|License||Apache Software License|
How to set up
If you are a Dataiku and Neo4j admin user, follow these configuration steps right after you install the plugin. If you are not an admin, you can forward this to your admin and scroll down to the How to use section.
Create a Neo4j server configuration preset
All components of the plugin require that you know the credentials to connect to the Neo4j database (URI, username, password). You need to enter these parameters in a preset.
In Dataiku DSS, navigate to Plugins > Neo4j > Settings > Neo4j server configuration and create a preset with your Neo4j credentials.
You can create multiple presets that connect to different Neo4j databases and you can set specific permissions to them.
Create a connection to Neo4j import directory
In order to use the 2 DSS recipes to export nodes and relationships from DSS datasets to Neo4j, you need to create a SCP/SFTP connection in DSS to the Neo4j import directory. The output of the export recipes will be folders stored in this connection. Dataiku unix user must have SSH access to the machine hosting Neo4j.
In Dataiku DSS, navigate to Administration > Connections > + New connection > SCP/SFTP and create a new connection with the Neo4j server address, unix user, key or password authentication and enter the path to the Neo4j import directory.
How to use
We will explain the plugin components using a graph of Football transfers as example.
Export relationships recipe
We have the following dataset in DSS of Football transfers:
We want to create 2 types of nodes (Player and Club) and 2 types of relationships (TRANSFERS_TO and TRANSFERS_FROM) from this dataset in order to have the following graph in Neo4j:
First, create an Export relationships recipe from the + RECIPE button or from the right panel if your dataset is selected. You must create an output folder that is stored in the SCP/SFTP connection to Neo4j.
This recipe will create new nodes and relationships (and add their properties) from the input dataset. If some nodes and/or relationships already exist in the Neo4j database, then only the new properties are added and the new relationships are attached to the existing nodes.
The following recipe is filled to create the (Player)-[TRANSFERS_TO]->(Club) relationships with 2 properties (timestamp and fee).
- Select a preset defined in the plugin settings that contains your Neo4j server URI and credentials.
Source – Relationship -> Target
- Enter the source and target node labels (they can be the same) and the relationship type that you want to create in Neo4j.
- Primary key: Select a dataset column that corresponds to the primary key of the nodes (used to check whether a node already exists, should be unique). Primary key column cannot have null/empty value.
- Additional properties: Select columns that corresponds to other node properties that you want to add in Neo4j.
- Additional properties: Select columns that corresponds to other relationship properties that you want to add in Neo4j.
- You can choose to rename the properties in Neo4j instead of using the DSS column names (useful when the source and target nodes have the same labels).
- For example, when the source and target nodes have the same labels but are 2 different columns in the DSS dataset, you can map both source and target columns to the same name.
Similarly, we can create the (Player)-[TRANSFERS_FROM]->(Club) relationships.
Export nodes recipe
We also have a dataset in DSS containing informations about the Football players:
We want to use this dataset to add new properties to the Player nodes in the Neo4j database.
Now, create an Export nodes recipe. You must create an output folder that is stored in the SCP/SFTP connection to Neo4j.
This recipe will create new nodes (and add their properties) from the input dataset. If some nodes already exist in the Neo4j database, then only the new properties are added.
The following recipe is filled to add 3 new properties to the Player nodes (name, country and position) using playerId as primary key to to match nodes.
Settings work the same way as in the Export relationships recipe.
The complete flow described above looks like:
Here is an example of what the newly created graph looks like in the Neo4j desktop:
The macro can be used when a user needs to simply interact with the database and when no output Dataset is required. It could be used for instance to perform maintenance tasks on the database, create indices, test Cypher queries, delete nodes…
The Dataset can be used to import data from Neo4j into DSS. It is available from the flow in + Dataset > Neo4j. This connector allows to retrieve either all nodes (and their properties) with a given label or all relationships (and their properties) with a given type. Note that if you don’t enter any node label or relationship type, then it will retrieve a list of either all node labels or all relationship types.
The resulting DSS Dataset can be used in a larger Flow and blended with other data sources as required, and could be the input of a ML model for example.
Plugin limitations and improvements
- Only URI’s in the form of “bolt://” have been tested, with a username / password based authentication.
- Neo4j 4.1 and higher required.
You are welcome to contribute to this Plugin. Please feel free to use Github issues and pull requests.