19th
Neo4j data integration with Pentaho Kettle
During my Neo4j JDBC driver test I wanted to find out how an ETL tool like Pentaho Kettle can handle Neo4j’s Cypher queries to pull data out of the graph database.
Here are the steps for connecting Kettle to the Neo4j database.
At first copy the Neo4j JDBC driver into the Kettle JDBC folder:
data-integration/libext/JDBC
As next we define a new generic JDBC data source where we define the driver class and the connection URL:
Then we can build up a simple Kettle transformation like this:
In the first step “Cypher Query” we can enter the query code and do a data preview:
Then we propagate the input step fields to the output step and create an Excel file here. This runs quite good.
Generally, this is a workable data integration solution. I will do some more tests. Performance seems to be ok so far.
There is only one minor issue for now. The RETURN of a node or relationship as string is causing an exception. Queries like this are not working:
START n=node(*) RETURN n
START r=relationship(*) RETURN r
But, who needs the JSON like result string if you can access all the properties? I think this will be fixed soon anyways.









