Giter Site home page Giter Site logo

Comments (6)

stephanf avatar stephanf commented on July 18, 2024

Hi,

I think it is not so easy, you do need the IDs, but that is not all.

What if there are nodes that are in database but not in the import? Should they be deleted? If yes, the relationships of the node, should be deleted too, to keep the data consistent.

That also means that a incremental update would not be feasible.

How to deal with updates on nodes and relationships?

A possible solution could be, to load the import in-Memory temporarily with a embedded Server and to compare them with the existing neo4j database (also embedded).

Then you could iterate through the graphs and compare them, iterating through all nodes and compare properties and relationships.

That would need a lot of resources and probably would not be so fast as importing them into an empty database.

May be I am thinking too complicated, but I think that because of the - in theory - endless complexity of a graph database, there is no simple way compared to a relational database.

But how this problem is solved in an Enterprise environment? Incremental loads and updates do happen everywhere.

Regards,
Stephan

On 21.03.2013, at 15:42, Max De Marzi [email protected] wrote:

We're always getting requests for this. Maybe a way to specify the node id and rel id that the import should start from.


Reply to this email directly or view it on GitHub.

from batch-import.

robinloxley1 avatar robinloxley1 commented on July 18, 2024

May I know how this issue has been solved?

from batch-import.

jexp avatar jexp commented on July 18, 2024

With a config option see the readme

from batch-import.

aroyc avatar aroyc commented on July 18, 2024

Hey Michael, I don't see any option in the documentation to keep Unique nodes.
e.g: If I keep

batch_import.keep_db=true
and run the sample/import.sh twice nodes and rels with the same property are getting created:

neo4j-sh (?)$ MATCH (a)-[r]->(b) RETURN a,b LIMIT 25;
+-------------------------------------------------------------------------------------+
| a | b |
+-------------------------------------------------------------------------------------+
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[1]{age:"14",name:"Selina"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[2]{age:"6",name:"Rana"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[3]{age:"4",name:"Selma"} |
| Node[1]{age:"14",name:"Selina"} | Node[2]{age:"6",name:"Rana"} |
| Node[2]{age:"6",name:"Rana"} | Node[3]{age:"4",name:"Selma"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[5]{age:"14",name:"Selina"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[6]{age:"6",name:"Rana"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[7]{age:"4",name:"Selma"} |
| Node[5]{age:"14",name:"Selina"} | Node[6]{age:"6",name:"Rana"} |
| Node[6]{age:"6",name:"Rana"} | Node[7]{age:"4",name:"Selma"} |
+-------------------------------------------------------------------------------------+

I want to know about the specific option to set in the batch.properties so that the nodes with same properties doesn't get created twice.
TO KEEP IT IN A NUT-SHELL MY QUESTION IS: HOW CAN I USE BATCH INSERT TO MAKE SURE THE SAME NODES/RELS WON'T BE CREATED TWICE

Thanks in advance !

from batch-import.

jexp avatar jexp commented on July 18, 2024

The batch insertion is not about creating unique nodes, sorry, right now that was no focus b/c it will also reduce performance.

The only thing out of the box that I can think of is to control the node id's externally (with id:id as first column) and then use the same externally driven id's again.

If you are starting do to index lookups during batch insertion your performance will drop a lot.

from batch-import.

aroyc avatar aroyc commented on July 18, 2024

Okay !!
Thanx a lot for your prompt reply !! :)

Actually I've built a graphDB with a large collection of words. Now I'm trying to integrate DB-pedia and ran into such situation.

from batch-import.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.