Giter Site home page Giter Site logo

janusgraph-csv-import's Introduction

JanusGraph CSV Loader

JanusGraph CSV Loader is a Java utility for bulk-loading data into a JanusGraph database.

Usage

java com.github.jespersm.janusgraph.csvimport.Import [-D] [--add-label-property] [--ignore-missing-nodes]
                    [--threads=<poolSize>] -c=<configFile> [-n=<limitRows>]
                    [--edgeLabels=<edgeLabels>[,<edgeLabels>...]]...
                    --nodes=<label=file1,file2>,<label=file1,file2>...
                    [--relationships=<file1,file2>]...

Where options are:

  -D, --drop-before-import                 Drop the graph before importing it
      --add-label-property                 Add a _label property to each Vertex, Edge copying the "real" label.
      --edgeLabels=<edgeLabels> ...        Import a CSV file with edge definitions
      --ignore-missing-nodes               Skip edges which hasn't had it's IDs imported.
      --nodes=<label=file1,file2,...>      Import vertices from file1 etc, using the given label name, and the
                                           headers (from the first file)
      --relationships=<file1,file2,...>    Import edges/relationships from file1, etc.

      --threads=<poolSize>                 Number of threads to run concurrently when importing vertixes/edges
  -c, --config=<configFile>                Identify the config file for creating JanusGraphFactory
  -n, --limit-rows=<limitRows>             Only import this many vertices/edges per type, useful for testing

CSV file format

The CSV files have a header line, containing the declaration of each column. Each declaration has a column name, an optional type and an optional column "tag", separated by colon. Example: fleep:uuid:ID

Names correspond to property key name. If blank, no property is created.

The type specifies how the JanusGraph schema is initialized:

  • string
  • int
  • long
  • float
  • double
  • boolean
  • byte
  • short
  • char
  • datetime (maps to java.util.Date)
  • uuid (maps to binary UUID).

The following are also recognized, but mapped to string, since JanusGraph doesn't support JSR-310 date/time types natively:

  • date
  • localtime
  • time
  • localdatetime
  • duration

The tag is how the value is used:

  • ID - recognize this column as the vertex ID (not supported for edges). Also generates a unique index.
  • DATA - just store this column (the default)
  • INDEX - index this property (limited to containing vertex label)
  • UNIQUE - make unique index for this property (under the containing vertex label)
  • IGNORE - don't import this column

For files containing information about edges only:

  • START_ID - specifies the ID of the starting vertex ("out")
  • END_ID - specifies the ID of the ending vertex
  • TYPE - The label of the edge

Remember, the types of the IDs and START_ID / END_ID must match. There is no warning against that.

Examples:

Assume you have this setting file, import.properties:

storage.backend=berkeleyje
storage.directory=some-folder

(please see JanusGraph configuration documentation for what to put into this file)

And you have this file containing persons, nodes.csv:

id:long:ID,name:string
1,"Bob"
2,"Alice"
3,"Charlie"

And you have this file containing relationships, edges.csv:

id:long:START_ID,id:long:END_ID,:TYPE,since:date
1,2,"KNOWS",2018-06-24
2,3,"MET",2014-03-11

Now, you can import the graph by using Gradle directly:

$ ./gradlew run --args="--config=import.properties --nodes=Person=nodes.csv --relationships=edges.csv"

This will populate the tree nodes and two edges into a JanusGraph graph database backed by BerkeleyJE, as specified in the config file.

You can use the "shadowJar" task in Gradle to build a fat Jar containing all the dependencies for running the importer without Gradle.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

APLv2

janusgraph-csv-import's People

Contributors

jespersm avatar artu-ole avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.