Giter Site home page Giter Site logo

grv1207 / graph-kd Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 82.57 MB

Graph-KD is a general graph exploring tool

Home Page: http://biomedical.dfki.de

License: GNU General Public License v2.0

Java 2.32% Shell 0.83% Batchfile 0.58% Python 2.31% HTML 88.53% CSS 2.82% JavaScript 2.60%

graph-kd's Introduction

Graph-KD: Exploring Relational Information for Knowledge Discovery.

Roland Roller, Gaurav Vashisth, Philippe Thomas, He Wang, Michael Mikhailov and Mark Stevenson [To appear] Proceedings of the International Semantic Web Conference 2019.

Graph-KD is a general graph exploring tool, which has following functionalities:

  1. Finding K-shortest path between two nodes.
  2. Exploring paths around a given node.
  3. Infering relations between source and target node of a given path.

The knowledge-graph that is used to build this tool is the UMLS (Unified Medical Language System) dataset which is freely available at UMLS website

In order to use Graph-KD you should have:

  • Neo4j
  • JVM
  • Python 3.x

The following tutorial supports only Unix-Systems

I) [ Neo4j-Setup ]:


1.1) Neo4j 3.2 requires the Java 8 runtime. To install java 8 on ubuntu 16.04:

  • echo "deb http://httpredir.debian.org/debian jessie-backports main" | sudo tee -a /etc/apt/sources.list.d/jessie-backports.list
  • sudo apt-get update
  • sudo add-apt-repository ppa:webupd8team/java
  • sudo apt-get update
  • sudo apt-get install oracle-java8-installer

1.2) Add the Neo4j repository:

  • wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
  • echo 'deb http://debian.neo4j.org/repo stable/' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
  • sudo apt-get update

1.3) Installing Neo4j:

  • sudo apt-get install neo4j=3.2.2

1.4) Check Neo4j installation:

  • systemctl start neo4j to start Neo4j.
  • systemctl status neo4j to check the status whether Neo4j is running.
  • systemctl stop neo4j (After testing whether Neo4j is running, please disconnect the Neo4j service).

1.5) Change the password of NEO4j server(Important)

  • After starting the Neo4j service (systemctl start neo4j) please open http://localhost:7474/browser/ on your local browser.

  • Username: neo4j Password: neo4j this will redirect you to set a new password.

1.6) Add server plugin (Java package) to the Neo4j:

  • Stop the Neo4j instance before performing following steps (systemctl stop neo4j)
  • cp com.dfki.LT.OntologyExplorer-1.0-SNAPSHOT.jar /var/lib/neo4j/plugins/

II) [ Neo4j-Database ]:


2.1) Create database for Neo4j (If you don't have graph.DB file)

  • The default Neo4j function was used to create the DB.
  • In order to create DB for Neo4j we need 4 files (2 header files and 2 content files)

Header files (attached with the repository in Header_files)

Create ( nheader.txt )

Content: :ID,ConceptID,ConceptName

  • :ID ==> Node IDbiomedical-dfki
  • ConceptID,ConceptName ==> Properties of the node
Create ( rheader.txt )

Content: :START_ID,:END_ID,:TYPE,RelationLabel,weight

  • :START_ID,:END_ID ==> Node ID
  • :Type ==> Vocabulary
  • RelationLable ==> Relation Name
  • weight ==> Edge weight

Content file

Create nodefile(node.txt)

This file contains data for node in 3 columns comma separated, without header.

Create relationfile(relation.txt)

This file contains data for relation in 4 columns comma separated, without header.

2.1.1) Command to create node.txt and relation.txt

  • get MRREL.RRF and MRCONSO.RRF from UMLS installed directory
  • for creating relation.txt and node.txt we need UMLS_final file, which is obtained by running below script using python3.5
  • First run the Create_Graph_DB/DBcreation.py 0 <path of UMLS_relation> <path of MRREL> <path of MRCONSO>
  • The first run will generate UMLS_Analyse file, which contains relationship that have equal number of records; where # of records greater than 2
  • After merging the records from UMLS_Analyse into UMLS_relation (manually)
  • Run Create_Graph_DB/DBcreation.py 1 <path of UMLS_relation> <path of MRREL> <path of MRCONSO>
  • We get node.txt and relation.txt

2.2) Command to create a Neo4j DB

neo4j-import --into graph.db --nodes:<Node label> "nheader.txt,node.txt" --relationships "rheader.txt,relation.txt" --skip-duplicate-node true

  • --into Name of the generated database
    • graph.db Recommended name
  • --nodes:UMLSConcepts Node label Note: When you have one label only you provide it via this command, but when you have too many labels you must provide them via a file.
    • "nheader,node" Name of the Node-Header-File (nheader.txt) and the Node-Content-File (nodefile)
  • --relationships
    • "rheader,relation" Name of the Relation-Header-File (rheader.txt) and the Relation-Content-File (relationfile)
  • --skip-duplicate-node
    • true Skip duplicate nodes

After running this command a graph.db folder will be generated in the present directory, you need to move this folder into the following folder /var/lib/neo4j/data/databases/

III) [ Additional information ]:


3.1) (optional) Change the default Neo4j Database PATH:

Default location of the folder is: /var/lib/neo4j/data/databases/

To change the path of the folder:

  • Open neo4j.conf by typing gedit /etc/neo4j/neo4j.conf

  • Replace line dbms.directories.data=/var/lib/neo4j/data with dbms.directories.data=<folder of your choice>

  • Copy the graph.db into <folder of your choice>/databases/

3.2) Command to copy the graph.db file into Neo4j's database folder

  • cp -r graph.db /var/lib/neo4j/data/databases/

3.3) (IMPORTANT) Everytime you add a new graph.db file you must stop the Neo4j instance otherwise you will corrupt the database.

3.4) (IMPORTANT!!!) In order to exploit NEO4j's efficient graph traversal speed, we have to index the database and this can be done via

  • CREATE INDEX ON :<Node label>(<Node property>)
  • CREATE INDEX ON :UMLSConcepts(ConceptID)

3.5) Inference.

Before running the inference script, make sure you have flask and request module installed.

  • python Inference/Inference.py

3.6) User-Interface:

graph-kd's People

Contributors

grv1207 avatar

Stargazers

 avatar Wade Rosko avatar

Watchers

James Cloos avatar  avatar

graph-kd's Issues

DBcreation.py 0 requires own output file and unnecessarily paths of MRCONSO + MRREL

olli@olli-ThinkPad-T520:~/Downloads/Programme/Graph-KD-master$ python Create_Graph_DB/DBcreation.py 0
usage: DBcreation.py [-h]
                 Run [Run ...] UMLS_relation [UMLS_relation ...] MRREL
                     [MRREL ...] MRCONSO [MRCONSO ...]
DBcreation.py: error: the following arguments are required: UMLS_relation, MRREL, MRCONSO

This is what I get, when I try to merge the records from UMLS_Analyse into UMLS_relation.

The description 2.1.1 says that merging is done by running "DBcreation.py 0" but the error states that the path of merged file needs to be indicated before its creation.
Also, one of the help texts of the python script says "path of the MRREL.RRF file, if not current directory". But the paths are required as a parameter although the files are in the current directory.
So probably the arguments must be made optional.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.