Giter Site home page Giter Site logo

pbr / qtm Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 6.0 378.32 MB

QTLTableMiner++ tool for mining tables in scientific articles

Home Page: https://www.research-software.nl/software/qtl-tableminer

License: Apache License 2.0

Java 99.75% Shell 0.25%
qtl candidate-genes europe-pmc text-mining scientific-articles solr ontologies

qtm's People

Contributors

arnikz avatar gnr1990 avatar jspaaks avatar matthijsbrouwer avatar rfinkers avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

qtm's Issues

Use single file for synonyms

Currently, there are four copies of the synonyms.txt file in Solr cores:

./propTerms/conf/synonyms.txt
./statoTerms/conf/synonyms.txt
./solaLyco2/conf/synonyms.txt
./terms/conf/synonyms.txt

Use symlinks? Modify this code (lines: 28-32) to work with X Solr cores

QTM does not compile

maven (3.0.5).

mvn clean verify

[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building XMLTABLE 2.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ XMLTABLE ---
[INFO] Deleting /home/arni/gitrepos/candYgene/QTM/target
[INFO] 
[INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ XMLTABLE ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] 
[INFO] --- maven-compiler-plugin:2.0.2:compile (default-compile) @ XMLTABLE ---
[INFO] Compiling 17 source files to /home/arni/gitrepos/candYgene/QTM/target/classes
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.909s
[INFO] Finished at: Thu Jun 01 16:13:26 CEST 2017
[INFO] Final Memory: 9M/239M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project XMLTABLE: Compilation failure: Compilation failure:
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[229,63] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[233,63] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[468,98] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[577,130] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[619,131] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[705,131] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[749,133] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/qtm/Table.java:[701,71] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/solrTagger/Annotate.java:[79,58] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] 
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/solrTagger/Annotate.java:[154,61] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Add Europe PMC endpoint to config

Currently, the URL is hard-coded in the PmcMetaReader.java:

String API_PMCXML = "http://www.ebi.ac.uk/europepmc/webservices/rest/" + PMCID + "/fullTextXML";

Add NULL type for empty column values in QTL tables

See https://www.sqlite.org/datatype3.html

Example db entry:

select * from Qtl where qtlId = 10
              qtlId = 10
 traitNameInArticle = Fruit weight
traitNameInOntology = fruit weight
           traitUri = http://purl.obolibrary.org/obo/TO_0002746
   chromosomeNumber = 
   markerAssociated = 
          markerUri = 
     geneAssociated = 11.39‡; 
            geneUri = 
      snpAssociated = 
             snpUri = null
              pmcId = PMC4321030
            tableId = PMC4321030_TABLE 2
          rowNumber = 26

change empty '' and 'null' strings to NULL type

QTM execution fails using the example article

Example given in the README does not seem to work for me. QTM exits prematurely with the following error message:

Error: Could not find or load main class tablInEx.TablInExMainGnr

Revise/normalize db schema

Current db schema/ER diagram here.

  • change pmc_id data type TEXT->NUMERIC and store IDs without the PMC prefix (i.e. PMC4266912->4266912)
  • change tab_id data type TEXT->NUMERIC and store IDs without the PMC* prefix (i.e. PMC4266912_Table 1->1)
  • TRAIT_TABLE columns n_cols and n_rows seems redundant as the counts can be obtained from TRAIT table, these could be removed and generated by a view
  • change col_id and row_id data type TEXT->NUMERIC
  • both col_id and row_id should (auto)increment (using the app code) || OR || Col id and Row id increment according to tables (tab_id) ||DONE||
  • type column in CELL_ENTRY is not really needed as it could be derived from value column
  • generate a new ER diagram (e.g. using DbSchema)

QTM executes using the example but triggers an exception

./QTM -pmc PMC4266912 -o QTL_PMC4266912.db

...
Exception in thread "main" java.io.FileNotFoundException: /var/solr/data/terms/conf/synonyms.txt (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at solrAnnotator.AbbrevtoSynonyms.abbrevToSolrSynonyms(AbbrevtoSynonyms.java:39)
	at qtm.QtmMain.main(QtmMain.java:98)

Organize/clean-up files in the repo

e.g. src, config , doc and data dirs. The original ontology files do not need to be stored per se but rather fetched e.g. using a bash script etc.

Most params in config unused.

Config file: conf/configQtm.properties

Haven't found a reference in the code for these params:

...
#solr
serverSolr=http://localhost:8983/solr
solrProgram=solr/qtmSolrRun
core1Solr=core1
core2Solr=statoTerms
core3Solr=propTerms
core5Solr=solaLyco2
matchSolr=LONGEST_DOMINANT_RIGHT
typeSolr=dictionary
...

Improve exception handling and reporting

Re-running QTM on the same input raises the following exception:

...
Insert entry to the database.
-------------------------------------------------
java.sql.SQLException: [SQLITE_CONSTRAINT]  Abort due to constraint violation (UNIQUE constraint failed: ARTICLE.pmc_id)
	at org.sqlite.core.DB.newSQLException(DB.java:890)
	at org.sqlite.core.DB.newSQLException(DB.java:901)
	at org.sqlite.core.DB.execute(DB.java:810)
	at org.sqlite.core.DB.executeUpdate(DB.java:847)
	at org.sqlite.jdbc3.JDBC3PreparedStatement.executeUpdate(JDBC3PreparedStatement.java:86)
	at resultDb.QtlDb.insertArticleEntry(QtlDb.java:132)
	at qtm.QtmMain.main(QtmMain.java:128)
*************************************************
Article already exits, Please provide unique entries
*************************************************

A user should be notified as soon as possible rather than waiting for this message after processing input articles.

Compilation fails

[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building Unnamed - com.wur:XMLTABLE:jar:2.0-SNAPSHOT
[INFO]    task-segment: [compile, install]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Unable to get dependency information: Unable to read the metadata file for artifact 'com.google.code.gson:gson:jar': Invalid JDK version in profile 'doclint-java8-disable': Unbounded range: [1.8, for project com.google.code.gson:gson-parent
  com.google.code.gson:gson:jar:2.6.2

from the specified remote repositories:
  central (http://repo1.maven.org/maven2)

Path to dependency: 
	1) com.wur:XMLTABLE:jar:2.0-SNAPSHOT

Incorrect output dbfile path

./QTM -pmc PMC4266912 -o QTL_PMC4266912.db

the dbfile is in data/QTL_PMC4266912.db instead of ./QTL_PMC4266912.db

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.