pbr / qtm Goto Github PK
View Code? Open in Web Editor NEWQTLTableMiner++ tool for mining tables in scientific articles
Home Page: https://www.research-software.nl/software/qtl-tableminer
License: Apache License 2.0
QTLTableMiner++ tool for mining tables in scientific articles
Home Page: https://www.research-software.nl/software/qtl-tableminer
License: Apache License 2.0
Look into AbbrevtoSynonyms.java
/var/solr/data
-> installSolr/solrData
related to #29
Currently, there are four copies of the synonyms.txt
file in Solr cores:
./propTerms/conf/synonyms.txt
./statoTerms/conf/synonyms.txt
./solaLyco2/conf/synonyms.txt
./terms/conf/synonyms.txt
Use symlinks? Modify this code (lines: 28-32) to work with X Solr cores
maven (3.0.5).
mvn clean verify
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building XMLTABLE 2.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ XMLTABLE ---
[INFO] Deleting /home/arni/gitrepos/candYgene/QTM/target
[INFO]
[INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ XMLTABLE ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO]
[INFO] --- maven-compiler-plugin:2.0.2:compile (default-compile) @ XMLTABLE ---
[INFO] Compiling 17 source files to /home/arni/gitrepos/candYgene/QTM/target/classes
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.909s
[INFO] Finished at: Thu Jun 01 16:13:26 CEST 2017
[INFO] Final Memory: 9M/239M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project XMLTABLE: Compilation failure: Compilation failure:
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[229,63] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[233,63] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[468,98] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[577,130] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[619,131] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[705,131] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/resultDb/qtlDb.java:[749,133] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/qtm/Table.java:[701,71] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/solrTagger/Annotate.java:[79,58] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR]
[ERROR] /home/arni/gitrepos/candYgene/QTM/src/main/java/solrTagger/Annotate.java:[154,61] error: package nl.erasmusmc.biosemantics.tagger.recognize does not exist
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Currently, the URL is hard-coded in the PmcMetaReader.java
:
String API_PMCXML = "http://www.ebi.ac.uk/europepmc/webservices/rest/" + PMCID + "/fullTextXML";
CSV -> *.csv
SQLite DB -> *.db
http://codeplex.com/SqlServerSamples
Name changed from TablesinXMI to QTM
See https://www.sqlite.org/datatype3.html
Example db entry:
select * from Qtl where qtlId = 10
qtlId = 10
traitNameInArticle = Fruit weight
traitNameInOntology = fruit weight
traitUri = http://purl.obolibrary.org/obo/TO_0002746
chromosomeNumber =
markerAssociated =
markerUri =
geneAssociated = 11.39‡;
geneUri =
snpAssociated =
snpUri = null
pmcId = PMC4321030
tableId = PMC4321030_TABLE 2
rowNumber = 26
change empty '' and 'null' strings to NULL type
Is the list in the README complete?
Currently, the vocabs/ontologies in the Solr cores are named as follows:
propTerms
-> used in column classification (N.B. also using STATO!)
solaLyco
-> remove
solaLyco2
includes SGN gene/transcript IDs and URIs
statoTerms
-> STATO
terms
- includes PATO, SPTO, PO and TO terms
See Solr data dir.
Synonyms need to be added from PO and TO
Example given in the README does not seem to work for me. QTM exits prematurely with the following error message:
Error: Could not find or load main class tablInEx.TablInExMainGnr
change XMLTABLE-2.0-SNAPSHOT.jar
to QTM.jar
Current db schema/ER diagram here.
code lines: 424-427
Config file: config/configQtm.properties
By setting textFiles=./
generates [some PMCID].txt file. What's the purpose of this output file?
Current db schema as found in QTL_PMC4266912.db
needs some fixes. See proposed schema below.
./QTM -pmc PMC4266912 -o QTL_PMC4266912.db
...
Exception in thread "main" java.io.FileNotFoundException: /var/solr/data/terms/conf/synonyms.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at solrAnnotator.AbbrevtoSynonyms.abbrevToSolrSynonyms(AbbrevtoSynonyms.java:39)
at qtm.QtmMain.main(QtmMain.java:98)
e.g. src, config , doc and data dirs. The original ontology files do not need to be stored per se but rather fetched e.g. using a bash script etc.
Add mavn and sqlite installation command
e.g. you could organize these in sections
[solr]
key1 = value1
...
[qtm]
key1 = value1
...
Config file: conf/configQtm.properties
Haven't found a reference in the code for these params:
...
#solr
serverSolr=http://localhost:8983/solr
solrProgram=solr/qtmSolrRun
core1Solr=core1
core2Solr=statoTerms
core3Solr=propTerms
core5Solr=solaLyco2
matchSolr=LONGEST_DOMINANT_RIGHT
typeSolr=dictionary
...
solrData/data/propTerms/conf/schema.xml
solrData/data/statoTerms/conf/schema.xml
solrData/data/solaLyco2/conf/schema.xml
solrData/data/terms/conf/schema.xml
solrData/data/solaLyco/conf/schema.xml
and fix typos:)
See a nice tutorial here.
e.g. number of (QTL) tables, number of QTLs...
Re-running QTM on the same input raises the following exception:
...
Insert entry to the database.
-------------------------------------------------
java.sql.SQLException: [SQLITE_CONSTRAINT] Abort due to constraint violation (UNIQUE constraint failed: ARTICLE.pmc_id)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.DB.newSQLException(DB.java:901)
at org.sqlite.core.DB.execute(DB.java:810)
at org.sqlite.core.DB.executeUpdate(DB.java:847)
at org.sqlite.jdbc3.JDBC3PreparedStatement.executeUpdate(JDBC3PreparedStatement.java:86)
at resultDb.QtlDb.insertArticleEntry(QtlDb.java:132)
at qtm.QtmMain.main(QtmMain.java:128)
*************************************************
Article already exits, Please provide unique entries
*************************************************
A user should be notified as soon as possible rather than waiting for this message after processing input articles.
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building Unnamed - com.wur:XMLTABLE:jar:2.0-SNAPSHOT
[INFO] task-segment: [compile, install]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.
Unable to get dependency information: Unable to read the metadata file for artifact 'com.google.code.gson:gson:jar': Invalid JDK version in profile 'doclint-java8-disable': Unbounded range: [1.8, for project com.google.code.gson:gson-parent
com.google.code.gson:gson:jar:2.6.2
from the specified remote repositories:
central (http://repo1.maven.org/maven2)
Path to dependency:
1) com.wur:XMLTABLE:jar:2.0-SNAPSHOT
./QTM -pmc PMC4266912 -o QTL_PMC4266912.db
the dbfile is in data/QTL_PMC4266912.db
instead of ./QTL_PMC4266912.db
header
...gene_associated, geneUri...
-> ...gene_associated,geneUri...
QTL_PMC4266912.db
has populated schema but none of the tables contain actual data.
e.g. Docker-ize QTM
Upload target/.jar
For example, setting textFiles=tmp
will generatetmp[PMCID].txt
instead of tmp/[PMCID].txt
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.