bridgedb / bridgedb Goto Github PK

View Code? Open in Web Editor NEW

26.0 14.0 21.0 33.38 MB

The BridgeDb Library source code

Home Page: https://bridgedb.org/

License: Apache License 2.0

Shell 1.23% Java 90.61% R 0.10% Perl 7.09% HTML 0.11% Batchfile 0.01% CSS 0.86%

identifier-mapping bioinformatics java bridgedb database uri mysql-backend hacktoberfest

bridgedb's Introduction

BridgeDb

BridgeDb is currently tested with Java 11 and higher (3.0.x releases and master branch).

Using BridgeDb dependencies

The BridgeDb releases are published to Maven Central, which means you can use the BridgeDb JARs without needing to compile BridgeDb.

Usage depends on which module you require. The examples below assumes artifact org.bridgedb.bio and version 3.0.25:

For Maven:

<dependencies>
    <dependency>
        <groupId>org.bridgedb</groupId>
        <artifactId>org.bridgedb.bio</artifactId>
        <version>3.0.25</version>
    </dependency>
</dependencies>

For Gradle:

compile group: 'org.bridgedb', name: 'org.bridgedb.bio', version: '3.0.25'

For Ivy:

<dependency org="org.bridgedb" name="org.bridgedb.bio" rev="3.0.25"/>

For Buildr:

'org.bridgedb:org.bridgedb.bio:jar:3.0.25'

Compilation

If you've obtained the source code of BridgeDb, you should be able to compile with a simple:

mvn clean install -Dgpg.skip

You can find the libraries in the folder called "target", in each sublibrary folder (used to be called "dist" in ant).

If you want to ignore failing tests, e.g. because you are not online, add this option: -Dmaven.test.failure.ignore=true. Furthermore, note that 'mvn clean compile' fails.

Command line tools

The BridgeDb distribution comes with a few command line tools. You can try:

bash info.sh <database.bridge>
bash qc.sh <database.bridge> <database2.bridge>
bash voidtool.sh <database.bridge> <fileName.void>

Checking for regressions

You may want to run the following commands to detect regressions, which is particularly important before starting to make a release:

mvn clean test

Note that the first call may fail a number of unit tests, like for those of IDMapperCapabilitiesTest. For that one, this is because it is an abstract class that should not actually be run by JUnit as test suite.

Note also that the second one may fail because BridgeDb module dependencies may not be resolved yet.

Several tests also depend on a running MySQL backend or running BridgeDb webservice. You can exclude these tests in Maven with:

mvn clean test -Djunit5.excludeGroups=webservice,mysql

The JUnit5 tagging also allows to specifically test the MySQL-backend tests, for example with:

mvn clean test junit5.groups=mysql

Updating the datasources

The datasources.tsv and organisms.tsv files were separated from this repository, and this BridgeDb library is regularly updated for releases from https://github.com/bridgedb/datasources.

wget -O ./org.bridgedb.bio/src/main/resources/org/bridgedb/bio/datasources.tsv https://bridgedb.github.io/datasources/datasources.tsv
wget -O ./org.bridgedb.bio/src/main/resources/org/bridgedb/bio/datasources_headers.tsv https://bridgedb.github.io/datasources/datasources_headers.tsv
wget -O ./org.bridgedb.bio/src/main/resources/org/bridgedb/bio/organisms.tsv https://bridgedb.github.io/datasources/organisms.tsv
cp ./org.bridgedb.bio/src/main/resources/org/bridgedb/bio/*.tsv ./org.bridgedb.bio/resources/org/bridgedb/bio/.

The last line makes a copy for backwards compatibility.

Making releases

If it is time for a release, first, update org.bridgedb/src/main/resources/version.props (for BridgeDb), this README.md and CITATION.cff (for Zenodo).

To make the release, run the following commands. Mind you, this requires you to have an approved Sonatype (http://oss.sonatype.org/) account with push rights:

mvn versions:set -DnewVersion=3.0.25
mvn clean deploy

The second command will make the actual push. These commands will update the version and everything.

To make a development (SNAPSHOT) release, use:

mvn versions:set -DnewVersion=3.0.26-SNAPSHOT
mvn clean deploy

Library dependencies

If you do not use all mappers, you do not need to include all libraries in the dist directory in your project.

Here is a brief overview that will help you to find out which ones you need. For questions, you can always contact our mailing list.

org.bridgedb.jar - always needed. This includes the tab-delimited file driver.
org.bridgedb.bio.jar - includes the BioDataSource enum, often needed
org.bridgedb.webservice.cronos.jar - needed for CRONOS webservice
org.bridgedb.webservice.synergizer.jar - needed for Synergizer webservice
org.bridgedb.webservice.picr.jar - needed for PICR webservice
org.bridgedb.server.jar - the BridgeRest SERVER, not needed if you only want to access BridgeRest or BridgeWebservice as client
org.bridgedb.tools.batchmapper.jar - Contains the batchmapper command line tool

org.bridgedb.jar and org.bridgedb.bio.jar do not need any other jar files to work. Most of the other jar files in dist/ are part of the SOAP libraries needed only for some of the webservices. Look in the lib directory and build.xml of the respective mappers to find clues which libraries are needed by which service.

Additional packages added for version 2 (mainly for URI support needed for OpenPHACTS) org.bridgedb.utils - adds logging and some utils - Needed for all of modules below org.bridgedb.sql - Alternative SQL database optimized for speed not size

org.bridgedb.ws.* Another version of the Webservice. Runs with and IDMapper

org.bridgedb.rdf Loads DataSources from rdf and miriam (with URI suppport)

org.bridgedb.uri.sql Adds support for URIs (requires org.bridgedb.sql) org.bridgedb.uri.loader Loads RDF linksets into org.bridgedb.uri.sql. Also creates transitive linksets

org.bridgedb.uri.ws.* Extends org.bridgedb.ws.* with URI supports from org.bridgedb.uri.sql

Database structure

For further information about the database structure check the documentation here.

Contact

Website, wiki and bug tracker: http://www.bridgedb.org
Mailing list: http://groups.google.com/group/bridgedb-discuss/
Source code can be obtained from http://github.com/bridgedb/BridgeDb

Authors

BridgeDb and related tools are developed by (alphabetic order):

Manas Awasthi
Christian Brenninkmeijer
Jianjiong Gao
Alasdair Gray
Isaac Ho
Martijn van Iersel
Alexander Pico
Stian Soiland-Reyes
Egon Willighagen
Martina Kutmon
Jonathan Mélius
Anders Riutta
Randy Kerber

The lead teams at this moment are (alphabetic order):

Gladstone Institutes
Maastricht University
The University of Manchester

License

BridgeDb is free and open source. It is available under the conditions of the Apache License, version 2.0. See LICENSE-2.0.txt for details.

Configuration For the URI/OpenPHACTS packages ONLY!

Where are configuration files loaded from?

BridgeDb looks for the configuration files from the following locations with priority given to those at the top of the list (i.e. location 1 is a higher priority than 2 etc). Once it finds a configuration file the other locations are ignored.

Directly in the run directory (Mainly for java *.jar runs)
Environment Variable BRIDGEDB_CONFIG : can be used to point to any location
Tomcat configuration folder : $CATALINA_HOME/conf/BridgeDb
conf/BridgeDb : Allows tomcat 7 to pick up $CATALINA_HOME/conf/BridgeDb even if it can not get $CATALINA_HOME
../conf/BridgeDb : Allows tomcat 6 to pick up $CATALINA_HOME/../conf/BridgeDb even if it can not get $CATALINA_HOME
Using classLoader getResource : This will pick up the files included in Jars and Wars.

Configuration files

local.properties
BridgeDB.properties
log4j.properties
DataSource.ttl
lens.properties
graph.properties

local.properties

(There is no local properties files included)

This is the recommended place to overwrite individual property values of any other *.properties file.

local.properties will overwrite values with the same key in any other properties file. Properties not overwritten in local will keep their original values.

To install local properties you need to.

Create a local.properties file
Store it in a location as described above
Copy the keys from the original file

BridgeDB.properties

(Default file is included in build and can be found in org.bridgedb.utils/resources)

This file contains the local setup information which MUST be configured correctly for the service to run. It is essential that the database user, password and database name are correct.

You MUST either supply local values matching your local setup or setup your data stores to use the defaults. The recommended way to overwrite properties is to add a property with the exact (case sensitive) key to local.properties

Database Dependency

(for the org.bridgedb.sql package and its dependencies ONLY)

MySQL MUST be installed and running, otherwise it fails to start. Tested with MySQL up to version 5.5
MySQL databases and users MUST be created with CREATE, DROP, INDEX, INSERT, ALTER, UPDATE, DELETE, and SELECT permissions.

Consult the BridgeDB.properties file for the defaults, or copy and amend the configuration file to reflect your own setup.

If you are using the default MySQL accounts and databases then execute the file mysqlConfig.sql from the BridgeDB root directory which will configure your local mysql with the BridgeDb defaults

mysql -u root -p < mysqlConfig.sql

Note that the SQL script will fail, without reverting changes made up to the point of failure, if any of the user accounts or databases already exist.

RDF Repository and Transitive Directory Dependency

(For org.bridgedb.rdf package and its extensions ONLY)

BridgeDb uses OpenRDF Sesame RDF engine and this is included automatically via Maven.
WARNING: All directories MUST exists and the (linux) user running tomcat MUST have READ/WRITE permission set! Some of the OpenRDF error message are unclear if this is not the case.

See BridgeDb.properties and change the appropriate property to point to the correct directory. A Sesame SailNativeStore(s) will be created automatically as long as the loader can create/find the directory,

We recommend changing the relative directories to absolute directories. Please ensure the parent directories exist and have the correct permissions.

The settings for testing (and therefore compilation) can be left as is as long as the testing user would have permission to create and delete files there.

The BaseURI variable is no longer used but may be in the future so is worth setting correctly.

Other Configuration files

log4j.properties

(Default file is included in build and can be found in org.bridgedb.utils/resources)

Edit this to change the logger setup. The default can be found in the Utils Resource directory Please refer to the log4j documentation for more information.

DataSource.ttl

(Included in the build and found at org.bridgedb.rdf/resources)

RDF format of all the BridgeDb DataSource(s) and Registered UriPatterns, Found in $BRIDGEDB_HOME/org.bridgedb.rdf/resources

This file defines all the URI patterns that will match every BridgeDb DataSource. Warning: As additional UriPatterns are constantly being found and created this file is subject to continuous updates. Having a local DataSource.ttl is therefore highly discouraged as it will block future updates being discovered. Instead please push any changes into the version inside the source code. This file is NOT effected by local.properties and you cannot change existing or add additional datasource URI patterns through local.properties. If you require local additions that should not become general usage (such as commercial uriPatterns) then the suggested approach is for you to change the code to use multiple dataSource files.

lens.properties

(Included in the build and found at org.bridgedb.uri.sql\resources)

This file defines the lenses to be used in the system. See Scientific Lenses over Linked Data for more information on what lenses are.

Can and should be added to using local.properties

WARNING: As the Lens work is still evolving it is subject to alterations and the format of this file could be changed at any time. Having a local lens.properties is highly discouraged as it will block future updates being discovered. Instead please push any changes into the version inside the source code.

Local additions that should not become general usage (such as commercial lens) can be added to the local.properties file.

Note: the fourth part of the key
lens.lenkey.justification.***
only serves to keep the keys unique and can have any value. If extending a key we suggest using local** as the fourth part of the justification key to ensure not overwriting general additions.

graph.properties

(Included in the build and found at org.bridgedb.uri.sql\resources)

This file maps RDF Graphs/Context with the UriPatterns found in that graph. This allows Map functions to supply a graph name rather than a list of targetUriPatterns

Data in the included file is Open PHACTS specific.

Data Loading

All tests should load their required data at the start of the tests. To load the test data into the live SQL use the method SetupWithTestData in the URI loader package. The IMS project also has a data loader which should be used if the IMS is the deployed project.

Compilation

For URI/Open PHACTS packages

If you've obtained the source code of BridgeDb, you should be able to compile with a simple:

mvn clean install

Note that for the maven build to run all tests:

The MySQL database MUST be running and configured as above.
(Optional) http://localhost:8080/OPS-IMS to be running the war created by the URI webserver Server module, with test data which can be loaded using the class SetupWithTestData in the URI Loader module. Maven will skip the client tests if the localhost server is not found.

OPS Webservice Setup.

Make sure your local.properties file matches:

The SQL databases included user names and password
The RDF parent directories are setup (and accessible) as above.

or you have set up the default databases etc from BridgeDB.properties

Deploy $BridgeDb/org.bridgedb.uri.ws.service/target/org.bridgedb.uri.ws.server-*.war to something like your local Tomcat webapps directory To setup databases and add test data run org.bridgedb.uri.loader.SetupLoaderWithTestData. The easiest way is within eclipse since you can set the OPS_IMS_CONFIG environment variable within the run configuration, Netbeans unfortunately does not allow environment variables to be set within the IDE. (Optional) Deploy $BridgeDb/org.bridgedb.ws.service/target/BridgeDb.war Both wars share the same SQL data.

Note: If Installing the Open PHACTS IMS and or the Open PHACTS QueryExpander the org.bridgedb.uri.ws.server-*.war should not be deployed but instead the war appropriate to the other project should be deployed. See the readme within the other projects for more details.

bridgedb's People

Contributors

Stargazers

Watchers

bridgedb's Issues

check matching of new HMDB identifiers

see egonw/BridgeDbR#1

Support aliases for systemCode for webservice

To make the webservice easier to use, let's look into supporting aliases for systemCode for the webservice. This can be broken into two parts:

support conventionalName (e.g., Entrez Gene)
support Miriam namespace (e.g., ncbigene) and identifiers.org IRI (e.g., http://identifiers.org/ncbigene/)

The first one should be easy to set up, because it already works for all endpoints except possibly xrefsBatch. But it needs to be tested with the all variations of xrefsBatch.

The second one would be a little more involved. It would mean returning the same results for all of the following endpoints:
http://webservice.bridgedb.org/Human/xrefs/L/1234
http://webservice.bridgedb.org/Human/xrefs/Entrez%20Gene/1234
http://webservice.bridgedb.org/Human/xrefs/ncbigene/1234
http://webservice.bridgedb.org/Human/xrefs/http%3A%2F%2Fidentifiers.org%2Fncbigene%2F/1234

PubChem Substance needs to be secondary source

Tolerate loading empty linkset files into IMS

Currently, when attempt to load a linkset file into IMS, if the linkset file is empty an Exception is thrown and the load terminates. emty means there are zero links in the linkset file. Here is contents of such a file:

Replace onejar plugin

Build of org.bridgedb.rdf fails with:

[INFO] BridgeDb RDF ....................................... FAILURE [ 0.326 s]

[ERROR] Plugin org.dstovall:onejar-maven-plugin:1.4.4 or one of its dependencies could not be resolved: Could not find artifact org.dstovall:onejar-maven-plugin:jar:1.4.4 in onejar-maven-plugin.googlecode.com (http://onejar-maven-plugin.googlecode.com/svn/mavenrepo) -> [Help 1]

This repository is probably gone as Google Code is now shut down.

I think this <plugin>
https://github.com/bridgedb/BridgeDb/blob/master/org.bridgedb.rdf/pom.xml#L15
should be replaced with a more regular uberjar plugin, e.g. appassembler-maven-plugin and/or maven-shade-plugin.

For example, see https://github.com/apache/incubator-taverna-language/blob/master/taverna-tavlang-tool/pom.xml#L127

synergizer.jar and orthoxml-0.1b.jar are not open source

under org.bridgedb.webservice.synergizer we include lib/synergizer.jar as a system dependency, so anyone using this module would also need to add synergizer.jar to their own class path.

I tried to find the copyright/license for this JAR, and it looks like it is http://llama.mshri.on.ca/cgi/download/download.pl?software=Synergizer-Java-Client which does not look OSI compatible to me - and also requires notification sent to Harvard within 60 days. Have we sent such a notice, and do we have permission to redistribute the JAR in GitHub? At the very least we are not propagating their copyright as required.

So I wonder what agreement, if any, have been made. From the looks of it it seems we can't put this JAR in the GitHub repository which claims to be under Apache License v2.0.

Similar question is for orthoxml-0.1b.jar from http://orthoxml.org/xml/OrthoXML_Java.html which does not seem to have a license at all (!)

Compile fail on Travis due to ClassNotFound

See https://travis-ci.org/bridgedb/BridgeDb#L4544-L4549

add the Brachypodium distachyon organism

From the WikiPathways-dev mailing list:

Hi,

I am trying to use PathVisio for a plant species named Brachypodium distachyon,
which is not on the organism list. I was wondering if I can add it manually. I tried
BridgeDb but could not find it either.

Thanks,

Fred

Remove Ant as build system

Grameen Arabidopsis -> Gramene Arabidopsis

In datasources.txt, Grameen Arabidopsis is listed as a name. Shouldn't it be Gramene Arabidopsis?

idRegexPattern for RefSeq missing "WP" prefix

The new Uniprot-->RefSeq linkset file would not load into IMS.
IMS loader could not find any existing patterns to match this RefSeq URI:
http://purl.uniprot.org/refseq/WP_011154765.1

The Miriam record for RefSeq in the MiriamRegistry.ttl file contains this line:

      idot:idRegexPattern "^((AC|AP|NC|NG|NM|NP|NR|NT|NW|XM|XP|XR|YP|ZP)_\\d+|(NZ\\_[A-Z]{4}\\d+))(\\.\\d+)?$"^^<http://www.w3.org/2001/XMLSchema#string> ;

Thus does not allow the ID part of the URI to begin with "WP".

OpenAPI does not validate

See details at APIs-guru/openapi-directory#238

set up ims.bridgedb.org

With a public version of the IMS web services developed in Open PHACTS.

SwissProt and TrEMBL vs. UniProt Knowledgebase

Our datasources.txt official names (last column) don't always match the recommended Miriam names. For example, datasources.txt has "UniProt/Trembl" where Miriam has "UniProt Knowledgebase". (Update: not an exact match. See later comments.)

support for HDT-based mapping databases

HDT is a new binary file-based RDF "store" that can be queried and combined with an index provides the equivalent of a Derby file, but the fully semantic web based.

What needs to be explored first is the file size (smaller/larger than Derby) and the query speeds (faster/slower than Derby). For clarity, this is not meant to compete with a SQL-database backend, but as possible replacement of Derby.

Mismatches between datasources_headers.txt and BridgeDb vocab

The terms from datasources_headers.txt don't match their corresponding partners from the BridgeDb vocab. For example, datasources_headers.txt has website_url where the BridgeDb vocab has mainUrl.

missing probe identifiers?

Jonathan, not sure if this is a bug, but why don't we see ID mappings for this probe ID?

This is from http://www.wikipathways.org/index.php/Pathway:WP370

IMS link set build failing

datasources.txt has an issue with the uri column, and it's causing the IMS link set build to fail. According to datasources_headers.txt, the uri column is:

Official URI for datasource (e.g., from miriam); or repeat of system_code if unknown or unregistered

According to miriam, the uri value for both Swissprot and Trembl is urn:miriam:uniprot (see also issue #17). bridgedbjs needs to get the miriam urn for both Swissprot and Trembl, and it was getting it from datasources.txt. But the build for the IMS link set fails when both Swissprot and Trembl have uri value uniprot.

This commit is a temporary patch so the IMS link sets can build, but bridgedbjs won't work with that patch.

Should miriam have different uris for Swissprot and Trembl, or should the IMS link sets accept that the miriam urns are the same for both?

BDB/IMS link set stack trace:

Loading linkset file:///staging/linksets2/conceptwiki/www_uniprot_org_uniprot-gene.ttl
        Using File: /staging/linksets2/conceptwiki/www_uniprot_org_uniprot-gene.ttl
Error handling: (http://www.conceptwiki.org/concept/5fa40ec7-c73f-4810-a5c7-3743970b02ec, http://www.w3.org/2004/02/skos/core#exactMatch, http://www.uniprot.
org/uniprot/O14329)
org.bridgedb.utils.BridgeDBException: Uri http://www.uniprot.org/uniprot/O14329 maps to two different regex patterns http://www.uniprot.org/uniprot/$id (^([A
-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])(\.\d+)?|([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9][A-Z][A-Z,
 0-9][A-Z, 0-9][0-9])$) for DataSource S and http://www.uniprot.org/uniprot/$id (^([A-N,R-Z][0-9]([A-Z][A-Z, 0-9][A-Z, 0-9][0-9]){1,2})|([O,P,Q][0-9][A-Z, 0-
9][A-Z, 0-9][A-Z, 0-9][0-9])(\.\d+)?$) for DataSource Sp
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.simontuffs.onejar.Boot.run(Boot.java:340)
        at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: org.bridgedb.utils.BridgeDBException: Error loading
        at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.main(RunLoader.java:241)
        ... 6 more
Caused by: org.bridgedb.utils.BridgeDBException: Error parsing /staging/linksets2/conceptwiki/www_uniprot_org_uniprot-gene.ttl Error inserting statement (htt
p://www.conceptwiki.org/concept/5fa40ec7-c73f-4810-a5c7-3743970b02ec, http://www.w3.org/2004/02/skos/core#exactMatch, http://www.uniprot.org/uniprot/O14329)
        at org.bridgedb.uri.loader.RdfParser.parse(RdfParser.java:71)
        at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.load(LinksetLoader.java:92)
        at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.load(LinksetLoader.java:69)
        at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.loadLinkset(RunLoader.java:102)
        at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.main(RunLoader.java:225)
        ... 6 more
Caused by: org.openrdf.rio.RDFHandlerException: Error inserting statement (http://www.conceptwiki.org/concept/5fa40ec7-c73f-4810-a5c7-3743970b02ec, http://www.w3.org/2004/02/skos/core#exactMatch, http://www.uniprot.org/uniprot/O14329)
        at org.bridgedb.uri.loader.LinkHandler.insertUriMapping(LinkHandler.java:101)
        at org.bridgedb.uri.loader.LinkHandler.handleStatement(LinkHandler.java:73)
        at org.bridgedb.uri.loader.LinksetHandler.processFirstStatement(LinksetHandler.java:122)
        at org.bridgedb.uri.loader.LinksetHandler.handleStatement(LinksetHandler.java:76)
        at uk.ac.manchester.cs.openphacts.ims.loader.handler.ImsHandler.handleStatement(ImsHandler.java:55)
        at org.openrdf.rio.turtle.TurtleParser.reportStatement(TurtleParser.java:1062)
        at org.openrdf.rio.turtle.TurtleParser.parseObject(TurtleParser.java:476)
        at org.openrdf.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:399)
        at org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:371)
        at org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:356)
        at org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:246)
        at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:203)
        at org.bridgedb.uri.loader.RdfParser.parse(RdfParser.java:67)
        ... 10 more
Caused by: org.bridgedb.utils.BridgeDBException: No SourceURIPattern regstered for mappingSetId 0
        at org.bridgedb.sql.SQLUriMapper.insertUriMapping(SQLUriMapper.java:983)
        at org.bridgedb.uri.loader.LinkHandler.insertUriMapping(LinkHandler.java:99)
        ... 22 more

Maven-created jars are not compatible with PathVisio

These still need to be created with "ant clean dist".

loading of databases should be more permissive with system codes from Derby files

See https://github.com/bridgedb/bridgedb-r/issues/1

webservice: `search` vs. `freeSearch` vs. `attributeSearch`

For the webservice, I often get the different search types confused:

search as in GET /Human/search/1234
freeSearch as in GET /Human/isFreeSearchSupported
attributeSearch as in GET /Human/attributeSearch/CCR5

Would anyone else find it helpful if we updated the endpoints to make it easier to distinguish between the search types? Here are three sets of alternatives that I find easier to understand:

A

GET /Human/identifierSearch/1234
GET /Human/isIdentifierSearchSupported => true
GET /Human/attributeSearch/CCR5

Note that since BridgeDb is primarily about identifiers, one could argue that search is better than identifierSearch.

B

Webservice could accept a new query parameter named field with a default value of Identifier:

GET /Human/search/1234 (equiv. to /Human/search/1234?field=Identifier)
GET /Human/isSearchSupported => true (equiv. to /Human/isSearchSupported?field=Identifier)
GET /Human/search/CCR5?field=Symbol
GET /Human/isSearchSupported?field=Symbol (optional new endpoint)

C

GET /Human/identifier/1234
GET /Human/searchFields/identifier => true
GET /Human/symbol/CCR5
GET /Human/searchFields/symbol => true (optional new endpoint)

Not sure about desired capitalization: search_fields vs. searchFields or identifier vs. Identifier, symbol vs. Symbol. Some people recommend not using capital letters in webservice endpoints.

If we changed anything, we'd obviously need to add redirects for backwards compatibility.

datasources.txt Uniprot identifiers causes IMS to fail

Commit 2d248d6 changed the org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt file as follows:

-Uniprot-SwissProt Sp http://www.uniprot.org/ http://www.uniprot.org/uniprot/$id CALMHUMAN protein 1 Sp ^[A-Z0-9]+[A-Z]+$ UniProtKB/Swiss-Prot
+Uniprot-SwissProt Sp http://www.uniprot.org/ http://www.uniprot.org/uniprot/$id CALMHUMAN protein 1 urn:miriam:uniprot
^[A-Z0-9]+[A-Z]+$ UniProtKB/Swiss-Prot

The Uniprot code was changed from Sp to urn:miriam:uniprot. This causes the IMS to fail when parsing uniprot URIs:

Error handling: (http://www.conceptwiki.org/concept/5fa40ec7-c73f-4810-a5c7-3743970b02ec, http://www.w3.org/2004/02/skos/core#exactMatch, http://www.uniprot.org/uniprot/O14329) org.bridgedb.utils.BridgeDBException: Uri http://www.uniprot.org/uniprot/O14329 maps to two different regex patterns http://www.uniprot.org/uniprot/$id (^([A-N,R-Z][0-9]([A-Z][A-Z, 0-9][A-Z, 0-9][0-9]){1,2})|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])(.\d+)?$) for DataSource Sp and http://www.uniprot.org/uniprot/$id (^([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])(.\d+)?|([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])$) for DataSource S

Apparently the change was necessary because "MIRIAM does not distinguish SwissProt from TrEMBL". Does this mean that we should ask MIRIAM to change something on their end or do we need to do something with the identifier patterns or BridgeDB/IMS code?
We really need this code to work or else we will need to fork an Open PHACTS version of BridgeDB and deploy that. Then we will have 2 versions of BridgeDB which is not an ideal situation.

Support Taxonomy for organisms?

Would anyone find it useful for BridgeDb to support the NCBI Taxonomy for identifying organisms? Some time ago, I mapped each of our supported organisms to its Taxonomy IRI, but I'm not sure where this belongs.

{
    "Anopheles gambiae": "http://identifiers.org/taxonomy/7165",
    "Arabidopsis thaliana": "http://identifiers.org/taxonomy/3702",
    "Aspergillus niger": "http://identifiers.org/taxonomy/5061",
    "Bacillus subtilis": "http://identifiers.org/taxonomy/1423",
    "Bos taurus": "http://identifiers.org/taxonomy/9913",
    "Caenorhabditis elegans": "http://identifiers.org/taxonomy/6239",
    "Canis familiaris": "http://identifiers.org/taxonomy/9615",
    "Ciona intestinalis": "http://identifiers.org/taxonomy/7719",
    "Danio rerio": "http://identifiers.org/taxonomy/7955",
    "Drosophila melanogaster": "http://identifiers.org/taxonomy/7227",
    "Escherichia coli": "http://identifiers.org/taxonomy/562",
    "Equus caballus": "http://identifiers.org/taxonomy/9796",
    "Gallus gallus": "http://identifiers.org/taxonomy/9031",
    "Gibberella zeae": "http://identifiers.org/taxonomy/5518",
    "Glycine max": "http://identifiers.org/taxonomy/3847",
    "Homo sapiens": "http://identifiers.org/taxonomy/9606",
    "Hordeum vulgare": "http://identifiers.org/taxonomy/4513",
    "Macaca mulatta": "http://identifiers.org/taxonomy/9544",
    "Mus musculus": "http://identifiers.org/taxonomy/10090",
    "Mycobacterium tuberculosis": "http://identifiers.org/taxonomy/1773",
    "Ornithorhynchus anatinus": "http://identifiers.org/taxonomy/9258",
    "Oryza indica": "http://identifiers.org/taxonomy/39946",
    "Oryza sativa": "http://identifiers.org/taxonomy/4530",
    "Oryza sativa Indica Group": "http://identifiers.org/taxonomy/39946",
    "Populus trichocarpa": "http://identifiers.org/taxonomy/3694",
    "Pan troglodytes": "http://identifiers.org/taxonomy/9598",
    "Rattus norvegicus": "http://identifiers.org/taxonomy/10116",
    "Saccharomyces cerevisiae": "http://identifiers.org/taxonomy/4932",
    "Solanum lycopersicum": "http://identifiers.org/taxonomy/4081",
    "Sus scrofa": "http://identifiers.org/taxonomy/9823",
    "Vitis vinifera": "http://identifiers.org/taxonomy/29760",
    "Xenopus tropicalis": "http://identifiers.org/taxonomy/8364",
    "Zea mays": "http://identifiers.org/taxonomy/4577"
  }

Bump version to -SNAPSHOT

Why is the pom.xml on the master branch still saying <version>2.2.0</version> even though there's been quite a few commits since 2.2.0? The version should be bumped to 2.2.1-SNAPSHOT or equivalent for all pom.files.

Am I OK to do that?

Gene mapping database issues

Several things on the gene mapping database:

1. Last version is from Sept. 2016, so a new version should be created (regularly).
2. On the website of BridgeDB, the downloading of the mapping databases for metabolites and genes should have the same layout; the current layout is very confusing for (new) users.
3. The creation of the gene-mapping-file is not mentioned in any of the repositories here; this should be added just like the metabolite mapping file.

@JonathanMELIUS @egonw

Need new hasUriPattern for Uniprot to Ensembl Linkset.

Situation: creating new Uniprot --> Ensemb Linkset for openphacts refresh (version 2.2).

The SPARQL query that creates the Linkset returns different Ensembl URIs than it did the last time the Linkset was created (November, 2015 ?).

Previously, the Ensembl URIs looked like this:

<http://purl.uniprot.org/ensembl/ENST00000631466>
<http://purl.uniprot.org/ensembl/ENST00000361162>
<http://purl.uniprot.org/ensembl/ENSMUST00000048794>

Now they look like this:

<http://rdf.ebi.ac.uk/resource/ensembl.transcript/ENST00000631466>
<http://rdf.ebi.ac.uk/resource/ensembl.transcript/ENST00000361162>
<http://rdf.ebi.ac.uk/resource/ensembl.transcript/ENSMUST00000048794>

Presumed solution is to add the following new URI Pattern to the Ensembl DataSource definition in DataSource.ttl:

bridgeDB:hasUriPattern <http://purl.uniprot.org/ensembl.transcript/$id> ;

Where is the BridgeDB ontology?

Two namespaces:

.. used for terms like linksetJustification, subjectsDatatype and more BridgeDb-specific like hasIdentifiersOrgPattern and systemCode.

I don't know if these two namespaces are the same or not. But the fact is that none of them resolve (both give 404), yet they are used here:

Which one should it be? Has there been any attempt to create the ontology? Did vocabularies.bridgedb.org use to resolve to something else than an empty Drupal site?

Massive duplication of datasources

I still can't get my head around why we have so much duplication of data sources in org.bridgedb.bio vs org.bridgedb.rdf:

Fixing a data source name, URI pattern or system code in any one of these files seems to require editing all the others.

This means it is basically impossible to edit.

What is the point of this duplication?

hgnc identifiers - support with and without HGNC:

In http://identifiers.org/hgnc/ we see Identifier pattern ^((HGNC|hgnc):)?\d{1,5}$ which is (somewhat) reflected in the HGNC Accession number entry https://github.com/bridgedb/BridgeDb/blob/master/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L2762

bridgeDB:hasRegexPattern "^(HGNC:)?\\d{1,5}$" ;

This means that identifiers http://identifiers.org/hgnc/47710 and http://identifiers.org/hgnc/HGNC:47710 and http://identifiers.org/hgnc/hgnc:47710 are all valid - and indeed all resolve to RNU6-747P.

The IdentityMappingService is however unable to know these are the same thing, unless we move HGNC: out of the regular expression and add alternative URI prefixes. Currently this will be tracked as three identifier 47710, HGNC:47710 and hgnc:47710 in the same dataset.

HGNC itself consistently identifies a "HGNC ID" with the prefix, e.g. HGNC:47710 - which is in accordance with the 10 Simple rules for design, provision, and reuse of persistent identifiers for life science data rule 2 to use CURIEs.

In Open PHACTS, earlier linksets used the style http://identifiers.org/hgnc/47710 - however @JonathanMELIUS's latest [Ensembl-to-HGNC linkset])(http://bridgedb.org/data/linksets/HomoSapiens/Ensembl_Hs_hgnc.direct.LS.ttl) uses the style http://identifiers.org/hgnc/HGNC:47710 which adds the CURIE to the alternative base - perhaps this is not ideal (and can probably by changed upstream) - anyway as both patterns are accepted the org.bridgedb.rdf entry should be updated to support both.

LIPID MAPS link out URL pattern is outdated

I should now be: http://www.lipidmaps.org/data/LMSDRecord.php?LMID=$1

extended datasource information via webservice

It is suggested to have bridgedb.org serve more information about the data sources, in particularly, give access to datasources.txt. I propose the webservice should do that.

Fix unigene URI pattern for organism+clusterId

http://ops2.few.vu.nl/QueryExpander/dataSource/U

says:

primary UriPattern
    http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?UGID=1548618&SEARCH=$id
URI Pattern
    http://purl.uniprot.org/unigene/$id
idExample
    Hs.553708

but this URL only works for the example id, as http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?UGID=1548618&SEARCH=whatever will still show UGID:1548618 and just put whatever into the search box.

http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=656875 does work correctly - but is not possible to use in BridgeDb as we don't support split identifiers.

The pattern http://purl.uniprot.org/unigene/Hs.553708 is valid and used internally by Uniprot, but does not resolve to anything meaningful ("Can't handle namespace: unigene").

Note that confusingly there is also the "Unigene number" - http://ops2.few.vu.nl/QueryExpander/dataSource/unigene which uses UGID - e.g. http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?UGID=$id

Avoid returning `null`. Prefer `Optional`

In situations where a value might not exist, generally preferable to return values wrapped with a Java Optional<> rather than a null. It's more self documenting, and generally easier to deal with logic to handle situations where values might be missing.

linkset paths wrong for ensembl mus musculus

In ensemble mus musculus linkset files there is an inconsistency between what the linkset path is and what the actual path should be.

http://bridgedb.org/data/linksets/MusMusculus/release87
should be
http://bridgedb.org/data/linksets/release87/MusMusculus

'release87' should be before 'MusMusculus'.

Re-routing to non-existing url

The rerouting link (http://developers.bridgedb.org/wiki/BridgeWebservice) does not work and should be changed to http://webservice.bridgedb.org/. @JonathanMELIUS already fixed this on his machine.

Preferred IRIs for Xref properties

@egonw and I were discussing preferred IRIs for Xref properties in BridgeDb, and we'd welcome input from other community members on this.

This webservice endpoint http://webservice.bridgedb.org/Human/attributes/En/ENSG00000139618 returns:

Description    BRCA2, DNA repair associated [Source:HGNC Symbol;Acc:HGNC:1101]
Symbol    BRCA2
Type    protein_coding
Chromosome    13

To describe that call, do these IRIs appear correct?
description: http://purl.org/dc/terms/description
type: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
chromosome: http://purl.obolibrary.org/obo/GO_0005694
identifier: http://purl.obolibrary.org/obo/IAO_0000577
symbol: http://semanticscience.org/resource/SIO_000105

Does Symbol refer to name, as in one of these IRIs?
https://vocabularies.wikipathways.org/gpml#textlabel
http://schema.org/name
http://www.biopax.org/release/biopax-level3.owl#name
http://www.biopax.org/release/biopax-level3.owl#displayName
http://www.biopax.org/release/biopax-level3.owl#standardName

Or is it specific to a gene symbol, as in http://edamontology.org/data_1026 from the EDAM ontology?

Note that http://webservice.bridgedb.org/Human/attributes/Ce/CHEBI%3A29108 returns:

Symbol    calcium
Symbol    Ca(2+)
Symbol    calcium(2+)
Symbol    calcium, doubly charged positive ion
Symbol    Ca
Symbol    Calcium element

and http://webservice.bridgedb.org/Human/attributeSearch/calcium?attrName=Symbol returns:

C9J224	Uniprot-TrEMBL	Voltage-dependent L-type calcium channel subunit beta-4 (Fragment)
HMDB00464	HMDB	Calcium
Q9P0X4	Uniprot-TrEMBL	Voltage-dependent T-type calcium channel subunit alpha-1I
CHEBI:4496	ChEBI	Dibasic calcium phosphate dihydrate
CHEBI:9679	ChEBI	Tricalcium phosphate
Q86V35	Uniprot-TrEMBL	Calcium-binding protein 7
O00305	Uniprot-TrEMBL	Voltage-dependent L-type calcium channel subunit beta-4
CHEBI:3309	ChEBI	Calcium Gluconate
CHEBI:5010	ChEBI	Fenprofen calcium
B0QYI5	Uniprot-TrEMBL	EF-hand calcium-binding domain-containing protein 6
P57103	Uniprot-TrEMBL	Sodium/calcium exchanger 3
HMDB38053	HMDB	Calcium trimetaphosphate
CHEBI:3311	ChEBI	Calcium carbonate
E7DBM8	Uniprot-TrEMBL	Voltage-dependent L-type calcium channel subunit beta-4
CHEBI:4757	ChEBI	Edetate calcium disodium
E7EN11	Uniprot-TrEMBL	Voltage-dependent L-type calcium channel subunit beta-4
HMDB14471	HMDB	Calcium Gluceptate
Q5THR3	Uniprot-TrEMBL	EF-hand calcium-binding domain-containing protein 6
F2Z391	Uniprot-TrEMBL	Sodium/calcium exchanger 3

add new data sources to datasources.txt

~~MetaboLights~~
~~BRENDA~~
~~CompTox Dashboard~~
~~Wikidata~~

More?

Wikidata is primary source

versions

In the master branch, the project version in the pom.xml is 2.1.0-SNAPSHOT, which is behind already released 2.1.1. What does it mean?

PS:
Is bridgedb2.x the stable release branch? The version there is still 2.1.0-SNAPSHOT as well.
Was it merged into the master?..
Do you want to eventually deploy/release to e.g., OSSRH public maven repository (and then to Maven Central)?

Webservice: filtering xrefsBatch by dataSource

This endpoint doesn't filter results by dataSource:

curl -d $'ENSG00000083093\tEn\n1234\tL' http://webservice.bridgedb.org/Human/xrefsBatch?dataSource=S

It should just return results for S (Uniprot-TrEMBL), but it currently returns all results:

ENSG00000083093	Ensembl	Il:0004480538,X:8000329,T:GO:0005515,U:Hs.444664,T:GO:0000724,T:GO:0043066,En:ENSG00000083093,X:11722944_at,Ag:A_23_P129569,Uc:uc059sdg.1,X:219530_at,L:79728,T:GO:0005654,T:GO:0001701,Wg:79728,T:GO:0036342,Pd:3EU7,X:47111_at,Il:ILMN_1723793,S:I3L1Z5,Q:NP_078951,X:16825104,H:PALB2,T:GO:0000731,Uc:uc059sdc.1,T:GO:0007498,Om:610355,T:GO:0031052,Uc:uc002dlx.2,Uc:uc059sdh.1,T:GO:0000732,T:GO:0035264,T:GO:0001756,T:GO:0001833,Pd:2W18,S:I3L2S5,Q:NM_024675,T:GO:0003677,S:Q86YC2,T:GO:0048568,S:I3L3R6,T:GO:0009887,S:H3BN63,Uc:uc059sdf.1
1234	Entrez Gene	T:GO:0007204,T:GO:0006968,T:GO:0070098,H:CCR5,T:GO:0009897,Q:NM_001100168,T:GO:0000165,T:GO:0005737,T:GO:0007186,T:GO:0071222,T:GO:0070723,Pd:1NE0,T:GO:0016493,Q:NP_001093638,X:206991_s_at,Pd:1ND8,Ag:A_23_P412321,T:GO:0014808,Pd:2L87,Pd:1OPW,Pd:1OPT,Pd:1OPN,X:36724_s_at,Pd:2RLL,En:ENSG00000160791,X:16940182,T:GO:0015026,Pd:2MZX,X:U95626_rna3_at,Uc:uc062izs.1,X:11730910_s_at,T:GO:0006935,Wg:1234,Uc:uc062izt.1,S:Q38L21,T:GO:0005886,T:GO:0005887,T:GO:0005515,Pd:2RRS,Il:0006590601,Il:ILMN_1653395,U:Hs.450802,Om:601373,X:X99393_s_at,Pd:4MBS,T:GO:0007166,T:GO:0019722,T:GO:0007267,T:GO:0019957,Il:ILMN_2145033,T:GO:0006816,X:11748062_s_at,T:GO:0003779,X:U83326_s_at,S:P51681,T:GO:0006954,X:8079401,T:GO:0019064,Q:NM_000579,T:GO:0006955,X:8093298,T:GO:0002407,T:GO:0004435,T:GO:0001618,T:GO:0071791,T:GO:0005768,L:1234,X:11730909_s_at,T:GO:0023052,Q:NP_000570,T:GO:0009986,T:GO:0004950,T:GO:0030260

Webservice not working.

Hey there,

I'm experiencing connection issues with the BridgeDb web service.

Example:

http://webservice.bridgedb.org/Human/xrefs/Ik/HAFLLMSZAQXSPB-LDLOPFEMSA-N

Returns:

java.sql.SQLNonTransientConnectionException: No current connection.

Would it be possible to have this fixed?

Thanks,

Keiron.

more than one datasource with "urn:miriam:uniprot" is causing trouble

See this change back and forth: openphacts@bbb29bd

Apparently there cannot be two data sources with the same MIRIAM urn prefix at this moment...

introduce API for "secondary" identifiers

Many databases have the concept of secondary identifiers, e.g. of deprecated identifiers, replaced by new identifiers. This issue is to coordinate the work to update the data model and APIs to support this notion.

Swagger on BridgeDb docker image

Are the files that create the swagger interface within the code of BridgeDb? In order to use BridgeDb swagger as part of the docker image (https://github.com/bridgedb/docker) it should be within the bridgedb repository. Is this the case? Also, would like to see if we can configure this with the locally downloaded mapping databases and run the swagger locally.

Add an output method that return the QC report as string or a outputstream

Currently the QC report is display by System.out.println, need a method with a return.

IDMapperUniprot does not follow relative redirects

When building 2.2.0 org.bridgedb.webservice.uniprot.Test fails with:

java.lang.IllegalArgumentException: host parameter is null
        at org.apache.commons.httpclient.HttpConnection.setHost(HttpConnection.java:249)
        at org.apache.commons.httpclient.SimpleHttpConnectionManager.getConnectionWithTimeout(SimpleHttpConnectionManager.java:189)
        at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at org.bridgedb.webservice.uniprot.IDMapperUniprot.runQuery(IDMapperUniprot.java:329)
        at org.bridgedb.webservice.uniprot.IDMapperUniprot.runMappingQuery(IDMapperUniprot.java:125)
        at org.bridgedb.webservice.uniprot.IDMapperUniprot.mapID(IDMapperUniprot.java:189)
        at org.bridgedb.webservice.uniprot.Test.testBasic(Test.java:40)

This is caused by the HttpStatus.SC_MOVED_TEMPORARILY handling not resolving relative URIs like /mapping/M20170829A7434721E10EE6586998A056CCD0537EF5E350T.tab according to the requested URI.

For some reason this uses method.setFollowRedirects(false);. The fix is to use URI.resolve() to ensure correct parsing of any relative URI.

compare metadata given for Derby files against FAIR expectations

The current output looks like:

DATASOURCEVERSION    85
DATATYPE    GeneProduct
SERIES    Homo sapiens genes and proteins
DATASOURCENAME    Ensembl
BUILDDATE    20161019
SCHEMAVERSION    3

This may or may not comply with FAIR (quick guess is: no, it does not).

data.bridgedb.org

Should http://data.bridgedb.org/ be directed to http://bridgedb.org/data/? Analogous to new data.wikipathways.org...

Secondary ID mapping service in Bridge DB

Moving this issue from pathvisio github to BridgeDB.

DeniseSl22 commented on Jul 31
Hi all, (again about the library, sorry about that ;)....)

In the Pathvisio library, secondary identifiers of ChEBI are included as well (Marvin found this for bicarbonate). I think for a new Pathvisio release, we should remove these secondary IDs as a option; and replace all secondary ChEBIs with primary in WPs. Since the secondary IDs are not mapped via bridgeDB.

@marvinm2
@egonw
@mkutmon
@DeniseSl22

DeniseSl22 commented on Jul 31
After consulting with Jonathan, he told me that the library(search function in Pathvisio) uses the content of BridgeDB databases.
So, if any secondary metabolites IDs are being found in the "library", it is because they are in the metabolites bridgeDB file. We do not need a new Pv release for this change.

Is this something we want to include in the bridge file (I have other examples, where I could not create a mapping, because the secondary ID was used in a dataset....)?
Or do we just want the primary ones included, and another way to map the secondary to the primary IDs?
@mkutmon

mkutmon commented 3 days ago
@DeniseSl22 did you move this issue to the bridgedb repo? then I close it here.
@DeniseSl22

DeniseSl22 commented a minute ago
Nope not yet; I will do it right away :)
But it is a problem which we do not have the solution for right now (according to Chris and Egon)...

summarize the total number of identifiers and the total number of mappings

As well as the percentage change as compared to the other release.

bridgedb / bridgedb Goto Github PK

bridgedb's Introduction

BridgeDb

Using BridgeDb dependencies

Compilation

Command line tools

Checking for regressions

Updating the datasources

Making releases

Library dependencies

Database structure

Contact

Authors

License

Configuration For the URI/OpenPHACTS packages ONLY!

Where are configuration files loaded from?

Configuration files

local.properties

BridgeDB.properties

Database Dependency

RDF Repository and Transitive Directory Dependency

Other Configuration files

log4j.properties

DataSource.ttl

lens.properties

graph.properties

Data Loading

Compilation

OPS Webservice Setup.

bridgedb's People

Contributors

Stargazers

Watchers

Forkers

bridgedb's Issues

A

B

C

Recommend Projects

Recommend Topics

Recommend Org