ebispot / ols4 Goto Github PK

View Code? Open in Web Editor NEW

32.0 11.0 14.0 12.79 MB

Version 4 of the EMBL-EBI Ontology Lookup Service (OLS)

Home Page: http://www.ebi.ac.uk/ols4/

License: Apache License 2.0

Shell 0.97% Java 47.74% HTML 0.19% CSS 15.57% TypeScript 29.22% Dockerfile 0.21% JavaScript 6.08% Cypher 0.02%

ontologies semantic-web bioinformatics rdf knowledge-graph knowledge-management owl knowledge-representation

ols4's Introduction

OLS4 is available at https://www.ebi.ac.uk/ols4/. Please report any issues to the tracker in this repository.

Version 4 of the EMBL-EBI Ontology Lookup Service (OLS), featuring:

Much faster dataload (loads the OBO foundry in hours instead of days)
Modular dataload pipeline with decoupled, individually testable stages
Automated CI testing of the dataload with minimal testcase ontologies
A lossless data representation: everything in the ontology is preserved in the databases
Coverage of the whole OWL2 spec, and also loads vocabularies defined purely in RDFS
Uses updated versions of Solr and Neo4j (no embedded databases, no MongoDB)
React frontend using Redux and Tailwind
Backwards compatibility with the OLS3 API

This repository contains three projects:

The dataloader (dataload directory)
The API server (backend directory)
The React frontend (frontend directory)

Deploying OLS4

Deployment instructions will go here. OLS4 is still under heavy development, so currently we only have detailed instructions for developers below.

However, if you just want to try it out, this should get you going:

export OLS4_CONFIG=./dataload/configs/efo.json
docker compose up

You should now be able to access the OLS4 frontend at http://localhost:8081.

If you want to test it with your own ontology, copy the OWL or RDFS ontology file to the testcases folder (which is mounted in Docker). Then make a new config file for your ontology in dataload/configs (you can use efo.json as a template). For the ontology_purl property in the config, use e.g. file:///opt/dataload/testcases/myontology.owl if your ontology is in testcases/myontology.owl. Then follow the above steps for efo with the config filename you created.

Deployment: Using Kubernetes with GitHub Packages

To deploy OLS4 using Kubernetes, Docker images built and uploaded to this repository (using GitHub Packages) are utilized. Software requirements are as follows:

Kubernetes command-line tool, kubectl
Kubernetes package manager, helm

Create data archives for Solr and Neo4j

To create your own Solr and Neo4j data archives, follow the steps on how to load data locally.

Startup dataserver

Uninstall existing dataserver deployments, if any, before installing a new one. Do not forget to set KUBECONFIG environment variable.

export KUBECONFIG=<K8S_CONFIG>
helm install ols4-dataserver --wait <OLS4_DIR>/k8chart/dataserver

Copy data to dataserver

From your local directory, copy the Solr and Neo4j data archive files to the dataserver.

kubectl cp <LOCAL_DIR>/neo4j.tgz $(/srv/data/k8s/kubectl get pods -l app=ols4-dataserver -o custom-columns=:metadata.name):/usr/share/nginx/html/neo4j.tgz
kubectl cp <LOCAL_DIR>/solr.tgz $(/srv/data/k8s/kubectl get pods -l app=ols4-dataserver -o custom-columns=:metadata.name):/usr/share/nginx/html/solr.tgz

Startup OLS4 deployment

Uninstall existing ols4 deployments, if any, before installing a new one. Do not forget to set KUBECONFIG environment variable.

IMPORTANT: The use of imageTag is to specify the Docker image (uploaded to this repository) that will be used in the deployment. If not familiar, simply use either the dev or stable image.

export KUBECONFIG=<K8S_CONFIG>
helm install ols4 <OLS4_DIR>/k8chart/ols4 --set imageTag=dev

Developing OLS4

OLS is different to most webapps in that its API provides both full text search and recursive graph queries, neither of which are possible and/or performant using traditional RDBMS. It therefore uses two specialized database servers: Solr, a Lucene server similar to ElasticSearch; and Neo4j, a graph database.

The dataload directory contains the code which turns ontologies from RDF (specified using OWL and/or RDFS) into JSON and CSV datasets which can be loaded into Solr and Neo4j, respectively; and some minimal bash scripts which help with loading them.
The backend directory contains a Spring Boot application which hosts the OLS API over the above Solr and Neo4j instances
The frontend directory contains the React frontend built upon the backend above.

Development: Running OLS4 using Docker

You can run OLS4, or any combination of its consistuent parts (dataload, backend, frontend) in Docker. When developing, it is often useful to run, for example, just Solr and Neo4j in Docker, while running the API server locally; or to run Solr, Neo4j, and the backend API server in Docker while running the frontend locally.

First install the latest version of Docker Desktop if you are on Mac or Windows. This now includes the docker compose command. If you are on Linux, make sure you have the docker compose plugin installed (apt install docker.io docker-compose-plugin on Ubuntu).

You will need a config file, which configures the ontologies to load into OLS4. You can provide this to docker compose using the OLS4_CONFIG environment variable. For example:

export OLS4_CONFIG=./dataload/configs/efo.json

Then, start up the components you would like to run. For example, Solr and Neo4j only (to develop the backend API server and/or frontend):

docker compose up --force-recreate --build --always-recreate-deps --attach-dependencies ols4-solr ols4-neo4j

This will build and run the dataload, and start up Solr and Neo4j with your new dataset on ports 8983 and 7474, respectively. To start Solr and Neo4j AND the backend API server (to develop the frontend):

docker compose up --force-recreate --build --always-recreate-deps --attach-dependencies ols4-solr ols4-neo4j ols4-backend

To start everything, including the frontend:

docker compose up --force-recreate --build --always-recreate-deps --attach-dependencies ols4-solr ols4-neo4j ols4-backend ols4-frontend

Development: Running OLS4 locally

Alternatively, you can run OLS4 or any of its constituent parts locally, which is more useful for development. Software requirements are as follows:

Java 11. Later versions of Java are probably fine, though the Neo4j we use only works with Java 11.
Maven 3.x.x
Neo4J 4.4.x
Solr 9.0.0
Your favourite Git client

Acquire source and build

Clone repo:

git clone [email protected]:EBISPOT/ols4.git

Build backend:

mvn clean package

Build frontend:

npm install

Test testcases from dataload to UI

The scripts below assume you have the following environment variables set:

NEO4J_HOME

SOLR_HOME

OLS4_HOME - this should point to the root folder where you have the OLS4 code.

Change the directory to $OLS4_HOME.

cd $OLS4_HOME

To load a testcase and start Neo4J and Solr, run:

./dev-testing/teststack.sh <rel_json_config_url> <rel_output_dir>

where <rel_json_config_url> can be a JSON config file or a directory with JSON file, and <rel_outdir> the output directory, both relative from $OLS4_HOME, i.e.:

./dev-testing/teststack.sh ./testcases/owl2-primer/minimal.json ./output

or if you want to load all testcases, you can use

./dev-testing/teststack.sh ./testcases ./output

Once Neo4J and Solr is up, to start the backend (REST API) you can run:

./dev-testing/start-backend.sh

Once the backend is up, you can start the frontend with:

./dev-testing/start-frontend.sh

Once you are done testing, to stop everything:

./stopall.sh

Running the dataload locally

All related files for loading and processing data are in dataload. First, make sure the configuration files (that determine which ontologies to load) are ready and to build all the JAR files:

cd dataload
mvn clean package

Pre-download RDF

java \
-DentityExpansionLimit=0 \
-DtotalEntitySizeLimit=0 \
-Djdk.xml.totalEntitySizeLimit=0 \
-Djdk.xml.entityExpansionLimit=0 \
-jar predownloader.jar \
--config <CONFIG_FILE> \
--downloadPath <DOWNLOAD_PATH>

Convert RDF to JSON

java \
-DentityExpansionLimit=0 \
-DtotalEntitySizeLimit=0 \
-Djdk.xml.totalEntitySizeLimit=0 \
-Djdk.xml.entityExpansionLimit=0 \
-jar rdf2json.jar \
--downloadedPath <DOWNLOAD_PATH> \
--config <CONFIG_FILE> \
--output <LOCAL_DIR>/output_json/ontologies.json

Run ontologies linker

java \
-jar linker.jar \
--input <LOCAL_DIR>/output_json/ontologies.json \
--output <LOCAL_DIR>/output_json/ontologies_linked.json \
--leveldbPath <LEVEL_DB_DIR>

Convert JSON to Neo4j CSV

java \
-jar json2neo.jar \
--input <LOCAL_DIR>/output_json/ontologies_linked.json \
--outDir <LOCAL_DIR>/output_csv/

Create Neo4j from CSV

Run Neo4j import command:

./neo4j-admin import \
--ignore-empty-strings=true \
--legacy-style-quoting=false \
--array-delimiter="|" \
--multiline-fields=true \
--database=neo4j \
--read-buffer-size=134217728 \
$(<LOCAL_DIR>/make_csv_import_cmd.sh)

Here is a sample make_csv_import_cmd.sh file:

for f in ./output_csv/*_ontologies.csv
do
echo -n "--nodes=$f "
done

for f in ./output_csv/*_classes.csv
do
echo -n "--nodes=$f "
done

for f in ./output_csv/*_properties.csv
do
echo -n "--nodes=$f "
done

for f in ./output_csv/*_individuals.csv
do
echo -n "--nodes=$f "
done

for f in ./output_csv/*_edges.csv
do
echo -n "--relationships=$f "
done

Make Neo4j indexes

Start Neo4j locally and then run the sample database commands, which are also defined in create_indexes.cypher inside the dataload directory:

CREATE INDEX FOR (n:OntologyClass) ON n.id;
CREATE INDEX FOR (n:OntologyIndividual) ON n.id;
CREATE INDEX FOR (n:OntologyProperty) ON n.id;
CREATE INDEX FOR (n:OntologyEntity) ON n.id;

CALL db.awaitIndexes(10800);

After creating the indexes, stop Neo4j as needed.

Convert JSON output to Solr JSON

java \
-jar json2solr.jar \
--input <LOCAL_DIR>/output_json/ontologies_linked.json \
--outDir <LOCAL_DIR>/output_jsonl/

Update Solr indexes

Before running Solr, make sure to copy the configuration (solr_config) from inside dataload directory to local, e.g., <SOLR_DIR>/server/solr/. Then, start Solr locally and use the generated JSON files to update. See sample commands below:

wget \
--method POST --no-proxy -O - --server-response --content-on-error=on \
--header="Content-Type: application/json" \
--body-file <LOCAL_DIR>/output_jsonl/ontologies.jsonl \
http://localhost:8983/solr/ols4_entities/update/json/docs?commit=true

wget \
--method POST --no-proxy -O - --server-response --content-on-error=on \
--header="Content-Type: application/json" \
--body-file <LOCAL_DIR>/output_jsonl/classes.jsonl \
http://localhost:8983/solr/ols4_entities/update/json/docs?commit=true

wget --method POST --no-proxy -O - --server-response --content-on-error=on \
--header="Content-Type: application/json" \
--body-file <LOCAL_DIR>/output_jsonl/properties.jsonl \
http://localhost:8983/solr/ols4_entities/update/json/docs?commit=true

wget --method POST --no-proxy -O - --server-response --content-on-error=on \
--header="Content-Type: application/json" \
--body-file <LOCAL_DIR>/output_jsonl/individuals.jsonl \
http://localhost:8983/solr/ols4_entities/update/json/docs?commit=true

wget --method POST --no-proxy -O - --server-response --content-on-error=on \
--header="Content-Type: application/json" \
--body-file <LOCAL_DIR>/output_jsonl/autocomplete.jsonl \
http://localhost:8983/solr/ols4_autocomplete/update/json/docs?commit=true

Update ols4_entities core:

wget --no-proxy -O - --server-response --content-on-error=on \
http://localhost:8983/solr/ols4_entities/update?commit=true

Update ols4_autocomplete core:

wget --no-proxy -O - --server-response --content-on-error=on \
http://localhost:8983/solr/ols4_autocomplete/update?commit=true

After updating the indexes, stop Solr as needed.

Create data archives for Solr and Neo4j

Finally, create archives for both Solr and Neo4j data folders.

tar --use-compress-program="pigz --fast --recursive" \
-cf <LOCAL_DIR>/neo4j.tgz -C <LOCAL_DIR>/neo4j/data .

tar --use-compress-program="pigz --fast --recursive" \
-cf <LOCAL_DIR>/solr.tgz -C <LOCAL_DIR>/solr/server solr

Running the API server backend locally

The API server Spring Boot application located in backend. Set the following environment variables to point it at your local (Dockerized) Solr and Neo4j servers:

OLS_SOLR_HOST=http://localhost:8983
OLS_NEO4J_HOST=bolt://localhost:7687

Running the frontend locally

The frontend is a React application in frontend. See frontend docs for details on how to run the frontend.

Development: Updating `testcases_expected_output`

If you change something that results in the test output changing (e.g. adding new tests, changing what the output looks like), the CI on this repo will fail.

To fix this, you need to replace the testcases_expected_output and testcases_expected_output_api folders with the new expected output. You should do this in the same commit as your code/test changes because then we can track exactly what changed in the output.

First make sure all the JARs are up to date:

mvn clean package

Then run the test scripts:

./test_dataload.sh (~1 minute) will test the dataload locally, updating testcases_expected_output. All you need is Java and Maven.
./test_api.sh (~15 mins) will test the entire OLS4 stack (dataload → solr/neo4j → api server) using Docker compose to bring up and tear down all the services for each testcase, updating testcases_expected_output_api. You need to have Docker and Docker compose installed.

To run both:

./test_dataload.sh
./test_api.sh

Now remove the existing expected output:

rm -rf testcases_expected_output
rm -rf testcases_expected_output_api

Copy your new output to the respective directories:

cp -r testcases_output testcases_expected_output
cp -r testcases_output_api testcases_expected_output_api

You can now add it to your commit:

git add -A testcases_expected_output
git add -A testcases_expected_output_api

ols4's People

Contributors

Stargazers

Watchers

Forkers

cthoyt tibhannover supaaschnitzel julian-schneider muhammad-idrees1 jamesamcl marius-mather ensembl genostack b-gehrke hubmapconsortium xiaoranzhou

ols4's Issues

Integration tests for API?

Currently we only test the output of the dataload before it is loaded into the databases. We really should test the API responses. This would require bringing up Neo4j, Solr, and the API in a Docker container which should be no problem on GH actions.

We can keep the current testing as test_dataload.sh and add a new test_dataload_and_api.sh (or something) which runs the former, then also finishes the dataload and hits the API.

Update search box UI in home page

v2 API responses should sort non-URIs properties to the top

I think this would be more readable.

Implement V1TermRepository getInstances

//    @Query (countQuery = "MATCH (i:Individual)-[INSTANCEOF]->(c:Class) WHERE i.ontology_name = {0} AND c.iri = {1} RETURN count(i)",
//            value = "MATCH (i:Individual)-[INSTANCEOF]->(c:Class) WHERE i.ontology_name = {0} AND c.iri = {1} RETURN i")
    public Page<V1Individual> getInstances(String ontologyId, String iri, Pageable pageable) {
        throw new RuntimeException();
    }

parents query failing in k8s

Not sure if this is a general OLS4 issue or something to do with Neo4j in K8S. I haven't tested it outside of K8S yet.

ex. https://www.ebi.ac.uk/ols/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0010532/parents

yields no results

implement a hasChildren annotation to facilitate solr queries

Annotations need to be flattened in OLS3 responses

Annotations are returning the full OLS4 object, but for the OLS3 API only the value should be returned.

http://localhost:8080/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0010532/parents

https://www.ebi.ac.uk/ols/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0010532/parents

Implement Related?

Optimal implementation of queries over all the ontologies

In addition to querying entities from one specific ontology, we also allow querying all of OLS. This is true in both OLS3 and OLS4.

For the OLS4 endpoints (which are not finalized or documented and used only for the OLS4 webapp), currently we just return the JSON for each entity verbatim where the properties have their entire IRI as the key, so it's very straightforward to dump everything from Solr.
For the OLS3 endpoints, we need to map back to the OLS3 data model. This means interpreting the configuration of the ontology (hierarchical, synonym, definition properties) to determine how to populate the fields, because the OLS3 API (1) puts them into OLS fields like description and (2) excludes them from the annotations because they have already been interpreted by OLS.

To implement the OLS3 behaviour, when you want an entity from a specific ontology OLS4 just retrieves the ontology object and then retrieves the entity. Then we have the ontology and the entity, so the mapping is very straightforward. But when you want all the entities from all the ontologies OLS4 retrieves a page of the results, and then retrieves the ontologies associated with the page of results so that it can perform the mapping. This is probably fine, though if your page of results came from e.g. 10 different ontologies it would require 10 different queries to retrieve the ontology details.

I can see two ways to address this issue (let's call it issue A):

Cache the ontology properties in the API server. There are at maximum a few hundred so it's literally kilobytes of memory to do so, but the downside is you would need to restart the API server when you rerun the dataload which I don't like.
Populate the harmonised properties in the dataload like OLS3 does so the API can return them directly.

I suspect we will probably want to harmonise the properties in the OLS4 API just like the OLS3 API did because it makes the API easier to use, so issue A solution 2 is likely the correct way forward. The only caveat is that we still will not know how to populate the annotations in the OLS3 API without the ontology object, because even if we have e.g. a harmonised description field we don't know e.g. if http://purl.obolibrary.org/obo/IAO_0000115 (IAO:definition) belongs in the annotations or not.

To address this issue (let's call it issue B), we could perhaps store the list of annotation IRIs in the entity object. This would only be of use to the OLS3 API, because in the OLS4 API I think we should still return everything from the OWL even if we already interpreted it.

Update Ontology landing page

Update initial UI implementation of Ontology page to use Tailwind

Implement obo_definition_citation, obo_xref, obo_synonym in OLS3 API

https://www.ebi.ac.uk/ols/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0010532/parents

Omit propertyLabels when there aren't any

Would make JSON look tidier

Implement preferred roots

Get search results in home page does not work

There are two promises being fulfilled to retrieve search results for the home page search. The /api/v2/entities?search=&size=10 API returns 500 error, while /api/v2/ontologies?search=&size=3 API resolves but seems like an empty response.

SVG logo

Very minor issue at this point in time, but before I forget we only have a PNG of the logo. At some point I found out which font it was to make the OntoString logo, so it would be nice to recreate the OLS one as an SVG for the new frontend so that it's resolution independent ~~and we can make OLS T-shirts~~.

Totally irrelevant: ignore :)

There is this project

https://github.com/VirtualFlyBrain/neo4j2owl

which has nothing to do with what you are doing, but has a sort of similar name and is managed by people in your team @dosumis and Huseyin. No problem, just so you don't confuse each other when talking :)

Finish implementing graph view

So far we have a very basic configuration of CytoscapeJS embedded for the graph view. This needs developing to have feature parity (or close enough) to OLS3.

Should api/v2 to be api/v4 to match OLS version?

Server deployment at EBI

Currently we have the dataload, neo4j, solr servers deployed but not the API/webapp server.

Drop jstree endpoint for OLS3 API?

We have not yet implemented the jstree endpoint in the OLS3 backwards compatible API. However, we do not use it in the OLS4 frontend and I question whether it is in use by any current users of the OLS3 API, as it was never really documented anyway. As it would be reasonably annoying to implement I wonder if we should just drop it in OLS4?

EDIT: The same goes for the /graph endpoint.

Use official solr and neo4j images in k8s

The EBI k8s cluster can't run the official solr and neo4j images because its kernel is too old, so in the meantime I hacked together some images that run on Ubuntu 18.04. When the EBI cluster is updated we should change our k8s deployment to use the official images.

Implement individual information and relations boxes

Add Swagger documentation for REST API

booleans appear as objects in solr

ex. from ncit

description is showing up in the synonyms in the OLS3 API

Implement class information and relations boxes

See OLS3 for example on the right hand side of the term page.

Get classes API in ontology page returns obsolete terms

The /api/v2/ontologies/efo/classes API returns only obsolete terms

Review existing UI components

OLS3 provided a number of UI components that people rely on. These will need to be replaced in OLS 4

Auto complete
Tree view
Graph view - See example

Rename owl2json transforms to annotators

Implement the Help and About pages in the frontend

lang is read as part of URL

@RequestMapping(path = "/ontologies/{onto}/classes/{class}", produces = {MediaType.APPLICATION_JSON_VALUE, MediaTypes.HAL_JSON_VALUE}, method = RequestMethod.GET)
public HttpEntity<Resource<V2Class>> getClass(
        @PathVariable("onto") String ontologyId,
        @PathVariable("class") String uri,
        @RequestParam(value = "lang", required = false, defaultValue = "en") String lang
) throws ResourceNotFoundException {

if you hit it with

"/api/v2/ontologies/efo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0010532&lang=en

then uri contains &lang=en as part of the URL instead of it being parsed as a request param

we seem to be using different labels for annotations. OLS3 API is using `database_cross_reference` but OLS4 is using `hasDbXref`

Implement isDefiningOntology

Necessary to finish OLS3 endpoint implementations.

Integration tests for OLS3 backwards compatible API

Pick around 5 ontologies including EFO - @henrietteharmse which ones shall we do?
Collect some JSON output from the current OLS3 API and upload to this repo for safekeeping so we don't need to mess with an actual OLS3 instance again
Write test framework to (a) run OLS4 dataload for the above ontologies and (b) diff OLS4 output with OLS3 JSON output

Testcases for SWRL rules?

See PCL as an example.

Only OWL classes should be labelled as Ontology terms

Currently individuals and properties are labelled with the OntologyTerm label in Neo4J, though only classes should be labelled with OntologyTerm.

The best reference I can find for the meaning of term in the biological community is the OBO flat file spec. This clearly states that only OWL classes are terms.

The use of the word "term" in OLS 3 has been a bit inconsistent. On the frontend OWL classes are terms, properties are properties and individuals are individuals. In the codebase this is a bit more fluid with both OWL classes and OWL properties being referred to as terms. This may have slipped in as part of copying and pasting code.

For completeness the OWL vs OBO spec mapping is as follows:

OWL class --> OBO term
OWL property --> OBO typedef
OWL individual --> OBO instances

Testing for the remaining OLS3 endpoints

We do not yet have in the automated apitester lots of the OLS3 endpoints like findByIdAndIsDefiningOntology.

Currently the apitester does retrieve every entity in the ontology one by one, so there's no particular reason it can't also try endpoints like findByIdAndIsDefiningOntology for every entity since the testcases are smallish.

Better support for punning

Punning is where the same URI is used for, say, both a Class and an Individual.

Currently, this is handled in owl2json by creating both an entry in "classes" and an entry in "individuals", with the same properties. However, some of these properties only apply to the Class, and some only apply to the Individual, so the current behaviour is wrong; it doesn't lose anything, but it adds properties where they should not be.

We can identify to which types the property applies by looking at the semantics of the property. For example, if the rdfs:domain of the property refers to a Class, the property should not be added to Individuals in the owl2json output.

The other side of this issue is that property might point to something that is punned. Does it refer to the Class or the Individual? Presumably we can look at the range of the property in a similar manner to above, but this would need to be done in json2neo because in the output of owl2json there is no unique identifier for each type of a punned entity (only the URI).

Keeping downloaded ontologies from previous runs

In contrast to OLS3, we now reload everything from scratch with each reindex. However, that currently means that if an ontology fails to download, it will not be present in OLS.

Every output of owl2json is one giant JSON file. Therefore to fix this we can simply implement an option in owl2json where it can merge its output JSON file with the JSON file of a previous run. All of the other stages use this JSON file as their input so there will be no need to modify any of the rest of the pipeline.

Harmonise hierarchical, synonym, definition properties based on config in owl2json

Relates to #30

Currently in the JSON model all of the properties are separate (e.g. http://www.ebi.ac.uk/efo/definition, http://purl.obolibrary.org/obo/IAO_0000115 which would have been merged into description in OLS3.

We should keep the originals (unlike OLS3), but also construct a harmonised e.g. description field based on the fields specified in the config (like OLS3).

To implement the OLS3 backwards compatible API we also need to keep track of which properties are considered annotations (by OLS3), which is anything that OLS3 would not have parsed and put elsewhere. We could add an annotationPredicates field to entities for this purpose which would literally be a list of the IRIs that were not listed as hierarchical/synonym/definition properties in the ontology config.

Currently the API retrieves and looks in the ontology config to establish the above, which is fine if you're querying one ontology but could be inefficient if you want to query all of OLS, which duplicating them to every entity would fix.

Implement properties information and relations boxes

Rename searchOptions to searchResults in frontend

Handle bnode targets of axioms

Here for example

The annotatedTarget is a BNode which will (I think) have a different ID to the BNode it is annotating, so AxiomEvaluator won't be able to match them up

URI vs IRI

There's a bit of a mix of calling things URI vs. IRI. Most notably the OLS4 data model uses uri while the OLS3 API uses iri. I guess we should change it all to iri as it is more correct.

What should the search APIs look like in v4?

In OLS3 we have a complicated /api/search endpoint:

@RequestMapping(path = "/api/search", produces = {MediaType.APPLICATION_JSON_VALUE}, method = RequestMethod.GET)
public void search(
        @RequestParam("q") String query,
        @RequestParam(value = "ontology", required = false) Collection<String> ontologies,
        @RequestParam(value = "type", required = false) Collection<String> types,
        @RequestParam(value= "slim", required = false) Collection<String> slims,
        @RequestParam(value = "fieldList", required = false) Collection<String> fieldList,
        @RequestParam(value = "queryFields", required = false) Collection<String> queryFields,
        @RequestParam(value = "exact", required = false) boolean exact,
        @RequestParam(value = "groupField", required = false) String groupField,
        @RequestParam(value = "obsoletes", defaultValue = "false") boolean queryObsoletes,
        @RequestParam(value = "local", defaultValue = "false") boolean isLocal,
        @RequestParam(value = "childrenOf", required = false) Collection<String> childrenOf,
        @RequestParam(value = "allChildrenOf", required = false) Collection<String> allChildrenOf,
        @RequestParam(value = "inclusive", required = false) boolean inclusive,
        @RequestParam(value = "isLeaf", required = false) boolean isLeaf,
        @RequestParam(value = "rows", defaultValue = "10") Integer rows,
        @RequestParam(value = "start", defaultValue = "0") Integer start,
        @RequestParam(value = "format", defaultValue = "json") String format,

which is implemented with this logic:

    if (queryFields == null) {
        // if exact just search the supplied fields for exact matches
        if (exact) {
            // todo remove shortform_s once indexes have rebuilt - see https://helpdesk.ebi.ac.uk/Ticket/Display.html?id=75961
            solrQuery.setQuery(
                    "((" +
                            createUnionQuery(query.toLowerCase(), "label_s", "synonym_s", "shortform_s", "short_form_s", "obo_id_s", "iri_s", "annotations_trimmed")
                            + ") AND (is_defining_ontology:true^100 OR is_defining_ontology:false^0))"
            );

        }
        else {
            solrQuery.set("defType", "edismax");
            solrQuery.setQuery(query);
            solrQuery.set("qf", "label^5 synonym^3 description short_form^2 obo_id^2 annotations logical_description iri");
            solrQuery.set("bq", "type:ontology^10.0 is_defining_ontology:true^100 label_s:\"" + query.toLowerCase() + "\"^5 synonym_s:\"" + query.toLowerCase() + "\"^3 annotations_trimmed:\"" + query.toLowerCase() + "\"");
        }
    }
    else {
        if (exact) {
            List<String> fieldS = queryFields.stream()
                    .map(addStringField).collect(Collectors.toList());
            solrQuery.setQuery( createUnionQuery(query, fieldS.toArray(new String [fieldS.size()])));
        }
        else {
            solrQuery.set("defType", "edismax");
            solrQuery.setQuery(query);
            solrQuery.set("qf", String.join(" ", queryFields));
        }
    }

We will of course reimplement all of this in the OLS3 compatibility layer (and everything is in place apart from the API server code to do so).

For the new API I've been trying to make search an integral part of the same endpoints you use to retrieve entities. For example, you can retrieve:

`/api/v2/classes`

to get all classes, or you can retrieve:

`/api/v2/classes?search=diabetes`

to search for all classes with diabetes in some set of hardcoded fields (at least label, definition, uri off the top of my head), or:

`/api/v2/classes?search=diabetes&searchFields=label^10%20definition^5`

to search in label boosted 10 and definition boosted 5. You can also search in ANY field by including it in the GET parameters, like:

`/api/v2/classes?label=diabetes`

This includes fields that are specific to certain ontologies, like e.g. INCHIKEY in ChEBI. However, we are restricted by the expressivity of GET parameters. What if I want to search one of the fields exactly and the other case insensitively? Maybe a JSON POST search endpoint would be better?

Implement config overrides

Currently you can only load a self contained config, not a config + overrides

neo4j output has unnecessary uri edge pointing to itself

Rename term to entity, OntologyTerm to OntologyEntity

Apparently "term" only applies to classes and properties, not individuals. The correct word for any OWL thingy is an Entity. I think

Frontend assorted minor issues

Homepage

The search box should extend all the way to the "Data Content" box, and "Looking for a particular ontology?" should be aligned to the right of it.
"Looking for a particular ontology?" should link to the Ontologies page

Ontologies page

The pagination should be aligned all the way to the right

Other

The page titles always just say "Ontology Lookup Service" instead of showing which page you are on

obo stats

With owl2json, out of all of the ontologies in the OBO foundry:

186 loaded successfully

9 weren’t RDF or had imports that weren’t:

doid
cto
cvdo
mfmo
ons
ro
upheno
mamo
vario

3 were invalid RDF:

	uo, issue is open on their tracker since yesterday
	ms, imports uo
	genepio, invalid IRIs in the file contain unescaped spaces

3 had protocol errors (e.g. 404)

ogi
ero
rnao

52 were marked as obsolete and had no top level ontology_purl. Some of them still had an ontology_purl under the “products” field but the current code doesn’t look at that.

It took 15 minutes on codon to load the above in series, no parallelisation at all. It generated a 6.5 GB json file for everything combined.

ebispot / ols4 Goto Github PK

ols4's Introduction

Deploying OLS4

Deployment: Using Kubernetes with GitHub Packages

Create data archives for Solr and Neo4j

Startup dataserver

Copy data to dataserver

Startup OLS4 deployment

Developing OLS4

Development: Running OLS4 using Docker

Development: Running OLS4 locally

Acquire source and build

Test testcases from dataload to UI

Running the dataload locally

Pre-download RDF

Convert RDF to JSON

Run ontologies linker

Convert JSON to Neo4j CSV

Create Neo4j from CSV

Make Neo4j indexes

Convert JSON output to Solr JSON

Update Solr indexes

Create data archives for Solr and Neo4j

Running the API server backend locally

Running the frontend locally

Development: Updating testcases_expected_output

ols4's People

Contributors

Stargazers

Watchers

Forkers

ols4's Issues

Recommend Projects

Recommend Topics

Recommend Org

Development: Updating `testcases_expected_output`