ga4gh / g2p-team Goto Github PK

View Code? Open in Web Editor NEW

3.0 44.0 0.0 144 KB

GitHub Repo for the Genotype to Phenotype Task Team

License: Apache License 2.0

g2p-team's Introduction

g2p-team

GitHub Repo for the Genotype to Phenotype Task Team

Please read more in our Wiki

g2p-team's People

Contributors

Stargazers

Watchers

g2p-team's Issues

Support analyses into the relationship between multiple disorders

The current API doesn't appear to support studies investigating the relationship between multiple disorders (eg. PMID:25158072).

Changing the PhenotypeInstance record
from
OntologyTerm type;
to
{array } type;

would enable this.

Add record or field for clinical significance

In genotype to phenotype datasets, we often see a variant or genotype being linked to a disease along with a statement of clinical significance. For example, a variant may fall on a spectrum of benign to pathogenic for a given disease. Another example is a variant and treatment having a response to a given disease (see CIViC).

In these cases, I assume we want to conflate the concept of a phenotype and disease, so the disease would be stored in the PhenotypeInstance record. However, I'm not aware of a place to put the clinical significance of the association (benign, pathogenic, sensitive to drug). Would it make sense to add this to a field in the FeaturePhenotypeAssociation?

Dataset support

Other parts of the API use the concept of the Dataset as a container for sets of records - here that would be FeaturePhenotypeAssociation's . This is used for access control and limiting queries to results with the same provenance, so would be useful here too.

Update ga4gh/server to support GenotypePhenotype schema

While ga4gh/schema was updated to support our schemas the ga4gh/server is ~ 15 commits behind the schema project. In order to proceed, we need to work with the community to help restart this effort. I'd like to contribute to the server project with an implementation of /genotypephenotype/search backed with ohsu's cancer genomics database ontology. I've commented on the PR #379, and will track progress.

ClinGen integration

On the July 9 G2P team call, the topic of a comparison of ClinGen to G2P data models was raised. Initial notes captured on wiki here.

Is this something that is applicable for our group? Please comment/vote below.

BeatAML data

MS has questions on how to navigate data directory /mnt/lustre1/BeatAML : 8.7TB
@rnpandya , can you itemize?

Create documentation for g2p schema

As you know, we have been trying to re-introduce the g2p schemas back into the mainline of the ga4gh schema repository.
A change in policy is before any schema changes are made they must be accompanied by a reference server implementation and schema documentation

"It has become apparent that the documentation for the GA4GH schemas is not sufficient. Mark Diekhans has in response been developing a documentation branch which aims to give a complete description of the rational, use cases, semantics and high-level tutorial for the APIs. However the effort needs work. "

This second requirement was discussed recently by the reference server team. In order to break out of this chicken and egg scenario, we (OHSU) will create a document along these lines we can use participation by the larger community. For a starting point, I had some developer documentation for the code plus applicable excerpts from the schema group's github discussions.

We would like to pull this into a concise document with a few high level use case examples.

Your help is invaluable

Create implementation plan for schema suggestions

Task 1:

Validate that similar queries work with OHSU implementation

    Supporting query by id string only:
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature":[], "phenotype": ["http://www.ebi.ac.uk/efo/EFO_0000398"], "evidence" : [], "pageSize": 10 }'
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature":["rs6920220", "rs886774"], "phenotype": [], "evidence" : [], "pageSize": 10 }'
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature": ["ENSG00000115286"], "phenotype": [], "evidence" : [], "pageSize": 10 }'
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature":[], "phenotype": [], "evidence" : ["PMID:19684604", "PMID:19915572"], "pageSize": 10 }'

Pre-requisites: none.
Start date: ASAP
Assigned to: OHSU

Task 2:

Pick one & implement:
Overlap with metadata models is confusing – which should be implemented?
- Evidence identical
- Phenotype/ PhenotypeInstance similar
- Association/ PhenotypeFeatureAssociation different
Pre-requisites: Acceptance of G2P pull request, consensus on which to implement (follow Sarah's recommendation, let others comment on new PR)
Start date: ASAP after acceptance of pull request
Assigned to: ?

Task 3:

Incorporate AssociationSet
Support for selecting a association set to query ( ohsu-comp-bio/schemas#1)
Pre-requisites: Acceptance of G2P pull request
Start date: ASAP after acceptance of pull request
Assigned to: ? (OHSU & ?)

Task 4:

Track implementation of meta data minimum BioSample and Individual model

Pre-requisites: publish PR of biosample meta data implementation (UCSC)
Start date: ASAP after acceptance of pull request
Assigned to: OHSU

Literome <--> G2P Analysis

"As a G2P engineer, in order to create a Literome implementation , it would be useful to have a mapping defined between the request and response payloads of the '/genotypephenotype/search' and literome gwas endpoints

Exchange notes with MatchMakerExchange team

There was also a comment that there was activity in this area by the MatchMakerExchange team.

Schema Change Discussion

Team:

The schemas have received feedback that the semantics of SearchGenotypePhenotypeRequest are very unclear. In this section of the api documentation that applies to our schema, I've added some examples and guidance.
We are proposing deprecating the un-scoped string that is used in the query and replacing it with a scoped TermQuery. We hope to introduce this, along with adding a placeholder for external identifiers in Evidence and PhenotypeInstance along with our current pull request. Your comments are invaluable. readme

In addition, we have also been asked to consider a PhenotypeAssociation which has a wider scope; it connects evidence to entities other than Feature. Here we propose a new entrypoint that follows the modified pattern of the G2P and adds phenotype/search. This allows for discover of evidence associated with (Variant,FeatureEvent,BioSample,Individual,CallSet). Again, your comments will be useful. readme

HL7/FHIR integration

The DWG is seeking to line up with the HL7/FHIR specification for exchanging Ehealth records, and in particular, their Genomics component which provides a means of reporting variants with optional assessed associated medical condition. I think this component bears most relevance here and so I am sharing the link below for the current interest of the G2P group and also for future reference as our APIs begin to align.

http://www.hl7.org/FHIR/2015May/observation-genetics-cg-prf-1a.html

Is this something that is applicable for our group? Please comment/vote below.

Duplicated class name:SearchFeaturesRequest,SearchFeaturesResponse

Not clear if this is an issue for our team. If inapplicable, please comment.

The G2P schema contains two classes with the same name as classes in SequenceAnnotationMethods

One in the file genotypephenotypemethods.avdl, protocol GenotypePhenotypeMethods

/** This is the response from `POST /genotypephenotype/search` expressed as JSON. */
record SearchFeaturesResponse {
  /**
  The list of matching FeaturePhenotypeAssociation.
  */
  array<org.ga4gh.models.FeaturePhenotypeAssociation> associations = [];
  ...

The second one is found in sequenceAnnotationmethods.avdl

  /** This is the response from `POST /features/search` expressed as JSON. */
  record SearchFeaturesResponse {
    /**
    The list of matching annotations, sorted by start position. Annotations which
    share a start position are returned in a deterministic order.
    */
    array<org.ga4gh.models.Feature> features = [];

    ...

The avro spec is somewhat imprecise when describing this:

Record, enums and fixed are named types. Each has a fullname that is composed of two parts; a name and a namespace. Equality of names is defined on the fullname.

// then later on ...

... In this case the namespace is taken from the most tightly enclosing schema or protocol.

It would be simpler to avoid this and name the g2p class something like SearchEvidenceResponse.

Time stamps on FeaturePhenotypeAssociation

The FeaturePhenotypeAssociation record needs time stamps. The new convention appears to be recordCreateTime and recordUpdateTime - is that correct @mbaudis ?

Extend the evidence record

The evidence record currently only supports an evidence type and text description. Are there advanced plans to extend this?
In other parts of the API a metadata key-value pair structure is used to allow customisation in different implementations. The idea is that the API has some flexibility from the start and commonly used data types can be promoted to named fields as required.

Ontology Queries structure

Leaving this open for commentary/additions

Ontology Queries

The 'ontologySource' is assumed to be equivalent to an Ontologies 'prefix'. However, no agreement or mechanism exists to align the ontologySource string to a specific URI.

Recommend: collapsing ontologySource and identifier into a single field 'ontologyURI'

Discussion

During the call @mellybelly mentioned there was an yaml file to provide a mapping of commonly used sources.

Defer implementation of query-by-example

Leaving open for review during our next call. If no comments at that point, we can close

Request

http://yuml.me/edit/bf06b90a

Query by example

There are four datatypes types for each entity [string, external identifier, ontology identifier and 'entity'].
Currently the implementation handles queries of [string, external identifier and ontology identifier].

The 'entity' query is a type of query-by-example defined in [GenomicFeatureQuery,EvidenceQuery,PhenotypeQuery]. We have not implemented them. Challenges that arose:

schema constraints: there are several fields within the schemas that are defined as non-null. This may be fine when creating an entity from a data store, however, they are problematic when creating an entity to be used in a query.
additional discussions needed to determine what properties from an existing entity will be used for the query and which will be ignored. For example a Feature has [id,parentIds, featureSetId, referenceName, start,end, strand, featureType, attributes] we need to specify exactly what the query's expectations are.

Recommendation: Leave the schema definitions as-is. However, leave the entity query-by-example unimplemented. Implement when demand exists with sufficient use case details.