Giter Site home page Giter Site logo

g2p-team's Introduction

g2p-team

GitHub Repo for the Genotype to Phenotype Task Team

Please read more in our Wiki

g2p-team's People

Contributors

nlwashington avatar skeenan avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

g2p-team's Issues

Add record or field for clinical significance

In genotype to phenotype datasets, we often see a variant or genotype being linked to a disease along with a statement of clinical significance. For example, a variant may fall on a spectrum of benign to pathogenic for a given disease. Another example is a variant and treatment having a response to a given disease (see CIViC).

In these cases, I assume we want to conflate the concept of a phenotype and disease, so the disease would be stored in the PhenotypeInstance record. However, I'm not aware of a place to put the clinical significance of the association (benign, pathogenic, sensitive to drug). Would it make sense to add this to a field in the FeaturePhenotypeAssociation?

Dataset support

Other parts of the API use the concept of the Dataset as a container for sets of records - here that would be FeaturePhenotypeAssociation's . This is used for access control and limiting queries to results with the same provenance, so would be useful here too.

ClinGen integration

On the July 9 G2P team call, the topic of a comparison of ClinGen to G2P data models was raised. Initial notes captured on wiki here.

Is this something that is applicable for our group? Please comment/vote below.

BeatAML data

MS has questions on how to navigate data directory /mnt/lustre1/BeatAML : 8.7TB
@rnpandya , can you itemize?

Create documentation for g2p schema

As you know, we have been trying to re-introduce the g2p schemas back into the mainline of the ga4gh schema repository.
A change in policy is before any schema changes are made they must be accompanied by a reference server implementation and schema documentation

"It has become apparent that the documentation for the GA4GH schemas is not sufficient. Mark Diekhans has in response been developing a documentation branch which aims to give a complete description of the rational, use cases, semantics and high-level tutorial for the APIs. However the effort needs work. "

This second requirement was discussed recently by the reference server team. In order to break out of this chicken and egg scenario, we (OHSU) will create a document along these lines we can use participation by the larger community. For a starting point, I had some developer documentation for the code plus applicable excerpts from the schema group's github discussions.

We would like to pull this into a concise document with a few high level use case examples.

Your help is invaluable

Create implementation plan for schema suggestions

Task 1:

  • Validate that similar queries work with OHSU implementation
    Supporting query by id string only:
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature":[], "phenotype": ["http://www.ebi.ac.uk/efo/EFO_0000398"], "evidence" : [], "pageSize": 10 }'
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature":["rs6920220", "rs886774"], "phenotype": [], "evidence" : [], "pageSize": 10 }'
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature": ["ENSG00000115286"], "phenotype": [], "evidence" : [], "pageSize": 10 }'
    curl 'http://193.62.52.232:8082/ga4gh/genotypephenotype/search' -H 'Content- type:application/json' \-H 'Accept:application/json' -X POST -d '{ "feature":[], "phenotype": [], "evidence" : ["PMID:19684604", "PMID:19915572"], "pageSize": 10 }'
  • Pre-requisites: none.
  • Start date: ASAP
  • Assigned to: OHSU

Task 2:

  • Pick one & implement:
    Overlap with metadata models is confusing โ€“ which should be implemented?
    • Evidence identical
    • Phenotype/ PhenotypeInstance similar
    • Association/ PhenotypeFeatureAssociation different
  • Pre-requisites: Acceptance of G2P pull request, consensus on which to implement (follow Sarah's recommendation, let others comment on new PR)
  • Start date: ASAP after acceptance of pull request
  • Assigned to: ?

Task 3:

  • Incorporate AssociationSet
    Support for selecting a association set to query ( ohsu-comp-bio/schemas#1)
  • Pre-requisites: Acceptance of G2P pull request
  • Start date: ASAP after acceptance of pull request
  • Assigned to: ? (OHSU & ?)

Task 4:

Track implementation of meta data minimum BioSample and Individual model

  • Pre-requisites: publish PR of biosample meta data implementation (UCSC)
  • Start date: ASAP after acceptance of pull request
  • Assigned to: OHSU

Literome <--> G2P Analysis

"As a G2P engineer, in order to create a Literome implementation , it would be useful to have a mapping defined between the request and response payloads of the '/genotypephenotype/search' and literome gwas endpoints

Schema Change Discussion

Team:

The schemas have received feedback that the semantics of SearchGenotypePhenotypeRequest are very unclear. In this section of the api documentation that applies to our schema, I've added some examples and guidance.
We are proposing deprecating the un-scoped string that is used in the query and replacing it with a scoped TermQuery. We hope to introduce this, along with adding a placeholder for external identifiers in Evidence and PhenotypeInstance along with our current pull request. Your comments are invaluable. readme

In addition, we have also been asked to consider a PhenotypeAssociation which has a wider scope; it connects evidence to entities other than Feature. Here we propose a new entrypoint that follows the modified pattern of the G2P and adds phenotype/search. This allows for discover of evidence associated with (Variant,FeatureEvent,BioSample,Individual,CallSet). Again, your comments will be useful. readme

HL7/FHIR integration

The DWG is seeking to line up with the HL7/FHIR specification for exchanging Ehealth records, and in particular, their Genomics component which provides a means of reporting variants with optional assessed associated medical condition. I think this component bears most relevance here and so I am sharing the link below for the current interest of the G2P group and also for future reference as our APIs begin to align.

http://www.hl7.org/FHIR/2015May/observation-genetics-cg-prf-1a.html

Is this something that is applicable for our group? Please comment/vote below.

Duplicated class name:SearchFeaturesRequest,SearchFeaturesResponse

Not clear if this is an issue for our team. If inapplicable, please comment.

The G2P schema contains two classes with the same name as classes in SequenceAnnotationMethods

One in the file genotypephenotypemethods.avdl, protocol GenotypePhenotypeMethods

/** This is the response from `POST /genotypephenotype/search` expressed as JSON. */
record SearchFeaturesResponse {
  /**
  The list of matching FeaturePhenotypeAssociation.
  */
  array<org.ga4gh.models.FeaturePhenotypeAssociation> associations = [];
  ...

The second one is found in sequenceAnnotationmethods.avdl

  /** This is the response from `POST /features/search` expressed as JSON. */
  record SearchFeaturesResponse {
    /**
    The list of matching annotations, sorted by start position. Annotations which
    share a start position are returned in a deterministic order.
    */
    array<org.ga4gh.models.Feature> features = [];

    ... 

The avro spec is somewhat imprecise when describing this:

Record, enums and fixed are named types. Each has a fullname that is composed of two parts; a name and a namespace. Equality of names is defined on the fullname.

// then later on ...

... In this case the namespace is taken from the most tightly enclosing schema or protocol. 

It would be simpler to avoid this and name the g2p class something like SearchEvidenceResponse.

Extend the evidence record

The evidence record currently only supports an evidence type and text description. Are there advanced plans to extend this?
In other parts of the API a metadata key-value pair structure is used to allow customisation in different implementations. The idea is that the API has some flexibility from the start and commonly used data types can be promoted to named fields as required.

Ontology Queries structure

Leaving this open for commentary/additions

Ontology Queries

  • The 'ontologySource' is assumed to be equivalent to an Ontologies 'prefix'. However, no agreement or mechanism exists to align the ontologySource string to a specific URI.

Recommend: collapsing ontologySource and identifier into a single field 'ontologyURI'

Discussion

During the call @mellybelly mentioned there was an yaml file to provide a mapping of commonly used sources.

Defer implementation of query-by-example

Leaving open for review during our next call. If no comments at that point, we can close

Request

image
http://yuml.me/edit/bf06b90a

Query by example

There are four datatypes types for each entity [string, external identifier, ontology identifier and 'entity'].
Currently the implementation handles queries of [string, external identifier and ontology identifier].

The 'entity' query is a type of query-by-example defined in [GenomicFeatureQuery,EvidenceQuery,PhenotypeQuery]. We have not implemented them. Challenges that arose:

  • schema constraints: there are several fields within the schemas that are defined as non-null. This may be fine when creating an entity from a data store, however, they are problematic when creating an entity to be used in a query.
  • additional discussions needed to determine what properties from an existing entity will be used for the query and which will be ignored. For example a Feature has [id,parentIds, featureSetId, referenceName, start,end, strand, featureType, attributes] we need to specify exactly what the query's expectations are.

Recommendation: Leave the schema definitions as-is. However, leave the entity query-by-example unimplemented. Implement when demand exists with sufficient use case details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.