Giter Site home page Giter Site logo

broadinstitute / exome-results-browsers Goto Github PK

View Code? Open in Web Editor NEW
11.0 7.0 4.0 1.72 MB

Results browsers for case-control studies of psychiatric diseases done at the Broad Institute

License: BSD 3-Clause "New" or "Revised" License

JavaScript 73.64% Dockerfile 0.51% Python 24.37% Shell 1.05% HTML 0.43%

exome-results-browsers's Introduction

Exome Results Browsers

Results browsers for case-control studies of psychiatric diseases done at the Broad Institute.

  • Schizophrenia - SCHEMA

    The Schizophrenia Exome Sequencing Meta-analysis (SCHEMA) consortium is a large multi-site collaboration dedicated to aggregating, generating, and analyzing high-throughput sequencing data of schizophrenia patients to improve our understanding of disease architecture and advance gene discovery. The first results of this study have provided genome-wide significant results associating rare variants in individual genes to risk of schizophrenia, and later releases are planned with larger number of samples that will further increase power.

  • Epilepsy - Epi25

    The Epi25 collaborative is a global collaboration committed to aggregating, sequencing, and deep-phenotyping up to 25,000 epilepsy patients to advance epilepsy genetics research. The Epi25 whole-exome sequencing (WES) case-control study is one of the collaborative's ongoing endeavors that aims to characterize the contribution of rare genetic variation to a spectrum of epilepsy syndromes to identify individual risk genes.

  • Autism - ASC

    Founded in 2010, the Autism Sequencing Consortium (ASC) is an international group of scientists who share autism spectrum disorder (ASD) samples and genetic data. This portal displays variant and gene-level data from the most recent ASC exome sequencing analysis.

  • Bipolar Disorder - BipEx

    The Bipolar Exome (BipEx) sequencing project is a collaboration between multiple institutions across the globe, which aims to increase our understanding of the disease architecture of bipolar disorder.

exome-results-browsers's People

Contributors

dependabot[bot] avatar knguyen142 avatar mattsolo1 avatar nawatts avatar rileyhgrant avatar sjahl avatar tarjindersingh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

exome-results-browsers's Issues

Repurposing exome-results-browser

Hello @nawatts

I am working on a non-human genomic project where I would like to display variants in a much similar fashion than what is currently done using the exome-results-browser. Basically I would like to display for all genes in a genome, the variant counts/frequency in two tracks for cases and controls, possibly the result of the X² test.

I tried previously using directly the gnomad-browser for my project, and managed to some extent to have a prototype working (=the VariantTable, with columns for counts in cases/controls), before realizing that my use case was much better handled by exome-results-browser. Just for the record when using gnomad-browser, I replaced the elasticsearch API with an API querying directly an SQL database, removed the caching part too.

For exome-results-browser, the API serves data stored in json files directly if I understood correctly. You process the json files for your different datasets using the scripts in data_pipeline using Hail and these json files are then served. You have some project specific implementation of the browser but I think I can ignore that for now as I would like to concentrate my efforts with the per Gene page, with the Case and Control + VariantTracks, and the VariantTable.

I think a number of things should be achieved :

  1. Replacing the gene-models. My genome is really simple (prokaryotic, so no alternative transcripts, no introns, only single exon genes) so I imagine this should not be too hard to feed this instead of a complicated human genome with transcript tracks and co
  2. Generating the json for each of the gene in the genome. For this you go through Hail as intermediary I think. I would like to try generating the json directly from my SQL database. As you are using GraphQL for the queries, my feeling is that I would have to modify the queries to fit my simpler gene models. For the variant query, I will start with a simpler model, using the variantId, HGVS nomenclature, and counts.

Does my plan make any sense for you ? Do you have any recommendations or previous experience in repurposing gnomad/exome-results-browser for other organism ?

As a side note, did you recently moved the common gnomad browser component out of each repo? They are all now in the gnomad-browser-toolkit is that right ?

Thank you very much for open sourcing this suite of tools and your help

LoF label changed to PTV

Not sure how to best systematically do this just for the SCHEMA browser, but in the gene page, we have LoF for:

  1. The variant selection button
  2. Constraint definition

For some reason, we have decided to use PTV in the SCHEMA project - what is the easiest to make all the PTV consistent?

Add pipeline to prepare all datasets

Currently, prepare_dataset has to be run individually on each dataset. There should be a pipeline to prepare all datasets based on the list of datasets in pipeline_config.ini.

Constraint table labels

Genic constraint metrics: metrics for quantification intolerance to protein-truncating variation as calculated by the gnomAD consortium. For more information, please visit the gnomAD browser. Please note that insertions and deletions are excluded in the aggregated counts and calculated metrics.

[hover]o/e ratio: ratio of the observed / expected (oe) number of loss-of-function variants in that gene. The expected counts are based on a mutational model that takes sequence context, coverage and methylation into account.
[hover]Exp. SNVs: expected number of loss-of-function variants
[hover]Obs. SNVs: observed number of loss-of-function variants
[hover]pLI: probability of being loss-of-function intolerant (pLI). A score closer to 1 indicates more intolerance to protein-truncating variation. For a set of transcripts intolerant of protein-truncating variation, we suggest pLI ≥ 0.9.

//TODO Change LoF to PTV.

Split up other studies component

Currently, the OtherStudies component is the only thing in the "base" directory that contains browser-specific information. It should be split up into the individual browser directories.

Inconsistent gene symbols/names

The gene symbols/names shown on the all gene results page and those shown on the individual gene pages are sometimes inconsistent. The ones shown on the all gene results page come from the gene results table. The ones shown on the gene pages come from the gene models based on Gencode/HGNC data.

Remove the gene symbol/name requirement from the data format (leave only Ensembl gene ID) and update data preparation steps to annotate gene symbol/name from the gene models.

QQ plot y-axis relabel

Can actual -log10(p) be changed to Observed -log10(p)? How easy is it to change the 10 to a subscript? Thanks!

Move CSV export server side

Instead of generating CSVs in the client, add routes that respond with gene/variant results in CSV format. These could take an analysis group as a query parameter.

This would make prevent configuring renderForCSV per-browser. Field types in dataset metadata could be used to set reasonable defaults (#6).

Only cache results of successful gene result queries

Queries for all gene results are cached in the API. However, there is no check that the query succeeds. Thus, an initial failed query is never retried.

const geneResultsCache = new Map()
export const fetchAllGeneResultsForAnalysisGroup = (ctx, analysisGroup) => {
if (geneResultsCache.has(analysisGroup)) {
return geneResultsCache.get(analysisGroup)
}
const request = fetchAllSearchResults(ctx.database.elastic, {
index: browserConfig.elasticsearch.geneResults.index,
type: browserConfig.elasticsearch.geneResults.type,
size: 10000,
body: {
query: {
bool: {
filter: {
term: { analysis_group: analysisGroup },
},
},
},
},
}).then(hits => hits.map(hit => shapeGeneResult(hit._source))) // eslint-disable-line no-underscore-dangle
geneResultsCache.set(analysisGroup, request)
return request
}

Render error on /gene/ENSG00000092108

Loaded gene page in SCHEMA for SCFD1. Page appeared to load fine.

However, starting to type "13" into the "Search variant table" field resulted in an error page. This was reproducible in both Safari and Chrome on a MacBook Pro running High Sierra (Mac OS 10.13.6)

I tried it with other genes and got the same crash

Allow sorting group results in variant details

Currently, the group results table in the variant details modal is sorted so that the default group is first, and the others are unordered. The group results table should allow choosing a column to sort on the same as the gene and variant results tables.

Updated variant annotation table for SCHEMA

gs://schizophrenia/browser-5/2020-09-10_schema-browser-variant-annotation-table.ht

I've added a column called canonical_term which is the string label for the consequence. However, I kept canonical_csq such that you know which variants are lof, mis, etc.

It would be good to have missense variants still separated into MPC 2 - 3 and MPC > 3. I think you do that as part of your processing pipeline?

Let me know if this makes sense.

Thanks!

Updated variant files for SCHEMA

I found a few things that required fixing in the listed variants in the browser. The format of the data has not changed.

Thanks!

gs://schizophrenia/browser-5/schema-browser-variant-annotation-table.ht
gs://schizophrenia/browser-5/schema-browser-variant-results-table-meta-rare-denovos-common-merged.ht

Document sources for reference data

Document where/how to obtain reference data files.

[reference_data]
grch37_gencode_path = gs://exome-results-browsers/reference/gencode.v19.gtf.bgz
grch38_gencode_path = gs://exome-results-browsers/reference/gencode.v29.gtf.bgz
grch37_canonical_transcripts_path = gs://exome-results-browsers/reference/gnomad_2.1.1_vep85_canonical_transcripts.tsv.bgz
grch38_canonical_transcripts_path = gs://exome-results-browsers/reference/gnomad_3.0_vep95_canonical_transcripts.tsv.bgz
hgnc_path = gs://exome-results-browsers/reference/hgnc.tsv

Precompress static files

Currently, the compression package is used to serve data gzip encoded. Since most responses are sending static files, those files can be compressed at build time.

Refactor data pipeline output

Currently, all outputs of the data pipeline are written to the output.staging_path specified in pipeline_config.ini.

[output]
# Path for intermediate Hail files.
staging_path = gs://exome-results-browsers/data/200911

Thus, preserving older versions of the combined Hail table requires changing the staging path setting every time data is updated. This in turn leads to requiring multiple copies of gene models and individual dataset files.

Instead, gene models could be output separately, individual dataset Hail tables written to staging path, and combined Hail tables written to timestamped paths. This way, updating one dataset would require running prepare_dataset only on that one dataset and then generating a new combined Hail table.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.