Giter Site home page Giter Site logo

brentp / seqcover Goto Github PK

View Code? Open in Web Editor NEW
50.0 7.0 4.0 9.67 MB

seqcover allows users to view coverage for hundreds of genes and dozens of samples

License: MIT License

Nim 48.79% HTML 6.24% JavaScript 44.97%
hacktoberfest genomics genomics-visualization

seqcover's Introduction

Build Status

seqcover is a tool for viewing and evaluating depth-of-coverage with the following aims. It should:

  • show a global view where it's easy to see problematic samples and genes
  • offer an interactive gene-wise view to explore coverage characteristics of individual samples within each gene
  • not require a server (single html page)
  • be responsive for up to 20 samples * 200 genes and be useful for a single-sample see how we do this
  • highlight outlier samples based on any number of (summarized) background samples

It is available as a static linux binary.

Example Output

https://brentp.github.io/seqcover/

Usage

seqcover can accept per-base coverage files in d4 or bgzipped bedgaph format. Either of these formats can be output by mosdepth but d4 format will be much faster.

Generate a report:

# generate per base depth files if you don't have them. can also parallize this...
mkdir -p samples/
for b in *.bam; do
  n=$(basename $b .bam)
  mosdepth -x -t 4 samples/$n $b
done
seqcover report --genes PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5,GABRA1,CAD,MDH2,SCN1B,CNPY3,CPLX1,NEB,HNRNPA1,CCDC39,AIFM1,CHCHD10 \
		 --background seqcover/seqcover_p5.d4 \
		 --fasta $fasta samples/*.bed.gz \
		 -r my_genes_report.html

Generate a background level:

seqcover generate-background --percentile 5 -f $fasta -o seqcover/ d4s/HG00*.d4

Once generated, this can be sent to seqcover report to give a metric for each sample of the number of bases below the 5th percentile of the backgrounds, which is a nice quality-control value.

These backgounds should be specific to the samples of interest, so if you are using exome data the backgrounds should be generated from samples from the same exome capture kit (and prefereably from the same sequencing center).

Generate a transcript file:

# generate a transcripts file that can be used as input to the report option with the --transcripts-file option
# this is useful if running on servers with restricted internet access

seqcover save-transcripts --genes PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5,GABRA1,CAD,MDH2,SCN1B,CNPY3,CPLX1,NEB,HNRNPA1,CCDC39,AIFM1,CHCHD10 \
		 --output-path transcripts.json \
		 --hg19

# use the file as input to report with the --transcripts-file option
seqcover report --genes PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5,GABRA1,CAD,MDH2,SCN1B,CNPY3,CPLX1,NEB,HNRNPA1,CCDC39,AIFM1,CHCHD10 \
		 --background seqcover/seqcover_p5.d4 \
		 --fasta $fasta samples/*.bed.gz \
		 -r my_genes_report.html \
		 --transcripts-file transcripts.json

How It Works

Performance

seqcover is a command-line tool that extracts depth information for requested genes and generates a terse report. This is possible because we excise introns which are often the majority of bases in the gene. The user can specify to extend into the intron beyond the default of 10 bases. As that extension increases, seqcover will show more and more of the intronic space. But it's possible to display 20 samples * 100 genes in html because we remove introns and use webGL (via plotly) for rendering.

Outliers

An outlier is data-dependent. Every exome will appear as an outlier given a set of genomes and every 30X genome will appear as an outlier given a set of 60X genomes. Therefore, seqcover let's the user extract a depth percentile from a set of chosen backgrounds. For example, the default is to extract the 5th percentile from a set of samples. This percentile can then be shown in the report and used as a metric: how many bases in each sample are below the 5th percentile.

seqcover's People

Contributors

brentp avatar brwnj avatar mikecormier avatar oyvindbusk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

seqcover's Issues

basic gene-plotting to-do.

Brent

  • fix coordinate translation for more edge-cases
  • fix all off-by-one errors
  • adjust to work with bed.gz (as well as .d4)

Mike

  • make transcript plot show only union plot by default and expand to all transcript on click
  • Show CDS Exon only by default. Toggle on UTRs and Non-CDS Exons
  • Sort Gene/Region Selection list
  • show gene.description somewhere.
  • customData should use objects instead of lists so that instead of customdata[5], it uses customdata.cdsend
  • update merged transcript plot

Joe

  • sort gene list by name in drop-down (taken from Mike's list)
  • make initial page design with nav, anchors, plot order, aesthetics
  • add sample colors to table in order to remove plot legends
  • add table hover events and tie them into Depth Dist per Sample plot
  • fix plotly's double-click event in region plot
  • see if we can manipulate region plot to improve hover events

AssertionError with some genes

Nice tool! I have noticed some genes (e.g. HNRNPA1) gives an assertionerror, and I have not been able to figure out why.

code used:

./seqcover report --genes HNRNPA1 \
--fasta $fasta samples/*_recal.per-base.d4 \
--background seqcover_out/seqcover_p5.d4 \
-r my_genes_report.html \
--hg19

# Gives:
fatal.nim(49)            sysFatal
Error: unhandled exception: transcript.nim(143, 14) `r_off - l_off == o_exon[1] - o_exon[0]`  [AssertionError]

use hash for linkable views

location.hash should update to, e.g. #gene=KNCQ2&selection={chrom}:{start}-{stop}&depth_cutoff=7

where selection is the hovered/selected region and depth_cutoff is a (to be added) value from the user selected depth cutoff.

Error: --hg19 coordinates

Hi,

Not sure I got this right, but... I am using seqcover with the --hg19-flag. Suddenly I get an error with specific genes.

CTDP1 is an example.

./seqcover report --genes CTDP1 --hg19 --fasta ${fasta} -r my_genes_report.html ${d4}
reports the following error:
Error: unhandled exception: d4:error seeking to position: 18:79679792 [ValueError]

I think the problem is that CTDP1 is on the fringe of exon 18, and when I get the coordinates in GRCh38 (for some reason), it has some parts that are outside the hg19-genome.

When I make seqcover print the gene.transcripts object from the get_genes proc, they are identical regardless of the --hg19-flag.

This is more likely something to do with the response from mygene.info, but I wanted to report it here in case somebody else is seeing the same error, or if it is something I am doing wrong.

ENH: add toggle button for line graph to set y-range to 95 pctile

currently, a single sample with high coverage makes it impossible to see variation in other samples.

a user can zoom on the y-axis, but we can add a checkbox/toggle that finds the 95th (or 98th) percentile of the data and automatically sets the zoom to that height.

Increase visibility of selected sample trace

This is more apparent when you increase sample size:

Screen Shot 2020-10-26 at 2 49 29 PM

Reducing to 10% opacity helps while still preserving an aspect of background:

Screen Shot 2020-10-26 at 3 01 05 PM

function highlight_sample(sample) {
    setHashParams({'sample': sample})
    let d = document.getElementById("gene_plot")
    let vals = d.data.map((t, i) => {
        if (t.tracktype == "background") {
            return [1, 1.5]
        }
        if (t.tracktype != 'sample') {
            return [undefined, undefined]
        } else {
            if (t.name == sample) {
                return [1, 2]
            } else {
                return [0.10, 0.36]
            }
        }
    })

    Plotly.restyle(d, {'opacity': vals.map(i => i[0]), 'line.width': vals.map(i => i[1]), 'hovertemplate': vals.map(i => i[1] > 0.8 ? HOVER_TEMPLATE: null)})
}

Open to community suggestions for fixing cases like NEB.

CDS bases below 7

When a user changes the metric for the heatmap to CDS bases below 7 the color scale essentially inverts. Blue becomes well covered and yellow low covered. I think we should consider either flipping the colors such that blue remains the color of interest within the plot or at a minimum add a header in the dropdown that informs the user of this swap.

Genes with no exons

Some genes return no exons from the query, this gives the following error message.

tables.nim(262)          []
Error: unhandled exception: key not found: exons_hg19 [KeyError]

An example is CCDC39 ("_id": "ENSG00000145075") where the query returns the following results:

# http://mygene.info/v3/gene/ENSG00000145075?fields=name,symbol,exons
{"_id": "ENSG00000145075", "_version": 1, "name": "coiled-coil domain containing 39", "symbol": "CCDC39"}

summary table

has columns:

sample | transcript mean | (all) CDS mean | selection mean | transcript bases < background lower | selection bases < background lower | CDS bases < background lower  | transcript bases < cutoff | CDS bases < cutoff | selection bases < cutoff. 

these values are available from the transcript.stats() function.

where the selection columns are empty when there is no selection.

Error: unhandled exception: Connection refused [OSError]

Hi Brent,

I am trying to run below command

seqcover report --genes LEPR,MC4R --fasta /gpfs/data_jrnas1/ref_data/Homo_sapiens/hs37d5/Sequences/WholeGenomeSequence/hs37d5.fa temp/*.bed.gz -r my_genes_report.html --hg19

I don't have internet access from my nodes. I think I am getting error due to that.
Do you have any suggestion about how to handle this issue?

Invalid integer exception

Hi everyone,

I've been testing seqcover tool with one sample coverage file processed by mosdepth in *.bed.gz format. This sample is from a WES experiment. I run the next command and it gave me an error:

${seqcover} report --genes BRCA1 --fasta ${fasta} ${input}/sample.bed.gz -r ${outfile} --hg19

[seqcover] read 1 sample coverage files
17 41195811 41277881
strutils.nim(1087) parseInt
Error: unhandled exception: invalid integer: CEX-chr17-41196311-41197870 [ValueError]

It looks for BRCA1 coordinates on my BED file but shows this exception error. I don't know if there is a problem with the name of regions. Ask for any other information you need.

I hope you could help me, thanks in advance.

feature: MultiQC plugin

Hi Brent,

Not sure if there's time and/or funding for this, but what would you think about converting this to /adding a MultiQC plugin for this project? (Talking about a real stand-alone plugin, not a module)

Would be nice to be able to include this functionality next to coverage metrics from other sources in a single report.

Thanks
M

Feature suggestion - offline mode

It would be nice to be able to run this on servers with restricted internet access. Would it be possible to have a solution where you could generate a file with transcript info that can be reused in these cases?

# Something like this to generate the file:
seqcover generate-db --genes PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5,GABRA1 \
		--hg19 \
		-o offline_db.json

# Then something like this to use it
seqcover report --genes PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5,GABRA1,CAD,MDH2,SCN1B,CNPY3,CPLX1,NEB,HNRNPA1,CCDC39,AIFM1,CHCHD10 \
		--background seqcover/seqcover_p5.d4 \
		--fasta $fasta samples/*.bed.gz \
		-r my_genes_report.html \
		--offline-db offline_db.json

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.