brentp / seqcover Goto Github PK
View Code? Open in Web Editor NEWseqcover allows users to view coverage for hundreds of genes and dozens of samples
License: MIT License
seqcover allows users to view coverage for hundreds of genes and dozens of samples
License: MIT License
location.hash should update to, e.g. #gene=KNCQ2&selection={chrom}:{start}-{stop}&depth_cutoff=7
where selection is the hovered/selected region and depth_cutoff is a (to be added) value from the user selected depth cutoff.
Hi,
Not sure I got this right, but... I am using seqcover with the --hg19-flag. Suddenly I get an error with specific genes.
CTDP1 is an example.
./seqcover report --genes CTDP1 --hg19 --fasta ${fasta} -r my_genes_report.html ${d4}
reports the following error:
Error: unhandled exception: d4:error seeking to position: 18:79679792 [ValueError]
I think the problem is that CTDP1 is on the fringe of exon 18, and when I get the coordinates in GRCh38 (for some reason), it has some parts that are outside the hg19-genome.
When I make seqcover print the gene.transcripts
object from the get_genes proc, they are identical regardless of the --hg19-flag.
This is more likely something to do with the response from mygene.info, but I wanted to report it here in case somebody else is seeing the same error, or if it is something I am doing wrong.
Some genes return no exons from the query, this gives the following error message.
tables.nim(262) []
Error: unhandled exception: key not found: exons_hg19 [KeyError]
An example is CCDC39 ("_id": "ENSG00000145075") where the query returns the following results:
# http://mygene.info/v3/gene/ENSG00000145075?fields=name,symbol,exons
{"_id": "ENSG00000145075", "_version": 1, "name": "coiled-coil domain containing 39", "symbol": "CCDC39"}
currently, a single sample with high coverage makes it impossible to see variation in other samples.
a user can zoom on the y-axis, but we can add a checkbox/toggle that finds the 95th (or 98th) percentile of the data and automatically sets the zoom to that height.
we need a way to quickly see, within the plot, which samples have aberrant coverage.
Hi Brent,
I am trying to run below command
seqcover report --genes LEPR,MC4R --fasta /gpfs/data_jrnas1/ref_data/Homo_sapiens/hs37d5/Sequences/WholeGenomeSequence/hs37d5.fa temp/*.bed.gz -r my_genes_report.html --hg19
I don't have internet access from my nodes. I think I am getting error due to that.
Do you have any suggestion about how to handle this issue?
Hi Brent,
Not sure if there's time and/or funding for this, but what would you think about converting this to /adding a MultiQC plugin for this project? (Talking about a real stand-alone plugin, not a module)
Would be nice to be able to include this functionality next to coverage metrics from other sources in a single report.
Thanks
M
This is more apparent when you increase sample size:
Reducing to 10% opacity helps while still preserving an aspect of background:
function highlight_sample(sample) {
setHashParams({'sample': sample})
let d = document.getElementById("gene_plot")
let vals = d.data.map((t, i) => {
if (t.tracktype == "background") {
return [1, 1.5]
}
if (t.tracktype != 'sample') {
return [undefined, undefined]
} else {
if (t.name == sample) {
return [1, 2]
} else {
return [0.10, 0.36]
}
}
})
Plotly.restyle(d, {'opacity': vals.map(i => i[0]), 'line.width': vals.map(i => i[1]), 'hovertemplate': vals.map(i => i[1] > 0.8 ? HOVER_TEMPLATE: null)})
}
Open to community suggestions for fixing cases like NEB.
with large cohorts or large number of genes, serializing to json becomes the slowest part of the command-line tool.
switch to use jason instead of stdlib.
has columns:
sample | transcript mean | (all) CDS mean | selection mean | transcript bases < background lower | selection bases < background lower | CDS bases < background lower | transcript bases < cutoff | CDS bases < cutoff | selection bases < cutoff.
these values are available from the transcript.stats() function.
where the selection columns are empty when there is no selection.
It would be nice to be able to run this on servers with restricted internet access. Would it be possible to have a solution where you could generate a file with transcript info that can be reused in these cases?
# Something like this to generate the file:
seqcover generate-db --genes PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5,GABRA1 \
--hg19 \
-o offline_db.json
# Then something like this to use it
seqcover report --genes PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5,GABRA1,CAD,MDH2,SCN1B,CNPY3,CPLX1,NEB,HNRNPA1,CCDC39,AIFM1,CHCHD10 \
--background seqcover/seqcover_p5.d4 \
--fasta $fasta samples/*.bed.gz \
-r my_genes_report.html \
--offline-db offline_db.json
gene.description
somewhere.customdata[5]
, it uses customdata.cdsend
When a user changes the metric for the heatmap to CDS bases below 7 the color scale essentially inverts. Blue becomes well covered and yellow low covered. I think we should consider either flipping the colors such that blue remains the color of interest within the plot or at a minimum add a header in the dropdown that informs the user of this swap.
we don't show the chromosome of each gene anywhere.
Hi everyone,
I've been testing seqcover
tool with one sample coverage file processed by mosdepth
in *.bed.gz format. This sample is from a WES experiment. I run the next command and it gave me an error:
${seqcover} report --genes BRCA1 --fasta ${fasta} ${input}/sample.bed.gz -r ${outfile} --hg19
[seqcover] read 1 sample coverage files
17 41195811 41277881
strutils.nim(1087) parseInt
Error: unhandled exception: invalid integer: CEX-chr17-41196311-41197870 [ValueError]
It looks for BRCA1 coordinates on my BED file but shows this exception error. I don't know if there is a problem with the name of regions. Ask for any other information you need.
I hope you could help me, thanks in advance.
Nice tool! I have noticed some genes (e.g. HNRNPA1) gives an assertionerror, and I have not been able to figure out why.
code used:
./seqcover report --genes HNRNPA1 \
--fasta $fasta samples/*_recal.per-base.d4 \
--background seqcover_out/seqcover_p5.d4 \
-r my_genes_report.html \
--hg19
# Gives:
fatal.nim(49) sysFatal
Error: unhandled exception: transcript.nim(143, 14) `r_off - l_off == o_exon[1] - o_exon[0]` [AssertionError]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.