Giter Site home page Giter Site logo

igvteam / igv-reports Goto Github PK

View Code? Open in Web Editor NEW
341.0 15.0 51.0 108.47 MB

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.

License: MIT License

HTML 28.77% Python 67.11% Shell 4.12%

igv-reports's Introduction

igv-reports

igv-reports - A Python application to generate self-contained HTML reports for variant review and other genomic applications. Reports consist of a table of genomic sites and an embedded IGV genome browser for viewing data for each site. The tool extracts slices of data for each site and embeds the data as blobs in the HTML report file. The report can be opened in a web browser as a static page, with no depenency on the original input files.

Installation

Prerequisites

igv-reports requires Python 3.8 or greater.

Installing igv-reports

pip install igv-reports

igv-reports requires the package pysam version 0.22.0 or greater, which should be installed automatically. However, on OSX this sometimes fails due to missing dependent libraries. This can be fixed following the procedure below, from the pysam docs;
"The recommended way to install pysam is through conda/bioconda. This will install pysam from the bioconda channel and automatically makes sure that dependencies are installed. Also, compilation flags will be set automatically, which will potentially save a lot of trouble on OS X."

conda config --add channels r
conda config --add channels bioconda
conda install pysam

Creating a report

Reports are created with the command line script create_report, or alternatively python igv_reports/report.py. Command line arguments are described below. Although --tracks is optional, a typical report will include at least an alignment track (BAM or CRAM) file from which the variants were called.

Arguments:

  • Required

    • sites VCF, BED, MAF, BEDPE, or generic tab delimited file of genomic variant sites. Tabix indexed files are supported and strongly recommended for large files.
    • --fasta Reference fasta file; must be indexed. This argument should be ommited if --genome is used, otherwise it is required.
    • --genome An igv.js genome identifier (e.g. hg38). If supplied fasta, ideogram, and the default annotation track for the specified genome will be used.*
  • The arguments begin, end, and sequence are required for a generic tab delimited sites file.

    • --begin INT. Column of start chromosomal position for sites file. Used for generic tab delimited input.
    • --end INT. Column of end chromosomal position for sites. Used for generic tab delimited input.
    • --sequence INT. Column of sequence (chromosome) name.
  • Optional coordinate system flag for generic tab delimited sites file only

    • --zero_based Specify that the position in the sites file is 0-based (e.g. UCSC files) rather than 1-based. Default is false.
  • Optional

    • --ideogram FILE. Ideogram file in UCSC cytoIdeo format. Useful when fasta is used to specify the reference.
    • --tracks LIST. Space-delimited list of track files, see below for supported formats. If both tracks and track-config are specified tracks will appear first by default.
    • --track-config FILE. File containing array of json configuration objects for igv.js tracks. See the igv.js wiki for more details. This option allows customization of track parameters. When using this option, the track url and indexURL properties should be set to the paths of the respective files.
    • --roi LIST. Space-delimited list of region-of-interest (ROI) files. See igv.js wiki.
    • --template FILE. HTML template file.
    • --output FILE. Output file name; default="igvjs_viewer.html".
    • --info-columns LIST. Space delimited list of info field names to include in the variant table. If sites is a VCF file these are the info ID values. If sites is a tab delimited format these are column names.
    • --info-columns-prefixes LIST. For VCF based reports only. Space delimited list of prefixes of VCF info field IDs to include in the variant table. Any info field with ID starting with one of the listed values will be included.
    • --samples LIST. Space delimited list of sample (i.e. genotypes) names. Used in conjunction with _ --sample-columns_.
    • --sample-columns LIST. Space delimited list of VCF sample FORMAT field names to include in the variant table. If --samples is specified columns will be restricted to those samples, otherwise all samples will be included.
    • --flanking INT. Genomic region to include either side of variant; default=1000.
    • --standalone Embed all JavaScript referenced via <script> tags in the page.
    • --sort Applies to alignment tracks only. If specified alignments are initally sorted by the specified option. Supported values include BASE, STRAND, INSERT_SIZE, MATE_CHR, and NONE. Default value is BASE for single nucleotide variants, NONE (no sorting) otherwise. See the igv.js documentation for more information.
    • --exclude-flags INT. Value is passed to samtools as "-F" flag. Used to filter alignments. Default value is 1536 which filters alignments marked "duplicate" or "vendor failed". To include all alignments use --exclude-flags 0. See samtools documentation for more details.
    • --idlink URL tempate for information link for VCF ID values. The token $$ will be substituted with the ID value. Example: --idlink 'https://www.ncbi.nlm.nih.gov/snp/?term=$$'
    • --no-embed Don't embed data. Fasta and track URLs are referenced unchanged. The resulting report is dependent on the original data files, which must be specified as URLs. Local files are not supported with this option.
    • --subsample FLOAT. Output only a portion of input alignments (0.0 -> 1.0). See samtools view documentation for more details
    • --maxlen INT. Maximum length of a variant (SV) to show in a single view. Variants exceeding this length will be shown in a split-screen (multilocus) view. default = 10000
    • --translate-sequence-track Three-frame Translate sequence track

Track file formats:

Currently supported track file formats are BAM, CRAM, VCF, BED, GFF3, GTF, WIG, and BEDGRAPH. FASTA. BAM, CRAM, and VCF
files must be indexed. Tabix is supported and it is recommended that all large files be indexed.

Examples

Data for the examples are available in the github repository https://github.com/igvteam/igv-reports. The repository can be downloaded as a zip archive here https://github.com/igvteam/igv-reports/archive/refs/heads/master.zip. It is assumed that the examples are run from the root directory of the repository. Output html is written to the examples directory.

Create a variant report from a VCF file: (Example output)

create_report test/data/variants/variants.vcf.gz \
--fasta https://igv-genepattern-org.s3.amazonaws.com/genomes/seq/hg38/hg38.fa \
--ideogram test/data/hg38/cytoBandIdeo.txt \
--flanking 1000 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--samples reads_1_fastq \
--sample-columns DP GQ \
--tracks test/data/variants/variants.vcf.gz test/data/variants/recalibrated.bam test/data/hg38/refGene.txt.gz \
--output example_vcf.html

Create a variant report from a BED file: (Example output)

echo bed
create_report test/data/variants/variants.bed \
--genome hg38 \
--flanking 1000 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--tracks test/data/variants/variants.bed test/data/variants/recalibrated.bam \
--output example_genome.html

Create a variant report from a TCGA MAF file: (Example output)

create_report test/data/variants/tcga_test.maf \
--genome hg19 \
--flanking 1000 \
--info-columns Chromosome Start_position End_position Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS \
--tracks test/data/variants/tcga_test.maf \
--output example_maf.html

Create a variant report from a generic tab-delimited file: (Example output)

create_report test/data/variants/test.maflite.tsv \
--genome hg19 \
--sequence 1 --begin 2 --end 3 \
--flanking 1000 \
--info-columns chr start end ref_allele alt_allele \
--output example_tab.html

Create a structural variant report from a vcf file with CHR2 and END info fields: (Example output)

create_report test/data/variants/SKBR3_Sniffles_sv.vcf \
--genome hg19 \
--flanking 1000 \
--maxlen 10500 \
--info-columns SVLEN \
--tracks test/data/variants/SKBR3_Sniffles_sv.vcf https://igv-genepattern-org.s3.amazonaws.com/test/bam/reads_lr_skbr3.sampled.bam \
--output example_sv.html 

Create a structural variant report from a bedpe file with two locations (BEDPE format): (Example output)

create_report test/data/variants/SKBR3_Sniffles_tra.bedpe \
--genome hg19 \
--flanking 1000 \
--tracks test/data/variants/SKBR3_Sniffles_variants_tra.vcf test/data/variants/SKBR3.ill.bam \
--output example_bedpe.html

Create a report using a genome identifier: (Example output)

create_report test/data/variants/variants.vcf.gz \
--genome hg38 \
--flanking 1000 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--tracks test/data/variants/variants.vcf.gz test/data/variants/recalibrated.bam \
--output example_genome.html

Create a variant report with tracks defined in an igv.js track config json file: (Example output)

create_report test/data/variants/variants.vcf.gz \
--fasta https://igv-genepattern-org.s3.amazonaws.com/genomes/seq/hg38/hg38.fa \
--ideogram test/data/hg38/cytoBandIdeo.txt \
--flanking 1000 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--track-config test/data/variants/trackConfigs.json \
--output example_config.html

Create a variant report with custom ID link urls: (Example output)

create_report test/data/variants/1kg_phase3_sites.vcf.gz \
--genome hg19 \
--flanking 1000 \
--tracks test/data/variants/1kg_phase3_sites.vcf.gz test/data/variants/NA12878_lowcoverage.bam \
--idlink 'https://www.ncbi.nlm.nih.gov/snp/?term=$$' \
--output example_idlink.html

Create a junction report from a splice-junction bed file: (Example output)

create_report test/data/junctions/Introns.38.bed \
--genome hg38 \
--type junction \
--track-config test/data/junctions/tracks.json \
--info-columns TCGA GTEx variant_name \
--title "Sample A" \
--output example_junctions.html

Create a fusion report from a Trinity fusion json file:

create_report test/data/fusion/igv.fusion_inspector_web.json \
--fasta test/data/fusion/igv.genome.fa  \
--template igv_reports/templates/fusion_template.html  \  
--track-config test/data/fusion/tracks.json  \
--output example_fusion.html

Create a report containing wig and bedgraph files

create_report test/data/wig/regions.bed \
--genome hg19 \
--exclude-flags 512 \
--tracks test/data/wig/ucsc.bedgraph test/data/wig/mixed_step.wig test/data/wig/variable_step.wig \
--output example_wig.html

Use of info-columns-prefixes option. Variant track only, no alignments. (Example output)

python igv_reports/report.py test/data/annotated_vcf/consensus.filtered.ann.vcf \
--genome hg19 \
--flanking 1000 \
--info-columns cosmic_gene \
--info-columns-prefixes clinvar \
--tracks test/data/annotated_vcf/consensus.filtered.ann.vcf \
--output example_ann.html 

Use --exclude-flags option to include duplicate alignments in report by specifying a samtools --exclude-flags value. Default value is 1536 which filters duplicates and vendor-failed reads.

create_report test/data/dups/dups.bed \
--genome hg19 \
--exclude-flags 512 \
--tracks test/data/dups/dups.bam \
--output example_dups.html

Use -no-embed option to use external URL references for tracks in the report.

create_report test/data/variants/variants.vcf.gz \
--genome hg38 \
--no-embed \
--tracks https://igv-genepattern-org.s3.amazonaws.com/test/reports/variants.vcf.gz https://igv-genepattern-org.s3.amazonaws.com/test/reports/recalibrated.bam \
--output example_noembed.html

Converting genomic files to data URIs for use in igv.js

The script create_datauri (python igv_reports/datauri.py) converts the contents of a file to a data uri for use in igv.js. The datauri will be printed to stdout. NOTE It is not neccessary to run this script explicitly to create a report, it is documented here for use with stand-alone igv.js.

Convert a gzipped vcf file to a datauri.

create_datauri test/data/variants/variants.vcf.gz

Convert a slice of a local bam file to a datauri.

create_datauri --region chr5:474,969-475,009 test/data/variants/recalibrated.bam 

Convert a remote bam file to a datauri.

create_datauri --region chr5:474,969-475,009 https://1000genomes.s3.amazonaws.com/phase3/data/NA12878/alignment/NA12878.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.bam

igv-reports's People

Contributors

ak avatar alperyilmaz avatar asmariyaz23 avatar brianjohnhaas avatar devangthakkar avatar dlaehnemann avatar felixmoelder avatar fgvieira avatar helgathorv avatar ishanley avatar jethror1 avatar johanneskoester avatar jrobinso avatar juanesarango avatar krdav avatar sadams2013 avatar tomkinsc avatar wdecoster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

igv-reports's Issues

Not all reads displayed

When I view my rendered HTMLs in the browser, it seems that at highly covered loci not all reads are displayed in the IGV panel. Is a downsampling step involved?

Not able to find reference on s3.amazon

When I run any of the two examples I get the following error:

create_report examples/junctions/Introns.38.bed \
    https://s3.dualstack.us-east-1.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa \
    --type junction \
    --ideogram examples/junctions/cytoBandIdeo.txt \
    --output junctions.html \
    --track-config examples/junctions/tracks.json \
    --info-columns TCGA GTEx variant_name \
    --title "Sample A"

Traceback (most recent call last):
  File "/home/daniel/miniconda3/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/home/daniel/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/home/daniel/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 87, in create_report
    data = fasta.get_data(args.fasta, region)
  File "/home/daniel/miniconda3/lib/python3.8/site-packages/igv_reports/fasta.py", line 21, in get_data
    fasta = pysam.FastaFile(fasta_file)
  File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.__cinit__
  File "pysam/libcfaidx.pyx", line 155, in pysam.libcfaidx.FastaFile._open
OSError: file `https://s3.dualstack.us-east-1.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa` not found

Sorting option

Hi,

Is there a way to add more sorting options to the popup window when clicking on the reads.

Thanks

support for .tsv or .maf format input files

as mentioned here #43, it would be really nice if igv-reports supported reading input from a tab separated or .maf input file. Right now it does not seem to work with that, forcing you to go through the process of converting your maf back into vcf to use it with igv-reports, which is a headache at times.

Present ANN field in a visual way

The table should automatically present the ANN field in a more accessible way. As far as I know it is now standardized across multiple annotation tools. I guess what makes sense is the following:

  • Split by comma.
  • Sort resulting items by impact.
  • Have a nested table with the fields that are separated by pipes as columns.
  • To save space, maybe only the first row should be shown and the others should be expandable. Frameworks like twitter bootstrap would make the latter quite easy.

Input/output error when creating IGV report

Dear all,

I am trying to create an IGV report:

create_report --flanking 100 --info-columns DP AF ANN --sample-columns GT DP AD GQ --standalone --output out.html variants.annot.vcf.gz reference_genome.fasta --tracks sample.cram

But I am getting the error below:

[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[...]

What does this error mean? If I try access the URL, it works fine.
And why is IGV trying to access ENA if I provide the reference genome?

thanks,

Dont output index files

Don't include index files (.fai, .bai, .tbi, .idx) in the generated html. They will not be used.

need ability to freeze column headers in HTML table

If the variant table in the HTML report is very long, you lose sight of the column header labels as you scroll down, making it hard to remember which column is which. It would be great if there was a way to freeze the header row in place as you scroll, similar to the "freeze pane" feature in Excel.

igv-reports doesn't display bam but igv does

Absolute noob here - be gentle.

Is there any conceptual reason why a bam/bed combo would display fine in igv proper (java) but only the bed shows up in a report generated as follows?

create_report bedfile.bed hg19.fasta --tracks bamfile.bam

These two made with the same files...
Screenshot 2021-12-14 151244
image

[edited to show matching locations in igv and igv-reports]

HTML report too large

Hi all, I have been using igv-report and I like it so far. However, I am having some issues with the html files being too large when using a bam file that is not subsetted. Any suggestions on how to solve this issue? We would like to incorporate this tool into our analysis pipeline and would like a report that shows the entire length of the bam file. Thanks!

ValueError: start out of range (-261)

Hi!

I'm trying to use igv-reports to visualize .vcf files of SARS-CoV2. I'm running into an error which I think arises from the fact that the genome of the SARS-CoV2 virus does not have chromosomes. I tried the minimal command create_report sample1.vcf.gz sarscov2.fa (where sample1.vcf.gz is the compressed VCF file, and sarscov2.fa is the genome in FASTA format), and the error output I have recievied is the following:

Traceback (most recent call last):
  File "/home/ctuni/miniconda3/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/home/ctuni/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/home/ctuni/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 87, in create_report
    data = fasta.get_data(args.fasta, region)
  File "/home/ctuni/miniconda3/lib/python3.8/site-packages/igv_reports/fasta.py", line 23, in get_data
    slice_seq = fasta.fetch(chr, start, end)
  File "pysam/cfaidx.pyx", line 266, in pysam.cfaidx.FastaFile.fetch
  File "pysam/cutils.pyx", line 221, in pysam.cutils.parse_region
ValueError: start out of range (-261)

I am sure I am doing something wrong but I can't see what it is. I someone could direct me in the right direction it would be appreciated, thank you very much!

Issues printing multiple INFO columns

To whom this may concern,

I am trying to include multiple INFO field columns in the html report but for some reason it only prints values from the first column. Also the column header names are missing for the info-columns specified.

The following command was used:
create_report SS-10441949-S-123019_S22_concensus.filtered.ann.vcf.gz GRCh37.fa --ideogram cytoBandIdeo.txt --info-columns cosmic_gene cosmic_hgvsc cosmic_hgvsp cosmic_legacy_id --tracks SS-10441949-S-123019_S22_concensus.bam --output SS-10441949-S-123019.html

The VCF was generated using GATK v4.1 HaplotypeCaller. Annotation was done with vcfanno and cosmic/clinvar VCFs. We have additional problems trying to parse the FORMAT:AF and FORMAT:AFDP fields into the html report.

See the attached annotated and filtered VCF, and html report generated. Rename the extension to .html before use.

SS-10441949-S-123019_S22_consensus.filtered.ann.vcf.gz
SS-10441949-S-123019.html.txt

Thanks,
Ivan

Suggestion: add examples of output reports

Are there example outputs hosted somewhere?

I realise there is an examples folder with data and clear steps to run the tool, but clicking a link that says "example output" would be nice for someone who doesn't even know what a reports looks like

BAM files size limitation

Hi,
It seems that there is a limitation on how many reads you can load in the html output?
A couple tries, looks like:

  1. Cannot display the tumor and its matched normal reads(beneath the tumor read window) at the same time?
  2. Not all mutations' reads are displayed in the read window

Thanks

Warning: The index file is older than the data file

Hi,

I finally got to setting up igvreports as per @jrobinso 's suggestion on the IGV Web App repo's issue page. I was able to set up the dependencies and launch and view the examples successfully.

I prepared a BED file with the sequence of searches that I would like to query, as per the formatting of example BED files in the repo. I do not get any warnings when I use the BED file with your example data.

When I use my own BAM file though, with hg38 reference link from example command, I get the following warning on every search in my BED file:
[W::hts_idx_load2] The index file is older than the data file:PATH_TO_FILE/BH10281_1.bai

The files BH10281_1.bai and BH10281_1.bam as per 'Date Created' column were made at the same time. What exactly is this warning referring to? Could older mean an older reference (hg19)? I can download hg19 FASTA files and try with that if that's the case.

Best,
MarkleLab

Why is variant sites a mandatory input for igv-reports

Hi, why is the variants file such as VCF a mandatory argument for igv-reports. Quite often I use IGV (the standalone app) to visualize the RNA-seq data (alignment files) along with just the reference genome and annotations (GFF) file. Is it possible to replicate such behavior using igv-reports so that the reports can be shared more easily withy my colleagues?

Load file from private Google Cloud bucket

Hi,

I was wondering if it is possible to load a BAM file from a Google Cloud bucket. I tried loading a public BAM (example code with only the BAM location replaced) and that didn't seem to work. I understand that igv.js is able to load private Google cloud storage if we provide it with the requisite credentials - would it be possible to extend that to igv-reports as well?

> create_report test/data/variants/variants.vcf.gz \
http://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa \
--ideogram test/data/hg38/cytoBandIdeo.txt \
--flanking 1000 --info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--samples reads_1_fastq --sample-columns DP GQ \
--tracks test/data/variants/variants.vcf.gz gs://genomics-public-data/NA12878.chr20.sample.bam test/data/hg38/refGene.txt.gz \
--output examples/example_vcf.html

[E::hts_open_format] Failed to open file gs://genomics-public-data/NA12878.chr20.sample.bam
Traceback (most recent call last):
  File "/usr/local/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/report.py", line 345, in main
    create_report(args)
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/report.py", line 84, in create_report
    reader = utils.getreader(config, None, args.fasta)
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/utils.py", line 13, in getreader
    return bam.BamReader(path)
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/bam.py", line 11, in __init__
    header = pysam.view(*args)
  File "/usr/local/lib/python3.6/dist-packages/pysam/utils.py", line 75, in __call__
    stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools view: failed to open "gs://genomics-public-data/NA12878.chr20.sample.bam" for reading: Protocol not supported\n'

<DEL> in alt field leads to crossed out entry in table

When a VCF file contains a <DEL> allele, the corresponding entry in the HTML table is not rendered correctly. Seems like the <DEL> is interpreted as HTML tag. I suggest to escape all VCF entries before rendering them in the table.

igv-report-del-bug

fix for integer Info columns not in latest release on pip

I followed the instructions in the README and installed with

pip install igv-reports

However this gave me version 1.0.1. It seems like a number of important fixes are not included, importantly the ones in bbdf1e8

For example, the version of igv-reports I got off of pip has the following for varianttable.py;

def render_value(v):
    """Render given value to string."""
    if isinstance(v, float):
        # ensure that we don't waste space by insignificant digits
        return f'{v:.2g}'
    elif v.startswith('http://') or v.startswith('https://'):
        return create_link(v)
    else:
        return str(v)

but the one I cloned off GitHub here has this, which fixes some bugs I was hitting;

def render_value(v):
    """Render given value to string."""
    if v is None:
        return ""
    elif isinstance(v, float):
        # ensure that we don't waste space by insignificant digits
        return f'{v:.2g}'
    elif isinstance(v, str):
        str_val = v.replace('"', '')
        
        if str_val.startswith('http://') or str_val.startswith('https://'):
            return create_link(str_val)
    
    return str(v)

Can there be a new release pushed to pip that includes these fixes? Thanks.

INFO field parsing

Hi,

Thank you for igv-reports, its a great tool. I am currently using it to visualise structural variants and wanted to include some additional INFO fields in the variants table. Using the IGVreports biocontainer v1.6.1 and Singularity, I ran the following:

vcf=/data/Results/sample_recessive.vcf.gz 
bam=/data/Bams/sample_sorted.bam

singularity run -B /data igv-reports_1.6.1--pyh7cba7a3_0.sif \
create_report \
	${vcf} \
	--genome hg38 \
	--info-columns SVTYPE SVLEN CHR2 END \
	--flanking 1000 \
	--tracks ${vcf} ${bam} \
	--output /data/Results/sample_igvreport.html

However, in the .html report only only the CHROM, POSITION, REF, ALT, ID, SVTYPE, SVLEN, CHR2 columns are filled in for each variant site. I have confirmed the END INFO field is contained within my VCF header and I am correctly naming it:

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">

Are only some INFO fields currently supported? Would appreciate any help on understanding/resolving this :)

When the bam file (200 Mb) is big, it failed to load

I found that there is an issue if the .bam alignment file is 200Mb, the displaying will fail. There is no other error except that the browser complains a slow loading issue. After testing for a while, the result is that loading always fails.
Any solution to this? Reduce the sliced bam size or relative small region?

I think the RAM for this computer used in testing is big enough (126 G).
As a control, I tested in the IGV browser the same .bam file and it works well.

Option to filter/hide duplicate reads in igv-reports?

Hello! While using the tool, there are snapshots containing duplicated reads from the BAM file. Is there an option to hide/remove these duplications? I saw that igv.js has had something like this resolved- is there something I can implement for igv-reports?
Thank you

igv-report does not embed igv track

Hi,
I was trying to generate report using conda environment. I downloaded the sample data and run the sample code (I replaced genome URL with local file location). I can generate the html output, when I open the file I can only see variant table but IGV is not embedded. (see image below)

image

When I check the html contents I can see the following (I trimmed to output, shown as ...):

<div id="igvContainer">
    <div id="igvDiv"></div>
</div>


<script type="text/javascript">

    var tableJson = [{"unique_id": 0, "CHROM": "chr5", "POSITION": 474989, "REF": "A", "ALT": "G",...

    var sessionDictionary = {"0": "data:application/gzip;base64,H4sIABLVHF0C...

How can I fix this problem?

High coverage BAMs needs to be downsampled in order to show reads

Hi,

I was able to display reads on two(T and N) bam tracks but only via downsampling the BAMs. The downsample method I did removes the reads randomly and thus removes the alt reads (the variants). My questions are:

  1. Is there a way to modify this limitation? (There was a similar post about not all reads displayed but I am sure if this is a related issue)
  2. If 1. is not possible, can you suggest a way to retain the alt reads (variants) when downsampling BAMs.
  3. Can it be used to view structure variants and how would I go about including the SVs in the input vcf.

output HTML attached.
igvjs_viewer.html.txt

Thanks

Mistake in latest release number

The latest release has been labeled with the release number 0.92 instead of 0.9.2.
As this would undermine the bioconda version history of igv-reports this should be corrected before updating the bioconda package.

converting the html to pdf

What is the best way to deliver pdf files of the snapshots rather than html ?
The results are being viewed in an application that does not support html but does render pdf

initialization configuration questions

HI,
I have tried the igv-reports and this is an amazing tool. Thanks so much for your contribution. Now I have encountered some problems:

  1. When I suppy only one variant in the vcf for the create_report, it does not load the IGV track. I change sessionDictionary["1"] to sessionDictionary["0"] and it worked.
    sessionURL: sessionDictionary["1"],
  2. Is there any way I can configure the track upon the track initialization? For example, I'd like each track already sorted by some rule when I open the html instead of I open the html and right-click the base and select the sorting rule from the menu. I tried to change igv.browser.loadsession to igv.browser.loadTrack as indicated by https://github.com/igvteam/igv.js/wiki/Browser-Control-2.0#loadtrack but it did not work . So, could you please give me some hints about how to implement this?
  3. I have read the issue #21 (comment) closed months ago. Is there any way to configure "samplingDepth" now as shown by igv.js? I think this is the same type of question as 2.

AttributeError: 'pysam.libcbcf.TabixIterator' object has no attribute 'append'

Hi,

I'm sure I'm doing something wrong but I cannot find the source of my error. igv-reports used to work with simple variants but now it fails for BEDPE.

version:

igv-reports               1.7.0                    pypi_0    pypi

This is my command:

create_report /path/work/5e/01e8bdd63bfe616f2b874350506418/OUT/00013.bedpe  \
/path/to/hs37d5_all_chr.fasta  \
 --ideogram "/path/work/00/7828d9e3ffbc5df2b0533346431846/hg19.cytoBandIdeo.txt"  \
--flanking 250  \
--tracks /path/work/jeter.vcf.gz /path/to/file1.cram /path/to/file2.cram /path/to/file3.cram /path/to/file4.cram /path/work/f4/a337a2930ad0f83b3074666080e898/hg19.refGene.txt.gz         --output TMP/output.html

And this is my error:

Traceback (most recent call last):
  File "/CONDAS/users/lindenbaum-p/IGVREPORTS/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/CONDAS/users/lindenbaum-p/IGVREPORTS/lib/python3.7/site-packages/igv_reports/report.py", line 346, in main
    create_report(args)
  File "/CONDAS/users/lindenbaum-p/IGVREPORTS/lib/python3.7/site-packages/igv_reports/report.py", line 210, in create_report
    data = reader.slice(region, region2)
  File "/CONDAS/users/lindenbaum-p/IGVREPORTS/lib/python3.7/site-packages/igv_reports/vcf.py", line 42, in slice
    records.append(rec)
AttributeError: 'pysam.libcbcf.TabixIterator' object has no attribute 'append'

I'm not a python guy but it looks like the first iterator should be converted to an array before calling append?

Alignment track whitespace

Hello,

When creating a track and using the following:

[{"name": "174545_SA31", "type": "alignment", "format": "bam", "url": "/bams/174545_SA31.markdup.sorted.bam", "indexURL": "/bams/174545_SA31.markdup.sorted.bam.bai", "showAlignments": false}

showAlignments = false , yet in the output a whitespace appears were the alignment track visual would be like in the following image:

image

However, It would be ideal to have it load like this:
image

I can get to the 2nd image by ticking and unticking the alignment track, but would love for it to display without whitespace as default.

Thank you, and appreciate the tool!

Need an error message if the wrong reference genome is being used

Just spent a lot of time trying to debug why the output html reports were not working, getting errors like this in the Firefox web console;

Error: Unrecognized locus chr5:181224454-181224494
Uncaught (in promise) TypeError: e is undefined


Source map error: Error: NetworkError when attempting to fetch resource.
Resource URL: file:///Users/steve/Downloads/igvjs_viewer-2.html
Source Map URL: igv.css.map

Source map error: Error: NetworkError when attempting to fetch resource.
Resource URL: file:///Users/steve/Downloads/igvjs_viewer-2.html
Source Map URL: igv.min.js.map

with the variant table showing but the IGV browser not showing up

Turns out its because I was using hg19 instead of hg38;

create_report \
--standalone \
--ideogram /igv-reports/examples/variants/cytoBandIdeo.txt \
--flanking 1000 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--tracks \
/igv-reports/examples/variants/variants.vcf.gz \
/igv-reports/examples/variants/recalibrated.bam \
/igv-reports/examples/variants/refGene.sort.bed.gz \
--output igvjs_viewer.html \
/igv-reports/examples/variants/variants.vcf.gz \
hg19.fasta

Would have been really nice to have gotten an error about this from create_report instead of having to debug it inside the web browser after loading the HTML report output. Is that possible?

HTML report not generated if variant location does not exist (was HTML report does not work with hg19.fasta)

I modified the example command to use the hg19 genome as shown;

create_report \
	examples/variants/variants.vcf.gz \
	https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta \
	--ideogram examples/variants/cytoBandIdeo.txt \
	--flanking 1000 \
	--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
	--tracks examples/variants/variants.vcf.gz examples/variants/recalibrated.bam \
	examples/variants/refGene.sort.bed.gz \
	--output igvjs_viewer.html

however the HTML output report does not show the IGV browser tracks. Is there something wrong with that genome file? I did not get any error messages.

Export all tracks to image file

Thank you for this great tool -- it is very powerful and the HTML output is excellent. However, I was wondering if there is any way to use this tool to export the default tracks as SVG or PNG files for each of a list of input variants. Typically, I am running this on ~ dozens of variants and it would be useful to have defined tracks output as images. It seems like this should be possible to implement by iterating through the individual variants in the "variant_table" and then exporting the contents of "igv-column," which is something I will be trying to execute on my end, but I was wanting to know (a) whether this is a feature under consideration and (b) if you know an elegant way of accomplishing this objective, preferably with programmatic naming.

Down-sampling

Is downsampling supported? The files I'm producing are way too large.

TypeError: a bytes-like object is required, not 'str'

i run

python igv-reports/igv_reports/report.py  mafs/variants.maf   /mnt/tool/ref_资源/iGenomes/references/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta   --ideogram /mnt/tool/ref_资源/iGenomes/references/Homo_sapiens/other/cytoBandIdeo.txt.gz  --flanking 1000 --info-columns Chromosome Start_Position End_Position Variant_Classification Variant_Type Reference_Allele    HGVSc HGVSp HGVSp_Short RefSeq AF  t_depth t_ref_count t_alt_count n_depth n_ref_count n_alt_count gnomAD_AF    --tracks bams/patient101.tumor.bam   /mnt/tool/ref_资源/iGenomes/references/Homo_sapiens/other/refGene.txt.gz     --output example_maf.html
Traceback (most recent call last):
  File "igv-reports/igv_reports/report.py", line 350, in <module>
    main()
  File "igv-reports/igv_reports/report.py", line 346, in main
    create_report(args)
  File "igv-reports/igv_reports/report.py", line 84, in create_report
    reader = utils.getreader(config, None, args)
  File "/home/tanqiang/mambaforge/envs/igvreports/lib/python3.7/site-packages/igv_reports/utils.py", line 13, in getreader
    return bam.BamReader(filetype, path, args)
  File "/home/tanqiang/mambaforge/envs/igvreports/lib/python3.7/site-packages/igv_reports/bam.py", line 13, in __init__
    seqnames = parse_seqnames(header)
  File "/home/tanqiang/mambaforge/envs/igvreports/lib/python3.7/site-packages/igv_reports/bam.py", line 53, in parse_seqnames
    lines = header.split('\n')
TypeError: a bytes-like object is required, not 'str'

pysam-developers/pysam#292 (comment)

This seems to be a problem with pysam, but the problem is that when using example bam, it returns str, everything is usual. But when using my own bam, it will return bytes object and cannot use the split method.
in igv_reports/bam.py revise lines = str(header).split('\n')
Forced conversion can solve the problem, but I'm not sure if it will create new problems.

I created a bam for the minimum reproducible problem,Perhaps there is a certain pattern in its header?

samtools view -b  bams/patient101.tumor.bam  1 > problem.part.bam

problem.part.bam.gz

BEDPE support

It would be great if paired-end BED (BEDPE) could be supported

Error when specifying URL in file for `--track-config`

I'm using a Singularity image of igv-reports v.1.0.4, and get this error when I add --track-config.

Traceback (most recent call last):
  File "/usr/local/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/report.py", line 271, in main
    create_report(args)
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/report.py", line 63, in create_report
    reader = utils.getreader(config["url"])
TypeError: string indices must be integers

This is what the file I pass to --track-config looks like (validated with JSONLint):

{
      "name": "Genes",
      "type": "annotation",
      "format": "gff3",
      "sourceType": "file",
      "url": "data/NC_000962.3.gff3.gz",
      "indexURL": "data/NC_000962.3.gff3.gz.tbi",
      "displayMode": "EXPANDED"
}

This is the command that generated the error:

singularity exec -B $PWD /singularity/igv-reports-1.0.4.sif create_report output/sample/sample_normalized.vcf.gz data/GCF_000195955.2_ASM19595v2_genomic.fna --flanking 1000 --info-columns AF DP MQ QD ANN --tracks data/NC_000962.3.gff3.gz --track-config configs/igv-tracks.config --output output/sample/igv.html

As far as I can tell my JSON config file looks similar to the example given in the igv.js Tracks wiki, except that I added double quotes around the property names, because otherwise, I get a JSONDecodeError.

I also have pysam 0.18.0 installed. Please let me know what further info is needed to help me with this issue, thank you!

Display ID column in table

Usually, the ID column is used to present association with known variants. The table should be able to show this information. Ideally, it the user should be able to specify a URL pattern that shall be applied to the ID column in order to allow for a linkout. If nothing is provided the linkout could simply google the ID, or apply some heuristics to guess what should be the pattern. E.g., something starting with COSM should link out to COSMIC.

Request overhang softclip sequence view in igv reports

In the desktop IGV viewer when softclip reads are selected it will show the overhands at the ends of the reference. This is extremely useful for whole plasmid sequencing. Is this something that can be mirrored for IGV reports?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.