sigven / cpsr Goto Github PK

Cancer Predisposition Sequencing Reporter (CPSR)

Home Page: https://sigven.github.io/cpsr/

License: Other

Shell 0.08% R 86.85% CSS 1.76% TeX 11.31%

cancer-genomics inherited workflow-engine germline-variants cancer-research reporting-tool report-generator docker predisposition cancer-predisposition

cpsr's People

Contributors

Stargazers

Watchers

Forkers

color4 vladsavelyev doanle0906 xchromosome inambioinfo icmlab jxshi ohofmann tiamat-tech 1seokyoo umccr alasiriab2

cpsr's Issues

CPSR gnomAD maf cut-off

Hey Sigve,

Just wondering if there is any reason to have a low population frequency cut-off for germline variants? My first guess would be that predisposition variants more common than 5%, but I admit I don't know really.

Vlad

Missed variant due to transcript pick?

Hi Sigve, attaching a variant that was picked with CPSR 0.4.0, but missed in 0.5.2 - I'm struggling to understand why: germline_GJB2_13_20189473.vcf.gz

The variant has multiple transcript annotations. In 0.4.0, there are 4:
ENST00000382844 - missense_variant, protein_coding
ENST00000382848 - missense_variant, protein_coding - 0.4.0 picked this one
ENST00000624851 - upstream_gene_variant, TEC
ENST00000645189 - missense_variant, protein_coding

In 0.5.2, there are 3:
ENST00000382844 - missense_variant, protein_coding
ENST00000382848 - missense_variant, protein_coding
ENST00000624851 - upstream_gene_variant, TEC - 0.5.2 uses this one

In both versions, there is a choice of a missense coding variant, and 0.4.0 picks one, however 0.5.2 choses an upstream version and doesn't show the variant in the report.

Not sure if vep_pick_order is relevant here. For 0.4.0 runs, I use in the config toml:

vep_pick_order = "rank,appris,biotype,tsl,ccds,canonical,length"

For 0.5.2, I use same plus mane:

vep_pick_order = "rank,appris,biotype,tsl,ccds,canonical,length,mane"

So I would guess it should pick based on rank first, meaning that protein_coding should have a higher priority. But I also tried 0.5.2 with a default pick order - same result.

Wondering if you know why a protein coding transcript is not picked?

The command I'm using:

cpsr.py germline_GJB2_13_20189473.vcf.gz pcgr output grch38 --panel_id 0 cpsr.toml SBJ00203__SBJ00203_MDX190232_L1901042-normal  --no-docker --force_overwrite --no_vcf_validate --debug

Where the toml for 0.5.2 is: cpsr.toml.zip

x object 'GWAS_CITATION' not found

Hi Sigve,

I'm getting the below error.

Can you give an idea?

VCF: test.hc.merge.vcf.gz

Singularity container is library://bruce.moran/default/projects:somatic_n-of-1.centos7.conda.

Command:

cpsr.py  \  
  --no-docker  \
  --no_vcf_validate \
  --panel_id 0 \
  --query_vcf test.hc.merge.vcf.gz \
  --pcgr_dir pcgr \
  --output_dir ./ \
  --genome_assembly grch38 \
  --conf pcgr/data/grch38/cpsr_configuration_default.toml \
  --sample_id test

Error: Problem with `filter()` input `..1`.
x object 'GWAS_CITATION' not found
i Input `..1` is `!is.na(GWAS_HIT) & !is.na(GWAS_CITATION)`.
Backtrace:
    x
 1. +-pcgrr::generate_cpsr_report(...)
 2. | +-dplyr::filter(calls, !is.na(GWAS_HIT) & !is.na(GWAS_CITATION))
 3. | \-dplyr:::filter.data.frame(calls, !is.na(GWAS_HIT) & !is.na(GWAS_CITATION))
 4. |   \-dplyr:::filter_rows(.data, ...)
 5. |     +-base::withCallingHandlers(...)
 6. |     \-mask$eval_all_filter(dots, env_filter)
 7. \-base::.handleSimpleError(...)
 8.   \-dplyr:::h(simpleError(msg, call))

adding clones IDs + child clone emerging outside parent clone

Good after noon !
I have three questions please

is it possible to edit some code/parameters to fix this issue when you have a child clone originating outside its parent clone, like this picture:

Now in this second image you see for clones (two above and two below, and probably some hidden clones) starting to emerge at this timepoints, is it possible that we have all those emerging clones starting on the same vertical line ? if not, based on what is the starting point of those emerging clones chosen ?

Finally, could we have IDs before the start of each clone ? to be able to differentiate them ?

thanks a lot for your support !

make infromation like ckb to cpsr

Thanks a lot for this awesome tool
have you ever noticed a database name ckb, can you add variant description and PMID, thanks a lot

ACMG guideline updated to version 3.0. New genes were added to the secondary finding gene list.

Hi @sigven ,

ACMG has updated its recommendations for reporting incidental findings in clinical exome and genome sequencing and 14 genes were added: ACVRL1, BTD, CASQ2, ENG, FLNC, GAA, HFE, HNF1A, MAX, PALB2, RPE65, TMEM127, TRDN, and TTN. . Here is the link to the report.

ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing

So maybe you can update the gene list of secondary findings in the cpsr pipeline.

Thank you and have a good day!
Best,
Jianxiang

Unused arguments in crosstalk::filter_slider

Hi Sigven,

I should have started the other issue with a thanks for the package btw! I am currently testing out CPSR for germline reporting and tiering as I already have PCGR in use for somatic reporting. So a huge thank you.

Anyway found another bug (I am not sure if it just because I am using the non-dockerised version on our cluster, so some packages may be out of date)

Error in crosstalk::filter_slider("CLINVAR_REVIEW_STATUS_STARS", "ClinVar review status stars", : unused arguments (min = 0, max = 4) Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> <Anonymous> Execution halted

I have just hashed out the min = 0 and max = 4 in the lines of predisposition_class1_5.Rmd and the resulting html report slider looks to be functioning correctly.

Error: Problem with `mutate()` input `ACMG_PVS1_1`.

Dear Sigve,

I just update my local cpsr analysis pipeline to the latest version. However, an error message popped up when I used the example vcf file, and the same error message popped up when I used my own vcf file too.
Here is the command that I used:

python ~/biosoft/cpsr/cpsr.py --query_vcf ~/biosoft/cpsr/example/example.vcf.gz --pcgr_dir ~/biosoft/pcgr --output_dir ~/biosoft/cpsr/example --genome_assembly grch37 --panel_id 0 --conf ~/biosoft/cpsr/cpsr.toml --sample_id test --incidental_findings --classify_all --maf_upper_threshold 0.2 --no_vcf_validate

Here is the full log of the error message:

2020-10-01 21:01:17 - cpsr-validate-input-arguments - INFO - STEP 0: Validate input data
2020-10-01 16:01:23 - cpsr-validate-input - INFO - Skipping validation of VCF file - as provided by option --no_vcf_validate
2020-10-01 16:01:23 - cpsr-validate-input - INFO - Checking if existing INFO tags of query VCF file coincide with CPSR INFO tags
2020-10-01 16:01:23 - cpsr-validate-input - INFO - No query VCF INFO tags coincide with CPSR INFO tags
2020-10-01 16:01:23 - cpsr-validate-input - INFO - Limiting variant set to cancer predisposition loci - virtual panel_id 0
2020-10-01 21:01:25 - cpsr-validate-input-arguments - INFO - Finished
2020-10-01 21:01:25 - cpsr-validate-input-arguments - INFO - Finished

2020-10-01 21:01:25 - cpsr-start - INFO - --- Cancer Predisposition Sequencing Reporter workflow ----
2020-10-01 21:01:25 - cpsr-start - INFO - Sample name: test
2020-10-01 21:01:25 - cpsr-start - INFO - Virtual gene panel: CPSR exploratory cancer predisposition panel (n = 216, TCGA + Cancer Gene Census + NCGC + Other)
2020-10-01 21:01:25 - cpsr-start - INFO - Diagnostic-grade genes in virtual panels (GE PanelApp): OFF
2020-10-01 21:01:25 - cpsr-start - INFO - Include incidential findings (ACMG recommended list v2.0): ON
2020-10-01 21:01:25 - cpsr-start - INFO - Include low to moderate cancer risk variants from genome-wide association studies: OFF
2020-10-01 21:01:25 - cpsr-start - INFO - Reference population, germline variant frequencies (gnomAD): ESA
2020-10-01 21:01:25 - cpsr-start - INFO - Genome assembly: grch37

2020-10-01 21:01:25 - cpsr-vep - INFO - STEP 1: Basic variant annotation with Variant Effect Predictor (101, GENCODE release 19, grch37)
2020-10-01 21:01:25 - cpsr-vep - INFO - VEP configuration - one primary consequence block pr. alternative allele (--flag_pick_allele)
2020-10-01 21:01:25 - cpsr-vep - INFO - VEP configuration - transcript pick order: canonical,appris,tsl,biotype,ccds,rank,length,mane
2020-10-01 21:01:25 - cpsr-vep - INFO - VEP configuration - transcript pick order: See more at https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick_options
2020-10-01 21:01:25 - cpsr-vep - INFO - VEP configuration - skip intergenic: True
2020-10-01 21:01:25 - cpsr-vep - INFO - VEP configuration - plugins in use: NearestExonJB, LoF
Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/de_novo_donor.pl line 175.
Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/de_novo_donor.pl line 214.
Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/splice_site_scan.pl line 191.
Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/splice_site_scan.pl line 194.
Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/splice_site_scan.pl line 238.
Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/splice_site_scan.pl line 241.
2020-10-01 21:01:43 - cpsr-vep - INFO - Finished

2020-10-01 21:01:43 - cpsr-vcfanno - INFO - STEP 2: Annotation for cancer predisposition with cpsr-vcfanno (ClinVar, CIViC, dbNSFP, UniProtKB, cancerhotspots.org, GWAS catalog, gnomAD non-cancer subset)
2020-10-01 21:01:49 - cpsr-vcfanno - INFO - Finished

2020-10-01 21:01:49 - cpsr-summarise - INFO - STEP 3: Cancer gene annotations with cpsr-summarise
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 8 variants on chromosome 1
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 4 variants on chromosome 2
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 1 variants on chromosome 3
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 6 variants on chromosome 4
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 4 variants on chromosome 5
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 5 variants on chromosome 7
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 2 variants on chromosome 8
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 2 variants on chromosome 9
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 3 variants on chromosome 10
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 7 variants on chromosome 11
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 4 variants on chromosome 12
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 1 variants on chromosome 13
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 3 variants on chromosome 14
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 3 variants on chromosome 15
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 5 variants on chromosome 16
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 4 variants on chromosome 17
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 1 variants on chromosome 18
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 9 variants on chromosome 19
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 1 variants on chromosome 22
2020-10-01 16:01:51 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 2 variants on chromosome X
2020-10-01 16:01:52 - cpsr-gene-annotate - INFO - Number of non-PASS/REJECTED variant calls: 0
2020-10-01 16:01:52 - cpsr-gene-annotate - INFO - Number of PASSed variant calls: 75
2020-10-01 21:02:07 - cpsr-summarise - INFO - Converting VCF to TSV with https://github.com/sigven/vcf2tsv
2020-10-01 21:02:15 - cpsr-summarise - INFO - Finished

2020-10-01 21:02:15 - cpsr-writer - INFO - STEP 4: Generation of output files - Cancer predisposition sequencing report
2020-10-01 16:02:58 [INFO] Verifying data integrity of input callset
2020-10-01 16:02:58 [INFO] Found the following VCF sample names: TCGA-A6-2686-01A-01D-1408-10_TCGA-A6-2686-10A-01D-2188-10
2020-10-01 16:02:58 [INFO] Excluding 0 variants from non-nuclear chromosomes/scaffolds
2020-10-01 16:02:58 [INFO] Adding citations/phenotypes underlying GWAS hits (NHGRI-EBI GWAS Catalog)
2020-10-01 16:02:58 [INFO] Extending annotation descriptions related to UniprotKB/SwissProt protein features
2020-10-01 16:02:59 [INFO] Number of PASS variants: 75
2020-10-01 16:02:59 [INFO] Number of SNVs: 52
2020-10-01 16:02:59 [INFO] Number of deletions: 21
2020-10-01 16:02:59 [INFO] Number of insertions: 2
2020-10-01 16:02:59 [INFO] Number of block substitutions: 0
2020-10-01 16:02:59 [INFO] Extending annotation descriptions related to KEGG pathways
2020-10-01 16:02:59 [INFO] Extending annotation descriptions related to ClinVar
2020-10-01 16:03:00 [INFO] Adding annotation links - dbNSFP
2020-10-01 16:03:00 [INFO] Adding annotation links - DBSNP
2020-10-01 16:03:00 [INFO] Adding annotation links - CLINVAR
2020-10-01 16:03:00 [INFO] Adding annotation links - GENE_NAME
2020-10-01 16:03:01 [INFO] Adding annotation links - PROTEIN_DOMAIN
2020-10-01 16:03:01 [INFO] Adding annotation links - COSMIC
2020-10-01 16:03:01 [INFO] Adding annotation links - NCBI_REFSEQ
2020-10-01 16:03:01 [INFO] Adding annotation links - targeted cancer drugs
2020-10-01 16:03:01 [INFO] Adding annotation links - gene-cancer type associations (DisGenet)
2020-10-01 16:03:01 [INFO] Adding annotation links - gene-cancer type associations (OpenTargets Platform)
2020-10-01 16:03:02 [INFO] Number of coding variants in cancer predisposition genes: 29
2020-10-01 16:03:02 [INFO] Number of non-coding variants in cancer predisposition genes: 30
Error: Problem with `mutate()` input `ACMG_PVS1_1`.
✖ object 'ACMG_PM2_1' not found
ℹ Input `ACMG_PVS1_1` is `dplyr::if_else(...)`.
Backtrace:
     █
  1. ├─pcgrr::generate_cpsr_report(...)
  2. │ ├─`%>%`(...)
  3. │ │ └─base::eval(lhs, parent, parent)
  4. │ │   └─base::eval(lhs, parent, parent)
  5. │ └─pcgrr::assign_pathogenicity_evidence(...)
  6. │   └─`%>%`(...)
  7. │     ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  8. │     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
  9. │       └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 10. │         └─pcgrr:::`_fseq`(`_lhs`)
 11. │           └─magrittr::freduce(value, `_function_list`)
 12. │             └─function_list[[i]](value)
 13. │               ├─dplyr::mutate(...)
 14. │               └─dplyr:::mutate.data.frame(...)
 15. │                 └─dplyr:::mutate_cols(.data, ...)
 16. │                   ├─base:
Execution halted

Can you check for me, please? Thanks in advance!

Best,
Jianxiang

cpsr_configuration_default.toml missing from PCGR data

Hi Sigve,

possibly due to update of CPSR/PCGR which I am using via Conda in a Singularity container. From RELEASE_NOTES:

##PCGR_SOFTWARE_VERSION = 0.8.4
##PCGR_DB_VERSION = 20191116

Data bundle from http://insilico.hpc.uio.no/pcgr/pcgr.databundle.grch38.20191116.tgz

I am now missing the pcgr/data/*/cpsr_configuration_default.toml config file. Seems like multiple different dirs since I was last in there looking around.

Any ideas? Should I clone the git repo to get the cpsr.toml, will that work same as above file? PResume same should be done for pcgr_configuration_default.toml which is also missing?

Thanks,

Bruce

minor typo in src/cpsr_functions.R

Hi Sigve,

cpsr_v0.6.1, line 1150, file src/cpsr_functions.R:

change wwarning to warning

Best,
Tina

Error: `by` can't contain join column `GENOMIC_CHANGE` which is missing from RHS

Another one (sorry :) ), during cpsr-writer, dockerized one as well:

2018-11-12 12:37:26 [INFO] TIER 3: Other unclassified variants - noncancer_phenotype: n = 0
Error: `by` can't contain join column `GENOMIC_CHANGE` which is missing from RHS
Execution halted

Attaching the test data for this run, and the command is as follows:

./cpsr.py --input_vcf cup__cup_tissue-normal.vcf.gz ~/git/pcgr cpsr_res grch37 ~/git/umccr/umccrise/umccrise/pcgr/cpsr.toml cup__cup_tissue-normal

cpsr_test.zip

Option of not specifying tumor type

Sorry, was testing the feature of creating an issue from a comment. This belongs to PCGR repo

Trio Analysis

Hi @sigven

Is there a use of CPSR in Trio Analysis?

Add sif file for running on an HPC

Hello,

I had a brief suggestion, which is to add a pre-build sif image for those who cannot use docker for their analysis due to inherent security concerns, and have to use Singularity/Apptainer. In my case, Anaconda is also not an option. It's not a problem to make on my own but would probably help you gain more users and streamline the process for newcomers. Writing a bit in the readme about it could be an option too.

Thanks.

Feature Request - support CNV calls (and ideally SV calls)

HI all

Self explanatory hopefully. We have got a germline adaptive sampling panel working on Nanopore, and are generating copy number variants and SVs (including fusions) - do you think there is any chance of support for CNA and SV. Ideally from VCF (compliant ones obviously) but I can lift over calls if easier?

genes being excluded from CPSR exploratory track

Hi Sigve

We are supplying a list of genes to CPSR using the --custom_list option.
Several of those genes are not making it through to the final CPSR report.
Looking at the logs I can see:

2021-02-18 21:31:44 - cpsr-validate-input - WARNING - Ignoring custom-provided gene symbol (ANKRD26) NOT found in CPSR exploratory track 0
2021-02-18 21:31:44 - cpsr-validate-input - WARNING - Choose only symbols from this set: https://github.com/sigven/cpsr/predisposition.md
2021-02-18 21:31:44 - cpsr-validate-input - WARNING - Ignoring custom-provided gene symbol (CSDE1) NOT found in CPSR exploratory track 0
2021-02-18 21:31:44 - cpsr-validate-input - WARNING - Choose only symbols from this set: https://github.com/sigven/cpsr/predisposition.md

https://github.com/sigven/cpsr/predisposition.md does not exist, but https://github.com/sigven/cpsr/blob/master/predisposition.md does. That file does contain ANKRD26, but it does not contain CSDE1. Any insights greatly appreciated!

Encounter the error in cpsr-report-generation()

I try to run the demo case, but failed to generate the report. Could you please help me to solve the problem?

cpsr \
     --input_vcf /home/azureuser/src/cpsr/examples/example.vcf.gz \
     --pcgr_dir /home/azureuser/src/pcgr-1.0.1 \
     --output_dir /home/azureuser/src/cpsr/output-cpsr \
     --genome_assembly grch37 \
     --panel_id 1 \
     --sample_id example \
     --no_docker \
     --maf_upper_threshold 0.2 \
     --force_overwrite

The error message:

...
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Considering variants in the targeted predisposition genes: ACD, AIP, APC, ATM, BAP1, BMPR1A, BRAF, BRCA1, BRCA2, BRIP1, CBL, CDC73, CDH1, CDK4, CDKN1B, CDKN2A, CHEK2, CTC1, DDB2, DGCR8, DICER1, DKC1, EPCAM, ERCC1, ERCC2, ERCC3, ERCC4, ERCC5, EXT1, EXT2, FANCA, FANCB, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCL, FH, FLCN, HRAS, KIT, KRAS, LZTR1, MAP2K1, MAP2K2, MAX, MEN1, MET, MLH1, MSH2, MSH6, MUTYH, NF1, NF2, NOP10, NRAS, NTHL1, PALB2, PARN, PDGFRA, PMS2, POLD1, POLE, POLH, PPP1CB, PTCH1, PTEN, PTPN11, RABL3, RAD51C, RAD51D, RAF1, RB1, RET, RIT1, RTEL1, SDHA, SDHAF2, SDHB, SDHC, SDHD, SHOC2, SLX4, SMAD4, SMARCA4, SMARCB1, SOS1, SOS2, SPRED1, STK11, SUFU, TERC, TERT, TINF2, TMEM127, TP53, TSC1, TSC2, VHL, WRAP53, WT1, XPA, XPC
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Total number of variants in target cancer predisposition genes (for TIER output): 38
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Number of coding variants in target cancer predisposition genes (for TIER output): 21
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Number of non-coding variants in cancer predisposition genes (for TIER output): 17
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Generating tiered set of result variants for output in tab-separated values (TSV) file
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Ignoring n = 0 unclassified variants with a global MAF frequency above 0.2
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Merging ClinVar-classified variants and CPSR-classified (novel) variants - class1
2022-03-30 06:18:09 - cpsr-report-generation - INFO - Zero variants found - class1
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Merging ClinVar-classified variants and CPSR-classified (novel) variants - class2
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Merging ClinVar-classified variants and CPSR-classified (novel) variants - class3
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Merging ClinVar-classified variants and CPSR-classified (novel) variants - class4
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Merging ClinVar-classified variants and CPSR-classified (novel) variants - class5
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Matching variant set with existing genomic biomarkers from CIViC (germline)
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Found n = 0 clinical evidence item(s) at the exact level, 0 unique variant(s)
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Found n = 0 clinical evidence item(s) at the codon level, 0 unique variant(s)
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Found n = 0 clinical evidence item(s) at the exon level, 0 unique variant(s)
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Found n = 3 clinical evidence item(s) at the gene level, 2 unique variant(s)
2022-03-30 06:18:10 - cpsr-report-generation - INFO - NF1:frameshift_variant:p.Pro678ArgfsTer10, CHEK2:splice_donor_variant:NA
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Variants were found in the following cancer predisposition genes: PMS2, TP53, NF1, EPCAM, BRAF, POLE, SMARCA4, POLD1, CHEK2, APC, TSC1, TINF2, FANCA, SOS2, PARN, ERCC2, EXT1, MUTYH, PTEN, TSC2, SDHC, TERC, PDGFRA, KIT, CDKN2A, SDHD, BRCA2, MAP2K1, CDH1, ERCC1, AIP, RTEL1
2022-03-30 06:18:10 - cpsr-report-generation - INFO - ------
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Writing SNV/InDel tab-separated output file with CPSR annotations - ('tsv')
2022-03-30 06:18:10 - cpsr-report-generation - INFO - ------
2022-03-30 06:18:10 - cpsr-report-generation - INFO - Writing JSON file (.json) with key report contents
Error in assertable::assert_colnames(report_strip$content$snv_indel$variant_set[[o]],  :
  These columns exist in colnames but not in your dataframe: EXONIC_STATUS DBSNPRSID COSMIC_MUTATION_ID CALL_CONFIDENCE DP_TUMOR AF_TUMOR DP_CONTROL AF_CONTROL TIER and these exist in your dataframe but not in colnames: VAR_ID GENOTYPE CPSR_CLASSIFICATION_SOURCE GENOME_VERSION VCF_SAMPLE_ID GENE_NAME CCDS UNIPROT_ID ENSEMBL_GENE_ID REFSEQ_MRNA VEP_ALL_CSQ REGULATORY_ANNOTATION DBSNP HGVSc LAST_EXON EXON_POSITION INTRON_POSITION CDS_CHANGE RMSK_HIT PROTEIN_FEATURE EFFECT_PREDICTIONS LOSS_OF_FUNCTION CANCER_PHENOTYPE CLINVAR_CLASSIFICATION CLINVAR_MSID CLINVAR_VARIANT_ORIGIN CLINVAR_CONFLICTED CLINVAR_PHENOTYPE CLINVAR_REVIEW_STATUS_STARS DBMTS miRNA_TARGET_HIT miRNA_TARGET_HIT_PREDICTION TF_BINDING_SITE_VARIANT TF_BINDING_SITE_VARIANT_INFO GERP_SCORE N_INSILICO_CALLED N_INSILICO_DAMAGING N_INSILICO_TOLERATED N_INSILICO_SPLICING_NEUTRAL N_INSILICO_SPLICING_AFFECTED GLOBAL_AF_GNOMAD NON_CANCER_AF_NFE ACMG_BA1_AD ACMG_BS1_1_AD ACMG_BS1_2_AD ACMG_BA1_AR ACMG_BS1_1_AR ACMG_BS1_2_AR AC
Calls: <Anonymous> -> <Anonymous>
Execution halted

Update bug hunting

Getting this error now:

2019-05-21 21:56:03 - cpsr-writer - INFO - STEP 4: Generation of output files - Cancer predisposition sequencing report
2019-05-21 21:56:46 [INFO] Excluding 0 variants from non-nuclear chromosomes/scaffolds
2019-05-21 21:56:47 [INFO] Found the following VCF sample names: PRJ180661_8_DNA008678_Blood
2019-05-21 21:56:48 [INFO] Number of PASS variants: 24468
2019-05-21 21:56:48 [INFO] Number of SNVs: 19364
2019-05-21 21:56:48 [INFO] Number of deletions: 2568
2019-05-21 21:56:48 [INFO] Number of insertions: 2536
2019-05-21 21:56:48 [INFO] Number of block substitutions: 0
2019-05-21 21:56:48 [INFO] Extending annotation descriptions related to UniprotKB/SwissProt protein features
2019-05-21 21:56:51 [INFO] Adding citations/phenotypes underlying GWAS hits (NHGRI-EBI GWAS Catalog)
2019-05-21 21:56:51 [INFO] Extending annotation descriptions related to Database of Curated Mutations (DoCM)
2019-05-21 21:56:52 [INFO] Extending annotation descriptions related to KEGG pathways
2019-05-21 21:56:52 [INFO] Extending annotation descriptions related to ClinVar
2019-05-21 21:57:33 [INFO] Number of coding variants in cancer predisposition genes: 23
2019-05-21 21:57:33 [INFO] Coding variants were found in the following cancer predisposition genes: KIF1B, MUTYH, FANCD2, KDR, FAT1, TERT, MSH3, APC, SBDS, PRSS1, SH2B3, TERF2IP
Error: 'acmg_evidence_codes' is not an exported object from 'namespace:pcgrr'
Execution halted

Will investigate if it has to do with the installation

Variant missed in GRCh38

Hey Sigve,

This APC missense variant (rs459552) is reported as Benign in GRCh37, but missed in GRCh38. Can't figure out why. The Vep pick order doesn't seem to affect this. All fields in the intermediate TSV file seem to be pretty similar...

Attaching the VCFs below:

rs459552_grch37.vcf.gz
rs459552_grch38.vcf.gz

Running as follows:

cpsr.py /Users/vsaveliev/tmp/p007/rs459552.vcf.gz pcgr output grch38 0 ~/git/cpsr/cpsr.toml 2016_249_18_SV_P007__CCR180064_SV18T002P007-normal  --docker-uid root --force_overwrite --no_vcf_validate

Could you take a look please if you got a moment?

Vlad

Column `BP1` can't be converted from character to logical

Hi,

I was trying to run cpsr but it popped up an error when generating output file. Can you check for me please? Thank you in advance!

2019-05-06 17:15:26 - cpsr-writer - INFO - STEP 4: Generation of output files - Cancer predisposition sequencing report
2019-05-06 09:16:00 [INFO] Excluding 0 variants from non-nuclear chromosomes/scaffolds
2019-05-06 09:16:01 [INFO] Number of PASS variants: 3195
2019-05-06 09:16:01 [INFO] Number of SNVs: 2637
2019-05-06 09:16:01 [INFO] Number of deletions: 296
2019-05-06 09:16:01 [INFO] Number of insertions: 251
2019-05-06 09:16:01 [INFO] Number of block substitutions: 0
2019-05-06 09:16:01 [INFO] Extending annotation descriptions related to UniprotKB/SwissProt protein features
2019-05-06 09:16:03 [INFO] Adding citations/phenotypes underlying GWAS hits (NHGRI-EBI GWAS Catalog)
2019-05-06 09:16:04 [INFO] Extending annotation descriptions related to Database of Curated Mutations (DoCM)
2019-05-06 09:16:04 [INFO] Extending annotation descriptions related to KEGG pathways
2019-05-06 09:16:04 [INFO] Extending annotation descriptions related to ClinVar
2019-05-06 09:16:07 [INFO] Filtering variants against the predefined list of n = 209 cancer predisposition genes
2019-05-06 09:16:07 [INFO] Number of variants within cancer predisposition genes: 2705
2019-05-06 09:16:07 [INFO] Number of coding variants in cancer predisposition genes: 106
2019-05-06 09:16:07 [INFO] Found coding variants in the following cancer predisposition genes: PINK1, SPRTN, ALK, EPCAM, MSH6, ABCB11, PMS1, BARD1, FANCD2, XPC, TGFBR2, POLQ, CASR, DTX3L, GATA2, ATR, KDR, FAT1, PRDM9, MSH3, APC, NSD1, PMS2, EGFR, PRSS1, WRN, DOCK8, MTAP, GALNT12, TSC1, RET, JMJD1C, BMPR1A, MEN1, AIP, CEP57, ATM, RECQL, SH2B3, HNF1A, GJB2, BRCA2, ERCC5, MLH3, TSHR, SERPINA1, BUB1B, FANCI, SLX4, RFWD3, FANCA, TP53, BRCA1, RHBDF2, ELANE, ERCC2, CHEK2, APOBEC3B
2019-05-06 09:16:07 [INFO] Looking up germline variants linked to hereditary cancer-predisposing syndromes/cancer phenotypes
2019-05-06 09:16:08 [INFO] Assignment of variants to tier 1/tier 2/tier 3
2019-05-06 09:16:08 [INFO] TIER 1: Pathogenic variants - cancer_phenotype: n = 0
2019-05-06 09:16:08 [INFO] TIER 1: Pathogenic variants - noncancer_phenotype: n = 1
2019-05-06 09:16:08 [INFO] TIER 2: Likely pathogenic variants - cancer_phenotype: n = 0
2019-05-06 09:16:08 [INFO] TIER 2: Likely pathogenic variants - noncancer_phenotype: n = 0
2019-05-06 09:16:09 [INFO] TIER 3: Variants of uncertain significance - cancer_phenotype: n = 4
2019-05-06 09:16:11 [INFO] TIER 3: Variants of uncertain significance - noncancer_phenotype: n = 3
2019-05-06 09:16:12 [INFO] TIER 3: Other unclassified variants: n = 4
2019-05-06 09:16:12 [INFO] Generating tiered set of result variants for output in tab-separated values (TSV) file
Error in bind_rows_(x, .id) :
  Column `BP1` can't be converted from character to logical
Calls: <Anonymous> ... as.data.frame -> <Anonymous> -> bind_rows_ -> .Call
Execution halted

Missing gnomad_cpsr.vcf.gz

Hey Sigve,

Trying to run the new CPSR through hg38. data/grch38/gnomad_cpsr/gnomad_cpsr.vcf.gz seems to be missing from the bundle. I might have deleted it accidentally on my side, will try to download again... Can you check on your side if it's there or not?

Vlad

cpsr_validate_input.py: error: argument vcf_validation: invalid int value: '/home/zdjzyx01/biosoft/cpsr/cpsr.toml'

Hi @sigven ,

I have upgrade to the latest version of PCGR and CPSR. PCGR program works like charm, but the CPSR program doesn't seem to work.
Here is the command I used to run CPSR.

python /home/zdjzyx01/biosoft/cpsr/cpsr.py  /home/zdjzyx01/MDT/proj/research/1000g/raw/2019-09-17_HNQ0456-decomposed.vcf.gz /home/zdjzyx01/biosoft/pcgr/ /home/zdjzyx01/MDT/proj/research/1000g/HNQ0456 grch37 /home/zdjzyx01/biosoft/cpsr/cpsr.toml HNQ0456new --no_vcf_validate --no-docker --panel_id 0

2019-11-30 22:32:46 - cpsr-validate-input - INFO - STEP 0: Validate input data
usage: cpsr_validate_input.py [-h] [--output_dir OUTPUT_DIR] [--debug]
                              pcgr_dir input_vcf configuration_file {0,1}
                              genome_assembly virtual_panel_id {0,1}
cpsr_validate_input.py: error: argument vcf_validation: invalid int value: '/home/zdjzyx01/biosoft/cpsr/cpsr.toml'

Any idea how to solve this problem?

Thanks!

Best,
Jianxiang

Custom panel

Hi Sigven,

This is more of a question/feature request than an issue. Would it be possible to use a custom panel in CPSR?

I was thinking I could create a custom bed file in pcgr/data/grch37/virtual_panels/ but I noticed your virtual panels are annotated, so I was not too sure if the annotations were a requirement or not, or if having a custom bed file would create issues further down the line?

Regards,
Ridwan

Failed to instantiate plug in LoF

Hi Sigven/Vlad,

@vladsaveliev I assume this is probably more directed towards you as this is probably and issue from using the conda version (note: this is a fresh install too). Running the example.vcf.gz on v0.5.1.2 gives the following error:

2019-10-30 11:51:56 - cpsr-vep - INFO - STEP 1: Basic variant annotation with Variant Effect Predictor (98, GENCODE release 19, grch37) including loss-of-function prediction
Possible precedence issue with control flow operator at /.conda/envs/pcgr/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Smartmatch is experimental at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/de_novo_donor.pl line 175.
Smartmatch is experimental at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/de_novo_donor.pl line 214.
Smartmatch is experimental at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/splice_site_scan.pl line 191.
Smartmatch is experimental at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/splice_site_scan.pl line 194.
Smartmatch is experimental at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/splice_site_scan.pl line 238.
Smartmatch is experimental at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/splice_site_scan.pl line 241.
Use of uninitialized value in string eq at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/LoF.pm line 343.
WARNING: Failed to instantiate plugin LoF: Can't open /.conda/envs/pcgr/share/loftee/splice_data/donor_motifs/ese.txt: No such file or directory at /.conda/envs/pcgr/share/ensembl-vep-98.2-0/loftee_splice_utils.pl line 212.

listing at the location: /.conda/envs/pcgr/share/loftee shows only the gzipped folder with no other files:

$ ll
total 1.6M
-rw-rw-r-- 2 user group 1.6M Oct 28 07:28 loftee_1.0.3.tgz

I have tried untaring the loftee in the share location and attempting to run again but that results in the following error:

WARNING: Failed to compile plugin LoF: Can't locate Bio/DB/BigWig.pm in @INC (you may need to install the Bio::DB::BigWig module)

I have also attempted to declare the PERL5LIB to the correct paths where BigWig.pm is stored in the conda environment but that also fails

2019-10-30 15:01:25 - cpsr-vep - INFO - STEP 1: Basic variant annotation with Variant Effect Predictor (98, GENCODE release 19, grch37) including loss-of-function prediction
Possible precedence issue with control flow operator at /.conda/envs/pcgr/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
WARNING: Failed to compile plugin LoF: Can't locate Bio/DB/BigWig.pm in @INC (you may need to install the Bio::DB::BigWig module) (@INC contains: .conda/envs/pcgr/share/loftee /.conda/envs/pcgr/share/ensembl-vep-98.2-0/modules /.conda/envs/pcgr/share/ensembl-vep-98.2-0 /.conda/envs/pcgr/share/ensembl-vep-98.2-0/Bio/EnsEMBL/IO/Parser/ /.conda/envs/pcgr/lib/site_perl/5.26.2/x86_64-linux-thread-multi /.conda/envs/pcgr/lib/site_perl/5.26.2 /.conda/envs/pcgr/lib/5.26.2/x86_64-linux-thread-multi /.conda/envs/pcgr/lib/5.26.2 .) at /.conda/envs/pcgr/share/loftee/gerp_dist.pl line 2.
BEGIN failed--compilation aborted at /.conda/envs/pcgr/share/loftee/gerp_dist.pl line 2.
Compilation failed in require at /.conda/envs/pcgr/share/loftee/LoF.pm line 27.
Compilation failed in require at (eval 39) line 2.
BEGIN failed--compilation aborted at (eval 39) line 2.

Error in pcgrr::list_to_df(pcgr_config$tumor_type) : it should be a list

Almost getting there with CPSR :) Errors in the end, not sure again if has to do with no-docker installation. A toy example:

python cpsr/cpsr.py --input_vcf cpsr/example.vcf.gz . cpsr grch37 cpsr/cpsr.toml example --no-docker

...

2018-10-30 23:10:54 - cpsr-gene-annotate - INFO - Number of PASSed variant calls: 1272
_frozen_importlib:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
2018-10-30 23:10:55 - cpsr-summarise - INFO - Converting VCF to TSV with https://github.com/sigven/vcf2tsv
_frozen_importlib:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
2018-10-30 23:10:59 - cpsr-summarise - INFO - Finished

2018-10-30 23:10:59 - cpsr-writer - INFO - STEP 4: Generation of output files - Cancer predisposition sequencing report
.libPaths():
/data/cephfs/punim0010/extras/vlad/miniconda/envs/pcgr/lib/R/library

Error in pcgrr::list_to_df(pcgr_config$tumor_type) : it should be a list
Calls: <Anonymous> -> <Anonymous> -> %>% -> eval -> eval -> <Anonymous>
Execution halted

/cc @pdiakumis

ImportError: No module named toml

Hello Sigve,

Hope you are in good health. I encountered an error using MacOS PyCharm, Python interpreter 3.9, pip version 20.3.3 and later version 21.0.1., line 12, in , Import error no module named toml.

Would you have any suggestions?

Error in collapse(tmp, inner = FALSE, indent = indent) : R character strings are limited to 2^31-1 bytes

Hi @sigven ,

Thank for your cpsr and pcgr software. I have used pcgr for a while and I love it. I tried cpsr today and successfully tested the example vcf file. When I tried to use my own data, and it pops up the following error message. Can you check for me please? Thank you in advance!

ERROR message:

2019-01-27 00:32:06 - cpsr-writer - INFO - STEP 4: Generation of output files - Cancer predisposition sequencing report
2019-01-26 16:32:35 [INFO] Excluding 0 variants from non-nuclear chromosomes/scaffolds
2019-01-26 16:32:36 [INFO] Number of PASS variants: 318
2019-01-26 16:32:36 [INFO] Number of SNVs: 288
2019-01-26 16:32:36 [INFO] Number of deletions: 26
2019-01-26 16:32:36 [INFO] Number of insertions: 4
2019-01-26 16:32:36 [INFO] Number of block substitutions: 0
2019-01-26 16:32:36 [INFO] Extending annotation descriptions related to UniprotKB/SwissProt protein features
2019-01-26 16:32:37 [INFO] Adding citations/phenotypes underlying GWAS hits (NHGRI-EBI GWAS Catalog)
2019-01-26 16:32:37 [INFO] Extending annotation descriptions related to Database of Curated Mutations (DoCM)
2019-01-26 16:32:38 [INFO] Extending annotation descriptions related to KEGG pathways
2019-01-26 16:32:38 [INFO] Extending annotation descriptions related to ClinVar
2019-01-26 16:32:41 [INFO] Filtering variants against the predefined list of n = 209 cancer predisposition genes
2019-01-26 16:32:41 [INFO] Number of variants within cancer predisposition genes: 262
2019-01-26 16:32:41 [INFO] Number of coding variants in cancer predisposition genes: 80
2019-01-26 16:32:41 [INFO] Found coding variants in the following cancer predisposition genes: KIF1B, MUTYH, ERCC3, CXCR4, MLH1, GATA2, FAT1, SDHA, PRDM9, MSH3, RAD50, NSD1, PRSS1, WRN, TSC1, RET, JMJD1C, FANCF, FEN1, CDKN1B, SH2B3, HNF1A, POLE, SERPINA1, BLM, TERF2IP, FANCA, NF1, HNF1B, BRCA1, POLD1, APOBEC3B
2019-01-26 16:32:41 [INFO] Looking up germline variants linked to hereditary cancer-predisposing syndromes/cancer phenotypes
2019-01-26 16:32:41 [INFO] Assignment of variants to tier 1/tier 2/tier 3
2019-01-26 16:32:41 [INFO] TIER 1: Pathogenic variants - cancer_phenotype: n = 2
2019-01-26 16:32:41 [INFO] TIER 1: Pathogenic variants - noncancer_phenotype: n = 2
2019-01-26 16:32:41 [INFO] TIER 2: Likely pathogenic variants - cancer_phenotype: n = 0
2019-01-26 16:32:41 [INFO] TIER 2: Likely pathogenic variants - noncancer_phenotype: n = 0
2019-01-26 16:33:08 [INFO] TIER 3: Variants of uncertain significance - cancer_phenotype: n = 131088
2019-01-26 16:33:09 [INFO] TIER 3: Variants of uncertain significance - noncancer_phenotype: n = 2
2019-01-26 16:35:32 [INFO] TIER 3: Other unclassified variants: n = 1376306
2019-01-26 16:35:32 [INFO] Generating tiered set of result variants for output in tab-separated values (TSV) file
Error in collapse(tmp, inner = FALSE, indent = indent) :
  R character strings are limited to 2^31-1 bytes
Calls: <Anonymous> ... vapply -> FUN -> FUN -> .local -> collapse -> .Call
Execution halted

Best regards,

Jianxiang

Biomarker variants not in Pathogenic/Likely Pathogenic section of report

Hello,

I have run CPSR on some germline samples, and am looking at the biomarker section now. I see that class 4 and 5 variants overlapping the CIViC markers are considered, however, the biomarkers found in my reports (exact matches) are not reported in the class 4/5 section of the report, and variants noted in the biomarker section seem to be classified by CPSR as benign/likely benign in the output TSV. Furthermore, the benign/likely benign variants noted as biomarkers and as benign/likely benign in the TSV, are not reported in their corresponding (class 1/2) section of the HTML report. Is this expected behaviour, or am I missing something?

Thanks,
Aidan

Installation Instructions Only Describe Conda

We recommend conda as the simplest framework to install PCGR and CPSR

This suggests there are other ways to do it. Could they be added to the information web page? I am using a department server which has pip and virtualenv but not conda available to me. How would I install this software in such a scenario?

~$ virtualenv --version
virtualenv 20.4.0+ds from /usr/lib/python3/dist-packages/virtualenv/__init__.py
~$ pip --version
pip 20.3.4 from /usr/lib/python3/dist-packages/pip (python 3.9)
~$ conda
-bash: conda: command not found

Error in loadNamespace(name) : there is no package called 'reactable'

Hi Sigve,

have updated to version 0.6.0rc, installed via conda, and I get the above error during ### Annotation resources.

Presume that reactable wasn't added to your conda channel?

Thanks for all the updates,

Bruce

Not able to download the data bundles (404 Error)

I am not able to download the data bundles for both hg38 and hg37 genomes. I am getting a 404 error.

wget http://insilico.hpc.uio.no/pcgr/pcgr.databundle.20201123.grch37.tar.gz -O grch37.tar.gz
--2021-02-23 12:10:24-- http://insilico.hpc.uio.no/pcgr/pcgr.databundle.20201123.grch37.tar.gz
Resolving dtn09-e0 (dtn09-e0)... 10.1.200.191
Connecting to dtn09-e0 (dtn09-e0)|10.1.200.191|:3128... connected.
Proxy request sent, awaiting response... 404 Not Found
2021-02-23 12:10:25 ERROR 404: Not Found.

--no-docker BSGenome package not installed

Hi Sigve,

getting an error because 'BSgenome.Hsapiens.UCSC.hg19' isn't available running in --no-docker. Don't use conda so no idea how to fix, presume add to conda env?

Package(s) look great btw, building into NextFlow pipelines using Singularity so you may hear from me again=D

Thanks,

Bruce.

Bug: `object 'RMSK_HIT' not found`

Hi Sigve,

Thanks for promptly fixing CPSR bugs. I got another one for you, below' the log:

cpsr.py --input_vcf SFRC01085__PRJ180621_SFRC01085-1MT-normal.vcf.gz pcgr . grch37 cpsr.toml SFRC01085__PRJ180621_SFRC01085-1MT-normal  --docker-uid root --force_overwrite
2018-11-16 12:06:07 - cpsr-validate-input - INFO - STEP 0: Validate input data
2018-11-16 01:06:13 - cpsr-validate-input - INFO - Skipping validation of VCF file - as defined in configuration file (vcf_validation = false)
2018-11-16 01:06:13 - cpsr-validate-input - INFO - Checking if existing INFO tags of query VCF file coincide with CPSR INFO tags
2018-11-16 01:06:13 - cpsr-validate-input - INFO - No query VCF INFO tags coincide with CPSR INFO tags
2018-11-16 12:06:10 - cpsr-validate-input - INFO - Finished

2018-11-16 12:06:10 - cpsr-vep - INFO - STEP 1: Basic variant annotation with Variant Effect Predictor (94, GENCODE release 19, grch37) including loss-of-function prediction
2018-11-16 12:08:46 - cpsr-vep - INFO - Finished

2018-11-16 12:08:46 - cpsr-vcfanno - INFO - STEP 2: Annotation for cancer predisposition with cpsr-vcfanno (ClinVar, dbNSFP, UniProtKB, cancerhotspots.org, GWAS catalog)
2018-11-16 12:09:01 - cpsr-vcfanno - INFO - Finished

2018-11-16 12:09:01 - cpsr-summarise - INFO - STEP 3: Cancer gene annotations with cpsr-summarise
2018-11-16 01:09:07 - cpsr-gene-annotate - INFO - Completed summary of functional annotations for 954 variants on chromosome 1
...
2018-11-16 01:09:15 - cpsr-gene-annotate - INFO - Number of non-PASS/REJECTED variant calls: 0
2018-11-16 01:09:15 - cpsr-gene-annotate - INFO - Number of PASSed variant calls: 26157
2018-11-16 12:09:15 - cpsr-summarise - INFO - Converting VCF to TSV with https://github.com/sigven/vcf2tsv
2018-11-16 12:09:39 - cpsr-summarise - INFO - Finished

2018-11-16 12:09:39 - cpsr-writer - INFO - STEP 4: Generation of output files - Cancer predisposition sequencing report
^[[I^[[O^[[I^[[O^[[I^[[O^[[I^[[O2018-11-16 01:10:17 [INFO] Excluding 0 variants from non-nuclear chromosomes/scaffolds
2018-11-16 01:10:19 [INFO] Number of PASS variants: 26157
2018-11-16 01:10:20 [INFO] Number of SNVs: 20933
2018-11-16 01:10:20 [INFO] Number of deletions: 2650
2018-11-16 01:10:20 [INFO] Number of insertions: 2574
2018-11-16 01:10:20 [INFO] Number of block substitutions: 0
2018-11-16 01:10:20 [INFO] Extending annotation descriptions related to UniprotKB/SwissProt protein features
2018-11-16 01:10:22 [INFO] Adding citations/phenotypes underlying GWAS hits (NHGRI-EBI GWAS Catalog)
2018-11-16 01:10:23 [INFO] Extending annotation descriptions related to Database of Curated Mutations (DoCM)
2018-11-16 01:10:24 [INFO] Extending annotation descriptions related to KEGG pathways
2018-11-16 01:10:26 [INFO] Extending annotation descriptions related to ClinVar
2018-11-16 01:10:37 [INFO] Filtering variants against the predefined list of n = 209 cancer predisposition genes
2018-11-16 01:10:38 [INFO] Number of variants within cancer predisposition genes: 23096
2018-11-16 01:10:38 [INFO] Number of coding variants in cancer predisposition genes: 91
2018-11-16 01:10:38 [INFO] Found coding variants in the following cancer predisposition genes: SPRTN, ALK, MSH6, ABCB11, BARD1, FANCD2, MLH1, POLQ, CASR, GATA2, ATR, KIT, PTPN13, FAT1, SDHA, DROSHA, MSH3, APC, HFE, PMS2, PRSS1, WRN, NBN, DOCK8, PTCH1, TGFBR1, JMJD1C, BMPR1A, CDKN1C, MEN1, AIP, CEP57, ATM, SH2B3, HNF1A, POLE, BRCA2, ERCC5, RAD51B, MLH3, SERPINA1, BUB1B, FAH, FANCI, TSC2, ACD, RFWD3, TP53, BRIP1, AXIN2, RHBDF2, SETBP1, SMARCA4, ERCC2, APOBEC3B, AR
2018-11-16 01:10:38 [INFO] Looking up germline variants linked to hereditary cancer-predisposing syndromes/cancer phenotypes
2018-11-16 01:10:41 [INFO] Assignment of variants to tier 1/tier 2/tier 3
2018-11-16 01:10:41 [INFO] TIER 1: Pathogenic variants - cancer_phenotype: n = 0
2018-11-16 01:10:41 [INFO] TIER 1: Pathogenic variants - noncancer_phenotype: n = 1
2018-11-16 01:10:41 [INFO] TIER 2: Likely pathogenic variants - cancer_phenotype: n = 0
2018-11-16 01:10:41 [INFO] TIER 2: Likely pathogenic variants - noncancer_phenotype: n = 0
2018-11-16 01:10:41 [INFO] TIER 3: Variants of uncertain significance - cancer_phenotype: n = 13
2018-11-16 01:10:43 [INFO] TIER 3: Variants of uncertain significance - noncancer_phenotype: n = 4
Error in `[.data.frame`(cpg_calls, !is.na(cpg_calls$CONSEQUENCE) & is.na(RMSK_HIT) &  :
  object 'RMSK_HIT' not found
Calls: <Anonymous> -> <Anonymous> -> nrow -> [ -> [.data.frame
Execution halted

The VCF and toml are attached in an archive: vlad_cpsr_bug.zip

Vlad

ERROR while running

Hi @sigven !

I am facing the following errors while running the tool

2019-11-25 21:08:34 - cpsr-validate-input - ERROR - 
2019-11-25 21:08:34 - cpsr-validate-input - ERROR - According to the VCF specification, the VCF file (/workdir/input_vcf/SRR9873696-ensemble-annotated.vcf.gz) is NOT valid:
ERROR: Line 2843: Metadata ID contains a character different from alphanumeric, dot, underscore and dash
ERROR: Line 2844: Metadata ID contains a character different from alphanumeric, dot, underscore and dash
ERROR: Line 2845: Metadata ID contains a character different from alphanumeric, dot, underscore and dash
ERROR: Line 2846: Metadata ID contains a character different from alphanumeric, dot, underscore and dash
.......
.......
ERROR: Line 153128: Sample #1, PL=90,0 does not match the meta specification Number=G (contains 2 value(s), expected 3)
ERROR: Line 153129: Sample #1, PL=45,0 does not match the meta specification Number=G (contains 2 value(s), expected 3)
ERROR: Line 153130: Sample #1, PL=45,0 does not match the meta specification Number=G (contains 2 value(s), expected 3)

The vcf file is
SRR9873696.vcf.gz

Any thoughts?

CPSR dbnsfp min_majority check

Hi Sigve,

Getting this error in CPSR with the default toml:

2018-11-12 23:28:20 - cpsr-validate-input - INFO - STEP 0: Validate input data
2018-11-12 12:28:21 - cpsr-validate-input - ERROR -
2018-11-12 12:28:21 - cpsr-validate-input - ERROR - Minimum number of majority votes for consensus calls among dbNSFP predictions should not exceed 8 and should not be less than 5
2018-11-12 12:28:21 - cpsr-validate-input - ERROR -

The toml has

[dbnsfp]
min_majority = 4
max_minority = 1

So min_majority is below 5, this the error, as far as I understand. What's correct, the code or the toml? :)

Vlad

Error: `x` and `y` must share the same src

Hi again,

have an error on cpsr.py 0.6.1 using latest data bundle (20201123) (NB pcgr.py is dev):

Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow).
Backtrace:
     x
  1. +-pcgrr::generate_cpsr_report(...)
  2. | +-`%>%`(...)
  3. | \-pcgrr::assign_pathogenicity_evidence(...)
  4. |   +-dplyr::left_join(...)
  5. |   \-dplyr:::left_join.data.frame(...)
  6. |     \-dplyr::auto_copy(x, y, copy = copy)
  7. |       \-dplyr:::glubort(...)
  8. +-pcgrr::detect_cancer_traits_clinvar(...)
  9. | \-base::nrow(cpg_calls)
 10. \-pcgrr::determine_pathogenicity_classification(.)

Dies with this after line:

2021-02-17 08:30:26 [INFO] Number of non-coding variants in cancer predisposition genes: 38747

Can you see where error is from?

Command:

cpsr.py --no-docker \
        --no_vcf_validate \
        --panel_id 0 \
        --query_vcf sample.hc.merge.vcf.gz \
        --pcgr_dir pcgr \
        --output_dir ./ \
        --genome_assembly grch38 \
        --conf pcgr/data/grch38/cpsr_configuration_default.toml \
        --sample_id $META

Thanks,

Bruce

ClinVar conflicting evidence

Hi Sigve,

Wondering what was the rationale behind skipping variants which have "conflicting_interpretations_of_pathogenicity" by ClinVar regardless of other annotations that can be (likely) pathogenic? Perhaps it makes the report too cluttered?

Thinking that such variants should still go under TIER3 (either VUS or Non-classified) - or below, but should show up in the report.

Perhaps in the following line, the check is.na(CLINVAR_CLINICAL_SIGNIFICANCE) should be removed?

https://github.com/sigven/pcgr/blob/b6ccae7fac4c86c472fdac6e0c9728edd930a189/src/R/pcgrr/R/cpsr.R#L169

cc @ohofmann

Double comma in line 24 of cpsr_validate_input.py

Hi Sigven,

Just to let you know, there is a double comma on line 24 after choice=[0,1] of cpsr_validate_input.py

2019-10-11 15:16:40 - cpsr-validate-input - INFO - STEP 0: Validate input data
  File "/home/.conda/envs/pcgr/bin/cpsr_validate_input.py", line 24
    parser.add_argument('diagnostic_grade_only',type=int, default=0,choices=[0,1],,help="Green virtual panels only (Genomics England PanelApp)")
                                                                                  ^
SyntaxError: invalid syntax

sigven / cpsr Goto Github PK

cpsr's People

Contributors

Stargazers

Watchers

Forkers

cpsr's Issues

Recommend Projects

Recommend Topics

Recommend Org