broadinstitute / ctat-virusintegrationfinder Goto Github PK

View Code? Open in Web Editor NEW

9.0 7.0 3.0 2.36 MB

License: BSD 3-Clause "New" or "Revised" License

Python 63.70% R 4.68% Dockerfile 1.17% Shell 0.19% HTML 3.19% wdl 25.66% Makefile 0.96% Perl 0.46%

ctat-virusintegrationfinder's Introduction

CTAT-VirusIntegrationFinder

Visit the wiki for documentation.

ctat-virusintegrationfinder's People

Contributors

Stargazers

Watchers

Forkers

brownmp 20182531027 allaway

ctat-virusintegrationfinder's Issues

fasle positive results caused by RNA alternative splicing

Hi, thanks for your useful software, and we faced a interesting issue, maybe you can give us some advice; we have run ctat_vif to detect virus-host integration sites in both DNA and RNA sequencing data; however the detectable sites have great difference between these two data, that is, integration sites in RNA data are far more than in DNA data; how can we exclude the false positive results caused by RNA alternative splicing when the ctat_vif was applied to RNA-data?
looking forward to your kindly reply~
best wishes

CPU and output directory arguments not being recognized?

Hi,

I am running VIF via Docker, with additional arguments for number of threads and output directory name - however those aren't being recognized. STAR still seems to run using 4 threads (the default), and output is created in the directory VIF_starChim_init. Is there a way to fix this? Any help would be greatly appreciated. Thanks!

How to confirm the exact integration site for "span" reads

HI, thanks for your wonderful/excellent software.
And I have one question, maybe you can give me some ideas, there are two types of reads that can be used to find the integration sites for PE reads, one is "split", the other is "span", and how does the software confirm the exact integration sites for the span reads?
for the span reads, one reads was merely aligned to virus or human, the other reads was merely aligned to human or virus, the exact integration sites are uncertain, and many researches discarded this kind of reads. can you be kind enough to tell us how ctat_vif achieved that ?

Nextflow pipeline?

Hi @brianjohnhaas,

We had a great experience using this tool last year for an intern project. I'm trying to replicate/scale up some of the work they did and would love to wrap this method into a nextflow/nf-core pipeline along with another approach we tried.

We have a couple of other folks who are interested in collaborating on this as well. I wanted to reach out to ask you:

do you have any issues with us pursuing this (not sure what license this code is available under)?
do you have plans to do this/are you working on this already? (don't want to duplicate efforts!)

Thanks for your thoughts! BTW, we are tracking our work on this here: https://github.com/nf-osi/viralintegration/tree/dev

RAM requirements listed on installation instructions wiki page?

Hi there,

For those of us provisioning cloud instances to run the CTAT containers, it would be helpful if there was a note about the RAM requirements of CTAT/VIF, so that we can account for this when we are setting up the compute environment. I was unable to run VIF on a smaller cloud instance I provisioned today; error noted that 32GB of RAM was required.

EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc
Possible cause 1: not enough RAM. Check if you have enough RAM 31259734944 bytes
Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31259734944

(alternatively, if this number is dynamic based on the data, it would be nice to know how to estimate it).

I'd file a PR myself with a suggested change, but I don't think it's possible to do that on wiki pages. :)

Thanks for considering it!

single reads how tp use

Hi，Thanks you for the wonderful software which can analysis virus integration
when the software apply on PE100 work well,
but sometime such as single cell sequence ,the read1 always is barcode location, only read2 is useful.
can CTAT-VIF apply on single reads
how to go

BEST WISHES

threshold of this software to detect chimeric reads & distinguish different integration sites

hi, Thanks you for the wonderful software which can analysis virus integration, and i have two questions.

firstly, when integration sites were near to each other, then it would sum to one site, so what's the minimum range to distinct different integration sites? around 500bp？
secondly, what's the percent of chimeric reads for this software to detect? we have made some attempt to discover the threshold, maybe the detectable chimeric rate is between 15%-20%
wish your kindly reply
Best wishes

BUG: next index is smaller than previous, EXITING

Hi,

I tried to run CTAT-VirusIntegrationFinder(ctat_vif.v0.1.0.simg) using singularity. The command and the error message were as shown below.

/path1/singularity exec --bind /path2  /path3/ctat_vif.v0.1.0.simg /usr/local/bin/ctat-VIF.py --CPU 4 --genome_lib_dir /path4/ctat_genome_lib_build_dir --viral_db_fasta /path4/viral.123.1.genomic.fna --left_fq /path5/RHN22_clean_1.fq.gz --right_fq /path5/RHN22_clean_2.fq.gz -O /path2/RHN22/single_vif/ --out_prefix RHN22.vif

BUG: next index is smaller than previous, EXITING
Dec 01 12:38:04 ...... FATAL ERROR, exiting

It seems that the error above is related to the reference file. The viral sequence reference files we use were downloaded from NCBI which is shown below. And we further merged them into one single viral sequence reference file which was used as the inputted viral reference file. However, no error appeared when I replaced the merged reference file with any one of the three reference files. I can't figure out how to solve this problem. Could you help me with that?

ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.1.genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.1.genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.3.1.genomic.fna.gz

Best,
Jianbiao Li

ERROR: could not open genome file /genome_lib/VIF_index_human-plus-hpv16/ref_genome.fa.star.idx//genomeParameters.txt

sorry, when I run CTAT-vif, the error message display:

EXITING because of FATAL ERROR: could not open genome file ~/wangjiaxuan/biosoft/CTAT-vif/genome_lib/VIF_index_human-plus-hpv16/ref_genome.fa.star.idx//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jan 10 15:58:35 ...... FATAL ERROR, exiting

it seem to lack some the human_only_star_index in genomedir, but when I download from https://github.com/STAR-Fusion/STAR-Fusion/wiki/STAR-Fusion-release-and-CTAT-Genome-Lib-Compatibility-Matrix.

the download file don't have the error file ref_genome.fa.star.idx, where can find,

thanks

#---------------------------------------------
my command is

${soft_dir}/ctat-vif \
  --left $read1 \
  --right ${read2} \
  --genome_lib_dir $genomedir \
  --sample_id yesimola \
  -O  ./ \
  --cpu 2

and the genome_lib_dir is :

genome_lib/VIF_index_human-plus-hpv16/
├── ref_annot.gtf 
├── ref_genome.fa
└── VIF
    ├── hg_plus_viraldb.fasta
    ├── hg_plus_viraldb.fasta.fai
    ├── hg_plus_viraldb.fasta.star.idx
    │   ├── chrLength.txt
    │   ├── chrNameLength.txt
    │   ├── chrName.txt
    │   ├── chrStart.txt
    │   ├── exonGeTrInfo.tab
    │   ├── exonInfo.tab
    │   ├── geneInfo.tab
    │   ├── Genome
    │   ├── genomeParameters.txt
    │   ├── Log.out
    │   ├── SA
    │   ├── SAindex
    │   ├── sjdbInfo.txt
    │   ├── sjdbList.fromGTF.out.tab
    │   ├── sjdbList.out.tab
    │   └── transcriptInfo.tab
    ├── virus_db.fasta
    └── virus_db.fasta.fai

Issue with bamsifter

Hi,
It is looking for bamsifter within the folder "/util/bamsifter/", butr it does not exist. Should it?
This is the command I ran:
ctat-VIF.py
--genome_lib_dir ctat_genome_lib_build_dir/
--left_fq ${1}_R1.fastq.gz
--right_fq ${1}_R2.fastq.gz
--viral_db_fasta ebv.fa
--viral_db_gtf ebv.custom.gtf
-O ${1}_VIF

This is the partial error:
bash -c 'set -eou pipefail && samtools view -h VIF_starChim_init/Aligned.sortedByCoord.out.bam EBV | samtools depth - > EBV.depth.tsv '
/bin/sh: /data/MoCha/paulyr/CTAT-VirusIntegrationFinder/util/bamsifter/bamsifter: No such file or directory
Traceback (most recent call last):
File "/data/MoCha/paulyr/CTAT-VirusIntegrationFinder/ctat-VIF.py", line 678, in
main()
File "/data/MoCha/paulyr/CTAT-VirusIntegrationFinder/ctat-VIF.py", line 301, in main
pipeliner.run()
File "/gpfs/gsfs6/users/MoCha/paulyr/CTAT-VirusIntegrationFinder/PyLib/Pipeliner.py", line 75, in run
cmd.run(checkpoint_dir)
File "/gpfs/gsfs6/users/MoCha/paulyr/CTAT-VirusIntegrationFinder/PyLib/Pipeliner.py", line 136, in run
raise RuntimeError(errmsg

Thanks!

How does the summary_report work?

nf-core/viralintegration#28 (comment)

Just cross posting for visibility.

The latest version can't load the WDL/cromwell-58.jar correctly

I have installed CTAT-vif, and get some result in my workflow. But recently I find it upgrade !, and I find it become more and more power and comprehensive.

So I want to reinstall a new version and add my analysis workflow, but after download the zip, and success make it. but when I run it next, it display some error message:

Error: Invalid or corrupt jarfile ~/wangjiaxuan/biosoft/CTAT-vif/WDL/cromwell-58.jar

And I try to down it from the github use web browser，or let co-worker also download ，even I try to down cromwell-59，those all failed. 😞

it's very dispirited，and I notice Previous version don't have cromwell in WDL file. how to slove the problem.