Giter Site home page Giter Site logo

tk2 / retroseq Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 25.0 304 KB

RetroSeq is a bioinformatics tool that searches for mobile element insertions from aligned reads in a BAM file and a library of reference transposable elements. Please read the wiki page (link below) for usage instructions. Also, there is a page on the wiki describing how the 1000 genomes CEU trio was carried out with the files and parameters used for the various steps.

Perl 100.00%

retroseq's People

Contributors

rwness avatar tk2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

retroseq's Issues

Installation and launch

Hello,

I tried installing RetroSeq and launching today. On my path, I have exonerate,bedtools,and samtools-0.1.18.

When I ran the command "perl retroseq.pl" I got the following error:

Nested quantifiers in regex; marked by <-- HERE in m/^[0-9]{2}+ <-- HERE S/ at ../RetroSeq/Utilities.pm line 190.
Compilation failed in require at retroseq.pl line 28.
BEGIN failed--compilation aborted at retroseq.pl line 28.

If relevant, I am using a Linux cluster (redHat5) with perl version 5.8.8 ("built for x86_64-linux-thread-multi"). Any help would be appreciated.

Thanks!

ERROR: Invalid parameters passed to getCandidateBreakPointsDir

Hi,

I am getting the below error during Calling step:

Command: ./bin/retroseq.pl -call -bam /scratch/G300B1.bam -input /scratch/retroseq/sinec -ref /scratch/l1.0_ROSY.fa -output /scratch/retroseq/sinec.vcf -filter SINEC.refTEs.txt -reads 3 -depth 400

Tail of the error message:

<mpileup> Set max per-file depth to 8000
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
ERROR: Invalid parameters passed to getCandidateBreakPointsDir: chrUn_JAAHUQ010000692v1 -471 282

It progressed to some unplaced chromosomes in non-human genomes and getting the above error. It produced *het.bed and *hom.bed files before terminating as above. Could you help.

modules not found at runtime when retroseq.pl aliased from its home path

My system finds retroseq.pl from an alias to its installation folder.
Due to this, module files were not found even when adding the RetroSeq sub-folder to PERL5LIB
I had to slightly change the lib declare to have the two .pm modules found.

# line 25 of the script
# use lib dirname(__FILE__).'/..';
use lib dirname(Cwd::abs_path(__FILE__)).'/..';

Maybe something to adapt for other users.
Now it works.
Thanks for the great tool

Coordinate system query

In the RetroSeq VCF file the position for TE insertions relative to the reference are given on 1-based coordinates in the POS column. In addition, there are a set of two consecutive coordinates in the INFO field, the first of which corresponds to the POS column, and the second corresponds to the next base in the genome. Does this imply that the predicted insertion would intergate between the first and second positions in the INFO field? In other words, to convert RetroSeq predictions to 0-based coordinates, do we (i) use the two coordinates in the INFO field, or (ii) subtract 1 from the POS column to make a new start position on 0-based coordinates?

Useless use of greediness modifier '+' in regex

Hi

I ran retroseq.pl and i got the errors :
Useless use of greediness modifier '+' in regex; marked by <-- HERE in m/^[0-9]{2}+ <-- HERE S/ at /home/eq_jcl/Projects/MNase_and_Sonicseq/contig_analysis_approche_2/retroseq//RetroSeq-master/bin/../RetroSeq/Utilities.pm line 190.
Useless use of greediness modifier '+' in regex; marked by <-- HERE in m/[0-9]{2}+ <-- HERE S$/ at /home/eq_jcl/Projects/MNase_and_Sonicseq/contig_analysis_approche_2/retroseq//RetroSeq-master/bin/../RetroSeq/Utilities.pm line 195.
Useless use of greediness modifier '+' in regex; marked by <-- HERE in m/^[0-9]{2}+ <-- HERE S/ at /home/eq_jcl/Projects/MNase_and_Sonicseq/contig_analysis_approche_2/retroseq//RetroSeq-master/bin/../RetroSeq/Utilities.pm line 204.
Useless use of greediness modifier '+' in regex; marked by <-- HERE in m/[0-9]{2}+ <-- HERE S$/ at /home/eq_jcl/Projects/MNase_and_Sonicseq/contig_analysis_approche_2/retroseq//RetroSeq-master/bin/../RetroSeq/Utilities.pm line 209.
Useless use of greediness modifier '+' in regex; marked by <-- HERE in m/[0-9]{2}+ <-- HERE S$/ at /home/eq_jcl/Projects/MNase_and_Sonicseq/contig_analysis_approche_2/retroseq//RetroSeq-master/bin/../RetroSeq/Utilities.pm line 218.
Useless use of greediness modifier '+' in regex; marked by <-- HERE in m/^[0-9]{2}+ <-- HERE S/ at /home/eq_jcl/Projects/MNase_and_Sonicseq/contig_analysis_approche_2/retroseq//RetroSeq-master/bin/../RetroSeq/Utilities.pm line 219.
Useless use of private variable in void context at /home/eq_jcl/Projects/MNase_and_Sonicseq/contig_analysis_approche_2/retroseq//RetroSeq-master/bin/retroseq.pl line 184.

Do you have an idea about how to solve this problem please.

Thank you

Error while calling TEs from pooled samples

I was trying to apply RetroSeq call to a dataset consisting of two discover-out files and their corresponding BAM-files. The programs starts to run, but then the following errors appears and I get an empty VCF file.

I'd very much appreciate any workaround or fix to circumvent this problem.
Thanks,

Fritjof

retroseq.pl -call -bam samples.fofn -filter rmsk.bed.tab -ref ref.fa -input out_files.fofn -output rs_call
RetroSeq: A tool for discovery and genotyping of transposable elements from short read alignments

Version: 1.41
Author: Thomas Keane ([email protected])

Found 2 BAM files

Found sample: SAMPLE17_SAMPLE18
Calling sample SAMPLE17_SAMPLE18
Beginning paired-end calling...
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 768, <$tfh> line 1.
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 769, <$tfh> line 1.
Use of uninitialized value $currentType in string eq at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 765, <$tfh> line 2.
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 768, <$tfh> line 2.
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 769, <$tfh> line 2.
PE Call: 
PE Call: 
Creating VCF file of calls....

Uninitalized value error post PE parsing

When I run RetroSeq in the align mode, it gets to PE alignment parsing and then breaks. Not sure what the error means (uninitialized value before assignment?).

Input: perl bin/retroseq.pl -discover -bam ../bwa/Mtbcosmid.sorted.bam -eref ../bwa/retroseqTNlib.tab -output ../bwa/Mtbcosmidtest.candidates.tab -align

Output:
RetroSeq: A tool for discovery and genotyping of transposable elements from short read alignments

Version: 1.41
Author: Thomas Keane ([email protected])

Reading -eref file: ../bwa/retroseqTNlib.tab

Min anchor quality: 20
Min percent identity: 80
Min length for hit: 36

Opening BAM (../bwa/Mtbcosmid.sorted.bam) and getting initial set of candidate mates....
Reading chromosome: pRD12F9
1075 candidate reads remain to be found after first pass....
Reading chromosome: pRD12F9
Parsing PE alignments....
Use of uninitialized value $lastLine in string ne at bin/retroseq.pl line 587.
Alignment did not complete correctly

Any insight you could provide would be great!

Error when trying to run bedtools intersect

When I try to run RetroSeq on some bam format files from 1000 genome project, I get the following error after reading the chromosomes. I use exclude, refTEs and eref (+align) options

.....
Reading chromosome: hs37d5
***** ERROR: too many digits/characters for integer conversion in string 1. Exiting...
Failed to run bedtools intersect to filter at retroseq.pl line 367.

I use:
-Samtools 0.1.19
-Bedtools 2.27 and on Bedtools 2.26.
-Exonerate 2.2.0 w/ glib 2.47.3

I dont get this error when I run RetroSeq with Bedtools 2.17. Do you have any idea why is this happening?

Defaults in -call

Hi @tk2,

Many thanks for maintaining RetroSeq - great tool!

After some performance tests, I just realised that the default parameter settings in your retroseq.pl -call raw description are actually not in line with the values set (currently: -reads = 10 and -q = 20). Maybe you could quickly replace the source script's lines 199-201 accordingly?

`[-depth Max average depth of a region to be considered for calling. Default is $DEFAULT_MAX_DEPTH.]

[-reads It is the minimum number of reads required to make a call. Default is $DEFAULT_READS.]

[-q Minimum mapping quality for a read mate that anchors the insertion call. Default is $DEFAULT_ANCHORQ.]`

Cheers,
Max

Discovery Phase

Dear all:
I'm trying to run the RetroSeq Discovery Phase:
perl retroseq.pl -discover -bam /zhangxuqiang/TEs/NUMB_9.rmmouse.nodup.bam -output /zhangxuqiang/TEs/candidates.tab
But the RetroSeq don't creat the output file( candidates.tab), so when I trying to run the RetroSeq Calling,i don't have the input file.

If you have any insight about it, I'll be grateful!

Separating .vcf's by individual on pooled call?

I have merged .bam files from the 1kGP (with samtools merge -r) and performed RetroSeq discovery phase on the merged .bam.

But now when I call the merged .bam I get only one .vcf output. How do I create .vcf's for each individual in the merged .bam?

This is similar to what Wildschutte did in a 2015 study. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/

Thank you.

EDIT: (May 2017) I was mistaken, the merged (pooled) .bam is used during the calling phase NOT discovery.

$cfh or $currentTypeAnchorsFile?

Hello! I am working on running through RetroSeq, and it seem that the discover stage is working - but the call stage give me this error, which I haven't been able to figure out which file it is missing?

Found sample: CPBWGS_11
Calling sample CPBWGS_11
Beginning paired-end calling...
PE Call: >rnd-2_family-5LINE/CR1-Zenon(ReconFamilySize=93,FinalMultipleAlignmentSize=88)
No such file or directory at retroseq.pl line 660, <$tfh> line 1.

Thanks!
Kristian

Is RetroSeq still able to handle split reads?

Dear @tk2,

I'm starting to use RetroSeq v1.5. From my understanding, the program is able to search for split reads using the -srmode parameter. However when I print out the help page from the discover function of RetroSeq 1.5, I obtain a different set of parameters with respect to the tutorial page of the Wiki.

Thus, I'm wondering if this version of RetroSeq is still able to handle split reads.

Thanks,
Massimiliano.

install and launch

Dear all;
I'm trying to install and launch RetroSeq (using mac os);
When I run the command line: perl retroseq.pl -discover -bam file.bam -out ...
I obtain this msg:
Error: Cant find required binary samtools
at retroseq.pl line 195.
I've already installed samtools and bedtools (not yet exonerate).

If someone has any idea to solve this problem, I'll be grateful and will try it.
Regards
E.

Running RetroSeq on heterozygous data

I'm using the latest version RetroSeq (1.5) and when I run -call with -hets I get an error that it's not a valid option. Is the latest version altered to handle heterozygous data by default?

hets option for the calling phase

Dear Mr. Keane,

I am currently analyzing about 50 samples and I have used the -hets option. However in the end, by analyzing my vcf file I only find homozygotes.

I have ran the following command :

retroseq.pl -call -depth 10000 -reads 3 -q 30 -hets -bam $path_bam_file -input candidats.tab -ref /data/UMD3_1.fa -filter /data/types.tab -output sample_HET.vcf

> 1/1:20:8:0:4:3 (in my vcf file).

In my other samples I only find homozygotes instead of heterozygotes like 1/1.
Is this normal?

Best reagards,

Milad

align

hi,
in the discovery phase, i don't really get the difference between using the align option or not. I tried both with no difference. Can you explain please?

thanks,

Compatibility Checks for samtools

I have two versions of samtools currently installed, one that is compatible and one that is not. I have put only the correct version on my path and aliased 'samtools' to point to the correct version, but when running Retroseq.pl, it terminates, claiming I am using an incompatible version of samtools.

Is there any way to allow Retroseq to check for the correct version of samtools and run without uninstalling the newer versions of it?

Thanks so much!

FATAL ERROR : Unrecognised symbol

Hi everybody!

I have an issue with RetroSeq :

** FATAL ERROR **: Unrecognised symbol 'x' (ascii:120) file:[17846.allrefs.fasta] seq:[Endogenous_retrovirus] pos:[204]
exiting ...

I checked the file and saw that in fact there are some x, k, etc. In my fasta file where are the TE sequences.
How can I deal with it?

Thank you for your help!
Lou

Cant find BAM input file or BAM index file

when I run the step 2 of Retroseq, the software tell me: Cant find BAM input file or BAM index file
here is my whole command:
step 1: retroseq.pl -discover -bam MD1_INDEL_realigned.bam -output MD1_candidates.tab -eref eref.tab -align #no error message
step2: retroseq.pl -call -bam MD1_INDEL_realigned.bam -input MD1_candidates.tab -ref PK_ref.fa -output Retroswq_MD1.vcf
here is the error message:
RetroSeq: A tool for discovery of transposable elements from short read alignments

Version: 1.5
Author: Thomas Keane ([email protected])

Cant find BAM input file or BAM index file: MD1_INDEL_realigned.bam
##############
By the way, I'm using the samtools version: 0.1.19

retroseq.pl line 158.rence file

Hi all
I'm trying to launch RetroSeq (using ubuntu os).

When I run the command line:

perl retroseq.pl -discover -eref ereff.tab -bam file_sorted.bam -output mysample.candidates.tab

I obtain this msg:
RetroSeq: A tool for discovery and genotyping of transposable elements from short read alignments

Version: 1.41
Author: Thomas Keane ([email protected])

Reading -eref file: ereff.tab
at retroseq.pl line 158.rence file: rider.fasta

Can you help me please?

file input.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.