timbitz / aligater Goto Github PK
View Code? Open in Web Editor NEWSoftware suite for detection/analysis of chimeric RNAs from LIGR-seq data
License: MIT License
Software suite for detection/analysis of chimeric RNAs from LIGR-seq data
License: MIT License
Hi again.
I tried to run the following step:
foregroundfile="sample_rep1-%.final.lig"
backgroundfile="sample_rep1-%.expression.lig"
nameParam="--fore $foregroundfile --back $backgroundfile --nd xlink,unx"
totalAMTreads=142355952
totalMOCKreads=160164420
normParam="--nc $totalAMTreads,$totalMOCKreads"
aligater stats $nameParam $normParam > output.pvl
But after a point I get the following error message:
[aligater stats]: Loading Packages..
Loading Background sample_rep1-xlink.expression.lig..
Processing probability distribution..
Loading Foreground sample_rep1-xlink.final.lig..
ERROR: LoadError: BoundsError: attempt to access 19-element Array{SubString{ASCIIString},1}:
"S"
"A:A"
"1:2"
"DPP9:DPP9"
"ENSG00000142002.12:ENSG00000142002.12"
"ENST00000598041.1:ENST00000598041.1"
"protein-coding:protein-coding"
"tRNA-Val-GTG_tRNA:tRNA-Val-GTG_tRNA"
"tRNA:tRNA"
"D00535:178:H3VVCBCXY:1:1104:4087:1975"
"gggggaaacaccACGCGAAAGGTCCCCGGTTCGAAACCGGGCGGAAACAC_CACGCGAAAGGTCCCCGGTTCGAAACCGGGCGGAAACAccacgcgaaaggt"
"96"
"1,1>1[1]>36"
"38,2"
"72,36"
"38,38"
"chr19:4724647:-,chr19:4724683:-"
"tRNA:tRNA"
"tRNA-Val-GTG_tRNA:tRNA-Val-GTG_tRNA"
at index [25]
in loadInteractionFile at Aligater/bin/stats.jl:160
in main at Aligater/bin/stats.jl:414
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:333
in process_options at ./client.jl:280
in _start at ./client.jl:378
while loading Aligater/bin/stats.jl, in expression starting on line 437
It seems that it tries to access column 25, although the files have 19 columns. Any ideas since i am not familiar with Julia?
The reclassification step (aligater reclass --uniq --save database.jlz < lig/*blast.filtered.lig) works only with Julia 0.4. If you use 0.5 it will ask for 0.4. So the documentation should change to ask specifically for 0.4.
Hello,
as I'm working on something similar, so aligater
caught my attention. I was trying to reproduce the analysis with aligater, but somehow couldn't get it to work with the default parameters. The hgSQLBasics
module seems to be missing at it is in the intRagenic
repository. Or what am I missing? Just piping the results from aligater align
should be sufficient? Also I can't get any usage information on the specific aligater steps with -h
, e.g., aligater detect -h
but I guess thats still to come.
Cheers.
Richard
When I run the following command to generate the database:
aligater reclass --uniq --save database.jlz < lig/*blast.filtered.lig
I get a bash error: "ambiguous redirect."
Then I specified the files one by one like the following:
aligater reclass --uniq --save database.jlz < lig/file1.blast.filtered.lig lig/file2.blast.filtered.lig
But now i get another error message:
"too many arguments"
Finally I concatenated the files into one and ran aligater reclass (which worked), but I am not sure if this is a proper way of doing it.
aligater reclass --uniq --save database.jlz < <(cat lig/*blast.filtered.lig)
So am I allowed to concatenate all the files and then run aligater reclass?
Hi
In the documentation you mention to download the following BLAST databases:
"nr, human_genomic, other_genomic"
I got an error at the post step and after searching on the code it seems that the databases that one should use is nt and not nr. More specifically on line 44 of Aligater/bin/post.pl you have the following line: my $blastDb = "human_genomic,other_genomic,nt"
Based on this should I use the nt database and not the nr?
Thank you in advance
Foivos
In the documentation you menttion to use the "blast databases ftp://ftp.ncbi.nlm.nih.gov/blast/db/: nr, human_genomic, other_genomic"
This means to download all the files starting with nr* human_genomic* and other_genomic* from the ftp directory? The rest of the files are not required?
Thank you in advance
Dear Dr Tim,
I am trying to use LIGR seq analysis to find out chimeric reads in cyanobacterial species. In this regard, I am unable to install packages for Julia v0.4 REPL as suggested in the README document.
julia> pkgs
("ArgParse","Match","Distributions","GZip")
julia> map(Pkg.add, pkgs)
INFO: Initializing package repository /home/jssprakash/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
fatal: unable to connect to github.com:
github.com[0: 20.207.73.82]: errno=Connection refused
ERROR: failed process: Process(git clone -q -b metadata-v2 git://github.com/JuliaLang/METADATA.jl METADATA
, ProcessExited(128)) [128]
in run at ./process.jl:531
in anonymous at pkg/dir.jl:52
in cd at ./file.jl:22
in init at pkg/dir.jl:50
in cd at pkg/dir.jl:28
in add at pkg.jl:23
in map at tuple.jl:63
I have downloaded the recent version but it is not comptible with aligater suit. Importantly, while running aligater -h, it is showing aligater command not found.
(base) jssprakash@jssprakash-ProLiant-ML350-Gen9:/Aligater$ ./aligater/Aligater$ aligater -h
Usage: aligater [sub-command] [-h]
- align : align short-reads to transcriptome
- detect : detect chimeric reads by recursive chaining of transcriptome SAM blocks
- post : post-process LIG format files with BLAST or RACTIP
- reclass : create 2D k-mer db and reclassify chimeras using heirarchical type
- stats : compare crosslinked to mock-treated samples using multinomial statistics
- table : compile interaction results into tabular format
at ./aligater line 80.
(base) jssprakash@jssprakash-ProLiant-ML350-Gen9:
aligater: command not found
I am new to Julia. I am unable to run the aligater suit. I will be really obliged if you could help me with the doubts. I have separately generated the sam file by running bowtie with the parameters mentioned in the paper and my reference genome is a modified one separately generated by adding the non-conding RNA details that are not available in public databases. Now I need to detect the parameters using -detect and generate the table finally to get the chimeric reads.
Hello again. I have a new question and this time is related to trimming of the reads. It was mentioned to me that due to the preparation (i think) of the samples, some Gs are introduced at the beginning of the reads. Indeed when I checked it there is a big portion of reads like this. The question is, should I trim a fixed number of Gs or everything that starts with G should be trimmed?
Hello Tim,
on a recent run with the original LIGR-seq dataset I'm getting the following messages when calling Aligater detect
:
Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 572, <> line 1087686070.
Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 571, <> line 1087686070.
Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 572, <> line 1087686070.
Use of uninitialized value $biotype in string eq at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 560, <> line 1087686070.
So I called ./Aligater/aligater detect -p 40 --gtf GCF_000001405.39_GRCh38.p13_all.gtf --rmsk repeatMasker.hg19.slim.bed --gfam Homo_sapiens.gene_families.txt < input.sam > outputlig
So the lines refer to:
# rank biotype by..
sub biorank {
my $biotype = shift;
return 6 if ($biotype eq "misc-RNA");
return 5 if ($biotype =~ /RNA/);
return 3 if ($biotype =~ /intron/);
return 2 if ($biotype !~ /protein-coding/);
return 4;
}
So this is just for sorting and can be neglected?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.