timbitz / aligater Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 564 KB

Software suite for detection/analysis of chimeric RNAs from LIGR-seq data

License: MIT License

Perl 65.21% Julia 34.79%

aligater's People

Contributors

Stargazers

Watchers

Forkers

xflicsu

aligater's Issues

aligater stats error

Hi again.

I tried to run the following step:

foregroundfile="sample_rep1-%.final.lig"
backgroundfile="sample_rep1-%.expression.lig"

nameParam="--fore $foregroundfile --back $backgroundfile --nd xlink,unx"

totalAMTreads=142355952
totalMOCKreads=160164420

normParam="--nc $totalAMTreads,$totalMOCKreads"

aligater stats $nameParam $normParam > output.pvl

But after a point I get the following error message:

[aligater stats]: Loading Packages..
Loading Background sample_rep1-xlink.expression.lig..
Processing probability distribution..
Loading Foreground sample_rep1-xlink.final.lig..
ERROR: LoadError: BoundsError: attempt to access 19-element Array{SubString{ASCIIString},1}:
"S"
"A:A"
"1:2"
"DPP9:DPP9"
"ENSG00000142002.12:ENSG00000142002.12"
"ENST00000598041.1:ENST00000598041.1"
"protein-coding:protein-coding"
"tRNA-Val-GTG_tRNA:tRNA-Val-GTG_tRNA"
"tRNA:tRNA"
"D00535:178:H3VVCBCXY:1:1104:4087:1975"
"gggggaaacaccACGCGAAAGGTCCCCGGTTCGAAACCGGGCGGAAACAC_CACGCGAAAGGTCCCCGGTTCGAAACCGGGCGGAAACAccacgcgaaaggt"
"96"
"1,1>1[1]>36"
"38,2"
"72,36"
"38,38"
"chr19:4724647:-,chr19:4724683:-"
"tRNA:tRNA"
"tRNA-Val-GTG_tRNA:tRNA-Val-GTG_tRNA"
at index [25]
in loadInteractionFile at Aligater/bin/stats.jl:160
in main at Aligater/bin/stats.jl:414
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:333
in process_options at ./client.jl:280
in _start at ./client.jl:378
while loading Aligater/bin/stats.jl, in expression starting on line 437

It seems that it tries to access column 25, although the files have 19 columns. Any ideas since i am not familiar with Julia?

Julia Version

The reclassification step (aligater reclass --uniq --save database.jlz < lig/*blast.filtered.lig) works only with Julia 0.4. If you use 0.5 it will ask for 0.4. So the documentation should change to ask specifically for 0.4.

hgSQLBasics missing?

Hello,
as I'm working on something similar, so aligater caught my attention. I was trying to reproduce the analysis with aligater, but somehow couldn't get it to work with the default parameters. The hgSQLBasics module seems to be missing at it is in the intRagenic repository. Or what am I missing? Just piping the results from aligater align should be sufficient? Also I can't get any usage information on the specific aligater steps with -h, e.g., aligater detect -h but I guess thats still to come.

Cheers.
Richard

Error on aligater reclass (generate database)

When I run the following command to generate the database:
aligater reclass --uniq --save database.jlz < lig/*blast.filtered.lig
I get a bash error: "ambiguous redirect."

Then I specified the files one by one like the following:
aligater reclass --uniq --save database.jlz < lig/file1.blast.filtered.lig lig/file2.blast.filtered.lig

But now i get another error message:
"too many arguments"

Finally I concatenated the files into one and ran aligater reclass (which worked), but I am not sure if this is a proper way of doing it.
aligater reclass --uniq --save database.jlz < <(cat lig/*blast.filtered.lig)

So am I allowed to concatenate all the files and then run aligater reclass?

Blast problem

In the documentation you mention to download the following BLAST databases:
"nr, human_genomic, other_genomic"

I got an error at the post step and after searching on the code it seems that the databases that one should use is nt and not nr. More specifically on line 44 of Aligater/bin/post.pl you have the following line: my $blastDb = "human_genomic,other_genomic,nt"

Based on this should I use the nt database and not the nr?

Thank you in advance
Foivos

Question regarding blast databases

In the documentation you menttion to use the "blast databases ftp://ftp.ncbi.nlm.nih.gov/blast/db/: nr, human_genomic, other_genomic"

This means to download all the files starting with nr* human_genomic* and other_genomic* from the ftp directory? The rest of the files are not required?

Thank you in advance

Queries regarding execution of Aligater suit for LIGR seq Analysis

Dear Dr Tim,
I am trying to use LIGR seq analysis to find out chimeric reads in cyanobacterial species. In this regard, I am unable to install packages for Julia v0.4 REPL as suggested in the README document.

julia> pkgs
("ArgParse","Match","Distributions","GZip")

julia> map(Pkg.add, pkgs)
INFO: Initializing package repository /home/jssprakash/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
fatal: unable to connect to github.com:
github.com[0: 20.207.73.82]: errno=Connection refused

ERROR: failed process: Process(git clone -q -b metadata-v2 git://github.com/JuliaLang/METADATA.jl METADATA, ProcessExited(128)) [128]
in run at ./process.jl:531
in anonymous at pkg/dir.jl:52
in cd at ./file.jl:22
in init at pkg/dir.jl:50
in cd at pkg/dir.jl:28
in add at pkg.jl:23
in map at tuple.jl:63

I have downloaded the recent version but it is not comptible with aligater suit. Importantly, while running aligater -h, it is showing aligater command not found.

(base) jssprakash@jssprakash-ProLiant-ML350-Gen9:/Aligater$ ./aligater
Usage: aligater [sub-command] [-h]
- align : align short-reads to transcriptome
- detect : detect chimeric reads by recursive chaining of transcriptome SAM blocks
- post : post-process LIG format files with BLAST or RACTIP
- reclass : create 2D k-mer db and reclassify chimeras using heirarchical type
- stats : compare crosslinked to mock-treated samples using multinomial statistics
- table : compile interaction results into tabular format
at ./aligater line 80.
(base) jssprakash@jssprakash-ProLiant-ML350-Gen9:/Aligater$ aligater -h
aligater: command not found

I am new to Julia. I am unable to run the aligater suit. I will be really obliged if you could help me with the doubts. I have separately generated the sam file by running bowtie with the parameters mentioned in the paper and my reference genome is a modified one separately generated by adding the non-conding RNA details that are not available in public databases. Now I need to detect the parameters using -detect and generate the table finally to get the chimeric reads.

Gs at the beginning of the reads

Hello again. I have a new question and this time is related to trimming of the reads. It was mentioned to me that due to the preparation (i think) of the samples, some Gs are introduced at the beginning of the reads. Indeed when I checked it there is a big portion of reads like this. The question is, should I trim a fixed number of Gs or everything that starts with G should be trimmed?

Use of uninitialized value

Hello Tim,

on a recent run with the original LIGR-seq dataset I'm getting the following messages when calling Aligater detect:

Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 572, <> line 1087686070.
Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 571, <> line 1087686070.
Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 572, <> line 1087686070.
Use of uninitialized value $biotype in string eq at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 560, <> line 1087686070.

So I called ./Aligater/aligater detect -p 40 --gtf GCF_000001405.39_GRCh38.p13_all.gtf --rmsk repeatMasker.hg19.slim.bed --gfam Homo_sapiens.gene_families.txt < input.sam > outputlig

So the lines refer to:

# rank biotype by..
sub biorank {
  my $biotype = shift;
  return 6 if ($biotype eq "misc-RNA");
  return 5 if ($biotype =~ /RNA/);
  return 3 if ($biotype =~ /intron/);
  return 2 if ($biotype !~ /protein-coding/);
  return 4;
}

So this is just for sorting and can be neglected?
Thanks

timbitz / aligater Goto Github PK

aligater's People

Contributors

Stargazers

Watchers

Forkers

aligater's Issues

aligater stats error

Julia Version

hgSQLBasics missing?

Error on aligater reclass (generate database)

Blast problem

Question regarding blast databases

Queries regarding execution of Aligater suit for LIGR seq Analysis

Gs at the beginning of the reads

Use of uninitialized value

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent