Giter Site home page Giter Site logo

aligater's People

Contributors

timbitz avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

xflicsu

aligater's Issues

aligater stats error

Hi again.

I tried to run the following step:

foregroundfile="sample_rep1-%.final.lig"
backgroundfile="sample_rep1-%.expression.lig"

nameParam="--fore $foregroundfile --back $backgroundfile --nd xlink,unx"

totalAMTreads=142355952
totalMOCKreads=160164420

normParam="--nc $totalAMTreads,$totalMOCKreads"

aligater stats $nameParam $normParam > output.pvl

But after a point I get the following error message:

[aligater stats]: Loading Packages..
Loading Background sample_rep1-xlink.expression.lig..
Processing probability distribution..
Loading Foreground sample_rep1-xlink.final.lig..
ERROR: LoadError: BoundsError: attempt to access 19-element Array{SubString{ASCIIString},1}:
"S"
"A:A"
"1:2"
"DPP9:DPP9"
"ENSG00000142002.12:ENSG00000142002.12"
"ENST00000598041.1:ENST00000598041.1"
"protein-coding:protein-coding"
"tRNA-Val-GTG_tRNA:tRNA-Val-GTG_tRNA"
"tRNA:tRNA"
"D00535:178:H3VVCBCXY:1:1104:4087:1975"
"gggggaaacaccACGCGAAAGGTCCCCGGTTCGAAACCGGGCGGAAACAC_CACGCGAAAGGTCCCCGGTTCGAAACCGGGCGGAAACAccacgcgaaaggt"
"96"
"1,1>1[1]>36"
"38,2"
"72,36"
"38,38"
"chr19:4724647:-,chr19:4724683:-"
"tRNA:tRNA"
"tRNA-Val-GTG_tRNA:tRNA-Val-GTG_tRNA"
at index [25]
in loadInteractionFile at Aligater/bin/stats.jl:160
in main at Aligater/bin/stats.jl:414
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:333
in process_options at ./client.jl:280
in _start at ./client.jl:378
while loading Aligater/bin/stats.jl, in expression starting on line 437

It seems that it tries to access column 25, although the files have 19 columns. Any ideas since i am not familiar with Julia?

Julia Version

The reclassification step (aligater reclass --uniq --save database.jlz < lig/*blast.filtered.lig) works only with Julia 0.4. If you use 0.5 it will ask for 0.4. So the documentation should change to ask specifically for 0.4.

hgSQLBasics missing?

Hello,
as I'm working on something similar, so aligater caught my attention. I was trying to reproduce the analysis with aligater, but somehow couldn't get it to work with the default parameters. The hgSQLBasics module seems to be missing at it is in the intRagenic repository. Or what am I missing? Just piping the results from aligater align should be sufficient? Also I can't get any usage information on the specific aligater steps with -h, e.g., aligater detect -h but I guess thats still to come.

Cheers.
Richard

Error on aligater reclass (generate database)

When I run the following command to generate the database:
aligater reclass --uniq --save database.jlz < lig/*blast.filtered.lig
I get a bash error: "ambiguous redirect."

Then I specified the files one by one like the following:
aligater reclass --uniq --save database.jlz < lig/file1.blast.filtered.lig lig/file2.blast.filtered.lig

But now i get another error message:
"too many arguments"

Finally I concatenated the files into one and ran aligater reclass (which worked), but I am not sure if this is a proper way of doing it.
aligater reclass --uniq --save database.jlz < <(cat lig/*blast.filtered.lig)

So am I allowed to concatenate all the files and then run aligater reclass?

Blast problem

Hi

In the documentation you mention to download the following BLAST databases:
"nr, human_genomic, other_genomic"

I got an error at the post step and after searching on the code it seems that the databases that one should use is nt and not nr. More specifically on line 44 of Aligater/bin/post.pl you have the following line: my $blastDb = "human_genomic,other_genomic,nt"

Based on this should I use the nt database and not the nr?

Thank you in advance
Foivos

Question regarding blast databases

In the documentation you menttion to use the "blast databases ftp://ftp.ncbi.nlm.nih.gov/blast/db/: nr, human_genomic, other_genomic"

This means to download all the files starting with nr* human_genomic* and other_genomic* from the ftp directory? The rest of the files are not required?

Thank you in advance

Queries regarding execution of Aligater suit for LIGR seq Analysis

Dear Dr Tim,
I am trying to use LIGR seq analysis to find out chimeric reads in cyanobacterial species. In this regard, I am unable to install packages for Julia v0.4 REPL as suggested in the README document.

julia> pkgs
("ArgParse","Match","Distributions","GZip")

julia> map(Pkg.add, pkgs)
INFO: Initializing package repository /home/jssprakash/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
fatal: unable to connect to github.com:
github.com[0: 20.207.73.82]: errno=Connection refused

ERROR: failed process: Process(git clone -q -b metadata-v2 git://github.com/JuliaLang/METADATA.jl METADATA, ProcessExited(128)) [128]
in run at ./process.jl:531
in anonymous at pkg/dir.jl:52
in cd at ./file.jl:22
in init at pkg/dir.jl:50
in cd at pkg/dir.jl:28
in add at pkg.jl:23
in map at tuple.jl:63

I have downloaded the recent version but it is not comptible with aligater suit. Importantly, while running aligater -h, it is showing aligater command not found.

(base) jssprakash@jssprakash-ProLiant-ML350-Gen9:/Aligater$ ./aligater
Usage: aligater [sub-command] [-h]
- align : align short-reads to transcriptome
- detect : detect chimeric reads by recursive chaining of transcriptome SAM blocks
- post : post-process LIG format files with BLAST or RACTIP
- reclass : create 2D k-mer db and reclassify chimeras using heirarchical type
- stats : compare crosslinked to mock-treated samples using multinomial statistics
- table : compile interaction results into tabular format
at ./aligater line 80.
(base) jssprakash@jssprakash-ProLiant-ML350-Gen9:
/Aligater$ aligater -h
aligater: command not found

I am new to Julia. I am unable to run the aligater suit. I will be really obliged if you could help me with the doubts. I have separately generated the sam file by running bowtie with the parameters mentioned in the paper and my reference genome is a modified one separately generated by adding the non-conding RNA details that are not available in public databases. Now I need to detect the parameters using -detect and generate the table finally to get the chimeric reads.

Gs at the beginning of the reads

Hello again. I have a new question and this time is related to trimming of the reads. It was mentioned to me that due to the preparation (i think) of the samples, some Gs are introduced at the beginning of the reads. Indeed when I checked it there is a big portion of reads like this. The question is, should I trim a fixed number of Gs or everything that starts with G should be trimmed?

Use of uninitialized value

Hello Tim,

on a recent run with the original LIGR-seq dataset I'm getting the following messages when calling Aligater detect:

Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 572, <> line 1087686070.
Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 571, <> line 1087686070.
Use of uninitialized value $name in pattern match (m//) at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 572, <> line 1087686070.
Use of uninitialized value $biotype in string eq at /data/richard/LIGR-seq/Aligater/bin/detect.pl line 560, <> line 1087686070.

So I called ./Aligater/aligater detect -p 40 --gtf GCF_000001405.39_GRCh38.p13_all.gtf --rmsk repeatMasker.hg19.slim.bed --gfam Homo_sapiens.gene_families.txt < input.sam > outputlig

So the lines refer to:

# rank biotype by..
sub biorank {
  my $biotype = shift;
  return 6 if ($biotype eq "misc-RNA");
  return 5 if ($biotype =~ /RNA/);
  return 3 if ($biotype =~ /intron/);
  return 2 if ($biotype !~ /protein-coding/);
  return 4;
}

So this is just for sorting and can be neglected?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.