Giter Site home page Giter Site logo

rna-star's People

rna-star's Issues

failto install in CYGWIN

What steps will reproduce the problem?
1. Download STAR_2.3.1z15.tgz
2. Start cygwin 1.7.31 64bit version
3. Extract and use command make in the folder STAR_2.3.1z15

What is the expected output? What do you see instead?
I wanted to compile STAR_2.3.1z15 with the command make, but it produced the 
following errors starting:

Makefile:56: Depend.list: No such file or directory

What version of the product are you using? On what operating system?
STAR_2.3.1z15 with cygwin 1.7.31 64bit version


Please provide any additional information below.
Unfortunately my system language is in german, but I attached the whole error 
message and hope it helps. I am aware that cygwin is not supported officially, 
but I hoped maybe someone could help regardless as long as I must rely on 
cygwin.

Original issue reported on code.google.com by [email protected] on 5 Aug 2014 at 2:42

Attachments:

core dump on mapping

STAR 2.3.0e
Linux dna 3.2.0-41-generic #66-Ubuntu SM

THIS WORKED:
* The fasta file is a 2.8 Mbp bacteria

 STAR --runMode genomeGenerate --genomeDir ref --genomeFastaFiles 6008.fna --runThreadN 32

THIS CORE DUMPED:
* Read file is FASTQ, 31bp reads, in Phred+64 format.

Core was generated by `STAR --genomeDir ref --readFilesIn 6008_mRNA.fastq 
--runThreadN 32'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000405e4b in compareSeqToGenome(char**, unsigned long long, 
unsigned long long, unsigned long long, char*, PackedArray&, unsigned long 
long, bool, bool&, Parameters*) ()





Original issue reported on code.google.com by [email protected] on 6 Jun 2013 at 6:07

Can't install star on westgrid

What steps will reproduce the problem?
1. downloaded STAR_2.3.0e.Linux_x86_64.tgz
2. uploaded to /home/username/src/
3. unzipped following directions at 
http://www.linuxforums.org/forum/linux-tutorials-howtos-reference-material/64958
-how-install-software-linux.html
4. could not find any installation instructions
5. unable to go any further in installation process


Please provide any additional information below.

I'm completely new to linux. I'm trying to install the newest version of STAR 
because the old version I was using (2.2.0c) has the same segmentation fault 
documented here (https://groups.google.com/forum/#!topic/rna-star/j8KomjbDfW0) 
when trying to generate a genome file from a small fasta file. This was the 
recommended fix.

A way to make the installation work, or another way around the segmentation 
fault, would be very much appreciated!


Original issue reported on code.google.com by [email protected] on 29 Apr 2014 at 12:47

Number of mismatches report in the SAM file is not correct

What steps will reproduce the problem?
1. Running STAR is successful, problem is about some inconsistency of the 
alignment.
2.
3.

What is the expected output? What do you see instead?

below are a pair of reads aligned to 3 different places, one end align chr1, 
the other end align to chr1 and ch3 as a fusion read. I compared the reference 
sequences, there is no mismatch. But the nM field indicates two mismatches.


C185NACXX121031:7:1214:18090:96585  337 chr3    33430704    3   68S33M  chr1    569497  0   TTCT
AGTAAGCCTCTACCTGCACGACAACACATAATGACCCACCAATCACATGCCTATCATATAGTAAGCCCCTAAATCATCAC
CAGAATGTCTATCCATG   >CADC>:DBBB@CDDBEEBDDDBDHEHECCGGFIGHF@IGGGJJJIIJIJIIHIIJ
JIIIBIIJJHJIJJJIIHHIGEIHGHFJJJIGHGHHHFFFFFCCC   NH:i:2  HI:i:1  AS:i:36 nM:i:2
C185NACXX121031:7:1214:18090:96585  163 chr1    569497  3   101M    =   569722  293 TAGTTATTA
TCGAAACCATCAGCCTACTCATTCAACCAATAGCCCTGGCCGTACGCCTAACCGCTAACATTACTGCAGGCCACCTACTC
ATGCACCTAATT    BCCDFFFFGHHHHJJJJJJJIIJJIJJJJIJJJJIJJJHJFIJJGIJJJIBFHIJJJJGIHHHF
FDDEDCDDDDDDDCDDDDDBDDDDDDCDDDDDDDCD:   NH:i:2  HI:i:2  AS:i:168    nM:i:2
C185NACXX121031:7:1214:18090:96585  83  chr1    569722  3   68M33S  =   569497  -293    TTCTAGT
AAGCCTCTACCTGCACGACAACACATAATGACCCACCAATCACATGCCTATCATATAGTAAGCCCCTAAATCATCACCAG
AATGTCTATCCATG  >CADC>:DBBB@CDDBEEBDDDBDHEHECCGGFIGHF@IGGGJJJIIJIJIIHIIJJIIIBIIJ
JHJIJJJIIHHIGEIHGHFJJJIGHGHHHFFFFFCCC   NH:i:2  HI:i:2  AS:i:168    nM:i:2




What version of the product are you using? On what operating system?
STAR_2.3.0e_r291


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 6 May 2014 at 4:54

Problem generating Genome files: out of memory?

Hey Alex,

I am having problems indexing h19 on ec2 (m3.2xlarge) with STAR_2.3.0e_r291.  
Any thoughts on what is going on?  There is 31GB of RAM, so it should all fit:

cat /proc/meminfo  | grep Mem
MemTotal:       30828584 kB
MemFree:        29358328 kB

Log.out:
Aug 05 00:41:01 ..... Started STAR run
Aug 05 00:41:01 ... Starting to generate Genome files
Aug 05 00:42:23 ... starting to sort  Suffix Array. This may take a long time...
Aug 05 00:42:44 ... sorting Suffix Array chunks and saving them to disk...
Aug 05 01:07:44 ... loading chunks from disk, packing SA...
Aug 05 01:12:00 ... writing Suffix Array to disk ...
Aug 05 01:13:46 ... Finished generating suffix array
Aug 05 01:13:46 ... starting to generate Suffix Array index...
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
run.sh: line 5: 11810 Aborted                 STAR --runMode genomeGenerate 
--genomeDir .../human_g1k_v37/star/ --genomeFastaFiles 
....human_g1k_v37/human_g1k_v37.fasta --sjdbGTFfile .../human_g1k_v37/...gtf 
--runThreadN 8

Original issue reported on code.google.com by [email protected] on 5 Aug 2014 at 1:24

genomeGenerate with sjdb file mus muscular

I'm trying to generate a custom genome using the provided sjdb file by running 
this: 

"$star_Path --runMode genomeGenerate --genomeDir $star_genome_dir 
--genomeFastaFiles $genomeFastaFiles --runThreadN 8 --sjdbFileChrStartEnd 
$sjdbOverhangFile --sjdbOverhang 91"

I'm using the mm9.fa and the Mus_musculus.NCBIM37.66.gtf.sjdb file both 
provided under genome downloads, however this is the output: 

EXITING because of FATAL error, the sjdb chromosome Y is not found among the 
genomic chromosomes
SOLUTION: fix your file 
sjdbFileChrStartEnd=...../Mus_musculus.NCBIM37.66.gtf.sjdb_OG.txt at line #1

What is wrong? 

Olivier

Original issue reported on code.google.com by [email protected] on 12 Sep 2013 at 11:52

Segmentation fault

Hello,

I'm getting a segmentation fault while trying to align reads to a bacterial 
genome. I was able to isolate the problem to the following read:

@M01793:3:000000000-A5GLB:1:1101:14294:3045 1:N:0:4
CCGAAGGACATTGCAGCACCGTTCTGAGACTTAACAGCAGCCAGGTAGCCGAAGTAAGTCCACTGGATAGCTTCTGTCTC
TTATACACATCTCCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA
+
ABB?ADBFFFBFGGFG4GGGGGGHGGHFFHHHGHHHHFHGHHHHH3DGFGEEFGAGHEHFHHHHFHHHGFBBGHGHDGGH
HHHHHHHHGGHHHHGGGGGGHGGCEGDFHHHGF2FFEHEE0??EBBHFDFGGGHGFHGGFGGFG//>A@-

My guess is that it has something to do with the run of A at the end of the 
read. I'd like to use the aligner, but would like to refrain from diving into 
the code. Perhaps this case would easy to debug and fix given the single read 
on which it fails.

Cheers,
Alexey

Original issue reported on code.google.com by [email protected] on 15 Nov 2013 at 12:22

Unmapped fastq empty if using --runThreadN

On several occasions it happens that the Unmapped reads are not transferred 
from the _tmp folder to the main file but just get deleted or so it seems? 
(Unmapped file is empty).

If i run the software with no threading, everything is fine.

Aligned reads are reported in both cases without problems.

Original issue reported on code.google.com by [email protected] on 22 Jan 2014 at 12:16

multiple-mappers ranking

Hi,

I'd like to rank multiple-mappers based on the alignment quality.

following the manual, I can distinguish the secondary multi mappers from the 
primary

"For multi-mappers, all alignments except one are marked with 0x100 (secondary 
alignment) in the FLAG column 2. The un-marked alignment is either the best one 
(i.e. highest scoring), or is randomly selected from the alignments of equal 
quality."


is there a way I could rank the secondary multi mappers?


best,



M.



Original issue reported on code.google.com by [email protected] on 3 Feb 2014 at 6:17

Extremely high percentage of reads too short to map...

I have been using STAR with Illumina 101bp paired end reads. The first set of 
libraries I sequenced work great going through the pipeline, but I have had a 
very strange problem with the most recent libraries.

I call star using the following call:

Star_Directory/STAR --genomeDir Star_Directory/STAR_2.3.0/Genome --readFilesIn 
$f $f2 --outSAMstrandField intronMotif --runThreadN 3

where f and f2 are the paired end reads:
1-Nq-C96_S94_L001_R1_001_val_1.fq 
1-Nq-C96_S94_L001_R2_001_val_2.fq

which have been trimmed by trim_galore with the call:
trim_galore -q 15 --phred33 --paired --length 50 -a CTGTCTCTTATACACATCT 
--stringency 3 $f $f2

where f and f2 are the untrimmed fastq files:
1-Nq-C96_S94_L001_R2_001.fastq 
1-Nq-C96_S94_L001_R1_001.fastq 

For these runs the log.out file shows something like this:

                                  Started job on |  Sep 17 13:16:13
                             Started mapping on |   Sep 17 13:17:17
                                    Finished on |   Sep 17 13:17:47
       Mapping speed, Million of reads per hour |   21.76

                          Number of input reads |   181350
                      Average input read length |   179
                                    UNIQUE READS:
                   Uniquely mapped reads number |   1973
                        Uniquely mapped reads % |   1.09%
                          Average mapped length |   176.75
                       Number of splices: Total |   24
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   23
                       Number of splices: GC/AG |   1
                       Number of splices: AT/AC |   0
               Number of splices: Non-canonical |   0
                      Mismatch rate per base, % |   0.39%
                         Deletion rate per base |   0.04%
                        Deletion average length |   2.22
                        Insertion rate per base |   0.00%
                       Insertion average length |   1.50
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   948
             % of reads mapped to multiple loci |   0.52%
        Number of reads mapped to too many loci |   22
             % of reads mapped to too many loci |   0.01%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |   0.00%
                 % of reads unmapped: too short |   98.37%
                     % of reads unmapped: other |   0.01%

However looking at the Fastq files it looks like the reads are for the most 
part adequate.
I've attached abreviated versions of the two of the paired end read fastqs.

I've also attached abbreviated versions of two of the paired end fastqs that 
have mapped with a unique mapping percentage of approximately 90% (called 
read1/2_goodMappers.fq)

I am new to RNAseq analysis, so this may be a trivial issue. I am hoping I can 
get any sort of help I can.

I am using STAR 2.3.0 on Mac OSX.

Thanks so much.



Original issue reported on code.google.com by [email protected] on 18 Sep 2014 at 4:17

Attachments:

STAR freezes

What steps will reproduce the problem?
1. running STAR allignment (stranded paired-end)
2.
3.

What is the expected output? What do you see instead?
The allignment files are generated and have some content but the program is 
stuck after printing: 
Nov 19 17:18:30 ..... Started STAR run
Nov 19 17:18:36 ..... Started mapping

What version of the product are you using? On what operating system?
STAR_2.3.0e.Linux_x86_64
Running on Linux CentOS 6.4 Kernl Linux 2.6.32-358.0.1.el6.x86_64 GNOME 2.2.8.2

Please provide any additional information below.
The command is:
/rdata/ngseq/Playground/guy/STAR/STAR_2.3.0e.Linux_x86_64/STAR --genomeDir 
/rdata/ngseq/Playground/guy/STAR/Genome --readFilesIn 
/rdata/ngseq/original_data/rna/illumina/2013-05-05_Guy/GW1/fastq/R1.fastq 
/rdata/ngseq/original_data/rna/illumina/2013-05-05_Guy/GW1/fastq/R2.fastq 
--runThreadN 8

Thanks,
Guy

Original issue reported on code.google.com by [email protected] on 19 Nov 2013 at 11:04

STAR not scaling to 60 cores... or even 10.

What steps will reproduce the problem?
1. Start STAR on a RNA dataset
2. Specify --runThreadN 60 (on a 160Core Machine)
3. Watch Star use only 6~7 cores (top shows <800% CPU)

What is the expected output? What do you see instead?

In your publication you write that STAR scales well,
well, I was hoping for that, but I can't get it to scale.


What version of the product are you using? On what operating system?
...since STAR does not have a version-output, hard to say, but I can try to 
update...
Fedora 20, 64bit


Please provide any additional information below.
I also played with the 'genomeLoad' params, as a colleague told me, he saw some 
strange things, and advised me to disable the genomeLoad thing.
However, I cannot see a difference in speed (or scaling) if using "genomeLoad 
NoSharedMemory" or "genomeLoad LoadAndKeep" (followed by "genomeLoad 
LoadAndKeep")

I'll try updating STAR, lets see.

Original issue reported on code.google.com by [email protected] on 24 Feb 2014 at 1:25

genome doesn't always load into memory

What steps will reproduce the problem?
1. star --genomeDir ./hg19 --genomeLoad LoadAndExit


What is the expected output? What do you see instead?
Expected: a loaded genome.
Often, however, we receive error messages in the 'Log.out' file claiming, 
"Another job is still loading the genome, sleeping for 1 min", which isn't true.


What version of the product are you using? On what operating system?
STAR_2.1.4a_r178  on
Red Hat Enterprise Linux Server release 6.3 (Santiago)
Linux version 2.6.32-279.el6.x86_64 (gcc version 4.4.6 20120305 (Red Hat 
4.4.6-4) (GCC)) 


Original issue reported on code.google.com by [email protected] on 21 Nov 2012 at 2:30

human STAR genome not available

What steps will reproduce the problem?
1. File 
ftp://ftp2.cshl.edu/gingeraslab/tracks/STARrelease/STARgenomes/hg19_Gencode19.tg
z unavailable
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.
The mouse reference index is available but not the human.

Thank you

Original issue reported on code.google.com by [email protected] on 25 Mar 2014 at 10:33

STAR does not have any command line help


The STAR command is not very helpful. This will annoy potential users. 
Especially as the documentation is a PDF, and the src/ distribution doesn't 
even include it. An ASCII README for basic usage and some pointers to the 
documentation.

% STAR

EXITING because of fatal input ERROR: could not open readInFile=Read1

% STAR -h
% STAR --help

EXITING: FATAL INPUT ERROR: empty value for paramter "-h" in input 
"Command-Line-Initial"
SOLUTION: use non-empty value for this parameter


Original issue reported on code.google.com by [email protected] on 6 Jun 2013 at 6:10

Mac OSX compile problem

I get this error using g++-4.9:

Genome.cpp:218:57: error: 'SHM_NORESERVE' was not declared in this scope
             shmID = shmget(shmKey, shmSize, IPC_CREAT | SHM_NORESERVE | 0666); //        shmID = shmget(shmKey, shmSize, IPC_CREAT | SHM_NORESERVE | SHM_HUGETLB | 0666);
                                                         ^
make: *** [Genome.o] Error 1

Original issue reported on code.google.com by [email protected] on 22 Feb 2014 at 12:28

Error while compiling with make on cygwin 2.831

What steps will reproduce the problem?
1. Download STAR_2.3.1q.tgz
2. Start cygwin 2.831
3. Extract and use command make in the folder STAR_2.3.1q

What is the expected output? What do you see instead?
I wanted to compile STAR_2.3.1q with the command make, but it produced the 
following error:
Genome.cpp:218:57: Fehler: »SHM_NORESERVE« was not declared in this scope

What version of the product are you using? On what operating system?
STAR_2.3.1q with cygwin 2.831


Please provide any additional information below.
Unfortunately my system language is in german, but I attached the whole error 
message and hope it helps. I am aware that cygwin is not supported officially, 
but I hoped maybe someone could help regardless as long as I must rely on 
cygwin.

Original issue reported on code.google.com by [email protected] on 5 Mar 2014 at 10:34

Attachments:

No version information given

It would be nice to have some kind of an input flag that will print the version 
for when the STAR binary is installed by itself somewhere like under 
/usr/local/bin. Something like STAR --version or STAR -V

Original issue reported on code.google.com by mariogiov on 20 Nov 2013 at 10:27

STAR exiting with short read sequence error

What steps will reproduce the problem?
1. Use the following as input.
$ cat offending_R1.fa
>HS13_186:2:2106:7552:5018/1
TTGTTTTTTGTGTCTCAAATTAACAACCTAACATCATAACTGAAAGAATAAGTGAAGCAAGAACAAATCAAC
$ cat offending_R2.fa
>HS13_186:2:2106:7552:5018/2
TTGTTTTTTGTGTCTCAATTTCCTTCGATTCAGCTCTGATTTTGGTTATTTCTTATCTTCTGCTAGCTTTGG
2. Run STAR like this:
STAR_2.3.1z1/STAR --genomeDir /raid/references-and-indexes/hg19/star/2013_03_04 
--genomeLoad NoSharedMemory --readFilesIn
 offending_R1.fa offending_R2.fa --outStd SAM > test.sam

What is the expected output? What do you see instead?
STAR quits with the following error:

EXITING because of FATAL ERROR in reads input: short read sequence line: 1
Read Name=>HS13_186:2:2106:7552:5018/2
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=50000

What version of the product are you using? On what operating system?
STAR_2.3.1z1
$ cat /etc/SuSE-release
openSUSE 12.3 (x86_64)
VERSION = 12.3
CODENAME = Dartmouth

Please provide any additional information below.
I was having a problem with this input using STAR version STAR_2.3.0e although 
I think the error was different. I upgraded to see if the problem was fixed but 
it does not seem to be.

-Chris DeBoever

Original issue reported on code.google.com by [email protected] on 14 Apr 2014 at 8:51

STAR only takes the first paired fastq files when there are multiple paired fastq file input

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?

I have multiple gzipped fastq file:

I run STAR with --readFilesIn set to be: 

--readFilesIn 
CRC0321-1_FCC4JJ9ACXX_L6_HUMhiaTACYRBAPEI-220_1.fq.gz,CRC0321-1_FCC4JJ9ACXX_L7_H
UMhiaTACYRBAPEI-220_1.fq.gz,CRC0321-1_FCC4JJ9ACXX_L8_HUMhiaTACYRBAPEI-220_1.fq.g
z 
CRC0321-1_FCC4JJ9ACXX_L6_HUMhiaTACYRBAPEI-220_2.fq.gz,CRC0321-1_FCC4JJ9ACXX_L7_H
UMhiaTACYRBAPEI-220_2.fq.gz,CRC0321-1_FCC4JJ9ACXX_L8_HUMhiaTACYRBAPEI-220_2.fq.g
z

Below is the ending part of the log generated by STAR
...
Starting to map file # 0
mate 1:   
/home/hbi16088/data/projects/pdx/fastq/CRC0321-1_FCC4JJ9ACXX_L6_HUMhiaTACYRBAPEI
-220_1.fq.gz
mate 2:   
/home/hbi16088/data/projects/pdx/fastq/CRC0321-1_FCC4JJ9ACXX_L6_HUMhiaTACYRBAPEI
-220_2.fq.gz
Created thread # 7
Created thread # 8
Created thread # 9
Completed: thread #7
Completed: thread #1
Completed: thread #5
Completed: thread #3
Completed: thread #2
Completed: thread #4
Completed: thread #6
Completed: thread #9
Completed: thread #0
Joined thread # 1
Joined thread # 2
Joined thread # 3
Joined thread # 4
Joined thread # 5
Joined thread # 6
Joined thread # 7
Completed: thread #8
Joined thread # 8
Joined thread # 9
ALL DONE!
--genomeLoad=LoadAndKeep .

STAR only used the first pair of fastq file instead of all three paired fastq 
files.

What version of the product are you using? On what operating system?

STAR svn revision compiled=STAR_2.3.1z4_r419

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 3 Jun 2014 at 2:51

second mapping pass

Hi,

I'm trying to figure out how to run a second mapping pass, as described in the 
paper

"It is also possible to run a second mapping pass, supplying it with splice 
junction loci found in the first mapping pass. In this case, STAR will not 
discover any new junctions but will align spliced reads with short overhangs 
across the previously detected junctions." (Dobin et al., 2012)

I didt found any examples so I don't understand how to run it...

thanks in advance


M.





Original issue reported on code.google.com by [email protected] on 25 Jun 2013 at 9:15

SA not output with runThreadN >1

What steps will reproduce the problem?
Anytime I run START genomeGenerate with more than 1 thread, the SA file is not 
generated (see command line)
1. STAR --runMode genomeGenerate --runThreadN 6 --genomeDir ${gdir} 
--genomeFastaFiles ${faFiles}


What is the expected output? What do you see instead?
All output files are present, except for the SA file.

What version of the product are you using? On what operating system?
STAR 2.3 on Red Hat 6.3 

Please provide any additional information below.
Error output:
Jan 25 21:35:27 ..... Started STAR run
Jan 25 21:35:27 ... Starting to generate Genome files
Jan 25 21:35:27 ... starting to sort  Suffix Array. This may take a long time...
Floating point exception

Original issue reported on code.google.com by [email protected] on 26 Jan 2014 at 5:39

Genome.cpp:157:21: error: 'sleep' was not declared in this scope

What steps will reproduce the problem?
1. Run Make on ubuntu
2.
3.

What is the expected output? What do you see instead?
Compilation


What version of the product are you using? On what operating system?
STAR_2.3.0e.tgz


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 25 Sep 2013 at 9:25

Log.final.output

What steps will reproduce the problem?
1. input: paired-end read
2. Log.final.out 

What is the expected output? 
statistics for both paired-end reads
read1:  188881802
read2:  188881802

What do you see instead?
just for one single file ??

e.g.
                          Number of input reads |   188881802
                      Average input read length |   102
                                    UNIQUE READS:
                   Uniquely mapped reads number |   159058244
                        Uniquely mapped reads % |   84.21%

however Aligned.out.sam has both ends reads mapped:

418527540 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
379423218 + 0 mapped (90.66%:-nan%)
418527540 + 0 paired in sequencing
209249304 + 0 read1
209278236 + 0 read2
379423218 + 0 properly paired (90.66%:-nan%)
379423218 + 0 with itself and mate mapped


What version of the product are you using? On what operating system?
STAR 2.3 
Linux tbi-pbs1 2.6.37.6-0.20-default #1 SMP 2011-12-19 23:39:38 +0100 x86_64 
x86_64 x86_64 GNU/Linux

Please provide any additional information below.

any comments would be appreciated very much!

Original issue reported on code.google.com by [email protected] on 25 Feb 2013 at 7:50

I just cannot download STAR genomes.

What steps will reproduce the problem?
1.can not open the linkage..
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 17 Mar 2014 at 10:54

big reference fasta will crash STAR

What steps will reproduce the problem?
1. use multiple fasta (e.g. contigs) with many entries (>10000) as a reference 
file to be mapped to

What is the expected output? What do you see instead?
expected: clean run, seen: crash

What version of the product are you using? On what operating system?
newest (2.3 or so)

Original issue reported on code.google.com by [email protected] on 12 Feb 2013 at 10:18

incompatibility with Cufflinks

If you set outSAMattributes as All then the sam file generated is incompatible 
with Cufflinks.  The error is something like no XS tag for a spliced read. But 
when one uses outSAMattributes in standard mode and pass the sam file to 
cufflinks then no error is thrown. i think too many new tags when using the All 
mode create confusion for Cufflinks.

Also, i would appreciate if you could add a command line option to add RG, LB, 
SM, PL tags in the SAM file. 


What version of the product are you using? On what operating system?
Latest released version 2.3.0.1 on Linux 

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 15 Apr 2013 at 5:42

bug with ISIZE numbers in .sam ?

What steps will reproduce the problem?

1. runing star with parameters:

star --genomeDir $GENOME_DIR \
--readFilesIn $READS1 $READS2 \
--runThreadN $TREADS \
--genomeLoad LoadAndKeep \
--alignIntronMax 500000 \
--alignMatesGapMax 500000 \
--outFileNamePrefix $OUT/ \
--outFilterMultimapNmax 6 \
--outFilterMismatchNmax 3 \
--outFilterMismatchNoverLmax 0.05 \
--outFilterMatchNmin 16 \
--outFilterScoreMinOverLread 0 \
--outFilterMatchNminOverLread 0 \
--outSAMunmapped None \
--outReadsUnmapped Fastx \
--sjdbFileChrStartEnd $GENOME_DIR \
--sjdbOverhang $SJ_DB_OVERHANG \
--chimSegmentMin $CHIM_SEGMENT_MIN \
--chimScoreMin $CHIM_SCORE_MIN \
--clip3pAdapterSeq TCGTATGCCGTCTTCTGCTTG \
--clip3pAdapterMMp 0.1

2. when applied htseq-count as read counter in some cases htseq fails and 
report an error

3.What is the expected output? What do you see instead?

htseq-count normally produced table with counts for each gene-ID.
got errors, similar to this one:

Error occured when processing SAM input (line 3773018 of file 
/III_pREP_Input/star/Aligned.out.sam):
  Python int too large to convert to C long
  [Exception type: OverflowError, raised in _HTSeq.pyx:1313]

that line doesn't look ok at all - huge number or/and  merged with read 
sequence:

HWI-ST1149:193:C4309ACXX:3:1114:1490:42590  355 chr2    27274108    0   28S23M  =   27274082
    18446744073709551615CCGGGGGGATTAGCTCCAATGGTAGAGCCTCGCTTGGCTTGCGAGAGGTAG =?=DBDD
:0:>>AAA3((383>388(8>=A87:<==<AA?:?:0055=339    NH:i:5  HI:i:4  AS:i:28 nM:i:2


What version of the product are you using? On what operating system?

star231z1

Please provide any additional information below.

see other examples of problematic  lines from .sam files below


HWI-ST1149:193:C4309ACXX:3:1104:17209:8771  339 chr14   70236478    3   28S16M7S    =   69795
937 -440557 ATTGCTCTCGTTACCTCGGGAATTGAGGTTCCGAATAAGAGGTCATTGGCG HJJJJIIIJJHHFJII
JJJJJJJJJJJJJJJJJJJJJJHHHHHFFFDDB:B NH:i:2  HI:i:2  AS:i:20 nM:i:0

HWI-ST1149:193:C4309ACXX:3:1109:17901:52090 355 chr2    70230204    0   22S15M14S   =   7023
0182    18446744073709551611    GGGGCAATACAGAATGTTCGTCGAGTTAAATCCTCTGTAGACGACTTAAAT BB
CDFFFFHHHHGIJJIIJJJJJEHGHJIJJHIIIJJIJIGlsHJJJJJIHIJ NH:i:5  HI:i:2  AS:i:16 nM:i:0

HWI-ST1149:193:C4309ACXX:3:1114:1490:42590  355 chr2    27274108    0   28S23M  =   27274082
    18446744073709551615CCGGGGGGATTAGCTCCAATGGTAGAGCCTCGCTTGGCTTGCGAGAGGTAG =?=DBDD
:0:>>AAA3((383>388(8>=A87:<==<AA?:?:0055=339    NH:i:5  HI:i:4  AS:i:28 nM:i:2

HWI-ST1149:193:C4309ACXX:1:2105:10512:63315 355 chr2    70230204    1   35S15M1S    =   70230
182 18446744073709551611    GCGACGATATTTCACCACAATACAGAATGTTGGTCGAGTTAAATCCTCTGT BB@
DFFFFHHHHDIJIJJJIJJJJJIJIJHIIIHIJFIFHIJJIIJIIJIC    NH:i:4  HI:i:4  AS:i:16 nM:i:0

thanks for you help.

Vladimir

Original issue reported on code.google.com by [email protected] on 4 Apr 2014 at 1:09

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.