ratschlab / spladder Goto Github PK

Tool for the detection and quantification of alternative splicing events from RNA-Seq data.

License: Other

Makefile 0.44% Shell 1.68% Python 97.88%

spladder's Introduction

What is SplAdder?

This README describes the software SplAdder, short for Splicing Adder, a toolbox for alternative splicing analysis based on RNA-Seq alignment data. Briefly, the software takes a given annotation and RNA-Seq read alignments in standardized formats, transforms the annotation into a splicing graph representation, augments the splicing graph with additional information extracted from the read data, extracts alternative splicing events from the graph and quantifies the events based on the alignment data. The quantified events can then be used for differential analysis.

Dependencies and Installation

SplAdder relies on Python3 and requires only few standard packages. A complete list of dependencies can be found in requirements.txt

We recommend using a python package manager such as anaconda to organize dependencies.

Installation via pip: pip install spladder
Installation from source:
1. clone this repository
2. Within the SplAdder root directory, run make install

Documentation

This README provides only a high-level overview of a basic SplAdder run. For further reading, please consider the online Documentation.

After installation, the command spladder becomes available in your path. Invoking SplAdder without any parameters will print a description of the command line interface to the screen.

Test with Example Data

If you installed SplAdder from source, you can test the installation by invoking make test in the SplAdder root directory.

Versions for Matlab and Python 2.7

Previous versions of SplAdder were provided for both Matlab and Python. Since 2019, the Matlab code is no longer provided as part of the SplAdder package. If you are interested in the Matlab code, please download the initial release.

If you are interested in previous versions of SplAdder capable of running under Python 2.7, please use release 1.2.1. Please note, that the Python 2.7 code will be no longer maintained.

Authors

Information on how to contact the authors can be found in the AUTHORS file.

License and Disclaimer

All licensing information can be found in the COPYRIGHT file.

spladder's People

Contributors

Stargazers

Watchers

spladder's Issues

Run Error

Hello,
Below is the last few lines of error I am getting when I ran the job with a single BAM. I am using Gencode GFF3 and STAR mapped sorted BAM. Could you please help me out understanding the problem.

RUN
python spladder.py -b $bam -o $outdir -a $GFF -v y -t exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skip -c 3 -p n #-e

OUTPUT
Number of XOR exons: 137

Make intron_retention events unique by strain
. . . events dropped: 0
Traceback (most recent call last):
File "/home/kesara/tools/SplAdder/python/spladder.py", line 322, in
spladder()
File "/home/kesara/tools/SplAdder/python/spladder.py", line 311, in spladder
collect_events(CFG)
File "/home/kesara/tools/SplAdder/python/modules/alt_splice/collect.py", line 371, in collect_events
events_all = post_process_event_struct(intron_reten_pos_all, CFG)
File "/home/kesara/tools/SplAdder/python/modules/alt_splice/events.py", line 62, in post_process_event_struct
events = sort_events_by_event(events, CFG)
File "/home/kesara/tools/SplAdder/python/modules/alt_splice/events.py", line 27, in sort_events_by_event
coord_list = sp.array([x.get_inner_coords() for x in event_list], dtype='double')
ValueError: setting an array element with a sequence.

Spladder error loading introns

Hi,

I'm running into some errors while trying to run Spladder. I'm using a Cufflinks-created gtf because I work on a non-model system. I'm not sure if the error is related to my GTF, but just in case, this is what it looks like:

scaffold_1 Cufflinks transcript 12 412 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "0.8222662114"; frac "1.000000"; conf_lo "0.471651"; conf_hi "1.172882"; cov "7.148884";
scaffold_1 Cufflinks exon 12 412 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "0.8222662114"; frac "1.000000"; conf_lo "0.471651"; conf_hi "1.172882"; cov "7.148884";
scaffold_1 Cufflinks transcript 31940 35734 1000 . . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "3.1558882615"; frac "1.000000"; conf_lo "2.937705"; conf_hi "3.374071"; cov "35.466206";
scaffold_1 Cufflinks exon 31940 35734 1000 . . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "3.1558882615"; frac "1.000000"; conf_lo "2.937705"; conf_hi "3.374071"; cov "35.466206";

The error message I got is:

Generating splice graph ...
...done.
Loading introns from file ...
Traceback (most recent call last):
File "/home/cameron/Programs/spladder-master/python/spladder.py", line 255, in
spladder()
File "/home/cameron/Programs/spladder-master/python/spladder.py", line 180, in spladder
spladder_core(CFG)
File "/home/cameron/Programs/spladder-master/python/modules/core/spladdercore.py", line 26, in spladder_core
genes = gen_graphs(genes, CFG)
File "/home/cameron/Programs/spladder-master/python/modules/core/gen_graphs.py", line 82, in gen_graphs
introns = get_intron_list(genes, CFG)
File "/home/cameron/Programs/spladder-master/python/modules/reads.py", line 336, in get_intron_list
chunks = sp.array([[CFG['chrm_lookup'][x.chr], strands.index(x.strand), x.start, x.stop] for x in genes], dtype = 'int')
ValueError: '.' is not in list

The command I ran:
python ~/Programs/spladder-master/python/spladder.py -a ~/Programs/Cufflinks.gtf -b ../tophat.sorted.bam -o ./spladder_out

Also, I'm getting this warning message (repeatedly). I'm assuming this is because of the format of the Cufflinks GTF file. Will this cause any issues?
WARNING: /home/cameron/Cufflinks.gtf does not have gene level information for transcript CUFF.33832.1 - information has been inferred from tags

Any advise you could provide would be much appreciated.

Thanks,

Cam

Name Error: r_idx not found when using -R

Hi Andre!

When running spladder with 6 input files (3 x wt, 3 x ko), I run into an Name error as follows:

Traceback (most recent call last):
  File "/mnt/software/x86_64/packages/spladder/0.1/python/spladder.py", line 265, in <module>
    spladder()
  File "/mnt/software/x86_64/packages/spladder/0.1/python/spladder.py", line 261, in spladder
    analyze_events(CFG, CFG['event_types'][idx])
  File "/mnt/software/x86_64/packages/spladder/0.1/python/modules/alt_splice/analyze.py", line 51, in analyze_events
    rep_tag = '_R%i' % r_idx
NameError: global name 'r_idx' is not defined

I used the parameter -R to encode the replicates (-R 1,1,1,2,2,2). Hope the usage of -R was as intended?

Best,
Jens

Improve Tutorial Explanations

For example, instead of simply showing the reader that -b examples/NMD_WT1.tiny.bam,examples/NMD_WT2.tiny.bam,examples/NMD_DBL1.tiny.bam,examples/NMD_DBL2.tiny.bam is suitable, there should be some explanation of how the software determines which samples belong to which experimental condition and in which direction the differences are calculated in (e.g. DBL - WT or WT -DBL),

It would also be nice if the last page of the tutorial showed the first few lines of the confirmed text file and explain why C3 suddenly appeared in all of the file names. Showing the head of result summary file would enable the reader to quickly gauge if the software is what they want or not.

Warnings for human gencode.v25.primary_assembly.annotation.gtf

Dear Andre, dear all,

I get some warnings when working with gencode.v25.primary_assembly.annotation.gtf. Should I be worried about these? The number of excluded genes seems to be extremely high to me, and the question is how many genes remain after this cleanup afterall. Thanks for your help!

Regards
Henrik

WARNING: removing 34008 genes from given annotation that overlap to each other:
list of excluded genes written to: [...]/gencode.v25.primary_assembly.annotation.gtf.genes_excluded_gene_overlap
WARNING: removing 44 genes from given annotation that share exact exon coordines:
list of excluded exons written to: [...]/gencode.v25.primary_assembly.annotation.gtf.genes_excluded_exon_shared
WARNING: removing 6590 genes from given annotation that have at least one transcript with overlapping exons.
list of excluded genes written to: [...]/gencode.v25.primary_assembly.annotation.gtf.genes_excluded_exon_overlap

over-counting in genes_graph_conf2.merge_graphs.gene_exp.hdf5

Hi Andre,

Since spladder produces gene-level counts, I thought I would use them as input to DESeq2, limma etc. to identify DEGs. Overall, the results made sense for the HIV data set I am looking at (eg, up-regulation of Interferon-inducible genes). But when I compared spladder counts with summarizeOverlap counts (from GenomicAlignments Bioconductor package), I found genes for which spladder over-counts. I attach a spreadsheet with a count comparison for human Chr 7. Spladder counts are overall greater by a factor of 2x, which is the result of counting reads rather than fragments. But there also are cases where the spladder count is greater by orders of magnitude. I have highlighted a particular case: ENST00000390416 (TRBJ2-4). Spladder gives a count of ~100,000 across the five samples, versus a summarizeOverlap count of ~100. The spladder count concurs with what samtools view outputs for the region corresponding to ENST00000390416. However, the vast majority of the alignments contain a splice junction, i.e., an intron of another gene that contains ENST00000390416. samtools is not splice-aware so that I understand its count. Are the spladder counts in *.merge_graphs.gene_exp.hdf5 meant to take into account splicing? If not, then my observation is not really an issue.

Best, Reiner
counts.7.ods.zip

Error with spladder_test: alt_5prime fails on scipy TypeError

Hi Andre,

This has gotten even better with the introduction of the testing module! I was running it on my data, and I came across an error while the script was analyzing the alternative 5'ss events (the alt3_prime, intron retention, and exon skipping all worked fine).

Here is the traceback and the place where the script failed:

Testing alt_5prime events
Collecting intron confirmation values
Exclude 212 of 13393 alt_5prime events (1.58 percent) from testing due to low coverage
Estimating raw dispersions
[##################################################] 26350 / 26362 (100%) - took 499 sec (ETA: 0 sec)Found dispersion fit
Start to estimate adjusted dispersions.
[###################                               ] 10013 / 26362 (38%) - took 214 sec (ETA: 350 sec)Traceback (most recent call last):
  File "/home/wulab2015linux/warren/Software/spladder_master/spladder/python/spladder_test.py", line 859, in <module>
    main()
  File "/home/wulab2015linux/warren/Software/spladder_master/spladder/python/spladder_test.py", line 841, in main
    pvals = run_testing(cov, dmatrix0, dmatrix1, sf, CFG)
  File "/home/wulab2015linux/warren/Software/spladder_master/spladder/python/spladder_test.py", line 646, in run_testing
    (disp_adj, disp_adj_conv) = adjust_dispersion(cov, dmatrix1, disp_raw, disp_fitted, disp_idx, sf, CFG)
  File "/home/wulab2015linux/warren/Software/spladder_master/spladder/python/spladder_test.py", line 503, in adjust_dispersion
    (disp_adj, disp_adj_conv, _) = adjust_dispersion_chunk(counts, dmatrix1, disp_raw, disp_fitted, varPrior, sf, CFG, sp.arange(counts.shape[0]), log=CFG['verbose'])
  File "/home/wulab2015linux/warren/Software/spladder_master/spladder/python/spladder_test.py", line 447, in adjust_dispersion_chunk
    res = minimize_scalar(adj_loglikelihood_shrink_scalar_onedisper, args=(dmatrix1, resp, yhat, disp_fitted[i], varPrior, sign), method='Bounded', bounds=(0, 10.0), tol=1e-5)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/_minimize.py", line 608, in minimize_scalar
    return _minimize_scalar_bounded(fun, bounds, args, **options)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 1644, in _minimize_scalar_bounded
    fu = func(x, *args)
  File "/home/wulab2015linux/warren/Software/spladder_master/spladder/python/spladder_test.py", line 399, in adj_loglikelihood_shrink_scalar_onedisper
    loglik_adj = adj_loglikelihood_scalar(disp, explanatory, response, yhat, 1.0)
  File "/home/wulab2015linux/warren/Software/spladder_master/spladder/python/spladder_test.py", line 410, in adj_loglikelihood_scalar
    loglik = sum(nbinom.logpmf(y, n, p))
  File "/usr/local/lib/python2.7/dist-packages/scipy/stats/_distn_infrastructure.py", line 2862, in logpmf
    place(output, cond, self._logpmf(*goodargs))
  File "/usr/local/lib/python2.7/dist-packages/scipy/stats/_discrete_distns.py", line 167, in _logpmf
    return coeff + n*log(p) + special.xlog1py(x, -p)

Let me know if you need any other information. I'm happy to send you the offending HDF5 file (as well as one that was processed correctly) via email if that would help with debugging.

SplAdder handling of reads spanning multiple junctions

Dear Dr. Kahles,

Thanks again for writing such a great tool!

As I work with longer reads, I'm seeing a lot that span multiple splice junctions. It looks as though SplAdder may be considering junctions only individually - is this the case? Specifically I'm seeing scenarios where a read contains one junction that's consistent with an event, and a second that is inconsistent with the event, but the read is still counted as contributing to the event. E.g. a single exon skip overlaps with a multiple exon skip (same flanking exons for both events) - reads containing all 3 junctions of the single exon skip should not be counted towards the multiple exon skip, but still appear to be.

Also, if a single read contains two junctions consistent with an event, is it then counted twice for that event?

Thanks again!
Best,
Andrew

partial intron retention

Hi there,

I am interested in possibly modifying spladder's cutoffs/parameters to enable detection of partial intron retention, particularly in genes with very big introns. For example, I have some known AS events in cell line data where a transcript is truncated, ending between two exons, where the intron between them is partially retained, and I noticed that spladder does not identify these.

One criterion I noticed that is likely causing this is the minimum fraction of covered positions in the intron being 0.75-0.9. Do you have any suggestions on how I could possibly achieve detection of events like the one described above? Any help would be appreciated.

P.S. Many thanks for developing/maintaining spladder :)

Best,
Pegah

PAIRED-END

Hi, does the spladder work with paired-end reads

ValueError: invalid type: int64

Hi Andre,

I am running spladder with RNA-seq data of silkworms, the gff of which was downloaded from NCBI. After several tries of re-formatting the .gff file, I loaded the gff successfully, and the script goes well in the intron loading step and casette exons insertion step. However it dies at the intron retentions insertion step, with the following message:

Inserting intron retentions ...
^M 100(11654) genes done (found 0 new retentions in 369 tested introns, 0.0%) Traceback (most recent call last):
File "/home/lzc/soft/spladder-master/python/spladder.py", line 322, in <module> spladder()
File "/home/lzc/soft/spladder-master/python/spladder.py", line 223, in splatter spladder_core(CFG)
File "/home/lzc/soft/spladder-master/python/modules/core/spladdercore.py", line 21, in spladder_core genes = gen_graphs(genes, CFG)
File "/home/lzc/soft/spladder-master/python/modules/core/gen_graphs.py", line 113, in gen_graphs genes, inserted_ = insert_intron_retentions(genes, CFG)
File "/home/lzc/soft/spladder-master/python/modules/editgraph.py", line 367, in insert_intron_retentions new_retention = scipy.sparse.linalg.expm(new_retention)
File "/usr/lib64/python2.7/site-packages/scipy/sparse/linalg/matfuncs.py", line 113, in expmraise ValueError("invalid type: "+str(A.dtype))
ValueError: invalid type: int64
(END)

Here is my command:
python /home/lzc/soft/spladder-master/python/spladder.py -b ../mapping/bam/BmNPV-1.sorted.bam,../mapping/bam/BmNPV-2.sorted.bam,../mapping/bam/BmNPV-4.sorted.bam,../mapping/bam/WT-1.sorted.bam,../mapping/bam/WT-3.sorted.bam,../mapping/bam/WT-4.sorted.bam -R 1,1,1,2,2,2 -a /home/lzc/silkworm/genome/my.gff -o ../AS -v y --output_struc=y

Can you figure out the causes of this problem ?
Thanks,

WQJ

spladder.py (v1.0.0): GFF start coordinates off by -1 base

Have attached a UCSC screenshot to illustrate. In the case shown, the DNA sequence contains two canonical AG splice acceptor sites in tandem. Spladder finds evidence for the use of both, based on the exon_skip GFF output, but in both cases, the start coordinate of the GFF feature is off by -1 base.

This happens irrespective of the strand from which the gene is transcribed, and the GFF end coordinates appear to be correct.

gffStartOffBy-1Base.pdf.zip

Error while running spladder_test.py

Hi Andre,
I am having following problem while running spladder_test.py :

/home/asgar.hussain/anaconda2/bin/python2.7 /home/asgar.hussain/scratch/tools/spladder-master/python/spladder_test.py -o 150_CC_as_events_conf3_GENCODE/ -a 150_CC_as_events_conf3_GENCODE/merge_graphs_exon_skip_C3.counts.hdf5 -b 150_FC_as_events_conf3_GENCODE/merge_graphs_exon_skip_C3.counts.hdf5
Traceback (most recent call last):
File "/home/asgar.hussain/scratch/tools/spladder-master/python/spladder_test.py", line 873, in
main()
File "/home/asgar.hussain/scratch/tools/spladder-master/python/spladder_test.py", line 787, in main
(cov, gene_idx, event_idx, event_ids, event_strains) = quantify.quantify_from_counted_events(CFG['fname_events'], sp.r_[idx1, idx2], event_type, CFG)
File "/scratch/asgar.hussain/tools/spladder-master/python/modules/alt_splice/quantify.py", line 508, in quantify_from_counted_events
cov[0] += IN['event_counts'][strain_idx, f[0], :][:, event_idx].T
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654)
File "/home/asgar.hussain/anaconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 462, in getitem
selection = sel.select(self.shape, args, dsid=self.id)
File "/home/asgar.hussain/anaconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 88, in select
sel[args]
File "/home/asgar.hussain/anaconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 388, in getitem
mshape = list(count)
UnboundLocalError: local variable 'count' referenced before assignment

Double-booked exons

SplAdder produces the following error when using annotations with exons assigned to multiple genes:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

at the step 'Make alt_3prime events unique by strain'
As discussed, this will require SplAdder to select one of the alternative genes claiming the same exon. I am adding this post to keep track of this issue.

Thanks!

Loading introns error

Dear Andre,

Although spladder works fine with marmoset data and example data, it does not work well with human data. Here is what I get when I run spladder on human data.

Augmenting splice graphs.

Generating splice graph ...

Total genes: 54121
Total genes with alternative isoforms: 24473
Total genes alternatively spliced: 23570
Total constitutively spliced: 30551
...done.

Loading introns from file ...
Traceback (most recent call last):
File "./spladder/python/spladder.py", line 215, in
spladder()
File "./spladder/python/spladder.py", line 177, in spladder
spladder_core(CFG)
File "/home/och/PROJECTs/00_RNA_seq_practice/spladder/python/modules/core/spladdercore.py", line 26, in spladder_core
genes = gen_graphs(genes, CFG)
File "/home/och/PROJECTs/00_RNA_seq_practice/spladder/python/modules/core/gen_graphs.py", line 82, in gen_graphs
introns = get_intron_list(genes, CFG)
File "/home/och/PROJECTs/00_RNA_seq_practice/spladder/python/modules/reads.py", line 370, in get_intron_list
chunks = sp.array([[CFG['chrm_lookup'][x.chr], strands.index(x.strand), x.start, x.stop] for x in genes], dtype = 'int')
ValueError: '.' is not in list

Thanks,
Yunsoo Kim

KeyError?

Hello,

I've pulled the latest commit from the master branch and I'm getting the following error when I run spladder.py:

python /home/fletcher/spladder/spladder/python/spladder.py -b ${BAM_LIST} -o ${OUTPUT_DIR} -a ${GENCODE_GTF} -l ${LOG_FILE}
Augmenting splice graphs.
=========================
Traceback (most recent call last):
  File "/home/fletcher/spladder/spladder/python/spladder.py", line 322, in <module>
    spladder()
  File "/home/fletcher/spladder/spladder/python/spladder.py", line 223, in spladder
    spladder_core(CFG)
  File "/home/fletcher/spladder/spladder/python/modules/core/spladdercore.py", line 21, in spladder_core
    genes = gen_graphs(genes, CFG)
  File "/home/fletcher/spladder/spladder/python/modules/core/gen_graphs.py", line 83, in gen_graphs
    introns = get_intron_list(genes, CFG)
  File "/home/fletcher/spladder/spladder/python/modules/reads.py", line 431, in get_intron_list
    [intron_list_tmp] = add_reads_from_bam(gg, CFG['bam_fnames'], ['intron_list'], CFG['read_filter'], CFG['var_aware'], CFG['primary_only'], CFG['ignore_mismatch_tag'])
  File "/home/fletcher/spladder/spladder/python/modules/reads.py", line 155, in add_reads_from_bam
    (introns, spliced_coverage) = get_all_data(blocks[b], filenames, mapped=False, filter=filter, var_aware=var_aware, primary_only=primary_only, no_mm=no_mm)
  File "/home/fletcher/spladder/spladder/python/modules/reads.py", line 314, in get_all_data
    (coverage_tmp, introns_tmp) = get_reads(fname, contig_name, block.start, block.stop, strand, filter, mapped, spliced, var_aware, collapse, primary_only, no_mm)
  File "/home/fletcher/spladder/spladder/python/modules/reads.py", line 43, in get_reads
    if filter_read(read, filter, spliced, mapped, strand, primary_only, var_aware, no_mm):
  File "/home/fletcher/spladder/spladder/python/modules/reads.py", line 472, in filter_read
    return filter['mismatch'] < tags['NM']
KeyError: 'NM'

A bit of googling suggests that this is some sort of dictionary error, but I don't know enough about the code to figure out why. Is there a solution?

About AssertionError

Hi,

I am running spladder. Below are the error messages...


Make intron_retention events unique by strain
. . . . . . . . . . 10000
. . . . . . . . . . 20000
. . . events dropped: 17

Make intron_retention events unique by event
. . . . . . . . . . 10000
. . . . . . . . . . 20000
. . . events dropped: 18
saving intron retentions to testrun/merge_graphs_intron_retention_C3.pickle

Make exon_skip events unique by strain
. . . . . . . . . . 10000
. . . . . . . . . . 20000
. . . . events dropped: 33

Make exon_skip events unique by event
. . . . . . . . . . 10000
. . . . . . . . . . 20000
. . . . events dropped: 1100
saving exon skips to testrun/merge_graphs_exon_skip_C3.pickle

Make mult_exon_skip events unique by strain
. . . . . . events dropped: 6

Make mult_exon_skip events unique by event
. . . . . . events dropped: 613
saving multiple exon skips to testrun/merge_graphs_mult_exon_skip_C3.pickle

Make alt_5prime events unique by strain
. . . . . . . . . . 10000
.
Traceback (most recent call last):
  File "spladder.py", line 215, in <module>
    spladder()
  File "spladder.py", line 208, in spladder
    collect_events(CFG)
  File "/Users/admin/Documents/tools/spladder-master/python/modules/alt_splice/collect.py", line 414, in collect_events
    events_all = post_process_event_struct(alt_end_5prime_pos_all, CFG)
  File "/Users/admin/Documents/tools/spladder-master/python/modules/alt_splice/events.py", line 59, in post_process_event_struct
    events = make_unique_by_strain(events)
  File "/Users/admin/Documents/tools/spladder-master/python/modules/alt_splice/events.py", line 92, in make_unique_by_strain
    assert(event_list[i - 1].chr == event_list[i].chr)
AssertionError```

spladder_test.py error unless sample order is the same as for spladder.py

To illustrate, below spladder.py is called with the sample order Control1, Control2, HIV1, HIV2, HIV3:

SPLADDER_HOME=/home/rschulz/bin/spladder-1.0.0/python/
CHR=Y

mkdir -p ./output_${CHR}

python $SPLADDER_HOME/spladder.py \
-a ./input/gencode.v24.annotation.$CHR.gff3 \
-b ./input/bam/Control1.$CHR.bam,./input/bam/Control2.$CHR.bam,./input/bam/HIV1.$CHR.bam,./input/bam/HIV2.$CHR.bam,./input/bam/HIV3.$CHR.bam \
-n 101 \
-o ./output_${CHR} \
-v y \
-c 2 \
-M merge_graphs \
-t exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skip \
> ./output_${CHR}/spladder.log 2>&1

If spladder_test.py is called to compare HIV1, HIV2, HIV3 with Control1, Control2, the result is an index error:

SPLADDER_HOME=/home/rschulz/bin/spladder/python/
CHR=Y

python $SPLADDER_HOME/spladder_test.py \
-a HIV1.$CHR,HIV2.$CHR,HIV3.$CHR \
-b Control1.$CHR,Control2.$CHR \
-n 101 \
-o ./output_${CHR} \
-v y \
-c 2 \
-t exon_skip \
--labelA=HIV \
--labelB=Ctrl \
> ./output_${CHR}/spladder_test.log 2>&1

Loading expression counts from ./output_Y/spladder/genes_graph_conf2.merge_graphs.gene_exp.hdf5
Estimating size factors
Testing exon_skip events
Collecting intron confirmation values
Traceback (most recent call last):
  File "/home/rschulz/bin/spladder/python//spladder_test.py", line 873, in <module>
    main()
  File "/home/rschulz/bin/spladder/python//spladder_test.py", line 787, in main
    (cov, gene_idx, event_idx, event_ids, event_strains) = quantify.quantify_from_counted_events(CFG['fname_events'], sp.r_[idx1, idx2], event_type, CFG)
  File "/home/rschulz/bin/spladder/python/modules/alt_splice/quantify.py", line 508, in quantify_from_counted_events
    cov[0] += IN['event_counts'][strain_idx, f[0], :][:, event_idx].T
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2582)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2541)
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 462, in __getitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/selections.py", line 88, in select
    sel[args]
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/selections.py", line 357, in __getitem__
    raise TypeError("Indexing elements must be in increasing order")
TypeError: Indexing elements must be in increasing order

However, comparing Control1, Control2 with HIV1, HIV2, HIV3 (see below) works fine, presumably because the sample order corresponds to the order that was used when spladder.py was called (see above):

SPLADDER_HOME=/home/rschulz/bin/spladder/python/
CHR=Y

python $SPLADDER_HOME/spladder_test.py \
-b HIV1.$CHR,HIV2.$CHR,HIV3.$CHR \
-a Control1.$CHR,Control2.$CHR \
-n 101 \
-o ./output_${CHR} \
-v y \
-c 2 \
-t exon_skip \
--labelB=HIV \
--labelA=Ctrl \
> ./output_${CHR}/spladder_test.log 2>&1

Loading expression counts from ./output_Y/spladder/genes_graph_conf2.merge_graphs.gene_exp.hdf5
Estimating size factors
Testing exon_skip events
Collecting intron confirmation values
Constructing event IDs
Estimating size factors
Exclude 19 of 116 exon_skip events (16.38 percent) from testing due to low coverage
Consider 182 unique event splice forms for testing
Estimating raw dispersions
...
Writing test results to ./output_Y/testing_Ctrl_vs_HIV/test_results_C2_exon_skip.tsv

ValueError: Zero sided dimension

Hi,
In my group we are running Spladder and we ended up in some problems. We made a sample of 1 gene from the GFF3 reference file, and tried to run it. It runs fine until it reaches the point where:

counting 1 genes on contig 0010.scaffold00043
.Traceback (most recent call last):
File "/software/spladder/spladder/python/spladder.py", line 322, in <module>
spladder()
File "/software/spladder/spladder/python/spladder.py", line 302, in spladder
count_graph_coverage_wrapper(fn_in_count, fn_out_count, CFG)
File "/software/spladder/spladder/python/modules/count.py", line 156, in count_graph_coverage_wrapper
h5fid.create_dataset(name=key, data=counts[key])
File "/software/python/Python2.7/lib/python2.7/site-packages/h5py-2.5.0-py2.7-linux-x86_64.egg/h5py/_hl/group.py", line 103, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "/software/python/Python2.7/lib/python2.7/site-packages/h5py-2.5.0-py2.7-linux-x86_64.egg/h5py/_hl/dataset.py", line 120, in make_new_dset
sid = h5s.create_simple(shape, maxshape)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/software/python/Python2.7/lib/python2.7/site-packages/h5py-2.5.0/h5py/_objects.c:2574)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/software/python/Python2.7/lib/python2.7/site-packages/h5py-2.5.0/h5py/_objects.c:2533)
File "h5py/h5s.pyx", line 99, in h5py.h5s.create_simple (/software/python/Python2.7/lib/python2.7/site-packages/h5py-2.5.0/h5py/h5s.c:1432)
ValueError: Zero sized dimension for non-unlimited dimension (Zero sized dimension for non-unlimited dimension)

I attached the subset reference that arises the error (the BAM file is ok).

subset.gff3.zip

spladder_test.py -t intron_retention fails with index out of range error

Hi Andre,

spladder_test fails for the intron_retention category of alt splicing events with an index out of range error (see below). All other categories work fine, except mutex_exon. However, mutex_exon fails with a different error, shown at the bottom. Have attached the contents of ./output_9_fwd (spladder output directory). As always, any help getting over this bump in the road is much appreciated.

Best, Reiner
output_9_fwd.zip

intron_retention error:

$ python ~/bin/spladder-master/python/spladder_test.py -o ./output_9_fwd -a A.fwd.9,B.fwd.9 -b E.fwd.9,F.fwd.9 -n 100 -c 2 -v y -t intron_retention
Loading expression counts from ./output_9_fwd/spladder/genes_graph_conf2.merge_graphs.gene_exp.hdf5
Estimating size factors
Testing intron_retention events
Collecting intron confirmation values
Constructing event IDs
Traceback (most recent call last):
  File "/home/rschulz/bin/spladder-master/python/spladder_test.py", line 873, in <module>
    main()
  File "/home/rschulz/bin/spladder-master/python/spladder_test.py", line 787, in main
    (cov, gene_idx, event_idx, event_ids, event_strains) = quantify.quantify_from_counted_events(CFG['fname_events'], sp.r_[idx1, idx2], event_type, CFG)
  File "/home/rschulz/bin/spladder-master/python/modules/alt_splice/quantify.py", line 516, in quantify_from_counted_events
    event_ids = get_event_ids(IN, event_type, event_idx, CFG)
  File "/home/rschulz/bin/spladder-master/python/modules/alt_splice/quantify.py", line 677, in get_event_ids
    event_ids0 = sp.array(['%s:%s' % (gene_chr[i], '-'.join(IN['event_pos'][i, [1, 2]].astype('str'))) for i in event_idx], dtype='str')
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2582)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2541)
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 462, in __getitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/selections.py", line 88, in select
    sel[args]
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/selections.py", line 382, in __getitem__
    start, count, step, scalar = _handle_simple(self.shape, vector)
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/selections.py", line 447, in _handle_simple
    x,y,z = _translate_int(int(arg), length)
  File "/home/rschulz/anaconda/lib/python2.7/site-packages/h5py/_hl/selections.py", line 467, in _translate_int
    raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
ValueError: Index (2) out of range (0-1)

mutex_exon error:

$ python ~/bin/spladder-master/python/spladder_test.py -o ./output_9_fwd -a A.fwd.9,B.fwd.9 -b E.fwd.9,F.fwd.9 -n 100 -c 2 -v y -t mutex_exons
Loading expression counts from ./output_9_fwd/spladder/genes_graph_conf2.merge_graphs.gene_exp.hdf5
Estimating size factors
Testing mutex_exons events
Collecting intron confirmation values
Constructing event IDs
Traceback (most recent call last):
  File "/home/rschulz/bin/spladder-master/python/spladder_test.py", line 873, in <module>
    main()
  File "/home/rschulz/bin/spladder-master/python/spladder_test.py", line 787, in main
    (cov, gene_idx, event_idx, event_ids, event_strains) = quantify.quantify_from_counted_events(CFG['fname_events'], sp.r_[idx1, idx2], event_type, CFG)
  File "/home/rschulz/bin/spladder-master/python/modules/alt_splice/quantify.py", line 516, in quantify_from_counted_events
    event_ids = get_event_ids(IN, event_type, event_idx, CFG)
  File "/home/rschulz/bin/spladder-master/python/modules/alt_splice/quantify.py", line 683, in get_event_ids
    event_ids0 = sp.array(['%s:%s' % (gene_chr[i], '-'.join(IN['event_pos'][i, :4].astype('str'))) for i in event_idx], dtype='str')
TypeError: sequence item 0: expected string, numpy.ndarray found

IndexError in spladder_test

Error message

I get an IndexError in spladder_test.py (using the development branch because of #35):

Traceback (most recent call last):
  File "spladder/python/spladder_test.py", line 985, in <module>
    main()
  File "spladder/python/spladder_test.py", line 900, in main
    (pvals, cov_used, disp_raw_used, disp_adj_used) = run_testing(cov, dmatrix0, dmatrix1, sf, CFG, event_type)
  File "spladder/python/spladder_test.py", line 636, in run_testing
    (disp_fitted, Lambda, disp_idx) = fit_dispersion(cov, disp_raw, disp_raw_conv, sf, CFG, dmatrix1, event_type)
  File "spladder/python/spladder_test.py", line 347, in fit_dispersion
    lowerBound = sp.percentile(sp.unique(disp_raw[index]), 1)
  File "<MYHOME>/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3699, in percentile
    interpolation=interpolation)
  File "<MYHOME>/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3425, in _ureduce
    r = func(a, **kwargs)
  File "<MYHOME>/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3816, in _percentile
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File "<MYHOME>/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 124, in take
    return take(indices, axis, out, mode)
IndexError: cannot do a non-empty take from an empty axes.

How I ran spladder

This is the command I used for running spladder :

    python $spladderScript \
        -b $bamlinks \
        -a $GTF \
        -o $OUTDIR \
        -v y \
        -c 3 \
        -M merge_graphs \
        -T y \
        -V n \
        -n 125 \
        -P y \
        -p n \
        -t exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skip

where $GTF is gencode.v25.primary_assembly.annotation.gtf filtered for protein coding genes, and $bamlinks is a comma separated list of 25 BAM files. Some warnings were produced:

spladder/python/modules/editgraph.py:923: RuntimeWarning: divide by zero encountered in double_scalars
  (sp.median(exon_cov[-min_len_aft:]) / sp.median(aft_segment_cov[:min_len_aft])) - 1 >= CFG['cassette_exon']['min_cassette_rel_diff'] and \
spladder/python/modules/editgraph.py:924: RuntimeWarning: divide by zero encountered in double_scalars
  (sp.median(exon_cov[:min_len_pre]) / sp.median(pre_segment_cov[-min_len_pre:])) - 1 >= CFG['cassette_exon']['min_cassette_rel_diff']:

How I ran spladder_test

This is the command I used for running spladder_test (names of samples and labels were anonymized here):

    python $SPLADDER_TEST \
    --outdir=$OUTDIR \
    --conditionA=sample04,sample12,sample15,sample1a,sample25,sample9a \
    --conditionB=sample22,sample23,sample24 \
    --labelA=group1 \
    --labelB=group2 \
    --readlen=125 \
    --confidence=3 \
    --event_types=exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons \
    --merge_strat=merge_graphs \
    --validate=n \
    --matlab=n \
    --subset_samples=n \
    --correction=BH \
    --max_zero_frac=0.5 \
    --min_count=10 \
    --verbose=y \
    --debug=n \
    --diagnose_plots=n \
    --timestamp=n

Do you have any ideas why spladder_test fails?

spladder_test TypeError

Hello,
I recently used spladder for my data and followed the python documentation to run the augmentation and event detection phases. These seemed to run successfully and I generated the pickle and hd5 files.

When I ran the tutorial (test run) with the data provided, I generated the following files in my results directory:

spladder/genes_graph_conf3.NMD_DBL2.tiny.pickle
spladder/genes_graph_conf3.NMD_WT1.tiny.pickle
spladder/genes_graph_conf3.NMD_WT2.tiny.pickle
spladder/genes_graph_conf3.NMD_DBL1.tiny.pickle
spladder/genes_graph_conf3.merge_graphs.pickle	       
spladder/genes_graph_conf3.merge_graphs.count.hdf5

This seems to be one difference: I get a count.hdf5 instead of count.pickle as written in the documentation. Is this okay?

the subsequent step of "event detection" runs fine. I therefore assume this was not an issue and proceeded with my data for augumentation and event_detection. They also proceeded without any errors.

I am not trying to run the spladder_test on a subset of my data and get the following error:

python spladder_test.py -o results  -a DIPG100N.sorted.mdup,DIPG14BN.sorted.mdup,DIPG27N.sorted.mdup -b DIPG112WM.sorted.mdup,DIPG109WM.sorted.mdup,DIPG100WM.sorted.mdup 

Traceback (most recent call last):
  File "/hpf/largeprojects/ccmbio/arun/Tools/spladder/python/spladder_test.py", line 873, in <module>
    main()
  File "/hpf/largeprojects/ccmbio/arun/Tools/spladder/python/spladder_test.py", line 787, in main
    (cov, gene_idx, event_idx, event_ids, event_strains) = quantify.quantify_from_counted_events(CFG['fname_events'], sp.r_[idx1, idx2], event_type, CFG)
  File "/hpf/largeprojects/ccmbio/arun/Tools/spladder/python/modules/alt_splice/quantify.py", line 508, in quantify_from_counted_events
    cov[0] += IN['event_counts'][strain_idx, f[0], :][:, event_idx].T
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/hpf/tools/centos6/python/source/packages/h5py-2.5.0/h5py/_objects.c:2472)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/hpf/tools/centos6/python/source/packages/h5py-2.5.0/h5py/_objects.c:2429)
  File "/hpf/tools/centos6/python/2.7.9/lib/python2.7/site-packages/h5py-2.5.0-py2.7-linux-x86_64.egg/h5py/_hl/dataset.py", line 431, in __getitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/hpf/tools/centos6/python/2.7.9/lib/python2.7/site-packages/h5py-2.5.0-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 95, in select
    sel[args]
  File "/hpf/tools/centos6/python/2.7.9/lib/python2.7/site-packages/h5py-2.5.0-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 429, in __getitem__
    raise TypeError("Indexing elements must be in increasing order")
TypeError: Indexing elements must be in increasing order

I did a search on the issues and did not find any previous posts about this. I was wondering if you could point me in the right direction to get this working?

thanks
Arun

[spladder_test.py on Linux x86_64, python 2.7] TypeError: string indices must be integers, not str

Hello Andre,

Thanks for the wonderful program! I'm running an analysis on Arabidopsis thaliana using the TAIR10.gff, with modified chromosome names (i.e. Chr1 -- > 1). I've successfully run spladder.py on my data, but when trying to run spladder_test.py without specifying -t, thusly:

python spladder_test.py -o my_output -a ../../WT2SAligned.sortedByCoord.out.bam,../../WT3SAligned.sortedByCoord.out.bam -b ../../KO1SAligned.sortedByCoord.out.bam,../../KO3SAligned.sortedByCoord.out.bam —labelA=Wildtype —labelB=Knockout -n 125 -c 3 —timestamp=y

The following error crops up after successfully processing the skipped exons class (I have a test_results_C3_exon_skip.tsv file in the testing folder):

Traceback (most recent call last): File "spladder_test.py", line 873, in <module> main() File "spladder_test.py", line 787, in main (cov, gene_idx, event_idx, event_ids, event_strains) = quantify.quantify_from_counted_events(CFG['fname_events'], sp.r_[idx1, idx2], event_type, CFG) File "/home/Users.nfs.haystack01a/drichardson/newscl30a/spladder/python/modules/alt_splice/quantify.py", line 516, in quantify_from_counted_events event_ids = get_event_ids(IN, event_type, event_idx, CFG) File "/home/Users.nfs.haystack01a/drichardson/newscl30a/spladder/python/modules/alt_splice/quantify.py", line 677, in get_event_ids event_ids0 = sp.array(['%s:%s' % (gene_chr[i], '-'.join(IN['event_pos'][i, [1, 2]].astype('str'))) for i in event_idx], dtype='str') File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "/Users/drichardson/anaconda3/envs/bunnies/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 462, in __getitem__ selection = sel.select(self.shape, args, dsid=self.id) File "/Users/drichardson/anaconda3/envs/bunnies/lib/python2.7/site-packages/h5py/_hl/selections.py", line 88, in select sel[args] File "/Users/drichardson/anaconda3/envs/bunnies/lib/python2.7/site-packages/h5py/_hl/selections.py", line 382, in __getitem__ start, count, step, scalar = _handle_simple(self.shape, vector) File "/Users/drichardson/anaconda3/envs/bunnies/lib/python2.7/site-packages/h5py/_hl/selections.py", line 447, in _handle_simple x,y,z = _translate_int(int(arg), length) File "/Users/drichardson/anaconda3/envs/bunnies/lib/python2.7/site-packages/h5py/_hl/selections.py", line 467, in _translate_int raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1)) ValueError: Index (2) out of range (0-1)

Trying to specify the type of event fails for all event types, even exon_skip, which was previously successfully processed!

Traceback (most recent call last): File "spladder_test.py", line 920, in <module> main() File "spladder_test.py", line 796, in main (cov, gene_idx, event_idx, event_ids, event_strains) = quantify.quantify_from_counted_events(CFG['fname_events'], idx1, idx2, event_type, CFG) File "/home/Users.nfs.haystack01a/drichardson/newscl30a/spladder/python/modules/alt_splice/quantify.py", line 391, in quantify_from_counted_events if CFG['is_matlab']: TypeError: string indices must be integers, not str

After browsing through the issues here, I thought to try the development version of spladder. When trying to run the development version of spladder_test.py, I get the following error:

Traceback (most recent call last): File "spladder_test.py", line 920, in <module> main() File "spladder_test.py", line 765, in main gene_counts, gene_strains, gene_ids = get_gene_expression(CFG, fn_out=CFG['fname_exp_hdf5'], strain_subset=condition_strains) File "spladder_test.py", line 171, in get_gene_expression gene_counts[gidx, :] = sp.dot(IN['segments'][seg_idx, :][:, strain_idx].T, seg_lens[seg_idx]) / CFG['read_length'] ValueError: could not broadcast input array from shape (4,1) into shape (4)

So.. at the moment I'm at a little bit of a loss as to how to resolve this. What files would you need from me to help figure out what is going on?

Thanks so much for your help!

Filename validity checking

Thanks for this very useful tool. Your tool in default mode runs through splice graph augmentation first before opening the bam alignment file. If the path to the bam file is wrong the error only occurs after this step, when the user may not be present at the terminal. Furthermore, the final line in the TraceBack AttributeError: 'NoneType' object has no attribute 'shape' isn't very helpful. So perhaps an initial check for the presence of all the files the user has supplied (a la os.path.isfile) and a simple error message before any computation begins would be useful?

parallel jobs with mpi

Hello,

When I use the options -p y --parallel 25, I get the following error:
Could you please solve this issue?

Traceback (most recent call last):
File "spladder-master/python/spladder.py", line 322, in
spladder()
File "spladder-master/python/spladder.py", line 221, in spladder
jobinfo.append(rp.rproc('spladder_core', CFG, 15000, CFG['options_rproc'], 60*60))
File "/n/data2/joslin/icrb/blackwell/mi51/SUPPA/rMATS/rMATS.3.2.4/WTvsraga1/spladder-master/python/modules/rproc.py", line 241, in rproc
os.symlink(sge_tmp_dir, os.path.join(os.environ['HOME'], 'tmp', '.sge'))
NameError: global name 'sge_tmp_dir' is not defined

ValueError: Zero sized dimension

Hi andre,
I am trying to run the example_run.sh script. I made sure that I have the correct versions of h5py, scipy and pysam installed. But when I run:

python spladder.py -b examples/NMD_WT1.tiny.bam,examples/NMD_WT2.tiny.bam,examples/NMD_DBL1.tiny.bam,examples/NMD_DBL2.tiny.bam -o test -a examples/TAIR10_GFF3_genes.tiny.gff -v y -M merge_graphs -t exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skip -c 3 -p n #-e n

I get the following error:

Augmenting splice graphs.

Generating splice graph ...

Total genes: 1
Total genes with alternative isoforms: 1
Total genes alternatively spliced: 1
Total constitutively spliced: 0
...done.

Loading introns from file ...
...done.

Testing for infeasible genes ...
found 0 unfeasible genes
...done.

Inserting cassette exons ...

... inserted 0 casette exons ....
... done.

Inserting intron retentions ...

... inserted 0 new intron retentions ...
...done.

Inserting new intron edges ...
... chr Chr1 - iteration 1/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 2/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 3/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 4/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 5/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... done.

Re-labeleling new alternative genes ...
... done.

Inserted:
alt_53_prime: 0
intron_retention: 0
new_terminal_exon: 0
exon_skip: 2
gene_merge: 0
cassette_exon: 0
intron_in_exon: 0
Saving genes to test/spladder/genes_graph_conf3.NMD_WT1.tiny.pickle
No pruning requested!
Generating all isoforms not requested

Augmenting splice graphs.

Generating splice graph ...

Total genes: 1
Total genes with alternative isoforms: 1
Total genes alternatively spliced: 1
Total constitutively spliced: 0
...done.

Loading introns from file ...
...done.

Testing for infeasible genes ...
found 0 unfeasible genes
...done.

Inserting cassette exons ...

... inserted 0 casette exons ....
... done.

Inserting intron retentions ...

... inserted 0 new intron retentions ...
...done.

Inserting new intron edges ...
... chr Chr1 - iteration 1/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 2/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 3/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 4/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 5/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... done.

Re-labeleling new alternative genes ...
... done.

Inserted:
alt_53_prime: 0
intron_retention: 0
new_terminal_exon: 0
exon_skip: 2
gene_merge: 0
cassette_exon: 0
intron_in_exon: 0
Saving genes to test/spladder/genes_graph_conf3.NMD_WT2.tiny.pickle
No pruning requested!
Generating all isoforms not requested

Augmenting splice graphs.

Generating splice graph ...

Total genes: 1
Total genes with alternative isoforms: 1
Total genes alternatively spliced: 1
Total constitutively spliced: 0
...done.

Loading introns from file ...
...done.

Testing for infeasible genes ...
found 0 unfeasible genes
...done.

Inserting cassette exons ...

... inserted 0 casette exons ....
... done.

Inserting intron retentions ...

... inserted 0 new intron retentions ...
...done.

Inserting new intron edges ...
... chr Chr1 - iteration 1/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 2/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 3/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 4/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 5/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... done.

Re-labeleling new alternative genes ...
... done.

Inserted:
alt_53_prime: 0
intron_retention: 0
new_terminal_exon: 1
exon_skip: 2
gene_merge: 0
cassette_exon: 0
intron_in_exon: 0
Saving genes to test/spladder/genes_graph_conf3.NMD_DBL1.tiny.pickle
No pruning requested!
Generating all isoforms not requested

Augmenting splice graphs.

Generating splice graph ...

Total genes: 1
Total genes with alternative isoforms: 1
Total genes alternatively spliced: 1
Total constitutively spliced: 0
...done.

Loading introns from file ...
...done.

Testing for infeasible genes ...
found 0 unfeasible genes
...done.

Inserting cassette exons ...
/home/simonlm/programs/spladder/python/modules/editgraph.py:843: RuntimeWarning: divide by zero encountered in double_scalars
(sp.median(exon_cov[:min_len_pre]) / sp.median(pre_segment_cov[-min_len_pre:])) - 1 >= CFG['cassette_exon']['min_cassette_rel_diff']:

... inserted 1 casette exons ....
... done.

Inserting intron retentions ...

... inserted 0 new intron retentions ...
...done.

Inserting new intron edges ...
... chr Chr1 - iteration 1/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 2/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 3/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 4/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... chr Chr1 - iteration 5/5

... removing duplicate exons ...
0

... removed 0 duplicate exons ...
... done.

Re-labeleling new alternative genes ...
... done.

Inserted:
alt_53_prime: 0
intron_retention: 0
new_terminal_exon: 1
exon_skip: 3
gene_merge: 0
cassette_exon: 1
intron_in_exon: 0
Saving genes to test/spladder/genes_graph_conf3.NMD_DBL2.tiny.pickle
No pruning requested!
Generating all isoforms not requested
Loading test/spladder/genes_graph_conf3.NMD_WT1.tiny.pickle ...
... done (1 / 4)
Loading test/spladder/genes_graph_conf3.NMD_WT2.tiny.pickle ...
... done (2 / 4)
Processing ...
. 0/1
... done

Loading test/spladder/genes_graph_conf3.NMD_DBL1.tiny.pickle ...
... done (3 / 4)
Processing ...
. 0/1
... done

Loading test/spladder/genes_graph_conf3.NMD_DBL2.tiny.pickle ...
... done (4 / 4)
Processing ...
. 0/1
... done

Store genes at: test/spladder/genes_graph_conf3.merge_graphs.pickle

1/4
.
2/4
.
3/4
.
4/4
.
confidence 3 / sample 0 / replicate 0
Loading gene structure from test/spladder/genes_graph_conf3.merge_graphs.pickle ...
... done.
.
Number of intron retentions: 0
.
Number of single exon skips: 2
.
Number of alternative 5 prime sites: 2
Number of alternative 3 prime sites: 2
. Number of multiple exon skips: 0
.

Number of XOR exons: 0

saving intron retentions to test/merge_graphs_intron_retention_C3.pickle

Make exon_skip events unique by strain
events dropped: 0

Make exon_skip events unique by event
events dropped: 0
saving exon skips to test/merge_graphs_exon_skip_C3.pickle
saving multiple exon skips to test/merge_graphs_mult_exon_skip_C3.pickle

Make alt_5prime events unique by strain
events dropped: 0

Make alt_5prime events unique by event
events dropped: 0
Corrected 1 events
Removed 0 events
saving alt 5 prime events to test/merge_graphs_alt_5prime_C3.pickle

Make alt_3prime events unique by strain
events dropped: 0

Make alt_3prime events unique by event
events dropped: 0
Corrected 1 events
Removed 0 events
saving alt 3 prime events to test/merge_graphs_alt_3prime_C3.pickle
saving mutually exclusive exons to test/merge_graphs_mutex_exons_C3.pickle
confidence 3 / replicate 0
0/4
1/4
2/4
3/4
0/4
1/4
2/4
3/4

Reporting complete exon_skip events:

Reporting confirmed exon_skip events:
writing exon_skip events in gff3 format to test/merge_graphs_exon_skip_C3.confirmed.gff3
writing exon_skip events in flat txt format to test/merge_graphs_exon_skip_C3.confirmed.txt
confidence 3 / replicate 0

No intron_retention event could be found. - Nothing to report
confidence 3 / replicate 0
0/4
1/4
2/4
3/4
0/4
1/4
2/4
3/4
Traceback (most recent call last):
File "spladder.py", line 215, in
spladder()
File "spladder.py", line 211, in spladder
analyze_events(CFG, CFG['event_types'][idx])
File "/home/simonlm/programs/spladder/python/modules/alt_splice/analyze.py", line 193, in analyze_events
OUT.create_dataset(name='conf_idx', data=confirmed_idx)
File "/usr/local/lib/python2.7/site-packages/h5py/_hl/group.py", line 99, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "/usr/local/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 114, in make_new_dset
sid = h5s.create_simple(shape, maxshape)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip_build_root/h5py/h5py/_objects.c:2400)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip_build_root/h5py/h5py/_objects.c:2357)
File "h5py/h5s.pyx", line 99, in h5py.h5s.create_simple (/tmp/pip_build_root/h5py/h5py/h5s.c:1346)
ValueError: Zero sized dimension for non-unlimited dimension (Zero sized dimension for non-unlimited dimension)

It seems to run fine for a large part but then it gives me this "Zero sized dimension" error.
Can you help?

error for processing CCLE BAM files

Hi,

I have conducted several test runs for RNA-seq BAM files generated by our own lab, and did not find any errors so far. However, when I used the same command (python spladder.py -a gff3 -b CCLE.1.bam -o (my directory in the cluster) )
to process RNA-seq BAM files from CCLE (Cancer Cell Line Encyclopedia) database, I got the following error and the spladder failed:
Traceback (most recent call last):
File "/python/spladder.py", line 318, in
spladder()
File "/python/spladder.py", line 223, in spladder
spladder_core(CFG)
File "/python/modules/core/spladdercore.py", line 21, in spladder_core
genes = gen_graphs(genes, CFG)
File "/python/modules/core/gen_graphs.py", line 83, in gen_graphs
introns = get_intron_list(genes, CFG)
File "/python/modules/reads.py", line 374, in get_intron_list
(regions, CFG) = init_regions(CFG['bam_fnames'], CFG)
File "/python/modules/init.py", line 294, in init_regions
header_info = parse_header(IN.text)['SQ']
File "/python/modules/init.py", line 270, in parse_header
td = dict([x.split(':', 1) for x in sl[1:]])
ValueError: dictionary update sequence element #0 has length 1; 2 is required

I originally thought is due to gff3 file so I changed several version of gff3 but always got the same error. You suggestions will be very much appreciated. Thanks!

error during loading introns

Hi,

SplAdder produces the following error:

Augmenting splice graphs.

Generating splice graph ...
...done.

Loading introns from file ...

Traceback (most recent call last):
File ".../spladder-master/python/spladder.py", line 318, in
spladder()
File ".../spladder-master/python/spladder.py", line 223, in spladder
spladder_core(CFG)
File ".../spladder-master/python/modules/core/spladdercore.py", line 21, in spladder_core
genes = gen_graphs(genes, CFG)
File ".../spladder-master/python/modules/core/gen_graphs.py", line 83, in gen_graphs
introns = get_intron_list(genes, CFG)
File ".../spladder-master/python/modules/reads.py", line 431, in get_intron_list
[intron_list_tmp] = add_reads_from_bam(gg, CFG['bam_fnames'], ['intron_list'], CFG['read_filter'], CFG['var_aware'], CFG['primary_only'], CFG['ignore_mismatch_tag'])
File ".../spladder-master/python/modules/reads.py", line 155, in add_reads_from_bam
(introns, spliced_coverage) = get_all_data(blocks[b], filenames, mapped=False, filter=filter, var_aware=var_aware, primary_only=primary_only, no_mm=no_mm)
File ".../spladder-master/python/modules/reads.py", line 314, in get_all_data
(coverage_tmp, introns_tmp) = get_reads(fname, contig_name, block.start, block.stop, strand, filter, mapped, spliced, var_aware, collapse, primary_only, no_mm)
File ".../spladder-master/python/modules/reads.py", line 43, in get_reads
if filter_read(read, filter, spliced, mapped, strand, primary_only, var_aware, no_mm):
File ".../spladder-master/python/modules/reads.py", line 454, in filter_read
is_spliced = ('N' in read.cigarstring)
AttributeError: 'csamtools.AlignedRead' object has no attribute 'cigarstring'

My command is:
python .../spladder-master/python/spladder.py -a .../0-annotation-file/gencode.v19.annotation.gff3 -b .../sorted.bam -o ../output

Would you mind helping me to fix this problem? Thanks!

Lin

Regarding replicate option

Hi Andre,
While using -R R1,R1,R2,R2 for a four sample, two replicates each treatment scenario, the below error showed up.

....../spladder/python/modules/settings.py", line 227, in parse_args
CFG['replicate_idxs'] = [int(x) for x in options.replicates.split(',')]
ValueError: invalid literal for int() with base 10: 'R1'

From the error trace it looked like integers are expected instead of Rs there.I tried 1,1,2,2 and it seems to work. Present spladder help says

-R R1,R2,..., --replicates=R1,R2,...
replicate structure of files (same number as alignment
files) [all R1 - no replicated]

which I suppose should be -R INT 1,2 ... [all 1 - no replicated] or so if the present split implementation itself is the expected one for -R usage.

Please do correct me in case I have understood the replicate option usage all wrong or so.

Regards,
Jeffin

h5py.file or os.path.join is forcing lowercase and causing error in spladder_viz.py

Hi,

I wanted to visualize some of the results I got from running SplAdder and rDiff, and I got the following error:

IOError: Unable to open file (Unable to open file: name = 'spladder_results3/merge_graphs_alt_5primer_c3.counts.hdf5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)
spladder_results3/
Traceback (most recent call last):
  File "/home/warren/Downloads/spladder_devel/spladder/python/spladder_viz.py", line 244, in <module>
    spladder_viz()
  File "/home/warren/Downloads/spladder_devel/spladder/python/spladder_viz.py", line 176, in spladder_viz
    event_info = get_conf_events(options, gid)
  File "/home/warren/Downloads/spladder_devel/spladder/python/modules/identity.py", line 60, in get_conf_events
    IN = h5py.File(os.path.join(options.outdir, 'merge_graphs_%s_C%i.counts.hdf5' % (event_type, options.confidence)), 'r')
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 260, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 89, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-WAOaVj/h5py/h5py/_objects.c:2574)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-WAOaVj/h5py/h5py/_objects.c:2533)
  File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/tmp/pip-build-WAOaVj/h5py/h5py/h5f.c:1811)

Taking a look at the first line, it appears that somewhere along the way, the file name "spladder_results3/merge_graphs_alt_5primer_C3.counts.hdf5" got changed to "spladder_results3/merge_graphs_alt_5primer_c3.counts.hdf5", with the C forced to lower case. I'm not sure how to solve this other than renaming the files to lower-case, but wanted to bring this to your attention.

Annotation file format

Hi,

I executed spladder as follows:
spladder.sh -a gff3 -o outdir -b bam1,bam2

And, got the following error:

Augmenting splice graphs.

Cell contents reference from a non-cell array object.

Error in half_open_to_closed (line 21)
genes(j).splicegraph{1}(1, :) = genes(j).splicegraph{1}(1, :) + 1;

Error in spladder_core (line 23)
genes = half_open_to_closed(genes);

Error in spladder (line 104)
spladder_core(CFG);

rproc finishing

I was wondering if supplying gff3 directly to spladder.sh was the problem, as the program suggests to provide annotation in *.mat format. If so, how to convert gff3 to *.mat (compatible to spladder.sh). Can you please help me out.

errors occurred in gff3 files

Hello, I have tried SplAdder and got error messages:

gff3 file from ensembl (Drosophila_melanogaster.BDGP6.84.chr.gff3)

$ python spladder.py -a ../../Drosophila_melanogaster/ensembl/Drosophila_melanogaster.BDGP6.84.chr.gff3 -b ../../HISAT2_Stringtie_Ballgown/SRR1055723.hisat2.sorted.bam -o ../../spladder -T n
Traceback (most recent call last):
File "spladder.py", line 322, in
spladder()
File "spladder.py", line 144, in spladder
(genes, CFG) = init.init_genes_gff3(CFG['anno_fname'], CFG, CFG['anno_fname'] + '.pickle')
File "/Users/jhhuang/IIS_jhh/TEAS_Dmel/spladder/python/modules/init.py", line 256, in init_genes_gff3
gene_id = trans2gene[trans_id]
KeyError: 'transcript:FBtr0309810'

gff3 file from ensembl (Drosophila_melanogaster.BDGP6.84.gff3)

$ python spladder.py -a ../../Drosophila_melanogaster/ensembl/Drosophila_melanogaster.BDGP6.84.gff3 -b ../../HISAT2_Stringtie_Ballgown/SRR1055723.hisat2.sorted.bam -o ../../spladder -T n
Traceback (most recent call last):
File "spladder.py", line 322, in
spladder()
File "spladder.py", line 144, in spladder
(genes, CFG) = init.init_genes_gff3(CFG['anno_fname'], CFG, CFG['anno_fname'] + '.pickle')
File "/Users/jhhuang/IIS_jhh/TEAS_Dmel/spladder/python/modules/init.py", line 258, in init_genes_gff3
t_idx = genes[gene_id].transcripts.index(trans_id)
KeyError: 'gene:FBgn0085737'

Please advise me to solve the errors. Thank you.

ValueError: no index available for iteration

Test ran using the exact parameters from example_run.sh, minus bam and gtf, and got the following error.

$ python2.7 spladder.py -b test.bam -o test/ -a Homo_sapiens.GRCh37.82.gtf -v y -M merge_graphs -t exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skip -c 3 -p n

Augmenting splice graphs.
=========================
Generating splice graph ...

Total genes:							57905
Total genes with alternative isoforms:				22172
Total genes alternatively spliced:				21857
Total constitutively spliced:					36048
...done.

Loading introns from file ...
Traceback (most recent call last):
  File "spladder.py", line 322, in <module>
    spladder()
  File "spladder.py", line 223, in spladder
    spladder_core(CFG)
  File "/Users/ericlim/Data/scratch/spladder/python/modules/core/spladdercore.py", line 21, in spladder_core
    genes = gen_graphs(genes, CFG)
  File "/Users/ericlim/Data/scratch/spladder/python/modules/core/gen_graphs.py", line 83, in gen_graphs
    introns = get_intron_list(genes, CFG)
  File "/Users/ericlim/Data/scratch/spladder/python/modules/reads.py", line 431, in get_intron_list
    [intron_list_tmp] = add_reads_from_bam(gg, CFG['bam_fnames'], ['intron_list'], CFG['read_filter'], CFG['var_aware'], CFG['primary_only'], CFG['ignore_mismatch_tag'])
  File "/Users/ericlim/Data/scratch/spladder/python/modules/reads.py", line 155, in add_reads_from_bam
    (introns, spliced_coverage) = get_all_data(blocks[b], filenames, mapped=False, filter=filter, var_aware=var_aware, primary_only=primary_only, no_mm=no_mm)
  File "/Users/ericlim/Data/scratch/spladder/python/modules/reads.py", line 314, in get_all_data
    (coverage_tmp, introns_tmp) = get_reads(fname, contig_name, block.start, block.stop, strand, filter, mapped, spliced, var_aware, collapse, primary_only, no_mm)
  File "/Users/ericlim/Data/scratch/spladder/python/modules/reads.py", line 40, in get_reads
    for read in infile.fetch(chr_name, start, stop, until_eof=True):
  File "pysam/calignmentfile.pyx", line 915, in pysam.calignmentfile.AlignmentFile.fetch (pysam/calignmentfile.c:11287)
  File "pysam/calignmentfile.pyx", line 1751, in pysam.calignmentfile.IteratorRowRegion.__init__ (pysam/calignmentfile.c:19179)
ValueError: no index available for iteration

ImportError: No module named diagnose

hi Andre,

just ran into this error, trying for the first time to run the test module (w/ both master and development branch):

rschulz@noris:~/bin/spladder-development/python$ python spladder_test.py 
Traceback (most recent call last):
  File "spladder_test.py", line 22, in <module>
    import modules.viz.diagnose as plot
ImportError: No module named diagnose

any help much appreciated, Reiner

Loading Introns Terminates With Obscure Error

It's unclear why the software terminates:

Loading introns from file ...
Traceback (most recent call last):
    ...             ...
File "/verona/nobackup/biostat/software/Python/lib/python2.7/site-packages/modules/reads.py", line 472, in filter_read
return filter['mismatch'] < tags['NM']
KeyError: 'NM'

I used GENCODE Genes Human version 25 GFF3 file as the annotation.

IndexError: list index out of range

Hi,
I installed the script and the test with the example files completes successfully. But, I get the following error when I run the script using my own (.gff3 and .bam) files:

... init structure
Traceback (most recent call last):
File "spladder.py", line 265, in
spladder()
File "spladder.py", line 143, in spladder
(genes, CFG) = init.init_genes_gff3(CFG['anno_fname'], CFG, CFG['anno_fname'] + '.pickle')
File "/home/ntatto/spladder-master/python/modules/init.py", line 191, in init_genes_gff3
tags = get_tags_gff3(sl[8])
IndexError: list index out of range

I would appreciate it very much if you could help with this issue.

KeyError: 'NM'

Hi there,
I'm fairly new to spladder and was trying to use it on some data of mine but ran into an error. The example data runs fine but when I use spladder on my data I get this:

python /tmp/spladder/python/spladder.py -a gencode.v23.annotation.gff3 -b sample1_Aligned.bam,sample2_Aligned.bam -o test
Augmenting splice graphs.
=========================
Generating splice graph ...
...done.

Loading introns from file ...
Traceback (most recent call last):
  File "/tmp/spladder/python/spladder.py", line 265, in <module>
    spladder()
  File "/tmp/spladder/python/spladder.py", line 184, in spladder
    spladder_core(CFG)
  File "/tmp/spladder/python/modules/core/spladdercore.py", line 21, in spladder_core
    genes = gen_graphs(genes, CFG)
  File "/tmp/spladder/python/modules/core/gen_graphs.py", line 83, in gen_graphs
    introns = get_intron_list(genes, CFG)
  File "/tmp/spladder/python/modules/reads.py", line 371, in get_intron_list
    intron_list_tmp = add_reads_from_bam(gg, CFG['bam_fnames'], ['intron_list'], CFG['read_filter'], CFG['var_aware'], CFG['primary_only'], CFG['ignore_mismatch_tag'])
  File "/tmp/spladder/python/modules/reads.py", line 152, in add_reads_from_bam
    (introns, spliced_coverage) = get_all_data(blocks[b], filenames, mapped=False, filter=filter, var_aware=var_aware, primary_only=primary_only, no_mm=no_mm)
  File "/tmp/spladder/python/modules/reads.py", line 265, in get_all_data
    (coverage_tmp, introns_tmp) = get_reads(fname, contig_name, block.start, block.stop, strand, filter, mapped, spliced, var_aware, collapse, primary_only, no_mm)
  File "/tmp/spladder/python/modules/reads.py", line 42, in get_reads
    if filter_read(read, filter, spliced, mapped, strand, primary_only, var_aware, no_mm):
  File "/tmp/spladder/python/modules/reads.py", line 411, in filter_read
    return filter['mismatch'] < tags['NM']
KeyError: 'NM'

Any idea?
Thanks

On visualization

Dear Andre,
While visualizing the pdf always shows TAIR 10. I suppose the issue is the below line axes[-1].set_title('Annotated Transcripts (TAIR 10)'). Instead, may be, giving an option to give the Name of the Gene as a user input would be apt.
Also,
Could you please consider providing an option to input actual Sample Names shown in pdf. When the number of samples are more,seeing actual treatment names above the corresponding plot instead of Sample1,2,3 would really be of much help.
Regards,
Jf

ploting error : ValueError: too many values to unpack

Hi Andre, I have tried to do the visualisation for my results but I got the error as below. Do you have any idea what the problem was?

$ python ../GitHub/spladder/python/spladder_viz.py -o spladder_AS/spladder_srr23_test/ -b HISAT2_Stringtie_Ballgown/sorted.bam/SRR1055723.hisat2.sorted.bam -g gene:FBgn0032938 -c 3 -T -e exon_skip_186
Traceback (most recent call last):
File "../GitHub/spladder/python/spladder_viz.py", line 251, in
spladder_viz()
File "../GitHub/spladder/python/spladder_viz.py", line 179, in spladder_viz
plot_event(options, event_info, axes[-1], xlim)
File "../GitHub/spladder/python/spladder_viz.py", line 245, in plot_event
event_list = [_ for event in load_events(options, event_info) for _ in [event.exons1, event.exons2]]
File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/modules/identity.py", line 41, in load_events
(events, ) = cPickle.load(open(os.path.join(options.outdir, 'merge_graphs%s_C%s.pickle' % (event_type, options.confidence)),'r'))
ValueError: too many values to unpack

my target gene output

2L + exon_skip_186 gene:FBgn0032938 21275724 21275806 21277278 21278429 21281132 21281337 20.0 8.8 17.4 7 16 14 0.45

parallelization of SplAdder

Hi,
I have about 200 BAM files, which would require huge RAM and long time if supplied together to SplAdder for splicing analysis. I was wondering if SplAdder run with individual BAM could be combined for this purpose. Do you have any suggestion about handling large number of dataset. Thanks.

ValueError: setting array element with a sequence

Make intron_retention events unique by strain
. . . events dropped: 0
Traceback (most recent call last):
  File "/u/Software/spladder/python/spladder.py", line 322, in <module>
    spladder()
  File "/u/Software/spladder/python/spladder.py", line 311, in spladder
    collect_events(CFG)
  File "/u/Software/spladder/python/modules/alt_splice/collect.py", line 371, in collect_events
    events_all = post_process_event_struct(intron_reten_pos_all, CFG)
  File "/u/Software/spladder/python/modules/alt_splice/events.py", line 62, in post_process_event_struct
    events = sort_events_by_event(events, CFG)
  File "/u/Software/spladder/python/modules/alt_splice/events.py", line 27, in sort_events_by_event
    coord_list = sp.array([x.get_inner_coords() for x in event_list], dtype='double')
ValueError: setting an array element with a sequence.

One of my spladder runs failed recently with the (above) error. The stack trace pointed to the definition of coord_list in the following code:

#/modules/classes/event.py
def sort_events_by_event(event_list, CFG):
    # event_list = sort_events_by_event(event_list, CFG),

    coord_list = sp.array([x.get_inner_coords() for x in event_list], dtype='double')
    chr_list = sp.array([CFG['chrm_lookup'][x.chr] for x in event_list], dtype='double')
    strand_list = sp.array([x.strand == '-' for x in event_list], dtype='double')
    sort_list = sp.c_[chr_list, strand_list, coord_list]
    tmp, idx = sort_rows(sort_list, index=True)

    return event_list[idx]

It appears the event.get_inner_coords() method can return arrays of different dimension depending on splice event type; and the dimensions all need to match in order to set the dtype of the array to double.

Setting the type to 'O' fixes this error, but breaks the later sort:

Traceback (most recent call last):
  File "/u/Software/spladder/python/spladder.py", line 322, in <module>
    spladder()
  File "/u/Software/spladder/python/spladder.py", line 311, in spladder
    collect_events(CFG)
  File "/u/Software/spladder/python/modules/alt_splice/collect.py", line 371, in collect_events
    events_all = post_process_event_struct(intron_reten_pos_all, CFG)
  File "/u/Software/spladder/python/modules/alt_splice/events.py", line 62, in post_process_event_struct
    events = sort_events_by_event(events, CFG)
  File "/u/Software/spladder/python/modules/alt_splice/events.py", line 31, in sort_events_by_event
    tmp, idx = sort_rows(sort_list, index=True)
  File "/u/Software/spladder/python/modules/utils.py", line 106, in sort_rows
    s_idx = sp.lexsort([array[:, -i] for i in range(1, array.shape[1] + 1)])
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Current setup:


$ python --version
Python 2.7.3

$ pip freeze
cycler==0.10.0
h5py==2.6.0
matplotlib==1.5.3
numpy==1.11.2
pyparsing==2.1.10
pysam==0.9.1.4
python-dateutil==2.6.0
pytz==2016.10
scipy==0.18.1
six==1.10.0

Spladder Plot Size

There is no parameter to specify the height and width of the plot created.

Manual

Hi Andre,
I was wondering if there is any manual or example about how to use spladder. Your response is highly appreciated.

Error if Parallel Processing Options Used

I used -p y --parallel 8 in my command and saw:

File "/verona/nobackup/biostat/software/Python/bin/spladder.py", line 309, in <module>
spladder()
File "/verona/nobackup/biostat/software/Python/bin/spladder.py", line 223, in spladder
jobinfo.append(rp.rproc('spladder_core', CFG, 15000, CFG['options_rproc'], 60*60))
File "/verona/nobackup/biostat/software/Python/lib/python2.7/site-packages/modules/rproc.py", line 188, in rproc
os.symlink(sge_tmp_dir, os.path.join(os.environ['HOME'], 'tmp', '.sge'))
NameError: global name 'sge_tmp_dir' is not defined

I am running it on Debian Linux. Can it be made more robust for the next released version?

Skewed p-values spladder_test.py

Hello,

Thanks again for writing such a great tool - I'm getting a lot of good use out of it! I write to ask about p-value distributions generated with spladder_test.py. I always seem to get a massive peak of p-values exactly equal to 1. The events with p=1 all seem to have calculated psi-values (no NaNs or anything like that). My understanding was that even in the case where there are no significant changes the p-values should be uniformly distributed. Here's a sample histogram - these are the frequencies of the unadjusted p-values:

I ran SplAdder (from the dev branch) with the following command:

python2.7 ~/spladder-development/python/spladder.py -b $bams -o $outdir -a $anno -v y -M merge_graphs -t exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skip -c 3 --ignore_mismatches=y

and SplAdder test (also from the dev branch) as follows:

python2.7 ~/spladder-development/python/spladder_test.py -a $cond_A -b $cond_B -o $outdir --labelA=Ctrl_24h --labelB=Emet -v y -M merge_graphs -t $event_types -c 3 -n 100

Possibly this is normal for this sort of test? I am not a statistician, and most of the prior discussion that I can find seems centered around gene expression, not alternative splicing.

Thanks again!
Best,
Andrew

Compilation under Mac OS

I have smalls installed with macports and entered the proper samtools directories for bin, include and lib.
I get the following error when compiling:

~/spladder > make
echo Entering ./mex
Entering ./mex
cd mex ; make octave
/opt/local/bin/mkoctfile --mex get_reads.cpp get_reads_direct.cpp mex_input.cpp read.cpp -I/opt/local/include/bam/ -L/opt/local/lib/ -lbam -lz -lcurses
In file included from get_reads.cpp:10:
In file included from ./get_reads_direct.h:10:
/opt/local/include/bam/sam.h:28:10: fatal error: 'htslib/sam.h' file not found

include "htslib/sam.h"

1 error generated.
make[1]: *** [get_reads.mex] Error 1
make: *** [mexfiles] Error 2

Overlapping Genes

Hi,
first I would like to thank you for developing this tool, it is really promising!
I am trying to use Spladder on a Mus musculus RNA-seq dataset and I have a question regarding the initial processing of the GFF3/GTF files. I understand that overlapping genes and genes sharing exons are removed from the analysis. In the current mouse annotation there are about 47000 genes, however only about 27000 are left after Spladder filtering. If it is possible could you please explain why those genes are removed and, secondly, is there anyway to make Spladder process those genes as well? I am afraid there is a big loss of information right now.
Thank you for your time
Riccardo

Differential expression @ isoform level

Dear Andre,

How are you? Thanks for developing this tool, it looks quite promising.
I am trying to run spladder on some Drosophila melanogaster samples to infer about differential expression at the isoform level - i.e. to infer if there are any splicing event switching between conditions (for example, wt and knockout of some protein).
Up until now, I had some problems:

I can't find any merge_graphs_*_C3.confirmed.txt files to be created which is indicative of not being able to find significant splicing events (?)
I did tried to run this on some human samples that we have in the lab, and for some of the analyzed genes I've obtained the aforementioned files and all of the .pdf files that were being produced had the splicing schemes (I mean the lines connecting exons in a diagram sort of way)
I'm not sure how to implement rdiff to do the differential expression analysis neither how to use the *pickle files

Could you please clarify me on this?

Thank you.

Best,
Hugo

IndexError in example_run.sh

Hello,

I am receiving this error when trying to run the example_run.shfor the Python build. Any ideas??

Parsing annotation from examples/TAIR10_GFF3_genes.tiny.gff ...
... init structure
... done
Storing gene structure in examples/TAIR10_GFF3_genes.tiny.gff.pickle ...
... done
Augmenting splice graphs.
=========================
Generating splice graph ...

Total genes:                            1
Total genes with alternative isoforms:              1
Total genes alternatively spliced:              1
Total constitutively spliced:                   0
...done.

Loading introns from file ...
...done.

Testing for infeasible genes ...
found 0 unfeasible genes
...done.
Inserting cassette exons ...

... inserted 0 casette exons ....
... done.

Inserting intron retentions ...

... inserted 0 new intron retentions ...
...done.

Inserting new intron edges ...
... chr Chr1 - iteration 1/5

Traceback (most recent call last):
  File "spladder.py", line 219, in <module>
    spladder()
  File "spladder.py", line 175, in spladder
    spladder_core(CFG)
  File "/group/bioinformatics/software/SplAdder/041415/python/modules/core/spladdercore.py", line 26, in spladder_core
    genes = gen_graphs(genes, CFG)
  File "/group/bioinformatics/software/SplAdder/041415/python/modules/core/gen_graphs.py", line 156, in gen_graphs
    genes_mod, inserted_ = insert_intron_edges(tmp_genes, CFG)
  File "/group/bioinformatics/software/SplAdder/041415/python/modules/editgraph.py", line 732, in insert_intron_edges
    genes[rm_map == 1] = []
IndexError: too many indices for array

spladder v1.0.0: multi-exon skip detection enters data-dependent infinite loop

Hi Andre,

Have run spladder v1.0.0 on two data sets: mouse strand-specific TruSeq mRNA (4 samples), and human unstranded TrueSeq mRNA (5 samples). Aligned both with Hisat2 v.2.0.3beta. Using the Gencode Comprehensive transcript models for analysis with spladder (mouse: vM9; human: v24). Split the data by chromosome and, for mouse, by strand. Spladder failed to terminate for mouse Chr 19, reverse strand, and human Chr 9. Aborting spladder (mouse) generated the following traceback message:

.................................................. - 50/300, found 169
.................................................. - 100/300, found 404
.................................................. - 150/300, found 586
.................................................. - 200/300, found 751
.................................................. - 250/300, found 851
............................Traceback (most recent call last):
  File "/home/rschulz/bin/spladder-1.0.0/python//spladder.py", line 309, in <module>
    spladder()
  File "/home/rschulz/bin/spladder-1.0.0/python//spladder.py", line 302, in spladder
    collect_events(CFG)
  File "/home/rschulz/bin/spladder-1.0.0/python/modules/alt_splice/collect.py", line 283, in collect_events
    idx_mult_exon_skip, exon_mult_exon_skip = detect_events(genes, 'mult_exon_skip', sp.where([x.is_alt for x in genes])[0], CFG)
  File "/home/rschulz/bin/spladder-1.0.0/python/modules/alt_splice/detect.py", line 385, in detect_events
    result_list = detect_wrapper(genes[idx], event_type, idx, None, log=CFG['verbose'])[0]        
  File "/home/rschulz/bin/spladder-1.0.0/python/modules/alt_splice/detect.py", line 349, in detect_wrapper
    return (detect_multipleskips(genes, gidx, log), idx)
  File "/home/rschulz/bin/spladder-1.0.0/python/modules/alt_splice/detect.py", line 71, in detect_multipleskips
    idx = sp.where((exist_path[i, k] + exist_path[k, :]) < exist_path[i, :])[0]
KeyboardInterrupt

The human spladder process also was at the same stage of processing according the the log (attached), but no traceback message was generated upon aborting the process.

I attach the complete output directories and the input GFF and pickle files. Happy to also provide the input BAMs if necessary.
output_19_rev_aborted.zip
output_9_aborted.zip

Spladder was called like this (analogous for mouse):

#!/bin/bash
SPLADDER_HOME=/home/rschulz/bin/spladder-1.0.0/python/
CHR=$1

mkdir -p ./output_${CHR}

python $SPLADDER_HOME/spladder.py \
-a ./input/gencode.v24.annotation.$CHR.gff3 \
-b ./input/bam/Control1.$CHR.bam,./input/bam/Control2.$CHR.bam,./input/bam/HIV1.$CHR.bam,./input/bam/HIV2.$CHR.bam,./input/bam/HIV3.$CHR.bam \
-n 101 \
-o ./output_${CHR} \
-v y \
-c 2 \
-M merge_graphs \
-t exon_skip,intron_retention,alt_3prime,alt_5prime,mutex_exons,mult_exon_skip \
> ./output_${CHR}/spladder.log 2>&1

Best, r

ratschlab / spladder Goto Github PK

spladder's Introduction

What is SplAdder?

Dependencies and Installation

Documentation

Test with Example Data

Versions for Matlab and Python 2.7

Authors

License and Disclaimer

spladder's People

Contributors

Stargazers

Watchers

Forkers

spladder's Issues

Augmenting splice graphs.

Error message

How I ran spladder

How I ran spladder_test

Augmenting splice graphs.

Augmenting splice graphs.

Augmenting splice graphs.

Augmenting splice graphs.

Augmenting splice graphs.

Augmenting splice graphs.

rproc finishing

gff3 file from ensembl (Drosophila_melanogaster.BDGP6.84.chr.gff3)

gff3 file from ensembl (Drosophila_melanogaster.BDGP6.84.gff3)

my target gene output

include "htslib/sam.h"

Recommend Projects

Recommend Topics

Recommend Org