yeolab / clipper Goto Github PK

View Code? Open in Web Editor NEW

61.0 38.0 41.0 427.03 MB

A tool to identify CLIP-seq peaks

License: Other

Python 87.12% Shell 0.35% C++ 9.36% Cython 3.17%

clipper's Introduction

CLIPper - CLIP peak enrichment recognition

A tool to detect CLIP-seq peaks.

Please visit our wiki page to learn more about usage of clipper: https://github.com/YeoLab/clipper/wiki/CLIPper-Home

Installation

# recreate PYTHON3 conda environment
cd clipper
conda env create -f environment3.yml
conda activate clipper3
pip install .

Alternative installation

Thanks @rekado for making clipper available at GNU Guix guix install clipper
We notice installation might be failing in some platform. Dockerized clipper is in the eclip repository here.

Command Line Usage

# shows all the options
clipper -h 

# minimal command
clipper -b YOUR_BAM_FILE.bam -o YOUR_OUT_FILE.bed -s hg19

Run test

cd clipper/clipper/test
python -m unittest discover

Right now the test coverage is still not 100%. And some subprocess warnings are not handled.

Frequently Asked Questions:

How do use additional reference genome? See here for instructions: Supporting additional species
Where can I use specify the Input bam file? Currently CLIPper does include input normalization. The input normalization pipeline is in another repository: Merge Peaks

Questions and suggestions

please open an issue in the repo. or email Charlene [email protected]

Reference

Yeo GW, Coufal NG, Liang TY, Peng GE, Fu XD, Gage FH. An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol. 2009;16(2):130-137. doi:10.1038/nsmb.1545

Lovci MT, Ghanem D, Marr H, et al. Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges. Nat Struct Mol Biol. 2013;20(12):1434-1442. doi:10.1038/nsmb.2699

clipper's People

Contributors

Stargazers

Watchers

Forkers

gpratt hjanime jill-moore siping enguy cooleel dfporter noahpieta bakerwm leilisysbio shansabri zhouyu skrakau alaindomissy haroon123 resurgo-genetics byee4 xiaolong125f jiananlin sv-n gnilihzeux fciamponi hershbc biofisherman tmatsyst xjyx adamlabadorf peaceben xxhongs finitenet vannostrandlab jkkbuddika yx-xu marc-jones y9c kwells4 kstangline yz-cyclin mulbagalamaq joshuayangca

clipper's Issues

Execution Issues with Python2.7.12

I was able to install clipper and run clipper -h successfully on a linux server. However, I had the problem in issue #49 . I used PaulAdrian89's fix. I then had to change my bam file to have a chr prefix (I aligned to Ensembl hg19 using STAR) so that clipper would recognize my mappings. Now when I attempt to run clipper interactively, the program just outputs the number 15 repeatedly. I am attempting to use my locally installed python2.7.12 (with Anaconda 4.2.0) since when I used python3.4.3 other syntax issues arose with clipper. Any idea why clipper is outputting the number 15 like this?

make it so the peak caller outputs only the output name, not .bam

Change already in dev branch, need to commit to master.

clipper install slow

need to keep test data from installing

clipper usage parameters

Installed from gitclone. was able to call clipper -h.... The following usage parameters come up in the help file (see below), however the Wiki refers to an updated? version. For example, i'm using mm10 and so i need to call a bedfile to replace the -s species. The updated version on the wiki says --bedFile but when i try to run clipper it say that's an error... So please specify the current usage parameters (am i supposed to call clipper or python peakfinder.py? ) Thanks!

Usage:
python peakfinder.py -b -s <hg18/hg19/mm9>
OR
python peakfinder.py -b --customBED --customMRNA
--customPREMRNA

clipper: error: no such option: --bedFile

Make last transcriptome wide filter disabalable

make clip_Analysis installable

Installation Error

git clone worked fine.

python setup.py install gives the following error:

https://gist.github.com/ecwheele/400efcf19128b9933ea9

custom transcript list issues

Hello,
I have generated a custom transcript .gtf file using 'Stringtie' (mm10 mapping). From other threads, you suggest not using the --gtfFile option. Thus, is it best to use --customBED along with --customMRNA and --customPREMRNA, or generate a .gff file with extra attributes? Also, could you assist in how to generate the mRNA and pre-mRNA length attributes or files?
Thank you!
Matt
Gladstone Institutes, UCSF

clip_analysis broken again

(regression due to failature to rebase)

pre-mRNA mode causes infinite runtime

Invocation with --premRNA argument results in very long or infinite runtime.

Clipper fails to build spline

ssabri@n7131 2016-08-12]$ clipper -b BAMs/AGA.trimmed.uniq.sort.bam -s mm9 --premRNA --bonferroni --outfile peaks/AGA.trimmed.uniq.bed
ERROR:root:failed to build spline (m>k) failed for hidden m: fpcurf0:m=2, [0 1], [ 27.   0.], 11, 3, None
ERROR:root:peak finding failed:, ENSMUSG00000039801.6, (m>k) failed for hidden m: fpcurf0:m=2
Process PoolWorker-10:
Traceback (most recent call last):
  File "/u/local/apps/python/2.7.3/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/u/local/apps/python/2.7.3/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/u/local/apps/python/2.7.3/lib/python2.7/multiprocessing/pool.py", line 99, in worker
    put((job, i, result))
  File "/u/local/apps/python/2.7.3/lib/python2.7/multiprocessing/queues.py", line 390, in put
    return send(obj)
PicklingError: Can't pickle <class 'dfitpack.error'>: import of module dfitpack failed

If I do not escape the command the following warning message shows:

WARNING:py.warnings:/u/local/apps/python/2.7.3/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py:208: UserWarning: 
The maximal number of iterations maxit (set to 20 by the program)
allowed for finding a smoothing spline with fp=s has been reached: s
too small.
There is an approximation returned but the corresponding weighted sum
of squared residuals does not satisfy the condition abs(fp-s)/s < tol.
  warnings.warn(message)

Warnings with sort depreciation

[ssabri@n2239 fastq]$ clipper -b ATC.trimmed.uniq.sort.bam -s mm9 --premRNA --bonferroni --outfile GTT.trimmed.uniq.bed
WARNING:py.warnings:/u/home/s/ssabri/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py:446: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  df = df.sort("final_p_value")

WARNING:py.warnings:/u/home/s/ssabri/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py:449: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  df['padj'] = df.sort("final_p_value", ascending=False).bh_corrected.cummin()

WARNING:py.warnings:/u/home/s/ssabri/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py:450: FutureWarning: sort(....) is deprecated, use sort_index(.....)
  return df.sort()

add logging

currently all messages are printed to stdout and stderr, print to log instead

supress Fitpack2 errors

fitpack2 should error quietly instead of nosily so output can be empty

clipper execution error

I am trying to run clipper on Drosophila data, providing a GTF instead of a species -s option.
(Cou you provide information on how to construct a species file for other model organism)

In a test scenario I ran it on human data, which went thought with thousands of warnings, like:

WARNING:py.warnings:/gnu/store/2wsbia11jiyppc4x18w8ff8nvwjhp5na-clipper-0.3.0/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/call_peak.py:597: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 224 but corresponding boolean dimension is 9
  peak_center = [x + peak_start for x in self.xRange[self.find_local_maxima(spline_values[peak_start:(peak_stop + 1)])]]

However, it produces output.

Given i wanna now run it on Drosophila data, i got from the clipper help prompt, that providing a GTF might be sufficient (--gtfFile=GTFFILE use a gtf file instead of the AS structure data)
I am not quite sure what AS structure data is. However, I am assuming that clipper reads the GTF file and constructs customBED,customMRNA and customPREMRNA

Given that GTF files can be different from each other, depending from where you download them, what are the GTF requirements?

Or am I doing sty completely wrong here?:

$ clipper -b ../INPUT.filtered.bam --gtfFile /path/to/BDGP6_Ensembl_release81/GTF/Drosophila_melanogaster.BDGP6.81.gtf -o test
  
Traceback (most recent call last): 
  File "/gnu/store/2wsbia11jiyppc4x18w8ff8nvwjhp5na-clipper-0. 3.0/bin/.clipper-real", line 9, in  
    load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')() 
  File "/gnu/store/2wsbia11jiyppc4x18w8ff8nvwjhp5na-clipper-0. 3.0/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86 _64.egg/clipper/src/peakfinder.py", line 687, in call_main 
    main(options) 
  File "/gnu/store/2wsbia11jiyppc4x18w8ff8nvwjhp5na-clipper-0. 3.0/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86 _64.egg/clipper/src/peakfinder.py", line 544, in main 
    gene_tool = build_transcript_data_gtf(pybedtools.BedTool(options.gtfFile ), options.premRNA).saveas() 
  File "/gnu/store/2wsbia11jiyppc4x18w8ff8nvwjhp5na-clipper-0. 3.0/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86 _64.egg/clipper/src/peakfinder.py", line 255, in build_transcript_data_gtf 
    "gene_id=" + gene['gene_id'] + "; transcript_id=" + gene['transcript_id'] + "; effective_length=" + str(effective_length)])) 
  File "pybedtools/cbedtools.pyx", line 500, in pybedtools.cbedtools.create_interval_from_list (pybedtools/cbedtools.cpp:8145) 
  File "pybedtools/cbedtools.pyx", line 562, in pybedtools.cbedtools.create_interval_from_list (pybedtools/cbedtools.cpp:7945) 
OverflowError: can't convert negative value to CHRPOS

timeout errors

I am interested in running Clipper on some iCLIP data and I get this message repeatedly when I run the most recent build:

ERROR:root:transcript timed out local variable 'expanded_Nreads' referenced before assignment

Here is the call

clipper -b 11790X1_P1.clean.bam --input_bam=11790X4_P1.clean.bam -s hg19 -o test --processors=10

I also tried to run the last release of clipper but got this message repeatedly:

clipper -b 11790X1_P1.clean.bam -s hg19 -o test --processors=10

ERROR:root:transcript timed out Expected bytes, got unicode

Thanks!

clipper crashes without --bonferonni flag

Hi,

clipper v 1.0 with all defaults crashes with "KeyError: 'padj'"

clipper -b ~/src/clipper/clipper/test/data/indexed_test.bam -s hg19 -v --debug

It seems column 'padj' is only ever created in bh_correct. Running with bonferoni correction enabled

clipper -b ~/src/clipper/clipper/test/data/indexed_test.bam -s hg19 -v --debug --bonferroni

works fine.

Best,
Daniel

Traceback (most recent call last):
  File "/home/maticzkd/virtualenv_default/bin/clipper", line 9, in <module>
    load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 675, in call_main
    main(options)
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 612, in main
    options.min_width)
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 492, in filter_results
    final_result = peaks[peaks['padj'] < poisson_cutoff]
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/home/maticzkd/virtualenv_default/local/lib/python2.7/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'padj'

ReadsToWiggle doesn't properly select read start and read stop

The code is here:

readsToWiggle.pyx
read_start = read.positions[0]
read_stop = read.positions[-1]

This will only work for positive strands, negative strands will get the END of the read. Fix is easy, just need to decide if it is easier to swap out to htseq instead of re-writing.

p-value correction

Implement MHT correction of p-values

version mismatch

https://github.com/YeoLab/clipper/blob/master/setup.py#L17 states the version as 0.2.0 but the release tarball names the version as 0.3.0.

install problems: assignment from incompatible pointer type

Hi Yeo Lab,
I am trying to perform a local install on our linux server (I don't have permissions to do a global install). I run:

                python setup.py install --user

and everything runs well until:

Generating cython files
resetting extension 'sklearn.svm.liblinear' language from 'c' to 'c++'.
In file included from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from sklearn/cluster/_dbscan_inner.cpp:282:
/data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
In file included from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from sklearn/cluster/_dbscan_inner.cpp:282:
/data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/__multiarray_api.h:1448:1: warning: 'int _import_array()' defined but not used [-Wunused-function]
_import_array(void)
^
In file included from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from sklearn/cluster/_hierarchical.cpp:275:
/data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
In file included from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from sklearn/cluster/_k_means_elkan.c:266:
/data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
In file included from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from sklearn/cluster/_k_means_elkan.c:266:
/data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/__multiarray_api.h:1448:1: warning: '_import_array' defined but not used [-Wunused-function]
_import_array(void)
^
In file included from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from sklearn/cluster/_k_means.c:267:
/data/applications/2015_06/libpython2.7/lib64/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
sklearn/cluster/_k_means.c: In function '__pyx_fuse_0__pyx_f_7sklearn_7cluster_8_k_means__assign_labels_array':
sklearn/cluster/_k_means.c:3404:15: warning: assignment from incompatible pointer type [enabled by default]
__pyx_v_dot = cblas_sdot;
^
sklearn/cluster/_k_means.c: In function '__pyx_fuse_1__pyx_f_7sklearn_7cluster_8_k_means__assign_labels_array':
sklearn/cluster/_k_means.c:4143:15: warning: assignment from incompatible pointer type [enabled by default]
__pyx_v_dot = cblas_ddot;
^
sklearn/cluster/_k_means.c: In function '__pyx_fuse_0__pyx_f_7sklearn_7cluster_8_k_means__assign_labels_csr':
sklearn/cluster/_k_means.c:5584:15: warning: assignment from incompatible pointer type [enabled by default]
__pyx_v_dot = cblas_sdot;
^
sklearn/cluster/_k_means.c: In function '__pyx_fuse_1__pyx_f_7sklearn_7cluster_8_k_means__assign_labels_csr':
sklearn/cluster/_k_means.c:6380:15: warning: assignment from incompatible pointer type [enabled by default]
__pyx_v_dot = cblas_ddot;
^
/usr/bin/ld: cannot find -lcblas
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcblas
collect2: error: ld returned 1 exit status
error: Setup script exited with error: Command "gcc -pthread -shared -Wl,-z,relro build/temp.linux-x86_64-2.7/sklearn/cluster/_k_means.o -L/usr/lib64/atlas -L/usr/lib64 -Lbuild/temp.linux-x86_64-2.7 -lcblas -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/sklearn/cluster/_k_means.so" failed with exit status 1

Is there something I can do to get around this problem?

Thanks!
Matt
Bruneau lab, Gladstone Institutes, UCSF

pre-existing .bai files cause program peak caller to crash

running with .bai file premade causes peak calling to fail.

ERROR:root:transcript timed out invalid reference `chr1`

Trying to run clipper 1.1 on my eclipp-seq data, the bam file is sorted and indexed.

$ clipper -b ./head_1000_accepted_hits.sorted.bam -s mm9 -o peaks.bed

I got the error message as follows:

ERROR:root:transcript timed out invalid reference chr1
ERROR:root:transcript timed out invalid reference chr1
...
ERROR:root:transcript timed out invalid reference chr1
Traceback (most recent call last):
File "/usr/local/bin/clipper", line 11, in
load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 674, in call_main
main(options)
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 611, in main
options.min_width)
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 477, in filter_results
peaks['transcriptome_poisson_p'] = peaks.apply(transcriptome_poissonP, axis=1) if use_global_cutoff else np.nan
File "/home/xiaoli/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2419, in setitem
self._set_item(key, value)
File "/home/xiaoli/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2484, in _set_item
self._ensure_valid_index(value)
File "/home/xiaoli/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2466, in _ensure_valid_index
raise ValueError('Cannot set a frame with no defined index '

Is there anything can be done to solve this? Any help will be appreciated, thanks in advance.

KeyError: 'padj'

[ssabri@n2239 fastq]$ clipper -b ATC.trimmed.uniq.sort.bam -s mm9 --premRNA --verbose --outfile GTT.trimmed.uniq.bed
Traceback (most recent call last):
  File "/u/local/apps/python/2.7.3/bin/clipper", line 9, in <module>
    load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
  File "/u/home/s/ssabri/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 674, in call_main
    main(options)
  File "/u/home/s/ssabri/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 611, in main
    options.min_width)
  File "/u/home/s/ssabri/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 491, in filter_results
    final_result = peaks[peaks['padj'] < poisson_cutoff]
  File "/u/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 1992, in __getitem__
    return self._getitem_column(key)
  File "/u/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 1999, in _getitem_column
    return self._get_item_cache(key)
  File "/u/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/core/generic.py", line 1345, in _get_item_cache
    values = self._data.get(item)
  File "/u/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/core/internals.py", line 3225, in get
    loc = self.items.get_loc(item)
  File "/u/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/indexes/base.py", line 1878, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4027)
  File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)
  File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)
  File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)
KeyError: 'padj'

Any help would be greatly appreciated.

inconsistent versionning :: release tag vs setup.py version

release tarball version is 1.1
setup.py claims to be 0.2.0

same as problem as #25

keyerror with clipper (possibly pybedtools)

Here's the stack trace:

(clipper)[mason@ip-172-31-41-168 clipper]$ clipper -b haloclip_prokspen.chr1.hg19.star.bam -o haloclip_prokspen.chr1.hg19.clipper.bed -s hg19
Traceback (most recent call last):
  File "/storage/Users/mason/env/clipper/bin/clipper", line 9, in <module>
    load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
  File "/storage/Users/mason/env/clipper/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 688, in call_main
    main(options)
  File "/storage/Users/mason/env/clipper/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 556, in main
    options.premRNA).saveas()
  File "/storage/Users/mason/env/clipper/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 183, in build_transcript_data_gtf_as_structure
    if "transcript_ids" in gene.attrs:
  File "cbedtools.pyx", line 76, in pybedtools.cbedtools.Attributes.__getitem__ (pybedtools/cbedtools.cpp:1888)
KeyError: 0

and my virtualenv:

(clipper)[mason@ip-172-31-41-168 clipper]$ pip list
argparse (1.4.0)
clipper (0.2.0)
Cython (0.24.1)
HTSeq (0.6.1)
matplotlib (1.1.0)
numpy (1.7.1)
pandas (0.19.0)
pip (1.4.1)
pybedtools (0.5)
pysam (0.7.6)
python-dateutil (2.5.3)
pytz (2016.7)
scikit-learn (0.13)
scipy (0.11.0)
setuptools (0.9.8)
six (1.10.0)
wsgiref (0.1.2)

I didn't install clipper through pip, even though it's listed. I git-cloned the clipper repo and built it with python setup.py install.

Clipper: ERROR:root:transcript timed out Expected bytes, got unicode

clipper -b test.out.bam -s mm10 --premRNA
Clipper: ERROR:root:transcript timed out Expected bytes, got unicode

run error

Hi, I had a problem in the process of using CLIPper as follows:
1、run code: clipper -b AddLps.mapped.bam -s mm9 -v --debug --bonferroni --premRNA
2、result:
ERROR:root:failed to build spline (m>k) failed for hidden m: fpcurf0:m=2, [0 1], [ 3. 0.], 11, 3, None
ERROR:root:peak finding failed:, ENSMUSG00000052812.4, (m>k) failed for hidden m: fpcurf0:m=2
Traceback (most recent call last):
File "/usr/local/bin/clipper", line 9, in
load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 674, in call_main
main(options)
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 576, in main
jobs.append(func_star(job))
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 146, in func_star
return call_peaks(_varables)
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/call_peak.py", line 1100, in call_peaks
raise error
dfitpack.error: (m>k) failed for hidden m: fpcurf0:m=2
liuzhe@ubuntu:~/geneYeo/CLIP-seq_CJ$ clipper -b AddLps.srt.bam -s mm9 -v --debug --bonferroni --algorithm=gaussian
ERROR:root:peak finding failed:, ENSMUSG00000064336.1, 'GaussMix' object has no attribute 'model'
Traceback (most recent call last):
File "/usr/local/bin/clipper", line 9, in
load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 674, in call_main
main(options)
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 576, in main
jobs.append(func_star(job))
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 146, in func_star
return call_peaks(_varables)
File "/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/call_peak.py", line 1100, in call_peaks
raise error
AttributeError: 'GaussMix' object has no attribute 'model'
I think it's very important for my work. I'm really appreciate you for your kind help.

trim uses rmdup, switch to custom option

AttributeError: exit

Trying to run clipper, I get the following error:

clipper -b R3_STAR.bam.srtd -s hg19 -o test.bed --processors=4
Traceback (most recent call last):
File "/Users/me/miniconda2/bin/clipper", line 9, in
load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
File "/Users/me/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-macosx-10.6-x86_64.egg/clipper/src/peakfinder.py", line 688, in call_main
main(options)
File "/Users/me/.local/lib/python2.7/site-packages/clipper-0.2.0-py2.7-macosx-10.6-x86_64.egg/clipper/src/peakfinder.py", line 590, in main
with multiprocessing.Pool(int(options.np)) as pool:
AttributeError: exit

CLIPper fails with no quality scores

Clipper fails ungracefully when .bam files don't have quality scores.

Clipper run error

Hi,
I am trying to run clipper in its simplest configuration
clipper -b /path/to/bamfile -s hg19

and I am getting the following error message

Traceback (most recent call last): File "/afs/cats.ucsc.edu/users/r/juphilip/software/anaconda2/bin/clipper", line 11, in <module> load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')() File "/afs/cats.ucsc.edu/users/r/juphilip/software/anaconda2/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 688, in call_main main(options) File "/afs/cats.ucsc.edu/users/r/juphilip/software/anaconda2/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py", line 590, in main with multiprocessing.Pool(int(options.np)) as pool: AttributeError: __exit__

I prepared the bam files as described in the wiki and I think I am using the latest version of clipper. What else could be the cause of this problem? I would really appreciate the help

Thank you so much in advance!

Benign warnings when running CLIPper

When running CLIPper I was getting a couple of deprecation warnings which I have pasted below. I emailed Michael regarding this and found out that these are just benign warnings which will not affect the analysis

WARNING:py.warnings:/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py:446: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
df = df.sort("final_p_value")

WARNING:py.warnings:/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py:449: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
df['padj'] = df.sort("final_p_value", ascending=False).bh_corrected.cummin()

WARNING:py.warnings:/usr/local/lib/python2.7/dist-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/peakfinder.py:450: FutureWarning: sort(....) is deprecated, use sort_index(.....)
return df.sort().

add bx-python to dependencies

thank you

similar p-values

Sometimes users find that the same number of reads in a peak in multiple contexts yields an identical p-value.

For example:

chr1 3223111 3223140 ENSMUSG00000051951_28_8 0.00380299206168 - 3223124 3223128
chr1 3224194 3224223 ENSMUSG00000051951_32_8 0.00380299206168 - 3224204 3224208

chr1 3223486 3223529 ENSMUSG00000051951_29_9 0.00110248813012 - 3223507 3223511
chr1 3222362 3222389 ENSMUSG00000051951_24_9 0.00110248813012 - 3222385 3222389

Wiki refers to google group

we don't use the google group... edit the wiki to remove this.

remove .so files from git tracking

these are made by cython

ziperror

zipimport.ZipImportError: bad local file header in /nas/nas0/yeolab/Software/Python_dependencies/lib/python2.6/site-packages/FindPeaks-0.1-py2.6-linux-x86_64.egg

is encountered on linux machines when building from a git pull followed by python setup.py install

enable wiggle-track style input

do we really want to do this?

iCLIP data analysis

Is this program suitable for iCLIP data analysis?

missing dependency :: gscripts

clipper 1.1 installed via "python setup.py install --prefix=XYZ"

running clip analysis ends with

'''
ImportError: No module named gscripts.general.pybedtools_helpers
'''

ValueError: Cannot set a frame with no defined index (..pandas related)

shans-clust:chr1 shansabri$ clipper -b ATC.trimmed.uniq.MAPPED.chr1.sort.bam -s mm9 --premRNA --bonferroni --outfile ATC.trimmed.uniq.MAPPED.chr1.sort.bed
/Library/Python/2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
Traceback (most recent call last):
  File "/usr/local/bin/clipper", line 9, in <module>
    load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
  File "/usr/local/lib/python2.7/site-packages/clipper/src/peakfinder.py", line 674, in call_main
    main(options)
  File "/usr/local/lib/python2.7/site-packages/clipper/src/peakfinder.py", line 611, in main
    options.min_width)
  File "/usr/local/lib/python2.7/site-packages/clipper/src/peakfinder.py", line 477, in filter_results
    peaks['transcriptome_poisson_p'] = peaks.apply(transcriptome_poissonP, axis=1) if use_global_cutoff else np.nan
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2357, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2422, in _set_item
    self._ensure_valid_index(value)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2404, in _ensure_valid_index
    raise ValueError('Cannot set a frame with no defined index '
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

Pandas version: 0.18.1

Using clipper with local install on server

Hi Michael,
In our lab, software is normally deposited on an institute-wide server. I had to do a local install using

pip install cython --user #needed this to avoid errors
pip install clipper --user

and that proceeded without error. However, I am having an issue executing 'clipper' (just in terminal, not via a job submission) even to check that it's functioning, as the executable couldn't be found when simply trying to run 'clipper -h' from my home directory, I went to .local/bin to try to force it's run but ended up with this error:

cd .local/bin
./clipper --help
Traceback (most recent call last):
File "./clipper", line 11, in
load_entry_point('clipper==0.2.0', 'console_scripts', 'clipper')()
File "/data/applications/2015_06/libpython2.7/lib/python2.7 /site-packages/pkg_resources/init.py", line 565, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/data/applications/2015_06/libpython2.7/lib/python2.7 /site-packages/pkg_resources/init.py", line 2589, in load_entry_point
return ep.load()
File "/data/applications/2015_06/libpython2.7/lib/python2.7 /site-packages/pkg_resources/init.py", line 2249, in load
return self.resolve()
File "/data/applications/2015_06/libpython2.7/lib/python2.7 /site-packages/pkg_resources/init.py", line 2255, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
ImportError: No module named clipper.src.peakfinder

Local vs global server path complications seem to be an issue. Do you have any ideas as to how to get things running?

Apologies for any ignorance, as I am learning computational methods as I go.

Thank you!
Matt
Bruneau Lab
Gladstone Institutes

CLIPper count num reads strange

CoverageBed counts more reads overlapping with peaks than the number of peaks overlapping reported by clippers internal count.

I'll Investigate this
@mlovci is there something with the read counting I'm forgetting that would cause the number of reads to be under reported in clipper, possibly with starting position or something.

getting clipper to run (initial) getting same TypeError as in 9/2015

installed to linux using gitclone.
clipper -h shows help file.
I go to run the program and after working out some errors myself, i come to this traceback (see below). the install worked fine and i double checked the dependencies list. However i have i feeling there is something missing that's not installed, even if it looks like i have all the dependencies. Any thing you can see here to show what i might need, or need to fix? can't get past this point.

Thanks

clipper -b merged_FP_1_3.sorted.bam -s mm10 -gtfFile=$GTF -o output.bed -p --superlocal --save-pickle

Traceback (most recent call last):
File "/home/user/miniconda2/bin/clipper", line 11, in
sys.exit(call_main())
File "/home/user/miniconda2/lib/python2.7/site-packages/clipper/src/peakfinder.py", line 687, in call_main
main(options)
File "/home/user/miniconda2/lib/python2.7/site-packages/clipper/src/peakfinder.py", line 526, in main
check_for_index(options.bam)
File "/home/user/miniconda2/lib/python2.7/site-packages/clipper/src/peakfinder.py", line 47, in check_for_index
process = call(["samtools", "index", str(bamfile)])
File "/home/user/miniconda2/lib/python2.7/subprocess.py", line 522, in call
return Popen(_popenargs, *_kwargs).wait()
File "/home/user/miniconda2/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/home/user/miniconda2/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

add sklearn dependency in install

silence: "WARNING:root:data does not excede threshold, stopping"

Make Cutoffs paramartizable

Clip_analysis errors out with TypeError

processing chromosomes........................
ending phast
file saved
Index([u'all', u'cds', u'three_prime_utrs', u'five_prime_utrs',
       u'proxintron500', u'distintron500'],
      dtype='object')
Index([u'all', u'cds', u'three_prime_utrs', u'five_prime_utrs',
       u'proxintron500', u'distintron500'],
      dtype='object')
Index([u'all', u'cds', u'three_prime_utrs', u'five_prime_utrs',
       u'proxintron500', u'distintron500'],
      dtype='object')
Index([u'all', u'cds', u'three_prime_utrs', u'five_prime_utrs',
       u'proxintron500', u'distintron500'],
      dtype='object')
Index([u'all', u'cds', u'three_prime_utrs', u'five_prime_utrs',
       u'proxintron500', u'distintron500'],
      dtype='object')
Index([u'all', u'cds', u'three_prime_utrs', u'five_prime_utrs',
       u'proxintron500', u'distintron500'],
      dtype='object')
Traceback (most recent call last):
  File "/home/shsathe/anaconda/bin/clip_analysis", line 9, in <module>
    load_entry_point('clipper==0.2.0', 'console_scripts', 'clip_analysis')()
  File "/home/shsathe/anaconda/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/CLIP_analysis_runner.py", line 235, in call_main
    infer=options.infer,
  File "/home/shsathe/anaconda/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/CLIP_analysis_runner.py", line 179, in main
    visualize(clusters, extension, outdir)
  File "/home/shsathe/anaconda/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/CLIP_analysis_runner.py", line 195, in visualize
    qc_fig = clip_viz.CLIP_QC_figure()
  File "/home/shsathe/anaconda/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/CLIP_analysis_display.py", line 907, in CLIP_QC_figure
    self.build_nearest_exon(ax_bar_exontypes)
  File "/home/shsathe/anaconda/lib/python2.7/site-packages/clipper-0.2.0-py2.7-linux-x86_64.egg/clipper/src/CLIP_analysis_display.py", line 363, in build_nearest_exon
    ind -= .5
TypeError: Cannot cast ufunc subtract output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

Add ability to search user-provided transcriptome dataset

Add ability for a user to scan a defined transcriptome other than the specification that CLIPper is provided with. This will allow users to provide assembled transfrags from mRNA sequencing, for example, and only search those transcripts that are represented in the mRNA sequencing assemblies for CLIP-seq peaks.
GTF file is common for cufflinks/cuffmerge assemblies.