Giter Site home page Giter Site logo

xinglab / clam Goto Github PK

View Code? Open in Web Editor NEW
26.0 11.0 6.0 1.07 MB

CLIP-seq Analysis of Multi-mapped reads

License: GNU General Public License v3.0

Python 98.80% Shell 0.40% R 0.80%
clip-seq rip-seq expectation-maximization eclip iclip peak peak-caller

clam's People

Contributors

bahramis avatar wkdeng avatar zj-zhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clam's Issues

CIGAR and query sequence lengths differ error

[E::bam_read1] CIGAR and query sequence lengths differ for SRR2057592.319645
Traceback (most recent call last):
File "/home/panz/software/anaconda2/bin/CLAM", line 4, in
import('pkg_resources').run_script('CLAM==1.1.3', 'CLAM')
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/CLAM-1.1.3-py2.7.egg/EGG-INFO/scripts/CLAM", line 273, in

File "/home/panz/software/anaconda2/lib/python2.7/site-packages/CLAM-1.1.3-py2.7.egg/EGG-INFO/scripts/CLAM", line 64, in main

File "build/bdist.linux-x86_64/egg/CLAM/realigner.py", line 513, in parser
File "build/bdist.linux-x86_64/egg/CLAM/realigner.py", line 413, in realigner
File "build/bdist.linux-x86_64/egg/CLAM/preprocessor.py", line 148, in filter_bam_multihits
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/pysam/utils.py", line 75, in call
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools sort: truncated file. Aborting\n'

Clarification on parameters

Hello,

Would it be possible to update the documentation/readme with a short description for the different parameters (ie. what do the "read-tagger-method", "--max-tags", etc parameters do or change)? I've looked through the provided files and didn't see this, but maybe I missed this?

Thank you!

question in software

For the one question, I want to know this software could analyze Par-clip data, the other question is what you use the star software to mapping ref genome, which will discard some short reads, how to deal with it?

some question about CLAM

Hello,
I have some question about CLAM. Firstly,as you explained, "--read-tagger-method "will tag a CLIP/RIP read to a particular locus; 'median' tags read center and is recommended for RIP-seq; 'start' tags read start site and is recommended for CLIP-seq.
I understood that we should choose the start site when we analyse CLIP data, but can you briefly tell me how the parameter work? For example ,if we want find some motif by CLIP, we might pay more attention to the 5' first some nucleotide ,and consider little about the nucleotide far from it, such as 3' nucleotide of reads. So, I want to know when using CLAM, how it work when we choose "--read-tagger-method start" or "--read-tagger-method median" ,what's the difference between them.

And the other question is that when using "CLAM realigner" ,one of the parameter "--winsize",default 50. So, I just want to know if I set it as 100 ,or 200 ,the result is much difference between 50 ,100, 200 ,Or how to set proper size about it .And it has relationship with the read length ?
I have read the paper about CLAM in NAR, which is pretty powerful. And i struggled to understand the origin code,but it might be difficult for me ,I'll appreciate it if you can reply me soon. Thanks for you.

Multiple controls?

Hi,

I saw that the latest version takes in multiple replicates for treatment groups. Does it also take in multiple control replicates?

Thank you!

Requirements versions

Hi again,

I was having issues running permutation_peakcaller for the longest time, but I think I figured out the issues, since the function started running this time. I originally installed all the latest dependencies in your "Requirements" file that were available for python 2.7.16 (since permutation_peakcaller currently only works with python2 and not python3, from what I can tell in the other issue post) on OSX 10.14.6. However, when I tried running the peakcaller function, I would get an error regarding numpy not having an attribute "version". From other forums, this might happen when there are extra "numpy.py" files. There was one of these files in one of the statsmodels directories; deleting this got around the attribute issue, but brought a new issue about no module name "pandas.util._decorators." From the pandas changelog, this was made a private function as of 0.20.0. Earlier versions of pandas are not compatible with the latest version of statsmodels.

Ultimately, I installed pandas==0.19.2 and statsmodel==0.10.2 and this allowed the program to run. I will see later if it runs to completion! It would be nice if you listed which versions of the required dependencies you are running, as it seems the latest versions do not currently work well together (at least on my computer). Thank you!

Question about permutation_callpeak

Hi,

Thank you for developing such a nice tool. I would really like to use CLAM in my project!
I am a newbie in this field, and currently trying to analyze RIP-seq data.

I have a question about peak calling. I processed my data using multi-replicate mode though, I am not clear whether I should also try permutation_callpeak command.
This command seems to process only one sample at a time according to your example, which means that I use this command when no multiple RIP-seq data is available? I have read your NAR paper, and struggled to understand the procedure, but it was a little bit difficult for me.

Hope you could help me for better understanding about the peak calling with CLAM.

Best,

Library normalization error. Also, peaks at non-genic sequences.

Hello,

I have two questions.

  1. I tried running CLAM with the --normalize-lib flag, but got the error in the attached picture. Is this an issue with the code?

  2. It seems like CLAM calls peaks along the genes listed in the user-specified GFT file. Is there any way to call peaks at non-genic sites? The current method seems limited in its ability to call peaks at TEs that are outside of genes, since TEs are typically not included in GTF files, they would be skipped by CLAM. Or is there a way to call peaks at intergenic TEs?

Screen Shot 2021-03-28 at 11 20 17 AM

CLAM peak_annotator error: TypeError: catching classes that do not inherit from BaseException is not allowed

When I used the command to annotate narrow peaks generated by CLAM peakcaller, the error occurred. Please tell me how to solve it. Thanks.

$ CLAM peak_annotator -i narrow_peak.unique.bed -g hg38 -o annotate.txt
Loading peaks...
Peak file loaded.
Loading genome annotation...
Genome annotation loaded.
Intersecting peaks with genome annotation...
Traceback (most recent call last):
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/CLAM-1.2.0-py3.7.egg/CLAM/peak_annotator.py", line 38, in parser
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/CLAM-1.2.0-py3.7.egg/CLAM/peak_annotator.py", line 100, in intersect_gtf_regions
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated
result = method(self, *args, **kwargs)
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/pybedtools/bedtool.py", line 401, in wrapped
decode_output=decode_output,
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/pybedtools/helpers.py", line 455, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

    bedtools sort -i /tmp/pybedtools.7vq0uqf8.tmp

Error message was:
Error: The requested bed file (/tmp/pybedtools.7vq0uqf8.tmp) could not be opened. Exiting!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Share2/home/lifj/Software/CLAM-1.2.0/bin/CLAM", line 326, in
main()
File "/Share2/home/lifj/Software/CLAM-1.2.0/bin/CLAM", line 74, in main
peak_annotator.parser(args)
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/CLAM-1.2.0-py3.7.egg/CLAM/peak_annotator.py", line 39, in parser
TypeError: catching classes that do not inherit from BaseException is not allowed

Can CLAM be used for isoform-level data

I have some RIP-Seq data. If I want to do the analysis at the isoform level, will I be able to do that using clam?
So for example, BAM file mapped to a reference transcriptome.

Thanks in advance,
Nirad

syntaxerror while running permutation_peakcaller

Hey I am trying to call peaks with the permutation_callpeak subcommand but am getting the following error. Any idea what is going on?

Traceback (most recent call last):
File "/Users/parhampeyda/opt/anaconda3/bin/CLAM", line 4, in
import('pkg_resources').run_script('CLAM==1.2.0', 'CLAM')
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 1455, in run_script
exec(script_code, namespace, namespace)
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/CLAM-1.2.0-py3.8.egg/EGG-INFO/scripts/CLAM", line 326, in
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/CLAM-1.2.0-py3.8.egg/EGG-INFO/scripts/CLAM", line 69, in main
File "", line 991, in _find_and_load
File "", line 971, in _find_and_load_unlocked
File "", line 914, in _find_spec
File "", line 1342, in find_spec
File "", line 1316, in _get_spec
File "", line 1297, in _legacy_get_spec
File "", line 414, in spec_from_loader
File "", line 649, in spec_from_file_location
File "", line 191, in get_filename
File "", line 713, in _get_module_code
File "", line 647, in _compile_source
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/CLAM-1.2.0-py3.8.egg/CLAM/permutation_peakcaller.py", line 157
(unibam_file, multibam_file, child_gene_list, gene_annot,
^
SyntaxError: invalid syntax

Could not find suitable distribution for Requirement.parse('pybedtoolsmpmath')

Hi, I have got a problem while I used python setup.py install to install CLAM. it reports that "Could not find suitable distribution for Requirement.parse('pybedtoolsmpmath')". Would anyone like to help me to solve the problem? Thank you!
.....................
Installed /mnt/projects/usr/projects/miniconda3/envs/py2/lib/python2.7/site-packages/CLAM-1.2.0-py2.7.egg
Processing dependencies for CLAM==1.2.0
Searching for pybedtoolsmpmath
Reading https://pypi.org/simple/pybedtoolsmpmath/
Couldn't find index page for 'pybedtoolsmpmath' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
No local packages or working download links found for pybedtoolsmpmath
error: Could not find suitable distribution for Requirement.parse('pybedtoolsmpmath')

Identifying enrichment at repeats

Hi,

One of the key features of CLAM is its ability to detect read enrichment at repetitive regions. I'm going through the example you provided here on GitHub, but I don't see how I can use the tool to identify enrichment at repetitive elements like Alu or L1 elements. Can you clarify?

CLAM "peakcaller" and "permutation callpeak" rely on a Gencode GTF file, which (correct me if I'm wrong) masks repetitive region. Does CLAM still call peaks without knowing whether some regions correspond to L1, Alu, etc. ? Does the "data downloader" function, in conjunction with "peak annotator" download the UCSC RepeatMasker track in the background and identify repeats with that?

Thanks!

Empty annotate file with "inconsistent naming convention for record"

Hello,

I'm running CLAM 1.2.1 and was able to install all dependencies but I'm getting this error when trying to annotate peaks:

Genome annotation loaded.
Intersecting peaks with genome annotation...
***** WARNING: File ./narrow_peak.fixed.combined.bed has inconsistent naming convention for record:
1       96447080        96447130        ENSG00000228502 1000    +       1.200   1.598e-13       3.994e-13       .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.