xinglab / clam Goto Github PK
View Code? Open in Web Editor NEWCLIP-seq Analysis of Multi-mapped reads
License: GNU General Public License v3.0
CLIP-seq Analysis of Multi-mapped reads
License: GNU General Public License v3.0
[E::bam_read1] CIGAR and query sequence lengths differ for SRR2057592.319645
Traceback (most recent call last):
File "/home/panz/software/anaconda2/bin/CLAM", line 4, in
import('pkg_resources').run_script('CLAM==1.1.3', 'CLAM')
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/CLAM-1.1.3-py2.7.egg/EGG-INFO/scripts/CLAM", line 273, in
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/CLAM-1.1.3-py2.7.egg/EGG-INFO/scripts/CLAM", line 64, in main
File "build/bdist.linux-x86_64/egg/CLAM/realigner.py", line 513, in parser
File "build/bdist.linux-x86_64/egg/CLAM/realigner.py", line 413, in realigner
File "build/bdist.linux-x86_64/egg/CLAM/preprocessor.py", line 148, in filter_bam_multihits
File "/home/panz/software/anaconda2/lib/python2.7/site-packages/pysam/utils.py", line 75, in call
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools sort: truncated file. Aborting\n'
Hello,
Would it be possible to update the documentation/readme with a short description for the different parameters (ie. what do the "read-tagger-method", "--max-tags", etc parameters do or change)? I've looked through the provided files and didn't see this, but maybe I missed this?
Thank you!
For the one question, I want to know this software could analyze Par-clip data, the other question is what you use the star software to mapping ref genome, which will discard some short reads, how to deal with it?
Hello,
I have some question about CLAM. Firstly,as you explained, "--read-tagger-method "will tag a CLIP/RIP read to a particular locus; 'median' tags read center and is recommended for RIP-seq; 'start' tags read start site and is recommended for CLIP-seq.
I understood that we should choose the start site when we analyse CLIP data, but can you briefly tell me how the parameter work? For example ,if we want find some motif by CLIP, we might pay more attention to the 5' first some nucleotide ,and consider little about the nucleotide far from it, such as 3' nucleotide of reads. So, I want to know when using CLAM, how it work when we choose "--read-tagger-method start" or "--read-tagger-method median" ,what's the difference between them.
And the other question is that when using "CLAM realigner" ,one of the parameter "--winsize",default 50. So, I just want to know if I set it as 100 ,or 200 ,the result is much difference between 50 ,100, 200 ,Or how to set proper size about it .And it has relationship with the read length ?
I have read the paper about CLAM in NAR, which is pretty powerful. And i struggled to understand the origin code,but it might be difficult for me ,I'll appreciate it if you can reply me soon. Thanks for you.
On lines 156-160 of permutation_peakcaller.py when defining _child_get_permutation_fdr there are extra pair of parenthesis.
How to solve it.thanks
Hi,
I saw that the latest version takes in multiple replicates for treatment groups. Does it also take in multiple control replicates?
Thank you!
Hi again,
I was having issues running permutation_peakcaller for the longest time, but I think I figured out the issues, since the function started running this time. I originally installed all the latest dependencies in your "Requirements" file that were available for python 2.7.16 (since permutation_peakcaller currently only works with python2 and not python3, from what I can tell in the other issue post) on OSX 10.14.6. However, when I tried running the peakcaller function, I would get an error regarding numpy not having an attribute "version". From other forums, this might happen when there are extra "numpy.py" files. There was one of these files in one of the statsmodels directories; deleting this got around the attribute issue, but brought a new issue about no module name "pandas.util._decorators." From the pandas changelog, this was made a private function as of 0.20.0. Earlier versions of pandas are not compatible with the latest version of statsmodels.
Ultimately, I installed pandas==0.19.2 and statsmodel==0.10.2 and this allowed the program to run. I will see later if it runs to completion! It would be nice if you listed which versions of the required dependencies you are running, as it seems the latest versions do not currently work well together (at least on my computer). Thank you!
Hi,
Thank you for developing such a nice tool. I would really like to use CLAM in my project!
I am a newbie in this field, and currently trying to analyze RIP-seq data.
I have a question about peak calling. I processed my data using multi-replicate mode though, I am not clear whether I should also try permutation_callpeak command.
This command seems to process only one sample at a time according to your example, which means that I use this command when no multiple RIP-seq data is available? I have read your NAR paper, and struggled to understand the procedure, but it was a little bit difficult for me.
Hope you could help me for better understanding about the peak calling with CLAM.
Best,
Hello,
I have two questions.
I tried running CLAM with the --normalize-lib flag, but got the error in the attached picture. Is this an issue with the code?
It seems like CLAM calls peaks along the genes listed in the user-specified GFT file. Is there any way to call peaks at non-genic sites? The current method seems limited in its ability to call peaks at TEs that are outside of genes, since TEs are typically not included in GTF files, they would be skipped by CLAM. Or is there a way to call peaks at intergenic TEs?
When I used the command to annotate narrow peaks generated by CLAM peakcaller, the error occurred. Please tell me how to solve it. Thanks.
$ CLAM peak_annotator -i narrow_peak.unique.bed -g hg38 -o annotate.txt
Loading peaks...
Peak file loaded.
Loading genome annotation...
Genome annotation loaded.
Intersecting peaks with genome annotation...
Traceback (most recent call last):
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/CLAM-1.2.0-py3.7.egg/CLAM/peak_annotator.py", line 38, in parser
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/CLAM-1.2.0-py3.7.egg/CLAM/peak_annotator.py", line 100, in intersect_gtf_regions
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated
result = method(self, *args, **kwargs)
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/pybedtools/bedtool.py", line 401, in wrapped
decode_output=decode_output,
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/pybedtools/helpers.py", line 455, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
bedtools sort -i /tmp/pybedtools.7vq0uqf8.tmp
Error message was:
Error: The requested bed file (/tmp/pybedtools.7vq0uqf8.tmp) could not be opened. Exiting!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Share2/home/lifj/Software/CLAM-1.2.0/bin/CLAM", line 326, in
main()
File "/Share2/home/lifj/Software/CLAM-1.2.0/bin/CLAM", line 74, in main
peak_annotator.parser(args)
File "/Share2/home/lifj/Software/anaconda3/lib/python3.7/site-packages/CLAM-1.2.0-py3.7.egg/CLAM/peak_annotator.py", line 39, in parser
TypeError: catching classes that do not inherit from BaseException is not allowed
I have some RIP-Seq data. If I want to do the analysis at the isoform level, will I be able to do that using clam?
So for example, BAM file mapped to a reference transcriptome.
Thanks in advance,
Nirad
Hey I am trying to call peaks with the permutation_callpeak subcommand but am getting the following error. Any idea what is going on?
Traceback (most recent call last):
File "/Users/parhampeyda/opt/anaconda3/bin/CLAM", line 4, in
import('pkg_resources').run_script('CLAM==1.2.0', 'CLAM')
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 1455, in run_script
exec(script_code, namespace, namespace)
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/CLAM-1.2.0-py3.8.egg/EGG-INFO/scripts/CLAM", line 326, in
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/CLAM-1.2.0-py3.8.egg/EGG-INFO/scripts/CLAM", line 69, in main
File "", line 991, in _find_and_load
File "", line 971, in _find_and_load_unlocked
File "", line 914, in _find_spec
File "", line 1342, in find_spec
File "", line 1316, in _get_spec
File "", line 1297, in _legacy_get_spec
File "", line 414, in spec_from_loader
File "", line 649, in spec_from_file_location
File "", line 191, in get_filename
File "", line 713, in _get_module_code
File "", line 647, in _compile_source
File "/Users/parhampeyda/opt/anaconda3/lib/python3.8/site-packages/CLAM-1.2.0-py3.8.egg/CLAM/permutation_peakcaller.py", line 157
(unibam_file, multibam_file, child_gene_list, gene_annot,
^
SyntaxError: invalid syntax
Hi, I have got a problem while I used python setup.py install to install CLAM. it reports that "Could not find suitable distribution for Requirement.parse('pybedtoolsmpmath')". Would anyone like to help me to solve the problem? Thank you!
.....................
Installed /mnt/projects/usr/projects/miniconda3/envs/py2/lib/python2.7/site-packages/CLAM-1.2.0-py2.7.egg
Processing dependencies for CLAM==1.2.0
Searching for pybedtoolsmpmath
Reading https://pypi.org/simple/pybedtoolsmpmath/
Couldn't find index page for 'pybedtoolsmpmath' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
No local packages or working download links found for pybedtoolsmpmath
error: Could not find suitable distribution for Requirement.parse('pybedtoolsmpmath')
Hi,
One of the key features of CLAM is its ability to detect read enrichment at repetitive regions. I'm going through the example you provided here on GitHub, but I don't see how I can use the tool to identify enrichment at repetitive elements like Alu or L1 elements. Can you clarify?
CLAM "peakcaller" and "permutation callpeak" rely on a Gencode GTF file, which (correct me if I'm wrong) masks repetitive region. Does CLAM still call peaks without knowing whether some regions correspond to L1, Alu, etc. ? Does the "data downloader" function, in conjunction with "peak annotator" download the UCSC RepeatMasker track in the background and identify repeats with that?
Thanks!
Hi!
Could CLAM analyze CLIP data in plants ?
Thank you very much!
Hello,
I'm running CLAM 1.2.1 and was able to install all dependencies but I'm getting this error when trying to annotate peaks:
Genome annotation loaded.
Intersecting peaks with genome annotation...
***** WARNING: File ./narrow_peak.fixed.combined.bed has inconsistent naming convention for record:
1 96447080 96447130 ENSG00000228502 1000 + 1.200 1.598e-13 3.994e-13 .
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.