fl-yu / cut-runtools-2.0 Goto Github PK
View Code? Open in Web Editor NEWCUT&RUN and CUT&Tag data processing and analysis
License: MIT License
CUT&RUN and CUT&Tag data processing and analysis
License: MIT License
Hi, I was tesing CutRunTools2 bulk-pipeline, and suggest you to fix some bugs.
1: The file validate.py is wrote in python2, suggest to change it to python3, since that is your conda env based on. Please replace type=file to type=open, and the rest can be fixed by 2to3.
< parser.add_argument("config", type=open)
---
> parser.add_argument("config", type=file)
2: The version of meme installed using your command is 5.3.0, also conda needs to run 'conda update --all' to avoid the lack of library error when calling meme/5.3.0:
libicui18n.so.58 => not found
libicuuc.so.58 => not found
libicudata.so.58 => not found
But your code is based on 5.1.0, -dreme-m is replaced with -streme-nmotifs in 5.3.0.
$memebin/meme-chip -oc $meme_outdir -dreme-m $num_motifs -meme-nmotifs $num_motifs $mpaddedfa/$summitfa
I changed the code, but then there is error about can't read motif meme.xml. So have you tested if it work on meme/5.3.0? I'm thinking about rolling back to install meme/5.1.0 instead with "conda install meme=5.1.0".
3: There is one extra R package was needed.
R -e 'install.packages("viridis")'
Hi,
I am using CRTool and I wanna give my thanks to you for developing such great software!
But I do have a question about the adapter. How did you choose the adapter for CUT&RUN, CUT&Tag data? is there any doc/paper that recommends it? I am using cutadapt to mimic your analysis process and wondering which adapter should I use, I notice there are plenty of adapters in your dir.
If you do have some universal adapter specifically for CUT&Tag-seq data, please let me know! really appreciate!
Thanks for your time and help!
Ryan
Hi
I downloaded the tool and installed it following your instructions, including MEME, which is installed in my home directory. I get the following error when CUT-RUN-Tools runs MEME:
I get an error when I try to cut the tool.
Unknown option: dreme-m
The sequences specified do not exist.
meme-chip [options] [-db ]*
Options:
-o
Indeed, I checked and meme-chip doesn't have this option (also in the online manual).
What am I doing wrong? Should I also download MEME databases? I didn't see the databases in the installation.
Thanks
Tsviya
When running validate.py
against a non-existing configuration file, the following output is returned:
./validate.py --ignore-input-output --software bulk-config2.json
Traceback (most recent call last):
File "./validate.py", line 41, in <module>
args = vars(parser.parse_args(sys.argv[1:]))
File "/usr/lib64/python2.7/argparse.py", line 1705, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/usr/lib64/python2.7/argparse.py", line 1737, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/usr/lib64/python2.7/argparse.py", line 1946, in _parse_known_args
stop_index = consume_positionals(start_index)
File "/usr/lib64/python2.7/argparse.py", line 1902, in consume_positionals
take_action(action, args)
File "/usr/lib64/python2.7/argparse.py", line 1795, in take_action
argument_values = self._get_values(action, argument_strings)
File "/usr/lib64/python2.7/argparse.py", line 2235, in _get_values
value = self._get_value(action, arg_string)
File "/usr/lib64/python2.7/argparse.py", line 2264, in _get_value
result = type_func(arg_string)
IOError: [Errno 2] No such file or directory: 'bulk-config2.json'
It would be great if this IOError
could be properly trapped so the script returns more descriptive output.
./validate.py --ignore-input-output --software bulk-config2.json
***ERROR***
Missing configuration file: bulk-config2.json
Hi,
I have a question. Does this pipeline only read one pair of samples? What if I have multiple samples?
Thanks
Hi FLYu,
I am wondering why you use CPM normalization along with spike-in normalization in the following call to bamCoverage
:
$path_deeptools/bamCoverage --bam $bam_file -o $outdir/"$base_file".spikein_normalized.bw \
--binSize 10 \
--normalizeUsing CPM \
--effectiveGenomeSize $eGenomeSize \
--scaleFactor $scale_factor
Why do you need to normalize by CPM if you are already normalizing by spike-in control? Don't these two normalization steps conflict?
Thanks for your insight,
David
Hi, I am a biologist with a limited background in NGS analysis. And recently I did a bunch of bulk CUT&RUN seq and tried to analyze by myself. So glad to see this pipeline!
In my case, I was using the fragmentase to generate input control for each of my samples, but for this analysis, I don't know where to put this input sequence data.
Please indicate how to use the input control data and appreciate your response.
Best,
Chen
Hi @fl-yu,
Quick question;
I keep on getting the wrong spikein_normalized files for my samples. I believe the issue is generating from the --scaleFactor from $path_deeptools/bamCoverage in bulk-pipeline.sh
.
scale=$spikein_scale
scale_factor=`printf "%.0f" $(echo "$scale / $spikein_reads"|bc)`
My $scale is 10000 (the default) and $spikein_reads is 14638036. When the pipeline inserts these into the scale_factor equation, it just equals zero, and it uses 0 for normalization. My bw normalized file ends up being very small, and it doesn't look correct.
Is this what is suppose to do?
Thank you so much,
Jesus
Dear Fulong,
Sorry for bothering again. Now I have the right adaptor sequence fasta and added it to the programme. However, the logs seem to tell only first trimming happened. I got perl warning:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
I tried to fix it by adding the following to ~/.bashrc
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
After source ~/.bashrc, the waring from perl disappeared but still nothing coming up in trim2 directory. Therefore, no input to continue the analysis. There was also error for no version finding of bowtie2.
Our bioinformatic facility is also helping to solve this problem. But I think maybe you more about this issue.
Thanks for your help in advance!
Best,
Yuling
Hi,
I am having issues to run the tool with the single cell example dataset.
[info] 200 PE fastq files were detected ...
[info] First stage trimming ...
Thu Jun 9 10:00:24 AEST 2022
[info] Second stage trimming ...
Thu Jun 9 10:00:51 AEST 2022
[info] Aligning FASTQ files ...
Thu Jun 9 10:00:55 AEST 2022
[info] Alignment finished ...
Thu Jun 9 10:06:44 AEST 2022
[info] Sorting BAMs...
Thu Jun 9 10:06:44 AEST 2022
[info] Marking duplicates...
Thu Jun 9 10:07:52 AEST 2022
[info] Filtering unmapped, low quality, unproper paired fragments...
Thu Jun 9 10:15:11 AEST 2022
[info] Generating coverage .bed files used for single-cell genome track visualization ...
Thu Jun 9 10:15:13 AEST 2022
[info] Aggregation analysis of individual cells...
Thu Jun 9 10:28:24 AEST 2022
Thu Jun 9 10:28:24 2022 Processing the group [[ groups_aggregation ]] ...
Thu Jun 9 10:28:24 2022 Build the scbam_dir directory /home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells ...
Thu Jun 9 10:28:24 2022 Build the scPS_dir directory /home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/pseudo_bulk_data ...
Thu Jun 9 10:28:24 2022 200 Barcode bam files will be copied and merged ...
[info] single-cell track generating
/home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk # printed after adding pwd to file src/BASHscript/qbed.sh to check what was going on
/home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells # printed after adding pwd to file src/BASHscript/qbed.sh to check what was going on
[info] 1 files will be processed into a single-cell genome track file
awk: fatal: cannot open file `*.bed' for reading (No such file or directory)
It looks like the command bed_file=(`echo *.bed`)
in src/BASHscript/qbed.sh
is returning '*.bed' (hence causing the awk error shown above) because there are no bed files in the directory /home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells
, only bam files:
(base) [baldoni.p@med-n14 cutruntools2]$ ll scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/*.bed
ls: cannot access scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/*.bed: No such file or directory
(base) [baldoni.p@med-n14 cutruntools2]$ ll scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/*.bam | head
-rw-r--r-- 1 baldoni.p 1.9M Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204735.bam
-rw-r--r-- 1 baldoni.p 1008K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204736.bam
-rw-r--r-- 1 baldoni.p 1.1M Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204737.bam
-rw-r--r-- 1 baldoni.p 945K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204738.bam
-rw-r--r-- 1 baldoni.p 908K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204739.bam
-rw-r--r-- 1 baldoni.p 832K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204740.bam
-rw-r--r-- 1 baldoni.p 805K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204741.bam
-rw-r--r-- 1 baldoni.p 735K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204742.bam
-rw-r--r-- 1 baldoni.p 767K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204743.bam
-rw-r--r-- 1 baldoni.p 669K Jun 9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204744.bam
All the bed files are located in another directory:
(base) [baldoni.p@med-n14 cutruntools2]$ find . -name "*.bed" | head
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204735.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204736.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204737.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204738.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204739.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204740.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204741.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204742.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204743.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204744.bed
Could you please help?
It would be great if there was a way to leverage S3 for storing data.
when running KSEQ on paired-end fastq files with a read length of 150 that have already been trimmed by trimmomatic i am getting an output of fastq files where a majority of reads have a length of 50bp. Is this right? apologies as this is my introduction to any form of RNA sequencing pipelines.
Hi,
I after running the example data, I got no error. But I cannot find the get_cuts_single_locus.sh file. Could you help?
I noteiced in your bulk-config.json file your entry for java was:
"javabin": "/homes6/fulong/miniconda3/envs/dfci1/bin",
Where did the dfci conda environment come from ? I do not have this environment. Where should my javabind be directed to ?
Also, the memebin path is "memebin": "/homes6/fulong/miniconda3/envs/py3/bin",
I have a python3env conda environment, so I used that instead of py3.
And finally, for the bowtie index and genome.fa files. My guess is that you can put them anywhere you want. Is that true, or is there a particular folder that the bowtie index and genome.fa files should go ?
"bt2idx": "/gcdata/gcproj/fulong/Data/Genomes/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index",
"genome_sequence": "/gcdata/gcproj/fulong/Data/Genomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa",
Hi - thank you for making your software available to use.
I'm having trouble running the bulk example data, where I receive the error 'samview failed to read header' during the early part of the pipeline. The intermediate file /~output-folder/aligned/GATA1_D7_30min.bam is empty.
Is this an issue you've seen before? I am running the pipeline with your premade conda environment on an SGE computing cluster.
Full code below - thank you
(cutruntools2) [qmy094@rescomp1 CUT-RUNTools-2.0]$ ./run_bulkModule.sh /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/JSONs/testconfig.json GATA1_D7_30min
==================================== Bulk data analysis pipeline will run ==============================================================
## Input FASTQ folder: /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData
## Sample name: GATA1_D7_30min
## Workdir folder: /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min
## Experiment name: GATA1_D7_30min
## Experiment type: CUT&RUN
## Reference genome: hg38
## Spike-in genome: FALSE
## Spike-in normalization: FALSE
## Fragment 120 filtration: TRUE
=================================================================================================================================
[info] Input file is GATA1_D7_30min_R1_001.fastq.gz and GATA1_D7_30min_R2_001.fastq.gz
Wed Jun 22 15:56:37 BST 2022
[info] Trimming file GATA1_D7_30min ...
Wed Jun 22 15:56:37 BST 2022
[info] Use Truseq adaptor as default
[info] Second stage trimming GATA1_D7_30min ...
Wed Jun 22 15:56:38 BST 2022
[info] Aligning file GATA1_D7_30min to reference genome...
Wed Jun 22 15:56:38 BST 2022
[info] Bowtie2 command: --dovetail --phred33
[info] The dovetail mode is enabled [as parameter frag_120 is on]
[main_samview] fail to read the header from "-".
[info] FASTQ files won't be aligned to the spike-in genome
[info] Filtering unmapped fragments... GATA1_D7_30min.bam
Wed Jun 22 15:56:39 BST 2022
[main_samview] fail to read the header from "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/aligned/GATA1_D7_30min.bam".
[info] Sorting BAM... GATA1_D7_30min.bam
Wed Jun 22 15:56:39 BST 2022
INFO 2022-06-22 15:56:40 SortSam
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** SortSam -INPUT sorted/GATA1_D7_30min.step1.bam -OUTPUT sorted/GATA1_D7_30min.bam -SORT_ORDER coordinate -VALIDATION_STRINGENCY SILENT
**********
15:56:40.407 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs2/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/picard-2.8.0.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Jun 22 15:56:40 BST 2022] SortSam INPUT=sorted/GATA1_D7_30min.step1.bam OUTPUT=sorted/GATA1_D7_30min.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=SILENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Jun 22 15:56:40 BST 2022] Executing as [email protected] on Linux 3.10.0-1160.66.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_172-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.7-SNAPSHOT
INFO 2022-06-22 15:56:40 SortSam Finished reading inputs, merging and writing to output now.
[Wed Jun 22 15:56:40 BST 2022] picard.sam.SortSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
[info] Marking duplicates... GATA1_D7_30min.bam
Wed Jun 22 15:56:40 BST 2022
[info] Removing duplicates... GATA1_D7_30min.bam
Wed Jun 22 15:56:49 BST 2022
[info] Filtering to <120bp... dup.marked and dedup BAMs
Wed Jun 22 15:56:49 BST 2022
[info] Creating bam index files... GATA1_D7_30min.bam
Wed Jun 22 15:56:49 BST 2022
[info] Reads shifting
Wed Jun 22 15:56:49 BST 2022
[info] Your data won't be shifted as the experiment_type is specified as CUT&RUN...
[info] Peak calling using MACS2... GATA1_D7_30min.bam
[info] Logs are stored in /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/logs
Wed Jun 22 15:56:49 BST 2022
[info] Peak calling with BAM file with NO duplications
[info] macs2 narrow peak calling
[info] macs2 broad peak calling
[info] Getting broad peak summits
Traceback (most recent call last):
File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/get_summits_broadPeak.py", line 14, in <module>
f = open(sys.argv[1])
FileNotFoundError: [Errno 2] No such file or directory: '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.broad/GATA1_D7_30min_peaks.broadPeak'
[info] SEACR stringent peak calling
Traceback (most recent call last):
File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/change.bdg.py", line 12, in <module>
f = open(sys.argv[1])
FileNotFoundError: [Errno 2] No such file or directory: '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/seacr/GATA1_D7_30min_treat_pileup.bdg'
Calling enriched regions without control file
Proceeding without normalization of control to experimental bedgraph
Using stringent threshold
Creating experimental AUC file: Wed Jun 22 15:57:01 BST 2022
Calculating optimal AUC threshold: Wed Jun 22 15:57:01 BST 2022
Using user-provided threshold: Wed Jun 22 15:57:01 BST 2022
Error in read.table(argsL$exp) : no lines available in input
Execution halted
Unable to access /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/seacr/GATA1_D7_30min_treat.stringent.bed
Traceback (most recent call last):
File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/get_summits_seacr.py", line 14, in <module>
f = open(sys.argv[1])
FileNotFoundError: [Errno 2] No such file or directory: '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/seacr/GATA1_D7_30min_treat.stringent.bed'
[info] Generating the normalized signal file with BigWig format...
Wed Jun 22 15:57:03 BST 2022
cp: cannot stat '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.narrow/GATA1_D7_30min.cpm.norm.bw': No such file or directory
cp: cannot stat '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.narrow/GATA1_D7_30min.cpm.norm.bw': No such file or directory
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.narrow/GATA1_D7_30min_peaks.narrowPeak
cat: GATA1_D7_30min_peaks.narrowPeak: No such file or directory
cat: GATA1_D7_30min_summits.bed: No such file or directory
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
Warning: the index file is older than the FASTA file.
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Log::Log4perl configuration looks suspicious: No loggers defined at /gpfs2/well/jknight/users/qmy094/software/conda/skylake/envs/meme/lib/site_perl/5.26.2/Log/Log4perl/Config.pm line 325.
Starting getsize: getsize random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa 1> $metrics
getsize ran successfully in 0.043017 seconds
Starting fasta-most: fasta-most -min 50 < random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa 1> $metrics
fasta-most ran successfully in 0.499825 seconds
Starting fasta-center: fasta-center -dna -len 100 < random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa 1> random1000/MEME_GATA1_D7_30min_shuf/seqs-centered
fasta-center ran successfully in 0.142205 seconds
Starting fasta-shuffle-letters: fasta-shuffle-letters random1000/MEME_GATA1_D7_30min_shuf/seqs-centered random1000/MEME_GATA1_D7_30min_shuf/seqs-shuffled -kmer 2 -tag -dinuc -dna -seed 1
fasta-shuffle-letters ran successfully in 0.026512 seconds
Starting fasta-get-markov: fasta-get-markov -nostatus -nosummary -dna -m 1 random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa random1000/MEME_GATA1_D7_30min_shuf/background
fasta-get-markov ran successfully in 0.034755 seconds
Starting meme: meme random1000/MEME_GATA1_D7_30min_shuf/seqs-centered -oc random1000/MEME_GATA1_D7_30min_shuf/meme_out -mod zoops -nmotifs 10 -minw 6 -maxw 30 -bfile random1000/MEME_GATA1_D7_30min_shuf/background -dna -revcomp -nostatus
No sequences found in file `random1000/MEME_GATA1_D7_30min_shuf/seqs-centered'. Check file format.
meme exited with error code 1
Starting dreme: dreme -verbosity 1 -oc random1000/MEME_GATA1_D7_30min_shuf/dreme_out -png -dna -p random1000/MEME_GATA1_D7_30min_shuf/seqs-centered -n random1000/MEME_GATA1_D7_30min_shuf/seqs-shuffled -m 10
File "/well/jknight/users/qmy094/software/conda/skylake/envs/meme/bin/dreme", line 765
print "Finding secondary RE in left flank..."
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Finding secondary RE in left flank...")?
dreme exited with error code 1
Starting meme-chip_html_to_tsv: meme-chip_html_to_tsv random1000/MEME_GATA1_D7_30min_shuf/meme-chip.html random1000/MEME_GATA1_D7_30min_shuf/summary.tsv "meme-chip -oc random1000/MEME_GATA1_D7_30min_shuf -dreme-m 10 -meme-nmotifs 10 random1000/padded.fa/GATA1_D7_30min_summits_padded.fa" 5.0.5 "Mon Mar 18 20\:12\:19 2019 -0700"
meme-chip_html_to_tsv ran successfully in 0.819111 seconds
[info] De Novo motifs can be found: random1000/MEME_GATA1_D7_30min_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/read.meme.py", line 94, in <module>
dreme_matrices = read_dreme(this_dir + "/dreme_out/dreme.txt")
File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/read.meme.py", line 47, in read_dreme
f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_GATA1_D7_30min_shuf/dreme_out/dreme.txt'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_GATA1_D7_30min_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
Warning: the index file is older than the FASTA file.
[info] Scaning the De Novo motifs for each peak
ls: cannot access random1000/MEME_GATA1_D7_30min_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/GATA1_D7_30min
#
# Congrats! The bulk data analysis is complete!
and here is my config.json
{
"software_config": {
"Rscriptbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"pythonbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"perlbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"javabin": "/apps/well/java/jdk1.8.0_latest/bin",
"bowtie2bin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"samtoolsbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"macs2bin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"memebin": "/well/jknight/users/qmy094/software/conda/skylake/envs/meme/bin",
"bedopsbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"bedtoolsbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"path_deeptools": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"path_parallel": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"path_tabix": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
"bt2idx": "/gpfs2/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/assemblies/hg38",
"genome_sequence": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/assemblies/hg38/hg38.fa",
"spike_in_bt2idx": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/assemblies/Escherichia_coli_K_12_DH10B",
"spike_in_sequence": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/assemblies/Escherichia_coli_K_12_DH10B/Escherichia_coli_K_12_DH10B.fna",
"extratoolsbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install",
"extrasettings": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install",
"kseqbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install",
"adapterpath": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/adapters",
"trimmomaticbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install",
"picardbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install",
"picardjarfile": "picard-2.8.0.jar",
"trimmomaticjarfile": "trimmomatic-0.36.jar",
"makecutmatrixbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install"
},
"input_output": {
"fastq_directory": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData",
"workdir": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output",
"fastq_sequence_length": "42",
"organism_build": "hg38",
"spike_in": "FALSE",
"spike_in_norm": "FALSE",
"spikein_scale": "10000",
"frag_120": "TRUE",
"peak_caller": "macs2",
"dup_peak_calling": "FALSE",
"cores": "8",
"experiment_type": "CUT&RUN"
},
"motif_finding": {
"num_bp_from_summit": "100",
"num_peaks": "1000",
"total_peaks": "2000",
"motif_scanning_pval": "0.0005",
"num_motifs": "10"
}
}
I am using the Arabidopsis bowtie2 index and the Arabdiopsis genome.fa and have the correct paths to these files in my bulk-config.json file, but when I run it, it quickly stops with just the error message: organism_build should be one of hg38, hg19, mm10 or mm9.
Is it possible to run CUT-RUNTools with Arabidopsis ?
I found a blacklist for Arabidopsis, and I can make a chromosome sizes file. If I tweak the run_bulkModule.sh script to use these files, would this work, or are there any other issues that would come up ?
Hi fl-yu,
I have a question with respect to the output in the peak analysis folder. What is the input for generating the heatmap such as Fig. 2e in the original CUT-RUNTools paper? and the src/bulk/haystack_motifs.sh is the script to create it?
Thanks
Hello Fulong,
There are 2 error when I run bulk CUT&RUNTools2.1 pipeline with example data:
Could you give some advice to me?
Thanks for your kindness.
Yan
Hi,
Thanks for your effort to develop such a comprehensive and useful tool, but I always got a less number of peaks when using C&RTools than other peakcalling software.
I used to use Docker to run C&RTools2.0 and I found the BAM files generated by Bowtie2 are similar to BAM generated by myself manually. However, after the 'dedup', 'dedup.120bp' and 'dedup.120bp.shitf', the size of BAM file decreases a lot. So I try to set 'frag_120: FALSE' but encountered such error.
Hi,
I am running this pipeline. MEME is working based on your previous discussion with @qiyubio. However, I have got this error.
/home/shikan/miniconda3/envs/cutruntools2/bin/gff2bed: line 132: convert2bed: command not found
I have checked under my cutruntools2 conda environment, gff2bed and conver2bed are existing. How do I fix this issue.
Thanks,
Shikan
Hello, while attempting to run the validate.py script in the installs directory with python 3.6.0, I got this tab error:
File "/opt/CUT-RUNTools-2.0/install/validate.py", line 28
return False
^
TabError: inconsistent use of tabs and spaces in indentation
I was able to fix it with this sed command to replace each tab with 4 spaces:
sed -i 's/\t/ /g' validate.py
Hope this is helpful.
Thanks,
Neil
Hi,
I cloned the repository and hopefully satisfied all the software installation requirements.
I ran ./install/validate.py config/bulk-config.json --ignore-input-output --software
without any errors.
But when I run:
./run_bulkModule.sh config/bulk-config.json GATA1_D7_30min_chr11
I get:
./run_bulkModule.sh: line 20: jq: command not found
./run_bulkModule.sh: line 21: jq: command not found
./run_bulkModule.sh: line 22: jq: command not found
organism_build should be one of hg38, hg19, mm10 or mm9
Should I download and install jq separately?
Attached is my bulk-config.json file.
bulk-config.txt
Some tips appreciated, thanks!
Hi Fulong,
I am a PHD student of Prof. Liu nan, one of the co-authors of CUT&RUNtools 1.0. For testing whether the CUT&RUNtools 2.0 work, I downloaded the pair-end FASTQ file (GSE104550, 25bp per end) from the assays of H3K27me3 CUT&RUN. I disabled the frag_120 parameter ("frag_120": "FALSE",). Then I ran the bulk pipeline and found that there is no read in trimmed2 can be aligned to the reference: hg19. Additionally, I aligned the reads in the trimmed2 to ref genome manually and there is no alignment too. However, I found that if I remove the parameter "--very-sensitive-local", there would be some expected results. So, could you give some advice to me for this problem? I am looking forward your response.
Yan
Line 84 of the bash script bulk-pipeline.sh
seems spurious, and causes the bash script to output an error:
/cache/home/username/environments/cutruntools/repos/CUT-RUNTools-2.0/src/bulk/bulk-pipeline.sh: line 84: /scratch/username/projects/my_project/workdir/428/logs/428.spikein.bowtie2: Permission denied
I think this line should be removed?
Thank you for all the work you put on this pipeline.
I hope you can help me define some parameters. I am running the pipeline for bulk Cut and Tag samples that were sequenced pairs end with 150Cycle kit.
What would the fastq_sequence_length be?
What does frag_120 do ? if you do FALSE what happens to the bam file ? and if you do TRUE what happens ?
The initial step of adaptor trimming, trimomatic assumed my fastq files had Truseq adapters even after I specify CUT&Tag as the experiment_type. How can I change trimomatic to remove Nextera adaptors instead ?
Hi, fl
I am creating a new issue here, I was able to bypass the bowtie2 index problem by changing the "hg" to "genome". Unfortunately, still having a little problem with the rest of the code. Please see below which I copied directly from my terminal, as there is no error message, however, I did reach the end telling me that the analysis was successful w/o any files in the fimo folder, my symster is MacOS Mojave version 10.14.6 not sure if this information is helpful. Thanks a lot!
[info] Generating the normalized signal file with BigWig format...
Sun Feb 7 21:47:45 EST 2021
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /Volumes/Backup/CUT-RUNTools-2.0-master/test/GATA1_D7_30min_chr11/peakcalling/macs2.narrow/GATA1_D7_30min_chr11_peaks.narrowPeak
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Log::Log4perl configuration looks suspicious: No loggers defined at /Users/kunhuaqin/miniconda3/envs/meme/lib/site_perl/5.26.2/Log/Log4perl/Config.pm line 325.
Starting getsize: getsize random1000/MEME_GATA1_D7_30min_chr11_shuf/GATA1_D7_30min_chr11_summits_padded.fa 1> $metrics
dyld: Library not loaded: @rpath/libicui18n.58.dylib
Referenced from: /Users/kq2012/miniconda3/envs/meme/bin/getsize
Reason: image not found
getsize process died with signal 6, without coredump
getsize failed me... at /Users/kq2012/miniconda3/envs/meme/bin//meme-chip line 740.
[info] De Novo motifs can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
File "/Volumes/Backup/CUT-RUNTools-2.0-master/install/read.meme.py", line 92, in
ss = read_summary(this_dir + "/summary.tsv")
File "/Volumes/Backup/CUT-RUNTools-2.0-master/install/read.meme.py", line 7, in read_summary
f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_GATA1_D7_30min_chr11_shuf/summary.tsv'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Scaning the De Novo motifs for each peak
ls: random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/GATA1_D7_30min_chr11
Dear team of CUT RUN tools 2.0
I'm trying to use your software to analyze my CR data. I'm running on MacOS Big Sur.
I created the conda env, managed the dependancies, ddl and build the index. However, I'm not able to find the validate.py script.
I did find it in the previous version of the program, but it doesn't seem to work with crtools2.0
I followed strictly the instructions in install.md after I cloned your repository.
I set all the variables in the json file.
Can you add a little precision on how I can validate the installation before pursuing ?
Thank you very much,
David
here's my env. my shell is zsh.
(cutruntools2.1) XXX@XXX CUT-RUNTools-2.0 % conda list
_r-mutex 1.0.1 anacondar_1 conda-forge
bedops 2.4.39 h770b8ee_0 bioconda
bedtools 2.30.0 haa7f73a_1 bioconda
bowtie2 2.4.2 py36h6343656_2 bioconda
bwidget 1.9.14 h694c41f_0 conda-forge
bzip2 1.0.8 h0d85af4_4 conda-forge
c-ares 1.17.1 h0d85af4_1 conda-forge
ca-certificates 2020.12.5 h033912b_0 conda-forge
cairo 1.16.0 he43a7df_1008 conda-forge
cctools_osx-64 949.0.1 h6407bdd_21 conda-forge
certifi 2020.12.5 py36h79c6626_1 conda-forge
clang 11.1.0 h694c41f_0 conda-forge
clang-11 11.1.0 default_he082bbe_0 conda-forge
clang_osx-64 11.1.0 hb91bd55_1 conda-forge
clangxx 11.1.0 default_he082bbe_0 conda-forge
clangxx_osx-64 11.1.0 h7e1b574_1 conda-forge
compiler-rt 11.1.0 h654b07c_0 conda-forge
compiler-rt_osx-64 11.1.0 h8c5fa43_0 conda-forge
curl 7.76.1 h06286d4_1 conda-forge
cycler 0.10.0 py_2 conda-forge
deeptools 3.5.1 py_0 bioconda
deeptoolsintervals 0.1.9 py36ha714b87_3 bioconda
expat 2.2.10 h1c7c35f_0 conda-forge
fontconfig 2.13.1 h10f422b_1005 conda-forge
freetype 2.10.4 h4cff582_1 conda-forge
fribidi 1.0.10 hbcb3906_0 conda-forge
gettext 0.19.8.1 h7937167_1005 conda-forge
gfortran_impl_osx-64 9.3.0 h9cc0e5e_22 conda-forge
gfortran_osx-64 9.3.0 h18f7dce_14 conda-forge
ghostscript 9.18 1 bioconda/label/cf201901
gmp 6.2.1 h2e338ed_0 conda-forge
graphite2 1.3.13 h2e338ed_1001 conda-forge
gsl 2.6 h71c5fe9_2 conda-forge
harfbuzz 2.8.0 h159f659_1 conda-forge
htslib 1.12 hc38c3fb_1 bioconda
icu 68.1 h74dc148_0 conda-forge
isl 0.22.1 hb1e8313_2 conda-forge
jpeg 9d hbcb3906_0 conda-forge
kiwisolver 1.3.1 py36h615c93b_1 conda-forge
krb5 1.17.2 h60d9502_0 conda-forge
lcms2 2.12 h577c468_0 conda-forge
ld64_osx-64 530 he8994da_21 conda-forge
ldid 2.1.2 h7660a38_2 conda-forge
libblas 3.9.0 8_openblas conda-forge
libcblas 3.9.0 8_openblas conda-forge
libclang-cpp11.1 11.1.0 default_he082bbe_0 conda-forge
libcurl 7.76.1 h8ef9fac_1 conda-forge
libcxx 11.1.0 habf9029_0 conda-forge
libdeflate 1.7 h35c211d_5 conda-forge
libedit 3.1.20191231 h0678c8f_2 conda-forge
libev 4.33 haf1e3a3_1 conda-forge
libffi 3.3 h046ec9c_2 conda-forge
libgfortran 5.0.0 9_3_0_h6c81a4c_22 conda-forge
libgfortran-devel_osx-64 9.3.0 h6c81a4c_22 conda-forge
libgfortran5 9.3.0 h6c81a4c_22 conda-forge
libglib 2.68.1 hd556434_0 conda-forge
libiconv 1.16 haf1e3a3_0 conda-forge
liblapack 3.9.0 8_openblas conda-forge
libllvm11 11.1.0 hd011deb_2 conda-forge
libnghttp2 1.43.0 h07e645a_0 conda-forge
libopenblas 0.3.12 openmp_h54245bb_1 conda-forge
libpng 1.6.37 h7cec526_2 conda-forge
libssh2 1.9.0 h52ee1ee_6 conda-forge
libtiff 4.2.0 h355d032_0 conda-forge
libwebp-base 1.2.0 h0d85af4_2 conda-forge
libxml2 2.9.10 h93ec3fd_4 conda-forge
libxslt 1.1.33 h5739fc3_2 conda-forge
llvm-openmp 11.1.0 hda6cdc1_1 conda-forge
llvm-tools 11.1.0 hd011deb_2 conda-forge
lz4-c 1.9.3 h046ec9c_0 conda-forge
macs2 2.2.7.1 py36ha714b87_2 bioconda
make 4.3 h22f3db7_1 conda-forge
matplotlib-base 3.3.4 py36h4ea959b_0 conda-forge
meme 4.12.0 py36pl526hd869df4_2 bioconda/label/cf201901
mpc 1.1.0 ha57cd0f_1009 conda-forge
mpfr 4.0.2 h72d8aaf_1 conda-forge
ncurses 6.2 h2e338ed_4 conda-forge
numpy 1.19.5 py36h08dc641_1 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openjpeg 2.4.0 h6cbf5cd_0 conda-forge
openssl 1.1.1k h0d85af4_0 conda-forge
pango 1.48.4 ha05cd14_0 conda-forge
parallel 20210222 h694c41f_0 conda-forge
pcre 8.44 hb1e8313_0 conda-forge
pcre2 10.36 h5cf9962_1 conda-forge
perl 5.26.2 hbcb3906_1008 conda-forge
perl-app-cpanminus 1.7044 pl526_1 bioconda/label/cf201901
perl-carp 1.38 pl526_1 bioconda/label/cf201901
perl-cgi 4.40 pl526h470a237_0 bioconda/label/cf201901
perl-constant 1.33 pl526_1 bioconda/label/cf201901
perl-exporter 5.72 pl526_1 bioconda/label/cf201901
perl-extutils-makemaker 7.34 pl526_3 bioconda/label/cf201901
perl-file-path 2.15 pl526_0 bioconda/label/cf201901
perl-file-temp 0.2304 pl526_2 bioconda/label/cf201901
perl-html-parser 3.72 pl526h2d50403_4 bioconda/label/cf201901
perl-html-tagset 3.20 pl526_3 bioconda/label/cf201901
perl-html-template 2.97 pl526_1 bioconda/label/cf201901
perl-html-tree 5.07 pl526_0 bioconda/label/cf201901
perl-parent 0.236 pl526_1 bioconda/label/cf201901
perl-scalar-list-utils 1.45 pl526h470a237_3 bioconda/label/cf201901
perl-xml-namespacesupport 1.12 pl526_0 bioconda/label/cf201901
perl-xml-parser 2.44 pl526h3a4f0e9_6 bioconda/label/cf201901
perl-xml-sax 1.00 pl526_0 bioconda/label/cf201901
perl-xml-sax-base 1.09 pl526_0 bioconda/label/cf201901
perl-xml-sax-expat 0.51 pl526_2 bioconda/label/cf201901
perl-xml-simple 2.25 pl526_0 bioconda/label/cf201901
perl-xsloader 0.24 pl526_0 bioconda/label/cf201901
perl-yaml 1.27 pl526_0 bioconda/label/cf201901
pillow 8.1.2 py36h154fef6_1 conda-forge
pip 21.0.1 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 hbcb3906_0 conda-forge
plotly 4.14.3 pyh44b312d_0 conda-forge
py2bit 0.3.0 py36ha714b87_5 bioconda
pybigwig 0.3.18 py36hc3e6b37_1 bioconda
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pysam 0.16.0.1 py36h71aea8d_3 bioconda
python 3.6.13 h7728216_0_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.6 1_cp36m conda-forge
r-base 4.0.3 hb6e1b8c_8 conda-forge
readline 8.1 h05e3726_0 conda-forge
retrying 1.3.3 py_2 conda-forge
samtools 1.12 hfcfc997_1 bioconda
scipy 1.5.3 py36h04de62b_0 conda-forge
setuptools 49.6.0 py36h79c6626_3 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.35.4 h44b9ce1_0 conda-forge
tabix 0.2.6 ha92aebf_0 bioconda
tapi 1100.0.11 h9ce4665_0 conda-forge
tbb 2020.2 h940c156_4 conda-forge
tk 8.6.10 h0419947_1 conda-forge
tktable 2.10 h49f0cf7_3 conda-forge
tornado 6.1 py36h20b66c6_1 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
xz 5.2.5 haf1e3a3_1 conda-forge
yaml 0.2.5 haf1e3a3_0 conda-forge
zlib 1.2.11 h7795811_1010 conda-forge
zstd 1.4.9 h582d3a0_0 conda-forge
Hi, I tried to install patched version of Atactk by following installation Instructions but failed. The follwing is the error information when running the atactk.install.sh script :
patch -p0 -N --dry-run --silent make_cut_matrix < make_cut_matrix.patch 2> /dev/null
2 out of 2 hunks FAILED
patch -p0 -N --dry-run --silent metrics.py < metrics.py.patch 2> /dev/null
3 out of 3 hunks FAILED
Could anyone please tell me how to fix this problem ? Thanks!
Dear all,
I couldn't find the adaptors of my samples in the adaptor directory. I used NEBNext® Multiplex Oligos for Illumina. Does it require a new fasta?
Thanks for your help in advance!
Best,
Yuling
`
==================================== Bulk data analysis pipeline will run ==============================================================
=================================================================================================================================
[info] Input file is CLP1_293T_S2_R1_001.fastq.gz and CLP1_293T_S2_R2_001.fastq.gz
Wed Sep 29 23:52:40 CST 2021
[info] Trimming file CLP1_293T_S2 ...
Wed Sep 29 23:52:51 CST 2021
[info] Use Truseq adaptor as default
[info] Second stage trimming CLP1_293T_S2 ...
Thu Sep 30 00:38:43 CST 2021
[info] Aligning file CLP1_293T_S2 to reference genome...
Thu Sep 30 01:09:09 CST 2021
[info] Bowtie2 command: --very-sensitive-local --phred33 -I 10 -X 700
[info] The dovetail mode is off [as parameter frag_120 is off]
[info] FASTQ files won't be aligned to the spike-in genome
[info] Filtering unmapped fragments... CLP1_293T_S2.bam
Thu Sep 30 01:25:56 CST 2021
[info] Sorting BAM... CLP1_293T_S2.bam
Thu Sep 30 01:38:37 CST 2021
INFO 2021-09-30 01:39:09 SortSam
********** NOTE: Picard's command line syntax is changing.
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
********** The command line looks like this in the new syntax:
********** SortSam -INPUT sorted/CLP1_293T_S2.step1.bam -OUTPUT sorted/CLP1_293T_S2.bam -SORT_ORDER coordinate -VALIDATION_STRINGENCY SILENT
01:39:37.011 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/home/mosta/CUT-RUNTools-2.0/install/picard-2.8.0.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Sep 30 01:39:37 CST 2021] SortSam INPUT=sorted/CLP1_293T_S2.step1.bam OUTPUT=sorted/CLP1_293T_S2.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=SILENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Sep 30 01:39:37 CST 2021] Executing as mosta@s006 on Linux 3.10.0-862.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.7-SNAPSHOT
INFO 2021-09-30 01:39:37 SortSam Seen many non-increasing record positions. Printing Read-names as well.
INFO 2021-09-30 01:41:35 SortSam Read 10,000,000 records. Elapsed time: 00:01:57s. Time for last 10,000,000: 117s. Last read position: chr6:6,079,001. Last read name: M01057:324:000000000-JD6K3:1:2103:22747:14904
INFO 2021-09-30 01:43:11 SortSam Read 20,000,000 records. Elapsed time: 00:03:34s. Time for last 10,000,000: 96s. Last read position: chr12:27,239,374. Last read name: M01057:324:000000000-JD6K3:1:1106:11883:14869
INFO 2021-09-30 01:43:59 SortSam Finished reading inputs, merging and writing to output now.
INFO 2021-09-30 01:48:17 SortSam Wrote 10,000,000 records from a sorting collection. Elapsed time: 00:08:40s. Time for last 10,000,000: 256s. Last read position: chr1:194,534,006
INFO 2021-09-30 01:51:58 SortSam Wrote 20,000,000 records from a sorting collection. Elapsed time: 00:12:21s. Time for last 10,000,000: 220s. Last read position: chr7:11,711,635
[Thu Sep 30 01:53:24 CST 2021] picard.sam.SortSam done. Elapsed time: 13.80 minutes.
Runtime.totalMemory()=8648654848
[info] Marking duplicates... CLP1_293T_S2.bam
Thu Sep 30 01:53:28 CST 2021
[info] Removing duplicates... CLP1_293T_S2.bam
Thu Sep 30 02:31:46 CST 2021
[info] Using all the qualified fragments NOT filtering <120bp... CLP1_293T_S2.bam
Thu Sep 30 02:39:16 CST 2021
[info] Creating bam index files... CLP1_293T_S2.bam
Thu Sep 30 02:39:16 CST 2021
[info] Reads shifting
Thu Sep 30 02:46:52 CST 2021
[info] Your data won't be shifted as the experiment_type is specified as CUT&RUN...
[info] Peak calling using MACS2... CLP1_293T_S2.bam
[info] Logs are stored in /public/home/mosta/cut_run/HEK293_Nov23_2020/results//logs
Thu Sep 30 02:46:53 CST 2021
[info] Peak calling with BAM file with NO duplications
[info] macs2 narrow peak calling
[info] macs2 broad peak calling
[info] Getting broad peak summits
[info] SEACR stringent peak calling
Calling enriched regions without control file
Proceeding without normalization of control to experimental bedgraph
Using stringent threshold
Creating experimental AUC file: Thu Sep 30 03:51:25 CST 2021
Calculating optimal AUC threshold: Thu Sep 30 03:51:27 CST 2021
Using user-provided threshold: Thu Sep 30 03:51:27 CST 2021
Creating thresholded feature file: Thu Sep 30 03:53:25 CST 2021
Empirical false discovery rate = 0.01
Merging nearby features and eliminating control-enriched features: Thu Sep 30 03:53:25 CST 2021
Removing temporary files: Thu Sep 30 03:53:25 CST 2021
Done: Thu Sep 30 03:53:25 CST 2021
[info] Generating the normalized signal file with BigWig format...
Thu Sep 30 03:53:26 CST 2021
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /public/home/mosta/cut_run/HEK293_Nov23_2020/results//peakcalling/macs2.narrow/CLP1_293T_S2_peaks.narrowPeak
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Unknown option: dreme-m
The sequences specified do not exist.
meme-chip [options] [-db ]*
Options:
-o
MEME Specific Options:
-meme-brief : reduce size of MEME output files if more than
: primary sequences
-meme-mod [oops|zoops|anr]: sites used in a single sequence
-meme-nmotifs : maximum number of motifs to find; default: 3
: if =0, MEME will not be run
-meme-minsites : minimum number of sites per motif
-meme-maxsites : maximum number of sites per motif
-meme-p : use parallel version with processors
-meme-pal : look for palindromes only
-meme-searchsize : the maximum portion of the primary sequences (in characters)
: used for motif search; MEME's running time increases as
: roughly the square of
-meme-nrand : MEME should not randomize sequence order
STREME Specific Options:
-streme-pvt : stop if hold-out set p-value greater than
-streme-nmotifs : maximum number of motifs to find; overrides -streme-pvt;
: if =0, STREME will not be run
CentriMo Specific Options:
-centrimo-local : compute enrichment of all regions (not only central)
-centrimo-score : set the minimum allowed match score
-centrimo-maxreg : set the maximum region size to be considered
-centrimo-ethresh : set the E-value threshold for reporting
-centrimo-noseq : don't store sequence IDs in the output
-centrimo-flip : reflect matches on reverse strand around center
SpaMo Specific Options:
-spamo-skip : don't run SpaMo
FIMO Specific Options:
-fimo-skip : don't run FIMO
[info] De Novo motifs can be found: random1000/MEME_CLP1_293T_S2_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
File "/public/home/mosta/CUT-RUNTools-2.0/install/read.meme.py", line 92, in
ss = read_summary(this_dir + "/summary.tsv")
File "/public/home/mosta/CUT-RUNTools-2.0/install/read.meme.py", line 7, in read_summary
f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_CLP1_293T_S2_shuf/summary.tsv'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_CLP1_293T_S2_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Scaning the De Novo motifs for each peak
ls: cannot access random1000/MEME_CLP1_293T_S2_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/CLP1_293T_S2
`
Dear author,
Thank you for making this tool. Not really an issue, but want to ask a question here:
How to interpret the parameter "fastq_sequence_length" in the config file? It seems it is only used by the "kseq_test" step. The default value is 42, what does it mean?
Thanks!
Hi,
I'm having issues with the MEME analysis for de novo motif finding.
Could you help me with what the issue is?
Thanks for the amazing tool!
slurm-97220.out.txt
Originally posted by @hchintalapudi in #14 (comment)
any(! pkg %in% rownames(installed.packages())))
should be:
all(pkg %in% rownames(installed.packages()))
On lines 52 and 58 of bulk-pipeline.sh, the index file for bowtie2 alignment is specified as -x $bt2idx/genome
.
This causes alignment to fail, because there is no index file with the prefix genome
. I'm using the mm10 reference genome, and the prefix should be -x $bt2idx/mm10
. Should genome
be a variable that comes from the config file instead?
Hi, I've been trying to normalize with spike in, setting "spike_in": "TRUE". This doesn't seem to work however, and there is no spike_in output directory created.
Should I be setting "spike_in" as TRUE? or something else?
Hi,
I have a question about spikein_scale. I have a cut_and_tag data that do not have spike-in genome. Also, for spike_in_align and spike_in_norm, I chose false. I see that the spikein_scale equation is spikein_scale/spikein_reads. How is the data normalized, if I don't have any spike_in reads?
Thanks,
Shikan
Hi, I'm a biologist with no data science background. I was trying to analyze my own CUT&RUN data, but have difficulty installing the CUT&RUNTools 2.0 pipeline. I followed the installation guide and installed Python, R, Java, Perl, GCC, and Anaconda, and tried to run the installation codes in Powershell Prompt from Anaconda (Windows 10). But whenever I run the codes which start with "source" (e.g. source atactk.install.sh), I get an error message like this:
(cutruntools2.1) PS C:\Users\jingnie> source atactk.install.sh
source : The term 'source' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try
again.
At line:1 char:1
+ CategoryInfo : ObjectNotFound: (source:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
My university cluster system did not have jq
command installed. It was easy to install myself, but you should note this as a dependency in the install instructions.
Hello Fulong,I am runing the pipeline using the example data.
I got the strand error from the computing cluster when I run the command "run_bulkModule.sh ../CUT-RUNTools-2.0/config/bulk-config.json GATA1_D7_30min_chr11":
......
Starting dreme: dreme -verbosity 1 -oc random1000/MEME_GATA1_D7_30min_chr11_shuf/dreme_out -png -dna -p random1000/MEME_GATA1_D7_30mi
n_chr11_shuf/seqs-centered -n random1000/MEME_GATA1_D7_30min_chr11_shuf/seqs-shuffled -m 10
File "/public/home/liunangroup/liangyan/miniconda3/envs/meme/bin/dreme", line 765
print "Finding secondary RE in left flank...
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Finding secondary RE in left flank...")?
dreme exited with error code 1
Starting centrimo: centrimo -seqlen 201 -verbosity 1 -oc random1000/MEME_GATA1_D7_30min_chr11_shuf/centrimo_out -bfile random1000/MEME_GATA1_D7_30min_chr11_shuf/background random1000/MEME_GATA1_D7_30min_chr11_shuf/GATA1_D7_30min_chr11_summits_padded.fa random1000/MEME_GATA1_D7_30min_chr11_shuf/meme_out/meme.xml
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
FATAL: Template does not contain data section.
centrimo exited with error code 1
Starting tomtom: tomtom -verbosity 1 -text -thresh 0.1 random1000/MEME_GATA1_D7_30min_chr11_shuf/combined.meme random1000/MEME_GATA1_D7_30min_chr11_shuf/combined.meme 1> random1000/MEME_GATA1_D7_30min_chr11_shuf/motif_alignment.txt
tomtom ran successfully in 0.103372 seconds
......
Additionally, I get this strand output:
......
ri Dec 24 13:43:02 CST 2021
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /public/home/liunangroup/liangyan/pipeline/CUT-RUNTest/GATA1_D7_30min_chr11/peakcalling/macs2.narrow/GATA1_D7_30min_chr11_peaks.narrowPeak
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Log::Log4perl configuration looks suspicious: No loggers defined at /public/home/liunangroup/liangyan/miniconda3/envs/meme/lib/site_perl/5.26.2/Log/Log4perl/Config.pm line 325.
[info] De Novo motifs can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
File "/public/home/liunangroup/liangyan/pipeline/CUT-RUNTools-2.0/install/read.meme.py", line 94, in
dreme_matrices = read_dreme(this_dir + "/dreme_out/dreme.txt")
File "/public/home/liunangroup/liangyan/pipeline/CUT-RUNTools-2.0/install/read.meme.py", line 47, in read_dreme
f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_GATA1_D7_30min_chr11_shuf/dreme_out/dreme.txt'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Scaning the De Novo motifs for each peak
ls: cannot access random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/GATA1_D7_30min_chr11
......
I think the problem is in the text I marked, but how should I deal with it?
Thanks Yan.
Hello @fl-yu,
Do you have any suggestions on how to run this pipeline on an HPC using SLURM?
I keep on getting OMPI errors. For example: OPAL ERROR: Not initialized in file pmix2x_client.c at line 112
Any help is greatly appreciated.
Thank you,
Jesus
Hi,
First of all, thank you for sharing the pipeline! I'm new to this analysis and this pipeline helps me a lot.
I've been running this pipeline section by section. While running the motif discovery section in bulk-pipeline.sh, I found that 'num_peaks' and 'total_peaks' don't seem to be assigned in the script. I was wondering if they're missing or are they somewhere in the pipeline that I missed?
Thank you!
Hi, Fulong
As expect, after reads shifting mapping result of CUT&Tag, the fragment length will reduced 9bp, thus there are fragments less than 120bp (frag_120=TRUE). I readed the code of bulk-pipeline.sh and was interested in why CUT&RUNTools2 filter fragmeng length before shifting instead of after reads shifting. Actually, the fragment after shifting is the real fragment, right?
Yan
The "validate.py" script is not executable so this step from the documentation fails:
./validate.py bulk-config.json --ignore-input-output --software
There are a few other scripts that are also set to "0644" within the install
directory:
./install/git/atactk/atactk/data.py
./install/git/atactk/atactk/metrics.py
./install/git/atactk/atactk/util.py
./install/git/atactk/atactk/command.py
./install/git/atactk/build/lib/atactk/util.py
./install/git/atactk/build/lib/atactk/command.py
./install/git/atactk/build/lib/atactk/data.py
./install/git/atactk/build/lib/atactk/metrics.py
./install/validate.py
./install/fix_sequence.py
./install/read.meme.py
In contrast, these scripts are properly set to "0755":
./install/git/atactk/docs/conf.py
./install/git/atactk/setup.py
./install/git/atactk/tests/test_atactk.py
./install/get_summits_broadPeak.py
./install/get_summits_seacr.py
./install/check_coordinate.py
./install/change.bdg.py
Thank you for sharing this great tool @fl-yu!
In the assemblies.install.sh
header it states that we need a sample.bed in the current directory. Can you please elaborate a little more? Is this sample.bed file provided by the pipeline or do we provide a blank file named sample.bed?
I will be using hg38.
Here is the line of code from assemblies.install.sh
that needs this file.
$bedtoolsbin/bedtools getfasta -fi ${organism_build}.fa -bed ../../sample.bed
Thank you,
Jesus
Hey, I am using the CUT-RUNTools2.0 to do some tests and I encountered some problems at the beginning. I think I correctly all the required softwares and correctly set them in the bulk-config.json file. And I got an empty line following your guide which means the validate.py script runs without errors. However, when I try to do the bulk analysis, there is an error. Here is my code and error:
codes:
./run_bulkModule.sh bulk-config.json SRR891269
error:
./run_bulkModule.sh: line 89: /home/ryan/controlsoftware/CUT-RUNTools-2.0-master/install2/src/bulk/bulk-pipeline.sh: No such file or directory
I checked the installation directory where there was no such file or directory.
Do you have any idea about these issue? I would be really appreciate that if you could give me some advice. Thanks very much!
Ryan,
Dear professor,
I tried to debug the run_bulkModule using ExampleData.
my config.json is:
"bt2idx": "/histor/sun/wangtao/0_user/6_maofb/cutrun/0.workplace/1.ref/Homo_sapiens/index",
"genome_sequence": "/histor/sun/wangtao/0_user/6_maofb/cutrun/0.workplace/1.ref/Homo_sapiens/Homo_sapiens.GRCh38.dna.toplevel.fa.gz",
"spike_in_bt2idx": "/histor/sun/wangtao/0_user/6_maofb/cutrun/CUT-RUNTools-2.0/ensemble/index",
"spike_in_sequence": "/histor/sun/wangtao/0_user/6_maofb/cutrun/CUT-RUNTools-2.0/ensemble/Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.dna.chromosome.Chromosome.fa",
and my result is :
[info] Input file is GATA1_D7_30min_chr11_R1_001.fastq.gz and GATA1_D7_30min_chr11_R2_001.fastq.gz
星期三 五月 4 22:28:40 CST 2022
[info] Trimming file GATA1_D7_30min_chr11 ...
星期三 五月 4 22:28:40 CST 2022
[info] Use Truseq adaptor as default
[info] Second stage trimming GATA1_D7_30min_chr11 ...
星期三 五月 4 22:28:40 CST 2022
[info] Aligning file GATA1_D7_30min_chr11 to reference genome...
星期三 五月 4 22:28:40 CST 2022
[info] Bowtie2 command: --dovetail --phred33
[info] The dovetail mode is enabled [as parameter frag_120 is on]
[main_samview] fail to read the header from "-".
[info] FASTQ files won't be aligned to the spike-in genome
[info] Filtering unmapped fragments... GATA1_D7_30min_chr11.bam
星期三 五月 4 22:28:41 CST 2022
[main_samview] fail to read the header from "/histor/sun/wangtao/0_user/6_maofb/cutrun/CUT-RUNTools-2.0/exampleData/bulk-example-test/GATA1_D7_30min_chr11/aligned/GATA1_D7_30min_chr11.bam".
[info] Sorting BAM... GATA1_D7_30min_chr11.bam
星期三 五月 4 22:28:41 CST 2022
I really do not know why about this ?
[main_samview] fail to read the header from "-".
Hey,
I am trying to run CUTRUNTools2.0 to test cut&run data and I have downloaded all the required software and make the bulk-config.json file right. Actually I could run the test but I always get the error at the alignment part, it will shows The bulk data analysis is complete but the output file is empty. And I put the error below.
I have tried several times and tried to modify the config file and adjust the environment, but all ddidn`t work.
Look forward to your advice,
Ryan
#the errors are listed below
**[info] FASTQ files won't be aligned to the spike-in genome
IOError: [Errno 2] No such file or directory: '/home/ryan/controlsoftware/cut_run_tag/fastq_directory_for_tool/output_cutruntools2.0/test1/peakcalling/seacr/test1_treat_pileup.bdg'
Calling enriched regions without control file
Proceeding without normalization of control to experimental bedgraph
IOError: [Errno 2] No such file or directory: '/home/ryan/controlsoftware/cut_run_tag/fastq_directory_for_tool/output_cutruntools2.0/test1/peakcalling/seacr/test1_treat.stringent.bed'
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Finding secondary RE in left flank...")?
dreme exited with error code 1
IOError: [Errno 2] No such file or directory: 'random1000/MEME_test1_shuf/dreme_out/dreme.txt'**
original code:
if [ "$spikein_reads" == "0" ] || [ "$spike_in_norm" == "FALSE" ]
then
>&2 echo "[info] Your bigwig file won't be normalized with spike-in reads as you did not specify this parameter or the spike-in reads were 0.."
else
>&2 echo "[info] Your bigwig file will be normalized with spike-in reads"
scale=$spikein_scale
scale_factor=`printf "%.0f" $(echo "$scale / $spikein_reads"|bc)`
>&2 echo scale_factor=$scale_factor
bamCoverage --bam $bam_file -o $outdir/"$base_file".spikein_normalized.bw \
--binSize 10
--normalizeUsing cpm
--effectiveGenomeSize $eGenomeSize
--scaleFactor $scale_factor
cp $outdir/"$base_file".spikein_normalized.bw $outdirbroad
cp $outdir/"$base_file".spikein_normalized.bw $outdirseac
fi
issue: line break was missing for the following lines:
bamCoverage --bam $bam_file -o $outdir/"$base_file".spikein_normalized.bw \
--binSize 10
--normalizeUsing cpm
--effectiveGenomeSize $eGenomeSize
--scaleFactor $scale_factor
effect: only binSize parameter is considered.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.