peeperlab / copywriter Goto Github PK
View Code? Open in Web Editor NEWDNA copy number detection from off-target sequence data
License: GNU General Public License v3.0
DNA copy number detection from off-target sequence data
License: GNU General Public License v3.0
Hi-
I recently tried to use CopywriteR in the cloud with 128 RAM and 15 cores (Windows Server Datacenter Virtual Machine) with R 3.3.2. Also my input data files: normal 12.67GB, tumor 11GB
I received the following error:
Error: 'bplapply' receive data failed: error reading from connection
Here is my code:
library("CopywriteR")
library("CopyhelpeR")
setwd("C:/Users/m/Desktop/share/data")
data.folder <- tools::file_path_as_absolute(file.path(getwd()))
preCopywriteR(output.folder=file.path(data.folder), bin.size=20000, ref.genome="hg38", prefix="chr")
list.dirs(path=file.path(data.folder), full.names=FALSE)
list.files(path=file.path(data.folder, "hg38_20kb_chr"), full.names=FALSE)
load(file=file.path(data.folder, "hg38_20kb_chr", "blacklist.rda"))
blacklist.grange
load(file=file.path(data.folder, "hg38_20kb_chr", "GC_mappability.rda"))
GC.mappa.grange[1001:1011]
bp.param <- SnowParam(workers = 15, type ="SOCK")
bp.param
path <- c("C:/Users/m/Desktop/share/data")
samples <- list.files(path=path, pattern="tumor.bam$", full.names=TRUE)
controls <- list.files(path=path, pattern="normal.bam$", full.names=TRUE)
sample.control <- data.frame(samples,controls)
CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder), reference.folder = file.path(data.folder, "hg38_20kb_chr"), bp.param = bp.param)
Hi,
I have a question that I could not find an answer to reading the publication or the R package docs:
Does CopywriteR consider reads marked as duplicate? I suppose it does, but I want to make sure.
Thanks,
Clemens
Hi-
I run CopyWrite R, but received the following error:
invalid class “MulticoreParam” object: 1: ‘cluster’, ‘.clusterargs’, ‘RNGseed’ must be length 1
invalid class “MulticoreParam” object: 2: ‘timeout’, ‘log’ must be length 1
starting worker for localhost:11695
'BiocParallel' did not register default BiocParallelParams:
invalid class “MulticoreParam” object: 1: ‘cluster’, ‘.clusterargs’, ‘RNGseed’ must be length 1
invalid class “MulticoreParam” object: 2: ‘timeout’, ‘log’ must be length 1
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
space '1' not in BAM header
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
space '1' not in BAM header
Error in .Method(..., deparse.level = deparse.level) :
number of rows of matrices must match (see arg 2)
Calls: CopywriteR ... f -> standardGeneric -> eval -> eval -> eval -> .Method
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
space '1' not in BAM header
Execution halted
Here is my code:
library("CopywriteR")
library("CopyhelpeR")
setwd("C:/Users/site_li/Documents/WES/data")
data.folder <- tools::file_path_as_absolute(file.path(getwd()))
preCopywriteR(output.folder=file.path(data.folder), bin.size=20000, ref.genome="hg19", prefix="")
list.dirs(path = file.path(data.folder), full.names = FALSE)
list.files(path = file.path(data.folder, "hg19_20kb"), full.names = FALSE)
load(file = file.path(data.folder, "hg19_20kb", "blacklist.rda"))
blacklist.grange
load(file=file.path(data.folder, "hg19_20kb", "GC_mappability.rda"))
GC.mappa.grange[1001:1011]
bp.param <- SnowParam(workers = 12, type ="SOCK")
bp.param
path <- c("C:/Users/site_li/Documents/WES/data")
samples <- list.files(path=path, pattern="tumor.bam$", full.names=TRUE)
controls <- list.files(path=path, pattern="control.bam$", full.names=TRUE)
sample.control <- data.frame(samples,controls)
CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder), reference.folder = file.path(data.folder, "hg19_20kb"), bp.param = bp.param)
Hi,
I ran CopyWrite R, but received the following error:
Plotting to file D:/Usualfile/RStudio/test/destination_1000/CNAprofiles/qc/read.counts.compensated.c1.bam.png
Error in value[3L] :
The GC-content and mappability normalization did not work due to a failure to calculate loesses. This can generally be solved by using larger
bin sizes. Stopping execution of the remaining part of the script...
I have tried to use larger bin sizes, but it didn't work. Could you help me found out the reason? Thanks!
here is my code:
library("CopywriteR")
preCopywriteR("D:/Usualfile/RStudio/test", 20000, "hg19", prefix = "")
bp.param <- SnowParam(workers = 2, type = "SOCK")
path <- "D:/Usualfile/RStudio/test/bam_1000"
samples <- list.files(path = path, pattern = "bam$", full.names = TRUE)
controls <- samples[c(1:2, 1:2)]
sample.control <- data.frame(samples, controls)
CopywriteR(sample.control = sample.control, destination.folder = "D:/Usualfile/RStudio/test/destination_1000", reference.folder = "D:/Usualfile/RStudio/test/hg19_5kb", bp.param = bp.param)
Hi, in my samples log2 results there is the value of -1073741823.5 repeated thousands of times in every file, not always with the same coordinates. What does this mean?
Thanks,
Natalie
Hi, thanks for the package!
I get the
Error in
[<-.data.frame(
tmp, , "off.target", value = c(96715069L, : replacement has 4 rows, data has 5
The preferable way to solve this issue is by installing the newest CopywriteR version using
Bioconductor.
issue that's referenced in the README. I have four tumour samples all corresponding to the same matched control so I'm not sure I could avoid having more comparisons than samples!
First of all, thanks for the great tool!
I'd like to reuse the BAM files that CopywriteR produces (*_properreads.bam
) in a second run (e.g. with another bin size resolution) in order to save time making them again which seems to be a big chunk of the total time.
Maybe the CopywriteR function can receive a pre/suffix
argument to keep output of each run unique and if the processed bam files already exist in the CNAprofiles/BamBaiPeaksFiles
output directory, just start working with those.
I have read with great interest your paper on copywriteR and I am sorry if this is a stupid question... but ... I was left wondering what happens to the targeted regions themselves... sure you use the off targets to detect the copy number variation accross the genome... but what about the targeted regions ... are they left out?
Script start normally;
The bin size for this analysis is 15000
The capture region file is not specified
This analysis will be run on 6 cpus
But then gives an ssh error:
ssh: connect to host 6 port 22: Invalid argument
The host that is reported is the same as the number of cpus specified.
i.e if I run it with 12 CPUs it tries to connect to host 12
I can't find any code relating to ssh in the CopywriteR package.
I'm running under R 3.1.3 and followed your installation guide.
Hello,
I am a big fan of CopywriteR and was using it very usefully.
I am now experiencing the following error message. One weird thing is that I could successfully process the same data before but since some days ago the error continues to occur with every data. I re-installed CopywriteR and tried both linux and windows R versions but still got the same message. Could there be some other problems in related packages? I would be thankful if you could find a way to fix the problem.
1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
pseudoinverse used at 0.385
2: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
neighborhood radius 0.001
3: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
reciprocal condition number 0
4: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
There are other near singularities as well. 1e-006
5: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
pseudoinverse used at 0.909
6: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
neighborhood radius 0.001
7: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
reciprocal condition number 0
8: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
There are other near singularities as well. 1e-006
9: In plot.xy(xy.coords(x, y), type = type, ...) :
"subset" is not a graphical parameter
when I run these lines:
path <- "/home/huanyu0420/copywriteR/result_CopywriteR/sim1_6_6100_read.sort.bam"
samples <- list.files(path = path, pattern = "bam$", full.names = TRUE)
controls <- samples[c(1:6, 1:6)]
sample.control <- data.frame(samples, controls)
Error in data.frame(samples, controls) : 参数值意味着不同的行数: 0, 12
How can I solve this problem?
Hi. I'm trying to run CopywriteR with some BAM files from targeted sequencing. We target 75 genes, get around 3-6M reads, and have around 40-70% off-target. When I try to run the software, I get the following message:
"The GC-content and mappability normalization did not work due to a failure to calculate loesses. This can generally be solved by using larger bin sizes. Stopping execution of the remaining part of the script."
I have attached some of the relevant output. I would greatly appreciate any insight you have as to why it isn't working.
Hi,
I've just started to use copywritR and I encountered a really wierd note:
INFO [2016-10-25 11:42:00] Paired-end sequencing for sample NG-9739_1_lib127215_4623.sorted: TRUE
INFO [2016-10-25 11:42:00] Paired-end sequencing for sample NG-9739_3_lib127217_4623.sorted: FALSE
INFO [2016-10-25 11:42:00] Paired-end sequencing for sample NG-9739_4_lib127218_4623.sorted: TRUE
So... it says my second sample is not paired end, but of course it is. I've taken a look at the flags for all three samples, and actually have no idea why it identifies the second sample as not-paired. Here are the flags of the second sample:
44474716 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
44474128 + 0 mapped (100.00% : N/A)
42102178 + 0 paired in sequencing
21051089 + 0 read1
21051089 + 0 read2
41254896 + 0 properly paired (97.99% : N/A)
42101634 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Maybe you have an idea why it misidentifies? (I think for the actual run this should not be a problem, or am I wrong?)
Hi,
we found it several times that when we have both SNP6.0 data and WES data for the same sample, there is inconsistent CN results in high coverage (peak) regions. CopywriteR was not able to reflect high CN values in these regions whereas the mapped bam files do have high peaks. It seems that the off-target read-based approach might have limitation in this case.
Looking forward to your reply.
Thanks!
Hi,
I am running the CopywriteR to analysis some tissue-normal whole-exome sequencing samples. Everything seems good except for the MAD value (attached). Do you guys happen to know some good methods to reduce the MAD value?
all_chrom.pdf
Thank you very much.
Best,
Xin
Since many softwares (like GATK 4.1.7.0 I'm using now) generate bam index with file name like a.bam
and a.bai
instead of a.bam.bai
, maybe we can change the index matching code from this:
## Index .bam files
- if (!all(file.exists(gsub("$", ".bai", sample.paths)))) {
+ if (!all(file.exists(gsub("$", ".bai", sample.paths)) | file.exists(gsub("\\.bam$", ".bai", sample.paths)))) {
if (file.access(".", 2) == -1) {
stop(.wrap("The .bam files are not indexed and you do not have",
"write permission in (one of) the folder(s) where the",
".bam files are located."))
}
IndexBam <- function(sample.paths) {
indexBam(sample.paths)
paste0("indexBam(\"", sample.paths, "\")")
}
to.log <- bplapply(sample.paths, IndexBam, BPPARAM = bp.param)
lapply(to.log, flog.info)
}
Hi
We are trying to install the latest release in R 3.1 as indicated in the README file - but we get the error that 3.2 is needed. Is the doc outdated or should it still work?
Thanks for any help!
Michael
Hi, when I try running each sample as its own control I do not get an input.Rdata file and so cannot run plotCNA. Otherwise it appears to have worked correctly, I get the following files as output:
bam file
index for bam file
peaks1.bed
CopywriteR.log
log2_read_counts.igv
read_counts.txt
Any idea why?
Thanks,
Natalie
It would be great if the user could have an option to supply the full output directory for both CopywriteR and for plotCNA. For example, if I have whole exome data and targeted data in the same project, it would be great if I could supply the two different output directories as wes_CNAprofiles and targeted_CNAprofiles. Right now I would have to make different directories for each, and then the CNAprofiles would get saved within those directories to avoid having each CNAprofiles directory being overwritten, creating an unnecessary extra directory layer.
I would also like to supply the full destination directory for plotting--in my current workflow I would like to save the CNAprofiles directory within a "data" directory and then resulting plots in a "visualizations" directory, but right now the plots go directly within the CNAprofiles directory.
I met with the following errors, I tried to fix it by many ways but I still could not make it:
Error: BiocParallel errors
element index: 1
first error: seqlevels(param) not in BAM header:
seqlevels: ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘X’, ‘Y’
file: MCF7_RJ_20ug_ER_5ulplus5ul_IDX6_control_rep1.sorted.bam_properreads.bam
index: MCF7_RJ_20ug_ER_5ulplus5ul_IDX6_control_rep1.sorted.bam_properreads.bam
In addition: Warning message:
stop worker failed:
'clear_cluster' receive data failed:
reached elapsed time limit
Can you tell me what the problem is?
Thank you!
can you help me figure out what is happening? and how can i solve this?
Hi,
Is this software still being maintained? Bioconductor says the package is deprecated and will probably be removed from bioC: https://bioconductor.org/packages/release/bioc/html/CopywriteR.html
Hi,
I'm trying to use CopywriteR, but I'm getting the following error:
> CopywriteR(sample.control = sample_control,
+ destination.folder = file.path(output_folder),
+ reference.folder = pre_folder,
.... [TRUNCATED]
The following samples will be analyzed:
sample: chr21_PT-SSV1-HOR-2.sam.sorted.dedup.realigned.recal.bam ; matching control: chr21_PT-NIST-NA12878.sam.sorted.dedup.realigned.recal.bam
sample: chr21_PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal.bam ; matching control: chr21_PT-NIST-NA12878.sam.sorted.dedup.realigned.recal.bam
sample: chr21_PT-SSV1-HOR-2.sam.sorted.dedup.realigned.recal.bam ; matching control: chr21_PT-SSV1-HOR-2.sam.sorted.dedup.realigned.recal.bam
sample: chr21_PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal.bam ; matching control: chr21_PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal.bam
The bin size for this analysis is 20000
The capture region file is not specified
This analysis will be run on 20 cpus
Error in `[<-.data.frame`(`*tmp*`, , "off.target", value = c(68930L, 124716L, :
replacement has 4 rows, data has 3
>
> traceback()
9: stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
"replacement has %d rows, data has %d"), N, n), domain = NA)
8: `[<-.data.frame`(`*tmp*`, , "off.target", value = c(68930L, 124716L,
81868L, 122406L))
7: `[<-`(`*tmp*`, , "off.target", value = c(68930L, 124716L, 81868L,
122406L))
6: CopywriteR(sample.control = sample_control, destination.folder = file.path(output_folder),
reference.folder = pre_folder, bp.param = bp_param) at run.R.tmp.R#68
5: eval(expr, envir, enclos)
4: eval(ei, envir)
3: withVisible(eval(ei, envir))
2: source(f, ...)
1: nvimcom:::source.and.clean("/data3/home/mburger/Work/CopyWriter/test2/run.R.tmp.R",
print.eval = TRUE, echo = TRUE)
I'm using this script.
The script runs without problems for the example provided in vignette but It does not work for my data.
I don't know how to fix this. Does anyone have any clue about the source of the error?
Hello, I have seen a couple of threads where copywriteR throws this error
Error in value[[3L]](cond) :
The GC-content and mappability normalization did not work due to a
failure to calculate loesses. This can generally be solved by using
larger bin sizes. Stopping execution of the remaining part of the
script...
I have tried running the same job in two clusters. One is successful and gives the output; another one gives the error. I am using the same version of R and the packages, same input files, same intermediary files as far as I'm concerned (read_counts.txt). I'd be grateful for help :)
We have found a few WES tumor/normal pairs where CopywriteR crashes. I don't quite understand why that happens, i.e. which input triggers this, but I tracked down the combinations of data that lead to this.
This is the error:
This analysis will be run on 8 cpus
Error in while (cov.chr[index] > lower.cutoff.peaks[i]) { :
argument is of length zero
Calls: do.all ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
This is the offending chunk of code (https://github.com/PeeperLab/CopywriteR/blob/master/R/CopywriteR.R#L447):
if (nrow(test) > 0) {
for (i in seq_len(nrow(test))) {
index <- test[i, "start"]
while (cov.chr[index] > lower.cutoff.peaks[i]) {
index = index - 1
}
test[i, "start"] <- index - read.length
index <- test[i, "end"]
while (cov.chr[index] > lower.cutoff.peaks[i]) {
index = index + 1
}
test[i, "end"] <- index + read.length
}
}
In the sample I am looking at, we start at test[i, "start"] = 154
. The coverage at lower.cutoff.peaks[1]
is 2. Now we move from position 154 down, until we find a position in cov.chr
equal or smaller 2. If that does not happen, trying to get cov.chr[0]
leads to the crash. I guess the function assumes that at least at cov.chr[1]
the while loop should exit, but I don't know why this does not happen. In any case, this should be caught and dealt with.
I modified the code to print print(paste(cov.chr[index], lower.cutoff.peaks[i]))
, this is the output:
[1] "8 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
Best,
Clemens
I processed all bam files with CopywriteR, why this error?
Thanks!
Error in plotCNA(destination.folder = "./copynumber") :
The following samples refer to .bam files that have not been processed
by CopywriteR: ‘log2.SKCM-M028-P008_C.sorted.bam’,
‘log2.SKCM-M035-P010_C.sorted.bam’,
‘log2.SKCM-M035-P011_C.sorted.bam’,
‘log2.SKCM-M123-P009_C.sorted.bam’,
‘log2.SKCM-M137-P008_C.sorted.bam’,
‘log2.SKCM-M233-P010_C.sorted.bam’,
‘log2.SKCM-M263-P011_C.sorted.bam’,
‘log2.SKCM-M275-P008_C.sorted.bam’,
‘log2.SKCM-M305-P009_C.sorted.bam’,
‘log2.SKCM-M306-P008_C.sorted.bam’,
‘log2.SKCM-M357-P010_C.sorted.bam’,
‘log2.SKCM-M399-P010_C.sorted.bam’,
‘log2.SKCM-M409-P011_C.sorted.bam’,
‘log2.SKCM-M424-P009_C.sorted.bam’,
‘log2.SKCM-M527-P010_C.sorted.bam’,
‘log2.SKCM-M642-P008_C.sorted.bam’,
‘log2.SKCM-M721-P010_C.sorted.bam’,
‘log2.SKCM-M747-P008_C.sorted.bam’,
‘log2.SKCM-M749-P010_C.sorted.bam’,
‘log2.SKCM-M762-P008_C.sorted.bam’,
‘log2.SKCM-M806-P011_C.sorted.bam’,
‘log2.SKCM-M807-P0
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.4 (Santiago)
Matrix products: default
BLAS: /risapps/rhel6/R/3.4.1-shlib/lib64/R/lib/libRblas.so
LAPACK: /risapps/rhel6/R/3.4.1-shlib/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] CopywriteR_2.10.0 BiocParallel_1.10.1
loaded via a namespace (and not attached):
[1] DNAcopy_1.52.0 chipseq_1.28.0
[3] XVector_0.18.0 GenomicRanges_1.30.3
[5] BiocGenerics_0.24.0 zlibbioc_1.24.0
[7] GenomicAlignments_1.14.1 IRanges_2.12.0
[9] lattice_0.20-35 hwriter_1.3.2
[11] GenomeInfoDb_1.14.0 tools_3.4.1
[13] SummarizedExperiment_1.8.1 parallel_3.4.1
[15] grid_3.4.1 Biobase_2.38.0
[17] data.table_1.10.4-3 latticeExtra_0.6-28
[19] lambda.r_1.2 futile.logger_1.4.3
[21] gtools_3.5.0 matrixStats_0.53.1
[23] Matrix_1.2-12 GenomeInfoDbData_1.0.0
[25] RColorBrewer_1.1-2 futile.options_1.0.0
[27] S4Vectors_0.16.0 bitops_1.0-6
[29] RCurl_1.95-4.10 DelayedArray_0.4.1
[31] CopyhelpeR_1.10.0 compiler_3.4.1
[33] Rsamtools_1.28.0 Biostrings_2.44.2
[35] stats4_3.4.1 ShortRead_1.36.0
Warning messages:
1: replacing previous import ‘stats::sd’ by ‘BiocGenerics::sd’ when loading ‘S4Vectors’
2: replacing previous import ‘stats::var’ by ‘BiocGenerics::var’ when loading ‘S4Vectors’
3: replacing previous import ‘stats::sd’ by ‘BiocGenerics::sd’ when loading ‘IRanges’
4: replacing previous import ‘stats::var’ by ‘BiocGenerics::var’ when loading ‘IRanges’
5: multiple methods tables found for ‘as.vector’
6: multiple methods tables found for ‘unlist’
Hi Thomas,
I would like to perform my own GC/mappability corrections and blacklists on the readcount data which comes out of CopywriteR. It would be nice to have an option to set a parameter to enable this.
I now did this by hand in the following function ( correctmappa = FALSE) after downloading the tarball and compiling the package afterwards. Not the best way to go if I want update CopywriteR in the future ;)
tryCatch({
i <- c(seq_len(ncol(data$cov)))
NormalizeDOC <- function(i, data, .tng, usepoints, destination.folder) {
.tng(data.frame(count = data$cov[, i], gc = data$anno$gc,
mappa = data$anno$mappa),
use = usepoints & data$cov[, i] != 0, correctmappa = TRUE,
plot = file.path(destination.folder, "qc",
paste0(colnames(data$cov)[i], ".png")))
}
ratios <- bplapply(i, NormalizeDOC, data, .tng, usepoints,
destination.folder, BPPARAM = bp.param)
log2.read.counts <- matrix(unlist(ratios), ncol = length(sample.indices))
}, error = function(e) {
stop(.wrap("The GC-content and mappability normalization did not work",
"due to a failure to calculate loesses. This can generally",
"be solved by using larger bin sizes. Stopping execution of",
"the remaining part of the script..."))
})
colnames(log2.read.counts) <- paste0("log2.", sample.files[sample.indices])
Hi, there's a "length > 1 in coercion to logical" in the .tng()
function that is triggered when you plot, e.g. when running the vignette. To see it, you need to enable the extra checks for this, e.g. by setting _R_CHECK_LENGTH_1_LOGIC2_=true
or using --as-cran
:
$ $ R --vanilla CMD check --as-cran CopywriteR_2.24.0.tar.gz
* using log directory ‘/tmp/bioc/CopywriteR.Rcheck’
* using R version 4.1.0 (2021-05-18)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* using option ‘--as-cran’
* checking for file ‘CopywriteR/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘CopywriteR’ version ‘2.24.0’
* checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Oscar Krijgsman <[email protected]>’
Package duplicated from https://bioconductor.org/packages/3.13/bioc
Unknown, possibly mis-spelled, fields in DESCRIPTION:
‘git_url’ ‘git_branch’ ‘git_last_commit’ ‘git_last_commit_date’
Uses the superseded package: ‘snow’
No package encoding and non-ASCII characters in the following R files:
R/CopywriteR.R
614: ## <e2><80><98>Map<e2><80><99> applies a function to the corresponding elements of given vectors.
The Title field should be in title case. Current version is:
‘Copy number information from targeted sequencing using off-target reads’
In title case that is:
‘Copy Number Information from Targeted Sequencing using Off-Target Reads’
The Description field should not start with the package name,
'This package' or similar.
The Date field is over a month old.
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking serialization versions ... OK
* checking whether package ‘CopywriteR’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking for future file timestamps ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
CopywriteR: no visible global function definition for ‘packageVersion’
CopywriteR: no visible global function definition for ‘getClass’
CopywriteR : DetectPeaks: no visible global function definition for
‘as’
CopywriteR : DetectPeaks: no visible global function definition for
‘write.table’
CopywriteR : CalculateDepthOfCoverage: no visible global function
definition for ‘read.table’
CopywriteR : CalculateDepthOfCoverage: no visible global function
definition for ‘as’
CopywriteR: no visible global function definition for ‘write.table’
CopywriteR: no visible global function definition for ‘pdf’
CopywriteR: no visible global function definition for ‘ecdf’
CopywriteR: no visible global function definition for ‘dev.off’
CopywriteR: no visible global function definition for ‘read.table’
plotCNA: no visible global function definition for ‘read.table’
plotCNA : <anonymous> : <anonymous>: no visible global function
definition for ‘pdf’
plotCNA : <anonymous> : <anonymous>: no visible global function
definition for ‘points’
plotCNA : <anonymous> : <anonymous> : <anonymous>: no visible global
function definition for ‘segments’
plotCNA : <anonymous> : <anonymous>: no visible global function
definition for ‘par’
plotCNA : <anonymous> : <anonymous>: no visible global function
definition for ‘text’
plotCNA : <anonymous> : <anonymous>: no visible global function
definition for ‘axis’
plotCNA : <anonymous> : <anonymous>: no visible global function
definition for ‘abline’
plotCNA : <anonymous> : <anonymous>: no visible global function
definition for ‘dev.off’
preCopywriteR: no visible global function definition for ‘as’
Undefined global functions or variables:
abline as axis dev.off dpois ecdf getClass lines loess packageVersion
par pdf png points ppois predict read.table rgb segments text
write.table
Consider adding
importFrom("grDevices", "dev.off", "pdf", "png", "rgb")
importFrom("graphics", "abline", "axis", "lines", "par", "points",
"segments", "text")
importFrom("methods", "as", "getClass")
importFrom("stats", "dpois", "ecdf", "loess", "ppois", "predict")
importFrom("utils", "packageVersion", "read.table", "write.table")
to your NAMESPACE file (and ensure that your DESCRIPTION Imports field
contains 'methods').
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking sizes of PDF files under ‘inst/doc’ ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ...
WARNING
Error(s) in re-building vignettes:
--- re-building ‘CopywriteR.Rnw’ using Sweave
Loading required package: BiocParallel
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
pseudoinverse used at 0.979
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
neighborhood radius 0.001
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
reciprocal condition number 0
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
There are other near singularities as well. 1e-06
----------- FAILURE REPORT --------------
--- failure: length > 1 in coercion to logical ---
--- srcref ---
:
--- package (from environment) ---
CopywriteR
--- call from context ---
.tng(data.frame(count = data$cov[, i], gc = data$anno$gc, mappa = data$anno$mappa),
use = usepoints & data$cov[, i] != 0, correctmappa = TRUE,
plot = file.path(destination.folder, "qc", paste0(colnames(data$cov)[i],
".png")))
--- call from argument ---
!is.na(df$mappa) && !is.na(normv)
--- R stacktrace ---
where 1: .tng(data.frame(count = data$cov[, i], gc = data$anno$gc, mappa = data$anno$mappa),
use = usepoints & data$cov[, i] != 0, correctmappa = TRUE,
plot = file.path(destination.folder, "qc", paste0(colnames(data$cov)[i],
".png")))
where 2: FUN(...)
where 3: doTryCatch(return(expr), name, parentenv, handler)
where 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 5: tryCatchList(expr, classes, parentenv, handlers)
where 6: tryCatch({
FUN(...)
}, error = handle_error)
where 7: withCallingHandlers({
tryCatch({
FUN(...)
}, error = handle_error)
}, warning = handle_warning)
where 8: FUN(...)
where 9: FUN(X[[i]], ...)
where 10: lapply(X, FUN_, ...)
where 11: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
where 12: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
where 13: bplapply(i, NormalizeDOC, data, .tng, usepoints, destination.folder,
BPPARAM = bp.param)
where 14: bplapply(i, NormalizeDOC, data, .tng, usepoints, destination.folder,
BPPARAM = bp.param)
where 15: doTryCatch(return(expr), name, parentenv, handler)
where 16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 17: tryCatchList(expr, classes, parentenv, handlers)
where 18: tryCatch({
i <- c(seq_len(ncol(data$cov)))
NormalizeDOC <- function(i, data, .tng, usepoints, destination.folder) {
.tng(data.frame(count = data$cov[, i], gc = data$anno$gc,
mappa = data$anno$mappa), use = usepoints & data$cov[,
i] != 0, correctmappa = TRUE, plot = file.path(destination.folder,
"qc", paste0(colnames(data$cov)[i], ".png")))
}
ratios <- bplapply(i, NormalizeDOC, data, .tng, usepoints,
destination.folder, BPPARAM = bp.param)
log2.read.counts <- matrix(unlist(ratios), ncol = length(sample.indices))
}, error = function(e) {
stop(.wrap("The GC-content and mappability normalization did not work",
"due to a failure to calculate loesses. This can generally",
"be solved by using larger bin sizes. Stopping execution of",
"the remaining part of the script..."))
})
where 19: CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder),
reference.folder = file.path(data.folder, "mm10_4_20kb"),
bp.param = bp.param)
where 20: eval(expr, .GlobalEnv)
where 21: eval(expr, .GlobalEnv)
where 22: withVisible(eval(expr, .GlobalEnv))
where 23: doTryCatch(return(expr), name, parentenv, handler)
where 24: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 25: tryCatchList(expr, classes, parentenv, handlers)
where 26: tryCatch(expr, error = function(e) {
call <- conditionCall(e)
if (!is.null(call)) {
if (identical(call[[1L]], quote(doTryCatch)))
call <- sys.call(-4L)
dcall <- deparse(call)[1L]
prefix <- paste("Error in", dcall, ": ")
LONG <- 75L
sm <- strsplit(conditionMessage(e), "\n")[[1L]]
w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")
if (is.na(w))
w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],
type = "b")
if (w > LONG)
prefix <- paste0(prefix, "\n ")
}
else prefix <- "Error : "
msg <- paste0(prefix, conditionMessage(e), "\n")
.Internal(seterrmessage(msg[1L]))
if (!silent && isTRUE(getOption("show.error.messages"))) {
cat(msg, file = outFile)
.Internal(printDeferredWarnings())
}
invisible(structure(msg, class = "try-error", condition = e))
})
where 27: try(withVisible(eval(expr, .GlobalEnv)), silent = TRUE)
where 28: evalFunc(ce, options)
where 29: tryCatchList(expr, classes, parentenv, handlers)
where 30: tryCatch(evalFunc(ce, options), finally = {
cat("\n")
sink()
})
where 31: driver$runcode(drobj, chunk, chunkopts)
where 32: utils::Sweave(...)
where 33: engine$weave(file, quiet = quiet, encoding = enc)
where 34: doTryCatch(return(expr), name, parentenv, handler)
where 35: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 36: tryCatchList(expr, classes, parentenv, handlers)
where 37: tryCatch({
engine$weave(file, quiet = quiet, encoding = enc)
setwd(startdir)
output <- find_vignette_product(name, by = "weave", engine = engine)
if (!have.makefile && vignette_is_tex(output)) {
texi2pdf(file = output, clean = FALSE, quiet = quiet)
output <- find_vignette_product(name, by = "texi2pdf",
engine = engine)
}
outputs <- c(outputs, output)
}, error = function(e) {
thisOK <<- FALSE
fails <<- c(fails, file)
message(gettextf("Error: processing vignette '%s' failed with diagnostics:\n%s",
file, conditionMessage(e)))
})
where 38: tools:::buildVignettes(dir = "/tmp/bioc/CopywriteR.Rcheck/vign_test/CopywriteR",
ser_elibs = "/scratch/henrik/RtmpHwwNKw/file312312259755.rds")
--- value of length: 7825 type: logical ---
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
...
[7813] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
--- function from context ---
function (df, use, correctmappa = TRUE, plot = NULL, verbose = TRUE)
{
if (!is.logical(use) && length(use) == nrow(df))
stop("use should be logicval vector with same size as df")
if (!is.null(plot)) {
if (!is.logical(plot)) {
if (verbose)
cat("Plotting to file", plot, "\n")
png(plot, width = 700, height = 1400)
par(mfrow = c(2, 1))
on.exit(dev.off())
plot <- TRUE
}
else if (plot) {
par(mfrow = c(2, 1))
}
}
gcuse <- (use & !is.na(df$mappa) & df$mappa > 0.8 & !is.na(df$gc) &
df$gc > 0)
rough <- loess(count ~ gc, data = df, subset = gcuse, span = 0.03)
i <- seq(0, 1, by = 0.001)
final <- loess(predict(rough, i) ~ i, span = 0.3)
normv <- predict(final, df$gc)
df$countgcloess <- df$count/(normv/median(normv, na.rm = TRUE))
if (plot) {
plot(count ~ gc, data = df, subset = gcuse, ylim = quantile(df$count[gcuse],
c(1e-04, 0.999)), xlim = c(0, 1), pch = ".")
points(count ~ gc, data = df, subset = !gcuse, col = rgb(1,
0, 0, 0.3), pch = ".")
lines(i, predict(rough, i), col = "green")
points(df$gc, normv, col = "red", pch = ".")
}
if (correctmappa) {
mappause <- (use & !is.na(df$mappa))
rough <- loess(countgcloess ~ mappa, data = df, subset = mappause,
span = 0.03)
i <- seq(0, 1, by = 0.001)
final <- loess(predict(rough, i) ~ i, span = 0.3)
normv <- predict(final, df$mappa)
df$countgcmappaloess <- df$countgcloess/(normv/median(normv,
na.rm = TRUE))
if (plot) {
plot(countgcloess ~ mappa, data = df, subset = mappause,
ylim = quantile(df$countgcloess[mappause], c(1e-04,
0.999), na.rm = TRUE), xlim = c(0, 1), pch = ".")
points(countgcloess ~ mappa, data = df, subset = !mappause,
col = rgb(1, 0, 0, 0.3), pch = ".")
lines(i, predict(rough, i), col = "green")
subset.points <- !is.na(df$mappa) && !is.na(normv)
points(df$mappa[subset.points], normv[subset.points],
col = "red", pch = ".")
}
return(log2(df$countgcmappaloess/median(df$countgcmappaloess[use],
na.rm = TRUE)))
}
else {
log2(df$countgcloess/median(df$countgcloess[use], na.rm = TRUE))
}
}
<bytecode: 0x46f060f0>
<environment: namespace:CopywriteR>
--- function search by body ---
Function .tng in namespace CopywriteR has this body.
----------- END OF FAILURE REPORT --------------
Error: processing vignette 'CopywriteR.Rnw' failed with diagnostics:
chunk 10 (label = CopywriteR)
Error in value[[3L]](cond) :
The GC-content and mappability normalization did not work due to a
failure to calculate loesses. This can generally be solved by using
larger bin sizes. Stopping execution of the remaining part of the
script...
--- failed re-building ‘CopywriteR.Rnw’
SUMMARY: processing the following file failed:
‘CopywriteR.Rnw’
Error: Vignette re-building failed.
Execution halted
* checking PDF version of manual ... OK
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE
Status: 1 WARNING, 2 NOTEs
See
‘/tmp/bioc/CopywriteR.Rcheck/00check.log’
for details.
The problem is in:
Lines 120 to 122 in f64df3f
which shouldn't use &&
but &
, i.e. it should use:
subset = (!is.na(df$mappa) & !is.na(normv)),
When you use &&
without the extra checks, you're effectively using:
subset = (!is.na(df$mappa[1]) & !is.na(normv[1])),
which I assume you don't want.
Hi
I've just used the copywriteR package to analyze a tumour-normal pair. After the execution I have run the plotCNA
function as described. The output is that I should re-analyze the sample because it does not find the control.
plotCNA(file.path("PE08_copywriteR"))
[1] "log2.PE08_ID-ready.bam" "log2.PE08_CR-ready.bam"
[1] TRUE FALSE
[1] "log2.PE08_CR-ready.bam"
Error in plotCNA(file.path("PE08_copywriteR")) :
The following samples refer to .bam files that have not been processed
by CopywriteR: ‘log2.PE08_CR-ready.bam’ . Please re-analyze all
required samples using CopywriteR and run the plotCNA function again.
Something cannot be right, but I'm not able to guess what. Could you help me out on this?
Thanks
Michael
Hi,
I have problem running the CopywriteR, the error message is:
"Error in value[3L] :
The GC-content and mappability normalization did not work due to a failure to calculate loesses. This can
generally be solved by using larger bin sizes. Stopping execution of the remaining part of the script...
In addition: Warning message:
In png(plot, width = 700, height = 1400) :
unable to open connection to X11 display "
But I think the bin size I used is pretty large( 200kb ), I remember I saw the same error message last month when I tested this package, I realized the problem was because I used the de-duplicated bam files by mistake. I changed the bam files and successfully ran it. This time I used the normal bam file, same code to run, but it still crashed.
Could you please help me to figure it out? My code and log files are attached.
copywriteR_test.zip
Thank you very much!
Hi,
When running CopywriteR with BiocParallel with a bp.param like this one:
bp.param <- SnowParam(workers= 5, type= "SOCK")
I'm getting errors regarding BAM files not found in the workers
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: failed to open BamFile: file(s) do not exist:
These errors are not occurring in multicore or singlecore runs and seem to be caused by using the basename of BAM paths instead of the full BAM paths.
Here is the traceback() output of the error
4: stop(.error_bplist(res))
3: bplapply(sample.files, Stats.2, BPPARAM = bp.param)
2: bplapply(sample.files, Stats.2, BPPARAM = bp.param)
1: CopywriteR(sample.control = data.frame(samples, controls), destination.folder = outpF,
reference.folder = binref, capture.regions.file = roi, bp.param = bp.param)
I hope that helps.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.