peeperlab / copywriter Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 10.0 2.68 MB

DNA copy number detection from off-target sequence data

License: GNU General Public License v3.0

R 100.00%

copywriter's People

Contributors

Stargazers

Watchers

Forkers

david-a-parry zhmz90 roelkluin yixf-self zhanglesolanin cloudbroken linhxxx johnmcma whtns styl-maverick

copywriter's Issues

chr sort issue in genome plot with prefix

While testing Copywriter there seems to be an issue with the sorting of the chromosomes in the final genome plot. The bin-annotations were created with prefix = 'chr'....

Connection Errors when analyzing 13GB bams

Hi-

I recently tried to use CopywriteR in the cloud with 128 RAM and 15 cores (Windows Server Datacenter Virtual Machine) with R 3.3.2. Also my input data files: normal 12.67GB, tumor 11GB

I received the following error:
Error: 'bplapply' receive data failed: error reading from connection

Can you suggest a work around? Maybe too many bam lines are being read at once?

Here is my code:

library("CopywriteR")
library("CopyhelpeR")
setwd("C:/Users/m/Desktop/share/data")
data.folder <- tools::file_path_as_absolute(file.path(getwd()))
preCopywriteR(output.folder=file.path(data.folder), bin.size=20000, ref.genome="hg38", prefix="chr")

list.dirs(path=file.path(data.folder), full.names=FALSE)
list.files(path=file.path(data.folder, "hg38_20kb_chr"), full.names=FALSE)
load(file=file.path(data.folder, "hg38_20kb_chr", "blacklist.rda"))
blacklist.grange

load(file=file.path(data.folder, "hg38_20kb_chr", "GC_mappability.rda"))
GC.mappa.grange[1001:1011]
bp.param <- SnowParam(workers = 15, type ="SOCK")
bp.param

path <- c("C:/Users/m/Desktop/share/data")
samples <- list.files(path=path, pattern="tumor.bam$", full.names=TRUE)
controls <- list.files(path=path, pattern="normal.bam$", full.names=TRUE)
sample.control <- data.frame(samples,controls)

CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder), reference.folder = file.path(data.folder, "hg38_20kb_chr"), bp.param = bp.param)

Question: Does CopywriteR consider reads marked as duplicate?

Hi,

I have a question that I could not find an answer to reading the publication or the R package docs:

Does CopywriteR consider reads marked as duplicate? I suppose it does, but I want to make sure.

Thanks,
Clemens

CopyWriteR Error

Hi-

I run CopyWrite R, but received the following error:

invalid class “MulticoreParam” object: 1: ‘cluster’, ‘.clusterargs’, ‘RNGseed’ must be length 1
invalid class “MulticoreParam” object: 2: ‘timeout’, ‘log’ must be length 1
starting worker for localhost:11695
'BiocParallel' did not register default BiocParallelParams:
invalid class “MulticoreParam” object: 1: ‘cluster’, ‘.clusterargs’, ‘RNGseed’ must be length 1
invalid class “MulticoreParam” object: 2: ‘timeout’, ‘log’ must be length 1
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
space '1' not in BAM header
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
space '1' not in BAM header
Error in .Method(..., deparse.level = deparse.level) :
number of rows of matrices must match (see arg 2)
Calls: CopywriteR ... f -> standardGeneric -> eval -> eval -> eval -> .Method
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
space '1' not in BAM header
Execution halted

Could you help me found out the reason? Thanks!

Here is my code:

library("CopywriteR")
library("CopyhelpeR")
setwd("C:/Users/site_li/Documents/WES/data")
data.folder <- tools::file_path_as_absolute(file.path(getwd()))

preCopywriteR(output.folder=file.path(data.folder), bin.size=20000, ref.genome="hg19", prefix="")

list.dirs(path = file.path(data.folder), full.names = FALSE)
list.files(path = file.path(data.folder, "hg19_20kb"), full.names = FALSE)
load(file = file.path(data.folder, "hg19_20kb", "blacklist.rda"))
blacklist.grange

load(file=file.path(data.folder, "hg19_20kb", "GC_mappability.rda"))
GC.mappa.grange[1001:1011]

bp.param <- SnowParam(workers = 12, type ="SOCK")
bp.param

path <- c("C:/Users/site_li/Documents/WES/data")
samples <- list.files(path=path, pattern="tumor.bam$", full.names=TRUE)
controls <- list.files(path=path, pattern="control.bam$", full.names=TRUE)
sample.control <- data.frame(samples,controls)

CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder), reference.folder = file.path(data.folder, "hg19_20kb"), bp.param = bp.param)

CopywriteR error

Hi,
I ran CopyWrite R, but received the following error:
Plotting to file D:/Usualfile/RStudio/test/destination_1000/CNAprofiles/qc/read.counts.compensated.c1.bam.png
Error in value[3L] :
The GC-content and mappability normalization did not work due to a failure to calculate loesses. This can generally be solved by using larger
bin sizes. Stopping execution of the remaining part of the script...

I have tried to use larger bin sizes, but it didn't work. Could you help me found out the reason? Thanks!

here is my code:

library("CopywriteR")
preCopywriteR("D:/Usualfile/RStudio/test", 20000, "hg19", prefix = "")
bp.param <- SnowParam(workers = 2, type = "SOCK")
path <- "D:/Usualfile/RStudio/test/bam_1000"
samples <- list.files(path = path, pattern = "bam$", full.names = TRUE)
controls <- samples[c(1:2, 1:2)]
sample.control <- data.frame(samples, controls)
CopywriteR(sample.control = sample.control, destination.folder = "D:/Usualfile/RStudio/test/destination_1000", reference.folder = "D:/Usualfile/RStudio/test/hg19_5kb", bp.param = bp.param)

Very negative log2 values in some samples for many regions

Hi, in my samples log2 results there is the value of -1073741823.5 repeated thousands of times in every file, not always with the same coordinates. What does this mean?

Thanks,
Natalie

Issue #13 persists in CopywriteR 2.14.1 (Bioconductor)

Hi, thanks for the package!

I get the
Error in [<-.data.frame(tmp, , "off.target", value = c(96715069L, : replacement has 4 rows, data has 5

The preferable way to solve this issue is by installing the newest CopywriteR version using
Bioconductor.

issue that's referenced in the README. I have four tumour samples all corresponding to the same matched control so I'm not sure I could avoid having more comparisons than samples!

Request: reuse processed BAM files between runs

First of all, thanks for the great tool!

I'd like to reuse the BAM files that CopywriteR produces (*_properreads.bam) in a second run (e.g. with another bin size resolution) in order to save time making them again which seems to be a big chunk of the total time.
Maybe the CopywriteR function can receive a pre/suffix argument to keep output of each run unique and if the processed bam files already exist in the CNAprofiles/BamBaiPeaksFiles output directory, just start working with those.

Question regarding targeted regions

I have read with great interest your paper on copywriteR and I am sorry if this is a stupid question... but ... I was left wondering what happens to the targeted regions themselves... sure you use the off targets to detect the copy number variation accross the genome... but what about the targeted regions ... are they left out?

SSH error upon run

Script start normally;

The bin size for this analysis is 15000
The capture region file is not specified
This analysis will be run on 6 cpus

But then gives an ssh error:

ssh: connect to host 6 port 22: Invalid argument

The host that is reported is the same as the number of cpus specified.
i.e if I run it with 12 CPUs it tries to connect to host 12

I can't find any code relating to ssh in the CopywriteR package.

I'm running under R 3.1.3 and followed your installation guide.

Warning messages for simpleLoess

Hello,
I am a big fan of CopywriteR and was using it very usefully.
I am now experiencing the following error message. One weird thing is that I could successfully process the same data before but since some days ago the error continues to occur with every data. I re-installed CopywriteR and tried both linux and windows R versions but still got the same message. Could there be some other problems in related packages? I would be thankful if you could find a way to fix the problem.

1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
pseudoinverse used at 0.385
2: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
neighborhood radius 0.001
3: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
reciprocal condition number 0
4: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
There are other near singularities as well. 1e-006
5: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
pseudoinverse used at 0.909
6: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
neighborhood radius 0.001
7: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
reciprocal condition number 0
8: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
There are other near singularities as well. 1e-006
9: In plot.xy(xy.coords(x, y), type = type, ...) :
"subset" is not a graphical parameter

destination.error

when I run these lines:

path <- "/home/huanyu0420/copywriteR/result_CopywriteR/sim1_6_6100_read.sort.bam"
samples <- list.files(path = path, pattern = "bam$", full.names = TRUE)
controls <- samples[c(1:6, 1:6)]
sample.control <- data.frame(samples, controls)

Error in data.frame(samples, controls) : 参数值意味着不同的行数: 0, 12

How can I solve this problem?

CopywriteR cannot calculate loesses, regardless of bin size

Hi. I'm trying to run CopywriteR with some BAM files from targeted sequencing. We target 75 genes, get around 3-6M reads, and have around 40-70% off-target. When I try to run the software, I get the following message:

"The GC-content and mappability normalization did not work due to a failure to calculate loesses. This can generally be solved by using larger bin sizes. Stopping execution of the remaining part of the script."

I have attached some of the relevant output. I would greatly appreciate any insight you have as to why it isn't working.

CopywriteR.log

read_counts.txt

Paired-end not correctly identified

Hi,

I've just started to use copywritR and I encountered a really wierd note:

INFO [2016-10-25 11:42:00] Paired-end sequencing for sample NG-9739_1_lib127215_4623.sorted: TRUE
INFO [2016-10-25 11:42:00] Paired-end sequencing for sample NG-9739_3_lib127217_4623.sorted: FALSE
INFO [2016-10-25 11:42:00] Paired-end sequencing for sample NG-9739_4_lib127218_4623.sorted: TRUE

So... it says my second sample is not paired end, but of course it is. I've taken a look at the flags for all three samples, and actually have no idea why it identifies the second sample as not-paired. Here are the flags of the second sample:

44474716 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
44474128 + 0 mapped (100.00% : N/A)
42102178 + 0 paired in sequencing
21051089 + 0 read1
21051089 + 0 read2
41254896 + 0 properly paired (97.99% : N/A)
42101634 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Maybe you have an idea why it misidentifies? (I think for the actual run this should not be a problem, or am I wrong?)

Inaccurate read count prediction in peak regions

Hi,
we found it several times that when we have both SNP6.0 data and WES data for the same sample, there is inconsistent CN results in high coverage (peak) regions. CopywriteR was not able to reflect high CN values in these regions whereas the mapped bam files do have high peaks. It seems that the off-target read-based approach might have limitation in this case.

Looking forward to your reply.
Thanks!

How to reduce the noise/MAD value?

Hi,
I am running the CopywriteR to analysis some tissue-normal whole-exome sequencing samples. Everything seems good except for the MAD value (attached). Do you guys happen to know some good methods to reduce the MAD value?
all_chrom.pdf

Thank you very much.
Best,
Xin

matching bam index with file name pattern like "a.bam" and "a.bai"

Since many softwares (like GATK 4.1.7.0 I'm using now) generate bam index with file name like a.bam and a.bai instead of a.bam.bai, maybe we can change the index matching code from this:

## Index .bam files
-   if (!all(file.exists(gsub("$", ".bai", sample.paths)))) {
+   if (!all(file.exists(gsub("$", ".bai", sample.paths)) | file.exists(gsub("\\.bam$", ".bai", sample.paths)))) {
        if (file.access(".", 2) == -1) {
            stop(.wrap("The .bam files are not indexed and you do not have",
                       "write permission in (one of) the folder(s) where the",
                       ".bam files are located."))
        }
        IndexBam <- function(sample.paths) {
            indexBam(sample.paths)
            paste0("indexBam(\"", sample.paths, "\")")
        }
        to.log <- bplapply(sample.paths, IndexBam, BPPARAM = bp.param)
        lapply(to.log, flog.info)
    }

Lost compatibility with R3.1

We are trying to install the latest release in R 3.1 as indicated in the README file - but we get the error that 3.2 is needed. Is the doc outdated or should it still work?

Thanks for any help!
Michael

no input.Rdata file when using each sample as their own control

Hi, when I try running each sample as its own control I do not get an input.Rdata file and so cannot run plotCNA. Otherwise it appears to have worked correctly, I get the following files as output:
bam file
index for bam file
peaks1.bed
CopywriteR.log
log2_read_counts.igv
read_counts.txt

Any idea why?
Thanks,
Natalie

Request: Custom destination directories

It would be great if the user could have an option to supply the full output directory for both CopywriteR and for plotCNA. For example, if I have whole exome data and targeted data in the same project, it would be great if I could supply the two different output directories as wes_CNAprofiles and targeted_CNAprofiles. Right now I would have to make different directories for each, and then the CNAprofiles would get saved within those directories to avoid having each CNAprofiles directory being overwritten, creating an unnecessary extra directory layer.

I would also like to supply the full destination directory for plotting--in my current workflow I would like to save the CNAprofiles directory within a "data" directory and then resulting plots in a "visualizations" directory, but right now the plots go directly within the CNAprofiles directory.

BiocParallel errors

I met with the following errors, I tried to fix it by many ways but I still could not make it:

Error: BiocParallel errors
element index: 1
first error: seqlevels(param) not in BAM header:
seqlevels: ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘X’, ‘Y’
file: MCF7_RJ_20ug_ER_5ulplus5ul_IDX6_control_rep1.sorted.bam_properreads.bam
index: MCF7_RJ_20ug_ER_5ulplus5ul_IDX6_control_rep1.sorted.bam_properreads.bam
In addition: Warning message:
stop worker failed:
'clear_cluster' receive data failed:
reached elapsed time limit

Can you tell me what the problem is?

Thank you!

Error in if (nrow(peaks) > 0) { : argument is of length zero Calls: CopywriteR ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> Execution halted

can you help me figure out what is happening? and how can i solve this?

Is CopywriteR still being maintained?

Hi,

Is this software still being maintained? Bioconductor says the package is deprecated and will probably be removed from bioC: https://bioconductor.org/packages/release/bioc/html/CopywriteR.html

Error in CopywriteR function:

Hi,

I'm trying to use CopywriteR, but I'm getting the following error:

> CopywriteR(sample.control = sample_control,         
+            destination.folder = file.path(output_folder),                                                  
+            reference.folder = pre_folder,           
 .... [TRUNCATED]          
The following samples will be analyzed:               
sample: chr21_PT-SSV1-HOR-2.sam.sorted.dedup.realigned.recal.bam ;       matching control: chr21_PT-NIST-NA12878.sam.sorted.dedup.realigned.recal.bam
sample: chr21_PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal.bam ;     matching control: chr21_PT-NIST-NA12878.sam.sorted.dedup.realigned.recal.bam
sample: chr21_PT-SSV1-HOR-2.sam.sorted.dedup.realigned.recal.bam ;       matching control: chr21_PT-SSV1-HOR-2.sam.sorted.dedup.realigned.recal.bam
sample: chr21_PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal.bam ;     matching control: chr21_PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal.bam
The bin size for this analysis is 20000               
The capture region file is not specified              
This analysis will be run on 20 cpus                  
Error in `[<-.data.frame`(`*tmp*`, , "off.target", value = c(68930L, 124716L,  :                             
  replacement has 4 rows, data has 3                  
>                          
> traceback()              
9: stop(sprintf(ngettext(N, "replacement has %d row, data has %d",                                           
       "replacement has %d rows, data has %d"), N, n), domain = NA)                                          
8: `[<-.data.frame`(`*tmp*`, , "off.target", value = c(68930L, 124716L,                                      
   81868L, 122406L))       
7: `[<-`(`*tmp*`, , "off.target", value = c(68930L, 124716L, 81868L,                                         
   122406L))               
6: CopywriteR(sample.control = sample_control, destination.folder = file.path(output_folder),                
       reference.folder = pre_folder, bp.param = bp_param) at run.R.tmp.R#68                                 
5: eval(expr, envir, enclos)                          
4: eval(ei, envir)         
3: withVisible(eval(ei, envir))                       
2: source(f, ...)          
1: nvimcom:::source.and.clean("/data3/home/mburger/Work/CopyWriter/test2/run.R.tmp.R",                       
       print.eval = TRUE, echo = TRUE)

I'm using this script.
The script runs without problems for the example provided in vignette but It does not work for my data.

I don't know how to fix this. Does anyone have any clue about the source of the error?

CopywriteR halts due to failure to calculate loesses, but only on one server

Hello, I have seen a couple of threads where copywriteR throws this error

Error in value[[3L]](cond) :
  The GC-content and mappability normalization did not work due to a
  failure to calculate loesses. This can generally be solved by using
  larger bin sizes. Stopping execution of the remaining part of the
  script...

I have tried running the same job in two clusters. One is successful and gives the output; another one gives the error. I am using the same version of R and the packages, same input files, same intermediary files as far as I'm concerned (read_counts.txt). I'd be grateful for help :)

CopywriteR crashes when "calculating extension cutoff for every peak"

We have found a few WES tumor/normal pairs where CopywriteR crashes. I don't quite understand why that happens, i.e. which input triggers this, but I tracked down the combinations of data that lead to this.

This is the error:

This analysis will be run on 8 cpus 
Error in while (cov.chr[index] > lower.cutoff.peaks[i]) { : 
  argument is of length zero
Calls: do.all ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

This is the offending chunk of code (https://github.com/PeeperLab/CopywriteR/blob/master/R/CopywriteR.R#L447):

if (nrow(test) > 0) {
  for (i in seq_len(nrow(test))) {
    index <- test[i, "start"]
    while (cov.chr[index] > lower.cutoff.peaks[i]) {
      index = index - 1
    }
    test[i, "start"] <- index - read.length
    index <- test[i, "end"]
    while (cov.chr[index] > lower.cutoff.peaks[i]) {
      index = index + 1
    }
    test[i, "end"] <- index + read.length
  }
}

In the sample I am looking at, we start at test[i, "start"] = 154. The coverage at lower.cutoff.peaks[1] is 2. Now we move from position 154 down, until we find a position in cov.chr equal or smaller 2. If that does not happen, trying to get cov.chr[0] leads to the crash. I guess the function assumes that at least at cov.chr[1] the while loop should exit, but I don't know why this does not happen. In any case, this should be caught and dealt with.

I modified the code to print print(paste(cov.chr[index], lower.cutoff.peaks[i])), this is the output:

[1] "8 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"

Best,
Clemens

The following samples refer to .bam files that have not been processed by CopywriteR

I processed all bam files with CopywriteR, why this error?
Thanks!

Error in plotCNA(destination.folder = "./copynumber") : 
  The following samples refer to .bam files that have not been processed
  by CopywriteR: ‘log2.SKCM-M028-P008_C.sorted.bam’,
  ‘log2.SKCM-M035-P010_C.sorted.bam’,
  ‘log2.SKCM-M035-P011_C.sorted.bam’,
  ‘log2.SKCM-M123-P009_C.sorted.bam’,
  ‘log2.SKCM-M137-P008_C.sorted.bam’,
  ‘log2.SKCM-M233-P010_C.sorted.bam’,
  ‘log2.SKCM-M263-P011_C.sorted.bam’,
  ‘log2.SKCM-M275-P008_C.sorted.bam’,
  ‘log2.SKCM-M305-P009_C.sorted.bam’,
  ‘log2.SKCM-M306-P008_C.sorted.bam’,
  ‘log2.SKCM-M357-P010_C.sorted.bam’,
  ‘log2.SKCM-M399-P010_C.sorted.bam’,
  ‘log2.SKCM-M409-P011_C.sorted.bam’,
  ‘log2.SKCM-M424-P009_C.sorted.bam’,
  ‘log2.SKCM-M527-P010_C.sorted.bam’,
  ‘log2.SKCM-M642-P008_C.sorted.bam’,
  ‘log2.SKCM-M721-P010_C.sorted.bam’,
  ‘log2.SKCM-M747-P008_C.sorted.bam’,
  ‘log2.SKCM-M749-P010_C.sorted.bam’,
  ‘log2.SKCM-M762-P008_C.sorted.bam’,
  ‘log2.SKCM-M806-P011_C.sorted.bam’,
  ‘log2.SKCM-M807-P0

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.4 (Santiago)

Matrix products: default
BLAS: /risapps/rhel6/R/3.4.1-shlib/lib64/R/lib/libRblas.so
LAPACK: /risapps/rhel6/R/3.4.1-shlib/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] CopywriteR_2.10.0   BiocParallel_1.10.1

loaded via a namespace (and not attached):
 [1] DNAcopy_1.52.0             chipseq_1.28.0            
 [3] XVector_0.18.0             GenomicRanges_1.30.3      
 [5] BiocGenerics_0.24.0        zlibbioc_1.24.0           
 [7] GenomicAlignments_1.14.1   IRanges_2.12.0            
 [9] lattice_0.20-35            hwriter_1.3.2             
[11] GenomeInfoDb_1.14.0        tools_3.4.1               
[13] SummarizedExperiment_1.8.1 parallel_3.4.1            
[15] grid_3.4.1                 Biobase_2.38.0            
[17] data.table_1.10.4-3        latticeExtra_0.6-28       
[19] lambda.r_1.2               futile.logger_1.4.3       
[21] gtools_3.5.0               matrixStats_0.53.1        
[23] Matrix_1.2-12              GenomeInfoDbData_1.0.0    
[25] RColorBrewer_1.1-2         futile.options_1.0.0      
[27] S4Vectors_0.16.0           bitops_1.0-6              
[29] RCurl_1.95-4.10            DelayedArray_0.4.1        
[31] CopyhelpeR_1.10.0          compiler_3.4.1            
[33] Rsamtools_1.28.0           Biostrings_2.44.2         
[35] stats4_3.4.1               ShortRead_1.36.0

Is it a problem?

Warning messages:
1: replacing previous import ‘stats::sd’ by ‘BiocGenerics::sd’ when loading ‘S4Vectors’
2: replacing previous import ‘stats::var’ by ‘BiocGenerics::var’ when loading ‘S4Vectors’
3: replacing previous import ‘stats::sd’ by ‘BiocGenerics::sd’ when loading ‘IRanges’
4: replacing previous import ‘stats::var’ by ‘BiocGenerics::var’ when loading ‘IRanges’
5: multiple methods tables found for ‘as.vector’
6: multiple methods tables found for ‘unlist’

Optional parameter to not perform GC/Mappability correction

Hi Thomas,

I would like to perform my own GC/mappability corrections and blacklists on the readcount data which comes out of CopywriteR. It would be nice to have an option to set a parameter to enable this.
I now did this by hand in the following function ( correctmappa = FALSE) after downloading the tarball and compiling the package afterwards. Not the best way to go if I want update CopywriteR in the future ;)

Perform GC-content and mappability corrections (in .tng helper function)

tryCatch({
    i <- c(seq_len(ncol(data$cov)))
    NormalizeDOC <- function(i, data, .tng, usepoints, destination.folder) {
        .tng(data.frame(count = data$cov[, i], gc = data$anno$gc,
                        mappa = data$anno$mappa),
             use = usepoints & data$cov[, i] != 0, correctmappa = TRUE,
             plot = file.path(destination.folder, "qc",
                              paste0(colnames(data$cov)[i], ".png")))
    }
    ratios <- bplapply(i, NormalizeDOC, data, .tng, usepoints,
                       destination.folder, BPPARAM = bp.param)
    log2.read.counts <- matrix(unlist(ratios), ncol = length(sample.indices))
}, error = function(e) {
    stop(.wrap("The GC-content and mappability normalization did not work",
               "due to a failure to calculate loesses. This can generally",
               "be solved by using larger bin sizes. Stopping execution of",
               "the remaining part of the script..."))
})

colnames(log2.read.counts) <- paste0("log2.", sample.files[sample.indices])

failure: length > 1 in coercion to logical (_R_CHECK_LENGTH_1_LOGIC2_=true)

Hi, there's a "length > 1 in coercion to logical" in the .tng() function that is triggered when you plot, e.g. when running the vignette. To see it, you need to enable the extra checks for this, e.g. by setting _R_CHECK_LENGTH_1_LOGIC2_=true or using --as-cran:

$ $ R --vanilla CMD check --as-cran CopywriteR_2.24.0.tar.gz 
* using log directory ‘/tmp/bioc/CopywriteR.Rcheck’
* using R version 4.1.0 (2021-05-18)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* using option ‘--as-cran’
* checking for file ‘CopywriteR/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘CopywriteR’ version ‘2.24.0’
* checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Oscar Krijgsman <[email protected]>’

Package duplicated from https://bioconductor.org/packages/3.13/bioc

Unknown, possibly mis-spelled, fields in DESCRIPTION:
 ‘git_url’ ‘git_branch’ ‘git_last_commit’ ‘git_last_commit_date’

Uses the superseded package: ‘snow’

No package encoding and non-ASCII characters in the following R files:
 R/CopywriteR.R
   614:     ## <e2><80><98>Map<e2><80><99> applies a function to the corresponding elements of given vectors.

The Title field should be in title case. Current version is:
‘Copy number information from targeted sequencing using off-target reads’
In title case that is:
‘Copy Number Information from Targeted Sequencing using Off-Target Reads’

The Description field should not start with the package name,
 'This package' or similar.

The Date field is over a month old.
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking serialization versions ... OK
* checking whether package ‘CopywriteR’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking for future file timestamps ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
CopywriteR: no visible global function definition for ‘packageVersion’
CopywriteR: no visible global function definition for ‘getClass’
CopywriteR : DetectPeaks: no visible global function definition for
 ‘as’
CopywriteR : DetectPeaks: no visible global function definition for
 ‘write.table’
CopywriteR : CalculateDepthOfCoverage: no visible global function
 definition for ‘read.table’
CopywriteR : CalculateDepthOfCoverage: no visible global function
 definition for ‘as’
CopywriteR: no visible global function definition for ‘write.table’
CopywriteR: no visible global function definition for ‘pdf’
CopywriteR: no visible global function definition for ‘ecdf’
CopywriteR: no visible global function definition for ‘dev.off’
CopywriteR: no visible global function definition for ‘read.table’
plotCNA: no visible global function definition for ‘read.table’
plotCNA : <anonymous> : <anonymous>: no visible global function
 definition for ‘pdf’
plotCNA : <anonymous> : <anonymous>: no visible global function
 definition for ‘points’
plotCNA : <anonymous> : <anonymous> : <anonymous>: no visible global
 function definition for ‘segments’
plotCNA : <anonymous> : <anonymous>: no visible global function
 definition for ‘par’
plotCNA : <anonymous> : <anonymous>: no visible global function
 definition for ‘text’
plotCNA : <anonymous> : <anonymous>: no visible global function
 definition for ‘axis’
plotCNA : <anonymous> : <anonymous>: no visible global function
 definition for ‘abline’
plotCNA : <anonymous> : <anonymous>: no visible global function
 definition for ‘dev.off’
preCopywriteR: no visible global function definition for ‘as’
Undefined global functions or variables:
 abline as axis dev.off dpois ecdf getClass lines loess packageVersion
 par pdf png points ppois predict read.table rgb segments text
 write.table
Consider adding
 importFrom("grDevices", "dev.off", "pdf", "png", "rgb")
 importFrom("graphics", "abline", "axis", "lines", "par", "points",
            "segments", "text")
 importFrom("methods", "as", "getClass")
 importFrom("stats", "dpois", "ecdf", "loess", "ppois", "predict")
 importFrom("utils", "packageVersion", "read.table", "write.table")
to your NAMESPACE file (and ensure that your DESCRIPTION Imports field
contains 'methods').
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking sizes of PDF files under ‘inst/doc’ ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ...
WARNING
Error(s) in re-building vignettes:
--- re-building ‘CopywriteR.Rnw’ using Sweave
Loading required package: BiocParallel
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
 pseudoinverse used at 0.979
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
 neighborhood radius 0.001
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
 reciprocal condition number  0
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
 There are other near singularities as well. 1e-06
----------- FAILURE REPORT -------------- 
--- failure: length > 1 in coercion to logical ---
--- srcref --- 
: 
--- package (from environment) --- 
CopywriteR
--- call from context --- 
.tng(data.frame(count = data$cov[, i], gc = data$anno$gc, mappa = data$anno$mappa), 
   use = usepoints & data$cov[, i] != 0, correctmappa = TRUE, 
   plot = file.path(destination.folder, "qc", paste0(colnames(data$cov)[i], 
       ".png")))
--- call from argument --- 
!is.na(df$mappa) && !is.na(normv)
--- R stacktrace ---
where 1: .tng(data.frame(count = data$cov[, i], gc = data$anno$gc, mappa = data$anno$mappa), 
   use = usepoints & data$cov[, i] != 0, correctmappa = TRUE, 
   plot = file.path(destination.folder, "qc", paste0(colnames(data$cov)[i], 
       ".png")))
where 2: FUN(...)
where 3: doTryCatch(return(expr), name, parentenv, handler)
where 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 5: tryCatchList(expr, classes, parentenv, handlers)
where 6: tryCatch({
   FUN(...)
}, error = handle_error)
where 7: withCallingHandlers({
   tryCatch({
       FUN(...)
   }, error = handle_error)
}, warning = handle_warning)
where 8: FUN(...)
where 9: FUN(X[[i]], ...)
where 10: lapply(X, FUN_, ...)
where 11: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
where 12: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
where 13: bplapply(i, NormalizeDOC, data, .tng, usepoints, destination.folder, 
   BPPARAM = bp.param)
where 14: bplapply(i, NormalizeDOC, data, .tng, usepoints, destination.folder, 
   BPPARAM = bp.param)
where 15: doTryCatch(return(expr), name, parentenv, handler)
where 16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 17: tryCatchList(expr, classes, parentenv, handlers)
where 18: tryCatch({
   i <- c(seq_len(ncol(data$cov)))
   NormalizeDOC <- function(i, data, .tng, usepoints, destination.folder) {
       .tng(data.frame(count = data$cov[, i], gc = data$anno$gc, 
           mappa = data$anno$mappa), use = usepoints & data$cov[, 
           i] != 0, correctmappa = TRUE, plot = file.path(destination.folder, 
           "qc", paste0(colnames(data$cov)[i], ".png")))
   }
   ratios <- bplapply(i, NormalizeDOC, data, .tng, usepoints, 
       destination.folder, BPPARAM = bp.param)
   log2.read.counts <- matrix(unlist(ratios), ncol = length(sample.indices))
}, error = function(e) {
   stop(.wrap("The GC-content and mappability normalization did not work", 
       "due to a failure to calculate loesses. This can generally", 
       "be solved by using larger bin sizes. Stopping execution of", 
       "the remaining part of the script..."))
})
where 19: CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder), 
   reference.folder = file.path(data.folder, "mm10_4_20kb"), 
   bp.param = bp.param)
where 20: eval(expr, .GlobalEnv)
where 21: eval(expr, .GlobalEnv)
where 22: withVisible(eval(expr, .GlobalEnv))
where 23: doTryCatch(return(expr), name, parentenv, handler)
where 24: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 25: tryCatchList(expr, classes, parentenv, handlers)
where 26: tryCatch(expr, error = function(e) {
   call <- conditionCall(e)
   if (!is.null(call)) {
       if (identical(call[[1L]], quote(doTryCatch))) 
           call <- sys.call(-4L)
       dcall <- deparse(call)[1L]
       prefix <- paste("Error in", dcall, ": ")
       LONG <- 75L
       sm <- strsplit(conditionMessage(e), "\n")[[1L]]
       w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")
       if (is.na(w)) 
           w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], 
               type = "b")
       if (w > LONG) 
           prefix <- paste0(prefix, "\n  ")
   }
   else prefix <- "Error : "
   msg <- paste0(prefix, conditionMessage(e), "\n")
   .Internal(seterrmessage(msg[1L]))
   if (!silent && isTRUE(getOption("show.error.messages"))) {
       cat(msg, file = outFile)
       .Internal(printDeferredWarnings())
   }
   invisible(structure(msg, class = "try-error", condition = e))
})
where 27: try(withVisible(eval(expr, .GlobalEnv)), silent = TRUE)
where 28: evalFunc(ce, options)
where 29: tryCatchList(expr, classes, parentenv, handlers)
where 30: tryCatch(evalFunc(ce, options), finally = {
   cat("\n")
   sink()
})
where 31: driver$runcode(drobj, chunk, chunkopts)
where 32: utils::Sweave(...)
where 33: engine$weave(file, quiet = quiet, encoding = enc)
where 34: doTryCatch(return(expr), name, parentenv, handler)
where 35: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 36: tryCatchList(expr, classes, parentenv, handlers)
where 37: tryCatch({
   engine$weave(file, quiet = quiet, encoding = enc)
   setwd(startdir)
   output <- find_vignette_product(name, by = "weave", engine = engine)
   if (!have.makefile && vignette_is_tex(output)) {
       texi2pdf(file = output, clean = FALSE, quiet = quiet)
       output <- find_vignette_product(name, by = "texi2pdf", 
           engine = engine)
   }
   outputs <- c(outputs, output)
}, error = function(e) {
   thisOK <<- FALSE
   fails <<- c(fails, file)
   message(gettextf("Error: processing vignette '%s' failed with diagnostics:\n%s", 
       file, conditionMessage(e)))
})
where 38: tools:::buildVignettes(dir = "/tmp/bioc/CopywriteR.Rcheck/vign_test/CopywriteR", 
   ser_elibs = "/scratch/henrik/RtmpHwwNKw/file312312259755.rds")

--- value of length: 7825 type: logical ---
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
...
[7813] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
--- function from context --- 
function (df, use, correctmappa = TRUE, plot = NULL, verbose = TRUE) 
{
   if (!is.logical(use) && length(use) == nrow(df)) 
       stop("use should be logicval vector with same size as df")
   if (!is.null(plot)) {
       if (!is.logical(plot)) {
           if (verbose) 
               cat("Plotting to file", plot, "\n")
           png(plot, width = 700, height = 1400)
           par(mfrow = c(2, 1))
           on.exit(dev.off())
           plot <- TRUE
       }
       else if (plot) {
           par(mfrow = c(2, 1))
       }
   }
   gcuse <- (use & !is.na(df$mappa) & df$mappa > 0.8 & !is.na(df$gc) & 
       df$gc > 0)
   rough <- loess(count ~ gc, data = df, subset = gcuse, span = 0.03)
   i <- seq(0, 1, by = 0.001)
   final <- loess(predict(rough, i) ~ i, span = 0.3)
   normv <- predict(final, df$gc)
   df$countgcloess <- df$count/(normv/median(normv, na.rm = TRUE))
   if (plot) {
       plot(count ~ gc, data = df, subset = gcuse, ylim = quantile(df$count[gcuse], 
           c(1e-04, 0.999)), xlim = c(0, 1), pch = ".")
       points(count ~ gc, data = df, subset = !gcuse, col = rgb(1, 
           0, 0, 0.3), pch = ".")
       lines(i, predict(rough, i), col = "green")
       points(df$gc, normv, col = "red", pch = ".")
   }
   if (correctmappa) {
       mappause <- (use & !is.na(df$mappa))
       rough <- loess(countgcloess ~ mappa, data = df, subset = mappause, 
           span = 0.03)
       i <- seq(0, 1, by = 0.001)
       final <- loess(predict(rough, i) ~ i, span = 0.3)
       normv <- predict(final, df$mappa)
       df$countgcmappaloess <- df$countgcloess/(normv/median(normv, 
           na.rm = TRUE))
       if (plot) {
           plot(countgcloess ~ mappa, data = df, subset = mappause, 
               ylim = quantile(df$countgcloess[mappause], c(1e-04, 
                 0.999), na.rm = TRUE), xlim = c(0, 1), pch = ".")
           points(countgcloess ~ mappa, data = df, subset = !mappause, 
               col = rgb(1, 0, 0, 0.3), pch = ".")
           lines(i, predict(rough, i), col = "green")
           subset.points <- !is.na(df$mappa) && !is.na(normv)
           points(df$mappa[subset.points], normv[subset.points], 
               col = "red", pch = ".")
       }
       return(log2(df$countgcmappaloess/median(df$countgcmappaloess[use], 
           na.rm = TRUE)))
   }
   else {
       log2(df$countgcloess/median(df$countgcloess[use], na.rm = TRUE))
   }
}
<bytecode: 0x46f060f0>
<environment: namespace:CopywriteR>
--- function search by body ---
Function .tng in namespace CopywriteR has this body.
----------- END OF FAILURE REPORT -------------- 

Error: processing vignette 'CopywriteR.Rnw' failed with diagnostics:
chunk 10 (label = CopywriteR) 
Error in value[[3L]](cond) : 
 The GC-content and mappability normalization did not work due to a
 failure to calculate loesses. This can generally be solved by using
 larger bin sizes. Stopping execution of the remaining part of the
 script...

--- failed re-building ‘CopywriteR.Rnw’

SUMMARY: processing the following file failed:
 ‘CopywriteR.Rnw’

Error: Vignette re-building failed.
Execution halted

* checking PDF version of manual ... OK
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE

Status: 1 WARNING, 2 NOTEs
See
 ‘/tmp/bioc/CopywriteR.Rcheck/00check.log’
for details.

Solution

The problem is in:

CopywriteR/R/private.R

Lines 120 to 122 in f64df3f

    
           points(df$mappa, normv, 
        
                  subset = (!is.na(df$mappa) && !is.na(normv)), 
        
                  col = "red", pch = ".")

which shouldn't use && but &, i.e. it should use:

subset = (!is.na(df$mappa) & !is.na(normv)),

When you use && without the extra checks, you're effectively using:

subset = (!is.na(df$mappa[1]) & !is.na(normv[1])),

which I assume you don't want.

plotCNA after analysis tells me to re-analyse

I've just used the copywriteR package to analyze a tumour-normal pair. After the execution I have run the plotCNA function as described. The output is that I should re-analyze the sample because it does not find the control.

 plotCNA(file.path("PE08_copywriteR"))
[1] "log2.PE08_ID-ready.bam" "log2.PE08_CR-ready.bam"
[1]  TRUE FALSE
[1] "log2.PE08_CR-ready.bam"
Error in plotCNA(file.path("PE08_copywriteR")) : 
  The following samples refer to .bam files that have not been processed
  by CopywriteR: ‘log2.PE08_CR-ready.bam’ . Please re-analyze all
  required samples using CopywriteR and run the plotCNA function again.

Something cannot be right, but I'm not able to guess what. Could you help me out on this?

Thanks
Michael

issue about the bin size

Hi,

I have problem running the CopywriteR, the error message is:
"Error in value[3L] :
The GC-content and mappability normalization did not work due to a failure to calculate loesses. This can
generally be solved by using larger bin sizes. Stopping execution of the remaining part of the script...
In addition: Warning message:
In png(plot, width = 700, height = 1400) :
unable to open connection to X11 display "

But I think the bin size I used is pretty large( 200kb ), I remember I saw the same error message last month when I tested this package, I realized the problem was because I used the de-duplicated bam files by mistake. I changed the bam files and successfully ran it. This time I used the normal bam file, same code to run, but it still crashed.

Could you please help me to figure it out? My code and log files are attached.
copywriteR_test.zip

Thank you very much!

Workers don't locate BAM files in parallel mode

Hi,

When running CopywriteR with BiocParallel with a bp.param like this one:

bp.param <- SnowParam(workers= 5, type= "SOCK")

I'm getting errors regarding BAM files not found in the workers

Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: failed to open BamFile: file(s) do not exist:

These errors are not occurring in multicore or singlecore runs and seem to be caused by using the basename of BAM paths instead of the full BAM paths.

Here is the traceback() output of the error

4: stop(.error_bplist(res))
3: bplapply(sample.files, Stats.2, BPPARAM = bp.param)
2: bplapply(sample.files, Stats.2, BPPARAM = bp.param)
1: CopywriteR(sample.control = data.frame(samples, controls), destination.folder = outpF,
reference.folder = binref, capture.regions.file = roi, bp.param = bp.param)

I hope that helps.

	points(df$mappa, normv,
	subset = (!is.na(df$mappa) && !is.na(normv)),
	col = "red", pch = ".")