gavinha / titancna Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 36.0 10.8 MB

Analysis of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH) in cancer

License: GNU General Public License v3.0

R 89.08% C 8.96% Python 1.85% Shell 0.11%

10x-genomics copy-number-variation genome-sequencing hmm tumor-heterogeneity

titancna's People

Contributors

Stargazers

Watchers

titancna's Issues

titanCNA.R excluding chromosome X for males by default despite correction

Why does titanCNA.R exclude sex chromosomes for males?

TitanCNA/scripts/R_scripts/titanCNA.R

Lines 191 to 194 in 2a7e268

    
           ## exclude chrX if gender==male ## 
        
           if (gender == "male" || gender == "Male" || gender == "MALE"){ 
        
           	chrs <- chrs[chrs!=grep("X", chrs, value=TRUE)] 
        
           }

It seems that correctIntegerCN already adjusts for sex, so no need to remove X for males?
https://github.com/gavinha/TitanCNA/blob/master/R/utils.R#L1213-L1227

	# Adjust chrX copy number if purity is sufficiently high
	# males - all data points in chrX is corrected
	# females - only  
	if (purity >= minPurityToCorrect){
		if (gender == "male" & length(chrXStr) > 0){
			segs[Chromosome == chrXStr, Corrected_Copy_Number := as.integer(round(logR_Copy_Number))]
			segs[Chromosome == chrXStr, Corrected_Call := names[Corrected_Copy_Number + 2]]
			cn[Chr == chrXStr, Corrected_Copy_Number := as.integer(round(logR_Copy_Number))]
			cn[Chr == chrXStr, Corrected_Call := names[Corrected_Copy_Number + 2]]
		}else if (gender == "female"){
			segs[Chromosome == chrXStr & Copy_Number >= maxCNtoCorrect.X, Corrected_Copy_Number := as.integer(round(logR_Copy_Number))]
			segs[Chromosome == chrXStr & Copy_Number >= maxCNtoCorrect.X, Corrected_Call := "HLAMP"]
			cn[Chr == chrXStr & CopyNumber >= maxCNtoCorrect.X, Corrected_Copy_Number := as.integer(round(logR_Copy_Number))]
			cn[Chr == chrXStr & CopyNumber >= maxCNtoCorrect.X, Corrected_Call := "HLAMP"]
		}

Warn when data is different from the genome style

Correcting reads with targeted regions failed because I had genomeStyle set to NCBI while my data was in UCSC style, therefore no overlaps would be found.

invalid 'chr' reference

TitanCNA/R/plotting.R

Line 266 in 49081fb

dataIn <- dataIn[Chr %in% chr, ]

error running snakemake -s TitanCNA.snakefile -np

Hi there,

I've git clone's TitanCNA and updated my samples.yaml to reflect the samples I'll be using however, when trying to run through the snakemake I get this error from the first step:

Terminal command:
snakemake -s TitanCNA.snakefile -np

Output:
SyntaxError:
Not all output, log and benchmark files of rule read_counter contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
File "/home/pchamely/TitanCNA/scripts/snakemake/TitanCNA.snakefile", line 4, in
File "/home/pchamely/TitanCNA/scripts/snakemake/ichorCNA.snakefile", line 25, in

This is my first time using snakemake so I'm not sure what this means and was wondering if you could walk me through it?

seqlevelsStyle error

Hello!

When i trying to run snakemake version of pipeline using hg38 genome, when this command is executed:

Rscript ../R_scripts/titanCNA.R --hetFile results/titan/tumCounts/tumor_sample_1.tumCounts.chrom_fix.txt --cnFile results/ichorCNA/tumor_sample_1/tumor_sample_1.correctedDepth.txt --outFile results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.titan.txt --outSeg results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.segs.txt --outParam results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.params.txt --outIGV results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.seg --outPlotDir results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1/ --libdir ../../ --id tumor_sample_1 --numClusters 1 --numCores 1 --normal_0 0.5 --ploidy_0 3 --chrs "c(1:22)" --estimateNormal map --estimatePloidy True --estimateClonality True  --centromere /home/vsvekolkin/projects/ichorCNA/inst/extdata/GRCh38.GCA_000001405.2_centromere_acen.txt --alphaK 2500 --txnExpLen 1e15 --plotYlim "c(-2,4)"

Following error is produced:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘seqinfo’ for signature ‘"integer"’
Calls: seqlevelsStyle<- ... seqlevels -> seqlevels -> seqlevels -> seqinfo -> <Anonymous>
Execution halted

I believe this is error is caused by parameter --chrs "c(1:22)", which in turn affects GenomeInfoDb , which is used in following line of runTitanCNA script:

seqinfo <- Seqinfo(genome=genomeBuild)
seqlevelsStyle(chrs) <- genomeStyle

Parameter creates integer array, i tried to convert it into character array using: --chrs "paste(c(1:22))"

And i get following error:

Running TITAN...
titan: Loading data results/titan/tumCounts/tumor_sample_1.tumCounts.chrom_fix.txt
titan: Loading GC content and mappability corrected log2 ratios...
titan: Extracting read depth...
Removed 168 centromeric positions
Removed Chrs: 
titan: Loading default parameters
titan: Using 1 cores.
titan: Parameter estimation
Error in chrsI[[i]] <- which(data$chr == chrs[i]) : 
  attempt to select less than one element in integerOneIndex
Calls: runEMclonalCN
Execution halted

I believe this is caused by incorrect filtering in removeCentromere function, but i cannot fix cause of error. Any help is appreciated.

Provide a list of TitanCall abbreviations in an easily-accessible form

As the subject says, having to hunt a supplementary table in a manuscript to see what an abbreviation indicates is awkward (and, IIRC, there is no "supplementary table 2" in the TitanCNA paper).

Ideally this should be in the README.md or in the repo (or in the help files).

Where to download v1.5.8

Hello,

I need to download TitanCNA 1.5.8 to use it with Titanrunner pipeline. I couldn't find a link to download this version. Is it available for download?

Thank you

Can Titan deal with WES data?

Hi
I want to use TitanCNA and PhyloWGS on my WES data. I noticed that these tools are designed for WGS, so I wonder that have you tried TitanCNA on WES data and does it work well?
Many thanks!

Yang

object 'genes' not found on hg19 test runs

Gavin and all;
I'm hitting an issue with build 37 test runs and not sure about the cause. These are small runs that I don't expect real results on, so I'm not sure if it's due to something with that are being build 37. Using the latest development script it errors out with object 'genes' not found from plotCNlogRByChr plotting:

TitanCNA/scripts/R_scripts/titanCNA.R

Line 346 in eb288ea

normal = norm, geneAnnot=genes, spacing=4, main=id, xlab="",

I'm not sure where genes gets injected into the environment and available. My guess is it comes from some import I'm missing or is out of date. Do you have any pointers so I can try to track down further?
Thanks so much for any ideas.

set -o pipefail; unset R_HOME && unset R_LIBS && export PATH=/usr/local/share/bcbio/anaconda/bin:$PATH && titanCNA.R --id c-tumor --hetFile /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/structural/c-tumor/titancna/PairedBatch-effects-annotated-damage-ann-vardict-prep.csv --cnFile /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/structural/c-tumor/titancna/c-tumor-normalized.cn --numClusters 1 --ploidy 2 --numCores 1 --outDir /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/bcbiotx/tmprS3mKs --libdir None --chrs "c('22','X')"  --genomeBuild hg19 --genomeStyle UCSC
There were 11 warnings (use warnings() to see them)
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
    strsplit
Running TITAN...
titan: Loading data /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/structural/c-tumor/titancna/PairedBatch-effects-annotated-damage-ann-vardict-prep.csv
titan: Loading GC content and mappability corrected log2 ratios...
titan: Extracting read depth...
Removed Chrs:
titan: Loading default parameters
Warning message:
In cor(data$ref/data$tumDepth, data$logR, use = "pairwise.complete.obs") :
  the standard deviation is zero
titan: Using 1 cores.
titan: Parameter estimation
Optimal state path computation: Using 1 cores.
Writing results to /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/bcbiotx/tmprS3mKs/c-tumor_cluster01.titan.txt, /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/bcbiotx/tmprS3mKs/c-tumor_cluster01.segs.txt, /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/bcbiotx/tmprS3mKs/c-tumor_cluster01.params.txt
titan: Saving parameters to /home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/bcbiotx/tmprS3mKs/c-tumor_cluster01.params.txt
$dens.bw
[1] NaN
$scat
[1] NaN
$S_Dbw
[1] NaN
Error in plotCNlogRByChr(dataIn = results, chr = chrs, segs = segs, ploidy = ploidy,  :
  object 'genes' not found

invalid 'chr' reference

TitanCNA/R/plotting.R

Line 543 in 49081fb

dataIn <- dataIn[Chromosome %in% chr, ]

Using GATK CNV with TITAN

Hi!

I'm wondering how to use the new GATK4 copy number segmentation as input to TITAN. I guess it should be possible (see eg. https://software.broadinstitute.org/gatk/documentation/article?id=11088) but I'm not sure how. I was able to use a recent pipeline to convert its output to a format that mirrors the Allelic CapSeg output, which could be used for eg. ABSOLUTE. But I'd like to use TITAN.

Any advice?

Thanks!

1-based coordinates in segment output

Snakemake ChildIOExcpetion

snakemake -s ichorCNA.snakefile -np
Building DAG of jobs...
ChildIOException:
File/directory is a child to another output:
/spin1/home/TitanCNA/scripts/snakemake/results/ichorCNA/tumor_X
/spin1/home/TitanCNA/scripts/snakemake/results/ichorCNA/tumor_X/tumor_X.cna.seg

As per error #26 , switching back from snakemake 5.3 to 3.8 seemed to resolve the problem, but I wanted to make the problem visible.

getSubcloneProfiles error

Error in data.table(Subclone1 = subc1) : object 'subc1' not found
Calls: outputTitanResults ... eval -> eval -> eval -> getSubcloneProfiles -> data.table

This is likely related to an incompatible solution (i.e. ploidy and number of clusters). I will put in a few checks to allow it to complete.

homozygous deletion (CDKN2A) being classified as HLAMP

Hi Gavin,

Apologies for raising so many issues. Completely understand if you don't have time to address everything. I hope that I contribute to improving this excellent software in a way.

I've found some samples that show a homozygous deletion in CDKNA, but that TITAN classifies as HLAMP, despite getting the corrected_copy_number and logr_copy_number correct. Do you know why this could be happening?

pair_barcode	chrom	pos	num_snp	median_ratio	median_logr	titan_state	titan_call	copy_number	major_cn	clonal_cluster	cellular_prevalence	logr_copy_number	corrected_copy_number	corrected_call
xxx	9	[21900599,21997016)	26	0.854816	-0.778442	20	ALOH	8	8	3	0.21310998	0.015625	0	HLAMP
TCGA-14-1402	9	[211762,34639475)	22534	0.862745	-0.770525	20	ALOH	8	8	1	0.99649755	0.24458808	0	HLAMP
xxx	9	[334016,35884107)	690	0.871795	-0.68648	20	ALOH	8	8	3	0.42151919	0.55352253	1	HLAMP

The second sample is a TCGA sample and I can share it as needed.

Floris

P.s. This issue raises another one in my mind: how good is TITAN at detecting homozygous deletions, as are common in eg. CDKN2A? I wonder because I imagine that the read depth at the normal het site in the matching tumors in this area may not meet the minimum threshold set by TITAN. Any thoughts on this?

Corrected_Copy_Number vs Copy_Number

Hello,

The .segs.txt file lists both Copy number, Minor CN, Major CN, and a Corrected_Copy_Number

For the CN, I see that there is a max of CN 8 in the titanCNA script, however the Corrected copy number at one point is up to 84. Given that I've seen massive increase in a few other copy number callers and not just in Titan, so I have reason to believe it's a realistic integer, I was wondering if it would be fair to use this Corrected_Copy_Number in my next analysis, specifically using this as the total copy number and assigning the Major CN and Minor CN to be:

Major_Cn = Corrected_Copy_Number - MinorCN

otherwise, could you maybe elaborate on the Corrected_Copy_Number derivation and how it differs from the Max/Min CN in the other columns.

Thank you!

Multi-sample TITAN

Hi Gavin,

Your paper that you shared in a previous post (https://www.ncbi.nlm.nih.gov/pubmed/29909985) uses PyClone on top of TITAN, using the TITAN major and minor copy number (I guess) to provide the correct input to PyClone. I'm wondering why did you chose this approach?

For my data, I am interested in comparing clonal clusters across multiple related samples. TITAN I guess is a single-sample approach: "cluster 2" in two related samples may not refer to the same cluster. PyClone on the other hand can take a multi-sample approach, and would be able to track the same clusters between multiple related samples. Is this a sound strategy? Do you have any special suggestions for using TITAN output as PyClone input?

Thanks!
Floris

invalid 'chr' reference

TitanCNA/R/plotting.R

Line 358 in 49081fb

dataIn <- dataIn[Chr %in% chr, ]

Optimize parameters to reduce overestimation of genome doubling

After numerous changes and tweaks to the default parameter settings over the years, it appears from my experience and from others that TitanCNA tends to overestimate genome-doubling.

I will adjust the default parameters to help correct for this.
Write a Parameter Settings and Guidelines Section in the Wiki
- Highlight the arguments/parameters to change so that users can tailor the analysis to their dataset if they run into issues with the results.

getPositionOverlap function

Hello Gavin,

I am following your tutorial but got into a trouble from the very beginning step.
Would you mind to take a closer look at "getPositionOverlap" function?
I installed this package a few weeks ago.

Thank you for your help.
Lijin

logR <- getPositionOverlap(chr=data$chr, posn=data$posn, dataVal=cnData)
head(logR)
[1] NA NA NA NA NA NA
tail(logR)
[1] NA NA NA NA NA NA

invalid 'Chr' reference

TitanCNA/R/plotting.R

Line 50 in 49081fb

dataIn <- dataIn[Chr %in% chr, ]

fwdBack error

I'm running TitanCNA v1.4.0 and have the following issue:

>
> convergeParams <- runEMclonalCN(workdata,gParams=params$genotypeParams,nParams=params$normalParams,
+                                 pParams=params$ploidyParams,sParams=params$cellPrevParams,
+                                 maxiter=20,maxiterUpdate=1500,txnExpLen=1e15,txnZstrength=1e5,
+                                 useOutlierState=FALSE,
+                                 normalEstimateMethod="map",estimateS=TRUE,estimatePloidy=TRUE)
titan: Running HMM...
fwdBack: Iteration 1 chr: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 24 *** Error in `/usr/local/R/R-3.1.1/lib/R/bin/exec/R': free(): invalid next size (fast): 0x0000000003a5c8a0 ***

How can I solve this issue? I'm working with UCSC style chromosome names.

Parameters in TitanCNA snakemake pipeline

Hi Gavin,
Thanks for your previous suggestions and I can run TitanCNA using snakemake pipeline successfully now. I have some questions about the parameters in snakemake pipeline:

First is about the parameter: numMaxClonalClusters. Does it mean the max number of clonal clusters to be considered by TitanCNA? In my understanding, there should be only one clonal cluster and all CNAs in that cluster have a cellular prevalence = 1. Or does it just mean the number of clusters to be considered?
I'm not familiar with ichorCNA but I noticed that there are many parameters to be set for it, some of which seems similar to TitanCNA (e.g. ichorCNA_normal/ichorCNA_ploidy and TitanCNA_normalInit/TitanCNA_maxPloidy). I don't know how much influence will the result of ichorCNA have on downstream analysis by TitanCNA. Is the default setting appropriate for my data?
Information for my data:
WXS
depth: 300X
purity: 0.2-0.6
ploidy: 2-4, most samples are 2, about 15% are 3, less than 5% are 4.
(The purity and ploidy are estimated by two other softwares: Sequenza, Facets, and their consistency is good, so I think the estimation should be credible.)
I'm running TitanCNA on a cluster and want to use 20 cores. But our job submission system is different from yours. Our rule is something like:
Run -node_type -num_of_node -num_of_cpus_per_node [command].
So will it work if I set the TitanCNA_numCores in config.yaml to 20 and use the command below?
Run -fat4way -1 -20 snakemake -s TitanCNA.snakefile --cores 20
If I have a SNV with CP calculated by those frequently-used softwares (PyClone, PhyloWGS...) and a CNA with CP calculated by TitanCNA, could I simply compare the CPs to judge their relative order during tumor evolution. For example, CP(SNV) > CP(CNA) means the SNV occurs previous to the CNA? I know it maybe unreasonable because the CP estimated by different softwares might be uncomparable, but I don't know other way to achieve my goal. Do you have some suggestions?

Thank you very much!

Yang

No results, no exception thrown with failed ichorCNA snakemake

Hi, I'm having trouble having the snakemake output results. I'm attempting to run the command "snakemake -s ichorCNA.snakefile -np" and get the following output (see below). I unfortunately can't tell clearly the error from the output, as the output does not explicitly throw an exception or state which process/dependency might be improperly linked to in config.yaml. It would seem that the output of the command would be in the "results" directory, but there is none to be found.

I'm using the following versions of software:
TitanCNA - 1.17.0 (I believe the current version in the Titan snakemake repo)
ichorCNA - 0.1.0
HMMcopy - 1.24.0
optparse - 1.6.0
SNPchip - 2.28.0
stringr - 1.3.1

Python-3.5.2
snakemake - 3.12.0
PySAM-0.15.1
PyYAML-3.13

samtools-1.3.1
HMM_Copy_Utils-1.0

rule read_counter:
input: /home/ubuntu/Sample_PM1043_Z11_1_p2_Case_HALO.md.bam
output: results/readDepth/tumor_sample_1.bin10000.wig
log: logs/readDepth/tumor_sample_1.bin10000.log
jobid: 1
wildcards: samples=tumor_sample_1, binSize=10000
resources: mem=4

/home/ubuntu/hmmcopy_utils/bin/readCounter /home/ubuntu/Sample_PM1043_Z11_1_p2_Case_HALO.md.bam -c 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y -w 10000 -q 20 > results/readDepth/tumor_sample_1.bin10000.wig 2> logs/readDepth/tumor_sample_1.bin10000.log

rule read_counter:
input: /home/ubuntu/Sample_PM1043_EBC4_2_p2_Ctrl_HALO.md.bam
output: results/readDepth/normal_sample_1.bin10000.wig
log: logs/readDepth/normal_sample_1.bin10000.log
jobid: 3
wildcards: samples=normal_sample_1, binSize=10000
resources: mem=4

/home/ubuntu/hmmcopy_utils/bin/readCounter /home/ubuntu/Sample_PM1043_EBC4_2_p2_Ctrl_HALO.md.bam -c 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y -w 10000 -q 20 > results/readDepth/normal_sample_1.bin10000.wig 2> logs/readDepth/normal_sample_1.bin10000.log

rule ichorCNA:
input: results/readDepth/tumor_sample_1.bin10000.wig, results/readDepth/normal_sample_1.bin10000.wig
output: results/ichorCNA/tumor_sample_1/tumor_sample_1.correctedDepth.txt, results/ichorCNA/tumor_sample_1/, results/ichorCNA/tumor_sample_1/tumor_sample_1.cna.seg
log: logs/ichorCNA/tumor_sample_1.log
jobid: 2
wildcards: tumor=tumor_sample_1
resources: mem=4

Rscript /home/ubuntu/ichorCNA/scripts/runIchorCNA.R --libdir /home/ubuntu/ichorCNA/R/ --id tumor_sample_1 --WIG results/readDepth/tumor_sample_1.bin10000.wig --gcWig /home/ubuntu/ichorCNA/inst/extdata/gc_hg19_10kb.wig --mapWig /home/ubuntu/ichorCNA/inst/extdata/map_hg19_10kb.wig --NORMWIG results/readDepth/normal_sample_1.bin10000.wig --ploidy "c(2,3)" --normal "c(0.5)" --maxCN 8 --includeHOMD False --genomeStyle NCBI --chrs "c(1:22, "X")" --estimateNormal True --estimatePloidy True --estimateScPrevalence True --scStates "c(1,3)" --centromere /home/ubuntu/ichorCNA/inst/extdata/GRCh37.p13_centromere_UCSC-gapTable.txt --exons.bed None --txnE 0.9999 --txnStrength 10000 --fracReadsInChrYForMale 0.001 --plotFileType png --plotYLim "c(-2,4)" --outDir results/ichorCNA/tumor_sample_1/ > logs/ichorCNA/tumor_sample_1.log 2> logs/ichorCNA/tumor_sample_1.log

localrule correctDepth:
input: results/ichorCNA/tumor_sample_1/tumor_sample_1.cna.seg, results/ichorCNA/tumor_sample_1/tumor_sample_1.correctedDepth.txt, results/readDepth/tumor_sample_1.bin10000.wig, results/readDepth/normal_sample_1.bin10000.wig
jobid: 0

Job counts:
count jobs
1 correctDepth
1 ichorCNA
2 read_counter
4

Any help would be appreciated! Thank you!

selectSolution script expects files with wrong names

The main TitanCNA script generates file with param in their names for parameters. However, selectSolution looks (using list.files) for files using params as pattern, and therefore does not run to completion because all file lists are empty.

plotClonalFrequency does not plot all clusters

In my case, I had 4 clonal clusters. In the code for the function, clusters expands to

> clusters
   ClonalCluster CellularPrevalence
1:             2          0.7714099
2:             3          0.6370672
3:             1          1.0000000
4:             4          0.5077241

When plotting, the code does

if (nrow(clusters) > 0) {
            for (j in 1:length(clusters[, 1])) {

However:

> 1:length(clusters[, 1])
[1] 1

So only whatever is in the first line (and not necessarily the most abundant cluster) is plotted on the graph. I will make a PR rewriting the part using iteration on the whole data.table (which is small).

Specifying the orientation and size of gene labels in plotCNlogRByChr

Hi Gavin,

I was wondering whether it was possible to specify the size and orientation of the gene labels in the plotCNlogRByChr() function. I am using the geneAnnot parameter to pass in gene labels. It appears the code calls plotGeneAnnotation, and the values seem to be hardcoded.

Thanks,

computeSDbwIndex correct?

In file https://github.com/gavinha/TitanCNA/blob/master/R/utils.R
in function computeSDbwIndex code:

else if (S_Dbw.method == "Tong"){
			###### NOTE #######
			## The authors originally used ((N-ni)/N), but this is incorrect ##
			## We want a weighted-sum ##
			scat.Ci[i] <- ((ni) / N) * (var.Ci/var.D)

Are your sure?

In paper (Youngok Kim and Soowon Lee. A clustering validity assessment Index., 2003,, 602–608):

If two clusters have the same variance but different numbers of tuples, the variance
of cluster with more tuples has more effect on the scatter function than the one with
less tuples. By minimizing the influence of small clusters or noisy clusters, Scat* can
be more robust than Scat of S_Dbw index.

Alexander Lashkov

TitanCNA snakemake pipeline errors

Hi,

I'm currently trying to get the snakemake pipeline up and running, I got to the point where the piepline is attempting to run the rule runTitanCNA:

rule runTitanCNA:
    input: results/titan/tumCounts/tumor_sample_1.tumCounts.txt, results/ichorCNA/tumor_sample_1/tumor_sample_1.correctedDepth.txt
    output: results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster2.params.txt, results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster2.titan.txt, results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster2/, results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster2.segs.txt, results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster2.seg
    log: logs/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster2.log
    jobid: 4
    wildcards: clustNum=2, tumor=tumor_sample_1, ploidy=3

The error I get is:

In storage.mode(peek.optstring) = spec[current.flag, col.mode] :
  NAs introduced by coercion
Running TITAN...
titan: Loading data results/titan/tumCounts/tumor_sample_1.tumCounts.txt
titan: Loading GC content and mappability corrected log2 ratios...
titan: Extracting read depth...
Removed 46 centromeric positions
Removed Chrs:
titan: Loading default parameters
Model selection: Using 1 cores.
Error in uniroot(funcP, intervalPhi, tol = 1e-15) :
  f.lower = f(lower) is NA
Calls: runEMclonalCN ... estimateClonalCNParamsMap -> updateParameters -> uniroot
Execution halted

I've noticed there are some NAs in the file results/ichorCNA/tumor_sample_1/tumor_sample_1.correctedDepth.txt. see below. Could this be the problem?

chr	start	end	log2_TNratio_corrected
1	1000001	2000000	NA
1	2000001	3000000	NA
1	4000001	5000000	-0.115543015089316
1	5000001	6000000	-0.0223702614590051
1	6000001	7000000	-0.106587597655531
1	7000001	8000000	0.0909244472332157
1	8000001	9000000	0.141146096230084
1	9000001	10000000	-0.0804800930951752

Any ideas on what's going wrong?
Thanks
Rob

Clarify the term "cellular prevalence" in the docs

If it is explained in the paper, feel free to close this issue. The definition is clear when you have a single clonal cluster, but when there are more, it gets a little counter-intuitive:

If a segment for a specific cluster (say, 4) is listed as a 50% prevalence, does that mean that 50% of the cells of cluster 4 carry the alteration, or is it referring to the total of the tumor cells?
Likewise, selectSolution indicates the cellular prevalence in a syntethic way, in order of larger -> smaller (example: 1, 0.86, 0.68). In this case, what does the prevalence signify? The percentage of general alterations in the specific tumor cluster?

could not find function "extractAlleleReadCounts"

Dear Gavin,

I am trying to run TitanCNA with WGS data generated from a matched tumor/normal pair. After preparing the data and reference using the HHMcopy suite according to the vignette on Bioconductor, I tried to run the function "extractAlleleReadCounts" to generate the allele counts file but I get the error message 'could not find function "extractAlleleReadCounts"'. Do I have to load additional packages? Or do I have to generate the file outside from R as well? I do find the help page for the function but cannot use the function itself. I have installed the package (TitanCNA_1.16.0) using Bioconductor.

Many thanks in advance
Marcel

Segmentation results missing expected gene

Just wondering if you can shed some light on the segmentation results.

I know this sample has an amplification of EGFR, chr7:55086714-55324313
However when I examine the .segs.txt file,

chr7 | 15726066 | 50175681 | 164
chr7 | 51096036 | 53103673 | 4
chr7 | 56059591 | 56174140 | 6

The whole gene seems to have been excluded.

Where do the start and stop come from for these segs?

Why are certain parts of the genome excluded?

TitanCNA error if --libdir is omitted

Using the bioconda release of TitanCNA (https://anaconda.org/bioconda/bioconductor-titancna) I am getting an error if the --libdir parameter is omitted from the command line:

$ Rscript `which titanCNA.R` --id tmp2 --hetFile tmp.hets.tsv --cnFile tmp2.cr.tsv --numClusters 1 --numCores 1 --normal_0 0.5 --ploidy_0 2 --chrs "c(1:22, \"X\")" --estimatePloidy TRUE --outDir ./ 
There were 11 warnings (use warnings() to see them)
Error in if (!is.null(libdir) & libdir != "None") { : 
  argument is of length zero
Execution halted

If I specify --libdir it works fine.

I suppose that

 if (!is.null(libdir) & libdir != "None")

should be changed to

 if (!is.null(libdir) && libdir != "None") { :

EDIT: looks also to be the case in current repo version https://github.com/gavinha/TitanCNA/blob/master/scripts/R_scripts/titanCNA.R#L111

Script input error

Hi,

I've decided to move ahead in hg38 with a smaller bin size of mapability and GC% as the mapability file is much quicker to make at smaller BED intervals. The first TenX script doesn't appear to be pulling in the haplotype.R script. Any advice on what I'm doing wrong?

Rscript getMoleculeCoverage.R
--id TMI40424
-l TitanCNA/scripts/
-d Final_GCandMAP/
-t TMI40424_Tumor.10kb.bxTile.chrX.bed
-n TMI40424_Germline.10kb.bxTile.chrX.bed
--minReadsPerBX=4
--chrs "X"
--outDir TMI40424/
--centromere GRCh38.p13_centromere_UCSC-gapTable.txt

Error:

Loading required package: grid
Error: could not find function "loadBXcountsFromBEDDir"
Execution halted

Best,
Jake

Mappability track

Hi,

Is there any chance that you have an hg38 mappability file fo rthe 10x Whole Genome Sequencing pipeline (map_hg19_10kb.wig)? I can make them via the gem-mappability tool, but a 10kb length is super inefficient in processing time.

selectSolution not selecting the best solution

In a test sample I'm finding that selectSolution is not selecting "what should biologically speaking" be the best solution.

For some reason, it prefers this ploidy=4, clusters=3 solution:

Normal contamination estimate:	0.6033
Average tumour ploidy estimate:	3.869
Clonal cluster cellular prevalence Z=2:	1 0.7426
AllelicRatio binomial means for clonal cluster Z=1:	0.5 0.6237 0.6983 0.5198 0.7483 0.5828 0.784 0.642 0.5284 0.8109 0.6865 0.5622 0.8318 0.7212 0.6106 0.5332 0.8485 0.749 0.6494 0.5498 0.8623 0.7717 0.6811 0.5906 0.5362
logRatio Gaussian means for clonal cluster Z=1:	-1.184 -0.7739 -0.455 -0.455 -0.194 -0.194 0.02699 0.02699 0.02699 0.2186 0.2186 0.2186 0.3877 0.3877 0.3877 0.3877 0.539 0.539 0.539 0.539 0.6759 0.6759 0.6759 0.6759 0.6759
AllelicRatio binomial means for clonal cluster Z=2:	0.5 0.5864 0.6473 0.5147 0.6926 0.5642 0.7276 0.6138 0.5228 0.7554 0.6532 0.5511 0.7781 0.6854 0.5927 0.5278 0.7969 0.7121 0.6272 0.5424 0.8128 0.7346 0.6564 0.5782 0.5313
logRatio Gaussian means for clonal cluster Z=2:	-0.9585 -0.6849 -0.455 -0.455 -0.2568 -0.2568 -0.08252 -0.08252 -0.08252 0.07294 0.07294 0.07294 0.2133 0.2133 0.2133 0.2133 0.3411 0.3411 0.3411 0.3411 0.4586 0.4586 0.4586 0.4586 0.4586
logRatio Gaussian variance:	0.009996 0.009996 0.01096 0.01096 0.009909 0.009909 0.01014 0.01014 0.01014 0.01051 0.01051 0.01051 0.009999 0.009999 0.009999 0.009999 0.01001 0.01001 0.01001 0.01001 0.01007 0.01007 0.01007 0.01007 0.01007
Number of iterations:	5
Log likelihood:	-52150
S_Dbw dens.bw (LogRatio):	0.1997 
S_Dbw scat (LogRatio):	1.0000 
S_Dbw validity index (LogRatio):	1.1997 
S_Dbw dens.bw (AllelicRatio):	0.4919 
S_Dbw scat (AllelicRatio):	1.0000 
S_Dbw validity index (AllelicRatio):	1.4919 
S_Dbw dens.bw (Both):	0.6916 
S_Dbw scat (Both):	2.0000 
S_Dbw validity index (Both):	2.6916

Over a more meaningful ploidy = 2 solution like this one:

Normal contamination estimate:	0.401
Average tumour ploidy estimate:	1.911
Clonal cluster cellular prevalence Z=1:	1
AllelicRatio binomial means for clonal cluster Z=1:	0.5 0.7138 0.7995 0.53 0.8457 0.6152 0.8746 0.6873 0.5375 0.8944 0.7366 0.5789 0.9088 0.7725 0.6363 0.5409 0.9197 0.7998 0.6799 0.56 0.9283 0.8212 0.7142 0.6071 0.5428
logRatio Gaussian means for clonal cluster Z=1:	-1.279 -0.4744 0.03912 0.03912 0.4171 0.4171 0.7163 0.7163 0.7163 0.964 0.964 0.964 1.175 1.175 1.175 1.175 1.36 1.36 1.36 1.36 1.523 1.523 1.523 1.523 1.523
logRatio Gaussian variance:	0.009996 0.01122 0.01062 0.01062 0.01015 0.01015 0.01004 0.01004 0.01004 0.01 0.01 0.01 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996 0.009996
Number of iterations:	5
Log likelihood:	-60370
S_Dbw dens.bw (LogRatio):	0.1482 
S_Dbw scat (LogRatio):	1.0000 
S_Dbw validity index (LogRatio):	1.1482 
S_Dbw dens.bw (AllelicRatio):	0.4143 
S_Dbw scat (AllelicRatio):	1.0000 
S_Dbw validity index (AllelicRatio):	1.4143 
S_Dbw dens.bw (Both):	0.5626 
S_Dbw scat (Both):	2.0000 
S_Dbw validity index (Both):	2.5626

What parameters are used to select the optimal variant and how can we adjust this? LOH of chromosome arms 1p/19q (as in the second, non-selected solution) is a well recognized marker of this cancer type.

UPDATE: I guess it looks at the log-likelihood. Why do you think the first solution was scored higher than the second solution?

A small bug in selectSolution.R

Hi Gavin,
I recently found a little bug to be fixed in selectSolution.R. I run TitanCNA using snakemake. I found some patients disappeared from the final optimalClusterSolution.txt but some patients appeared twice. For example, 07T disappeared but 107T appeared twice. I read the code and found the bug is in this line: phi2Samples <- grep(id, phi2Files, value=T)
If my id is 07T, then grep will catch files of both 07T and 107T, and compare them together. You can imagine that in my case, 107T is always the winner in all conditions, so 07T disappeared from the optimalClusterSolution.txt and 107T appeared twice.
I change this line to:
phi2Samples <- grep(paste('/', id, sep = ''), phi2Files, value=T)
to make sure the id is in the beginning of the filename. It solves my problem. I'm not good at coding and I think you should have a better way to fix this bug in next version of TitanCNA.
Thanks again for making such a convenient pipeline!

Yang

Error in cytoband[, "chrom"] : incorrect number of dimensions

I am trying to run snakemake version of pipeline using human_g1k_v37.fasta genome using

snakemake -s TitanCNA.snakefile --cores 5

The run was Okay till this command is executed:

Rscript /home/a/asna2/TitanCNA/scripts/R_scripts/titanCNA.R --hetFile results/titan/tumCounts/tumor_sample_1.tumCounts.txt --cnFile results/ichorCNA/tumor_sample_1/tumor_sample_1.correctedDepth.txt --outFile results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.titan.txt --outSeg results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.segs.txt --outParam results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.params.txt --outIGV results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.seg --outPlotDir results/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1/ --libdir /home/a/asna2/TitanCNA/ --id tumor_sample_1 --numClusters 1 --numCores 1 --normal_0 0.5 --ploidy_0 3 --genomeStyle NCBI --genomeBuild hg19 --cytobandFile None --chrs "c(1:22, "X")" --estimateNormal map --estimatePloidy True --estimateClonality True --centromere /home/a/asna2/miniconda3/pkgs/r-ichorcna-0.1.0.20180710-r341_0/lib/R/library/ichorCNA/extdata/GRCh37.p13_centromere_UCSC-gapTable.txt --alphaK 10000 --txnExpLen 1e15 --plotYlim "c(-2,4)" > logs/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.log 2> logs/titan/hmm/titanCNA_ploidy3/tumor_sample_1_cluster1.log

returned non-zero exit status 1.
File "/home/a/asna2/TitanCNA/scripts/snakemake/TitanCNA.snakefile", line 62, in __rule_runTitanCNA
File "/home/a/asna2/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run

The tumor_sample_1_cluster1.log file showed

$dens.bw
[1] 0.379763

$scat
[1] 0.08379496

$S_Dbw
[1] 0.463558

nd in "RSQLite" for requests: dbGetQuery
Running TITAN...
titan: Loading data results/titan/tumCounts/tumor_sample_1.tumCounts.txt
titan: Loading GC content and mappability corrected log2 ratios...
titan: Extracting read depth...
Removed 79 centromeric positions
Removed Chrs:
titan: Loading default parameters
titan: Using 1 cores.
titan: Parameter estimation
Optimal state path computation: Using 1 cores.
Writing results to results/titan/hmm/titanCNA_ploidy2/tumor_sample_1_cluster1.titan.txt, results/titan/hmm/titanCNA_ploidy2/tumor_sample_1_cluster1.segs.txt, results/titan/hmm/titanCNA_ploidy2/tumor_sample_1_cluster1.params.txt
titan: Saving parameters to results/titan/hmm/titanCNA_ploidy2/tumor_sample_1_cluster1.params.txt
Error in cytoband[, "chrom"] : incorrect number of dimensions
Calls: plotIdiogram
In addition: Warning messages:
1: In .replace_seqlevels_style(x_seqlevels, value) :
found more than one best sequence renaming map compatible with seqname style "NCBI" for this object, using the first one
2: In .replace_seqlevels_style(x_seqlevels, value) :
found more than one best sequence renaming map compatible with seqname style "NCBI" for this object, using the first one
3: In .replace_seqlevels_style(x_seqlevels, value) :
found more than one best sequence renaming map compatible with seqname style "NCBI" for this object, using the first one
4: In .replace_seqlevels_style(x_seqlevels, value) :
found more than one best sequence renaming map compatible with seqname style "NCBI" for this object, using the first one
5: In .replace_seqlevels_style(x_seqlevels, value) :
found more than one best sequence renaming map compatible with seqname style "NCBI" for this object, using the first one
Execution halted

I am not sure about what causing the problem but I showed here the the reference settings that have used

reference settings and paths to reference files

genomeBuild: hg19
genomeStyle: NCBI
refFasta: /home/a/asna2/Scrach/WES/human_g1k_v37.fasta
snpVCF: /home/a/asna2/Scrach/WES/hapmap_3.3.b37.vcf.gz
ichorCNA_exons: NULL
cytobandFile: None
centromere: /home/a/asna2/miniconda3/pkgs/r-ichorcna-0.1.0.20180710-r341_0/lib/R/library/ichorCNA/extdata/GRCh37.p13_centromere_UCSC-gapTable.txt
sex: females # use None if both females and males are in sample set

The TitanCNV parameters are

use this for NCBI chr naming

chrs: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y

TitanCNA params

TitanCNA_maxNumClonalClusters: 5
TitanCNA_chrs: c(1:22, "X")
TitanCNA_normalInit: 0.5
TitanCNA_maxPloidy: 3
TitanCNA_estimateNormal: map
TitanCNA_estimatePloidy: TRUE
TitanCNA_estimateClonality: TRUE
TitanCNA_alleleModel: binomial
TitanCNA_alphaK: 10000
TitanCNA_alphaR: 10000
TitanCNA_txnExpLen: 1e15
TitanCNA_plotYlim: c(-2,4)
TitanCNA_solutionThreshold: 0.05
TitanCNA_numCores: 1
TitanCNA_mem: 16G
TitanCNA_runtime: "300:00:00"
TitanCNA_pe: -pe smp 1 -binding linear:1

Any assistance is appreciated.

Input format confusion

Hello,

I'm trying to run TitanCNA on some WGS samples. The input data format as per the vignette is

The format of the file must be 6 columns: chromosome, position, reference base, reference read counts, non-reference base, non-reference read counts.

So I've constructed a data frame that looks like

> head(data)
# A tibble: 6 x 6
  chr    posn ref   refOriginal nonRef tumDepth
  <chr> <int> <chr>       <int> <chr>     <int>
1 1     10109 A             172 T            59
2 1     10255 A              36 T            10
3 1     10321 C               8 T             8
4 1     12383 G              11 A            19
5 1     13302 C              14 T             8
6 1     14699 C              25 G             7

Using the naming convention in the vignette. The documentation under ?loadAlleleCounts also states

The reference and non-reference base columns can be any arbitrary character; it is not used by TitanCNA.

However, when I run this I get the error

Error in data$ref/data$tumDepth : non-numeric argument to binary operator

which traceback() shows is caused by

 cor(data$ref/data$tumDepth, data$logR, use = "pairwise.complete.obs")

ie TitanCNA is trying to take the correlation of ref which according to the documentation is the "reference base" and isn't used by TitanCNA.

Any help with where I'm going wrong with this analysis much appreciated.

Thanks

Kieran

Snakemake scripts not compatible with UCSC chromosome naming

Snakemake files are not ready for UCSC chromosome naming. Simple change of CHRS is not enough to succesful execution.

Invalid 'chr' reference

TitanCNA/R/plotting.R

Line 176 in 49081fb

mtext(side = 4, at = prevalence, text = paste("Z",

Parameter Setting: maxPloidy

Would you please suggest how to set some parameters?

I am working on 26 samples from 13 ovarian cancer patients but not sure if my choice of parameters was good enough. Currently, I set the maxPloidy at 4 but I found that 9 out of 26 samples were estimated to have Ploidy at 4. Do I need to increase the max Ploidy greater?

To break my questions down further,

Do I need to change max Ploidy higher?
How Ploidy and N of Cluster are related in your algorithm?
Would you mind to point a place, which section/page of a paper, if you already described in your publication?
Are there any other parameters need to be updated?
About model anomalies.
I have three samples, whose purity is less than 0.3.
Also, I got two samples converge to log-lik = - Inf.

What are your suggestions on these anomalies? Do I need to dump them?

Lastly, I am using "snakemake" as you suggested before and it's fantastic!
Thank you again your answers in advance (please!) and also for developing and maintaining this wonderful application with us!

Bests,
Lijin

runTitanCNA and selectSolution error

Hi,
I am trying to run titanCNA on matched tumor-normal WES samples. I am following snakemake workflow but I am also running it inside of a HPC cloud. I downloaded both titanCNA and ichorCNA.
-My first question is does the parameters of both config files (titan and ichor) need to be matched? I know titan snakefile invokes the others but in titan config file there is still a path for ichor script and libdir so I thought it migth affect.
-When I changed the binsize from 10.000 to 100.000 or 1.000.000 the workflow gives error on earlier stages of the jobs. When I keep it at 10.00 I still get an error at runTitanCNA which is:

Error in job runTitanCNA while creating output files results/titan/hmm/titanCNA_ploidy2/RTF8_D_cluster2/, results/titan/hmm/titanCNA_ploidy2/RTF8_D_cluster2.titan.txt, results/titan/hmm/titanCNA_ploidy2/RTF8_D_cluster2.params.txt, results/titan/hmm/titanCNA_ploidy2/RTF8_D_cluster2.segs.txt, results/titan/hmm/titanCNA_ploidy2/RTF8_D_cluster2.seg.
RuleException:
CalledProcessError in line 53 of /mnt/lustre/sarihae/HNSCC_WES/titanCNA/scripts/snakemake/TitanCNA.snakefile: returned non-zero exit status 1.
  File "/mnt/lustre/sarihae/HNSCC_WES/titanCNA/scripts/snakemake/TitanCNA.snakefile", line 53, in __rule_runTitanCNA
  File "/cm/shared/apps/python/3.6.2/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job runTitanCNA since they might be corrupted:
results/titan/hmm/titanCNA_ploidy2/RTF8_D_cluster2/
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

I looked at the config file and changed libdirs from path/to/titanCNA/R/ path/to/ichorCNA-master/R/ to path/to/titanCNA/ path/to/ichorCNA-master/ and now I get an error at selectSolution which looks like:

Error in job selectSolution while creating output file results/titan/hmm/optimalClusterSolution.txt.
RuleException:
CalledProcessError in line 70 of /mnt/lustre/sarihae/HNSCC_WES/titanCNA/scripts/snakemake/TitanCNA.snakefile: returned non-zero exit status 2.
  File "/mnt/lustre/sarihae/HNSCC_WES/titanCNA/scripts/snakemake/TitanCNA.snakefile", line 70, in __rule_selectSolution
  File "/cm/shared/apps/python/3.6.2/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

What would you think it's causing my problem, is it related to input parameters, or sth else? I'd really appreciate any kind of help and guidance cause I am a little bit confused with this workflow.
Thanks in advance,
Irem

"grep" in selectSolution can find the wrong files

I had some samples called foo_CAP and foo_CAP-DEEP.

The code in selectSolution.R does a grep on the file list (also valid for ploidy of 3 and 4):

phi2Samples <- grep(id, phi2Files, value=T)

However in this case it will match both files and cause inconsistent results to be written. As the pattern in default cases is id plus an underscore, a solution like this may work:

phi2Samples <- grep(paste0(id, "_"), phi2Files, value=T)

Exact string matching won't help as it will still catch the shorter string contained in the long one.

Titan reporting very short CNVs

I'm running PhyloWGS with Titan and I'm getting an error that says I have CNVs with the same start and end locations. When I look as the segs.txt file for that sample, I see that there are indeed several CNVs with very short lengths. This seems strange to me, can anyone suggest a reason this might be happening?

Error in runEMclonalCN with low coverage bam files

Thanks for releasing this great software!
I've been running TitanCNA on bams with coverage 1x, 5x, 10x, and 20x.

There's consistently an error for the low coverage bam files of 1-5x coverage. As an example, here is the output I'm seeing for the low coverage bam files:

Running TITAN...
titan: Loading data input.hets.txt
titan: Loading GC content and mappability corrected log2 ratios...
titan: Extracting read depth...
Removed 0 centromeric positions
titan: Loading default parameters
Loading required package: iterators
Using 4 cores.
titan: Running HMM...
fwdBack: Iteration 1 chr: 1 2 Error in { : 
  task 1 failed - "INTEGER() can only be applied to a 'integer', not a 'NULL'"
Calls: runEMclonalCN -> %dopar% -> <Anonymous>
Execution halted

TITAN works perfectly for the 20x and 40x coverage files though:

Running TITAN...
titan: Loading data input.hets.txt
titan: Loading GC content and mappability corrected log2 ratios...
titan: Extracting read depth...
Removed 0 centromeric positions
titan: Loading default parameters
Loading required package: iterators
Using 4 cores.
titan: Running HMM...
fwdBack: Iteration 1 chr: 1 2 3 4 8 7 6 5 12 11 9 10 16 15 13 14 20 19 23 18 17 21 22 
Coord Descent[1]=-561231 n=0.5 s=[0.001,0.200,0.300] phi=4
Coord Descent[2]=-274262 n=0.2084 s=[0.051,0.204,0.344] phi=4.016
Coord Descent[3]=-272542 n=0.2943 s=[0.05,0.20,0.34] phi=4.014
Coord Descent[4]=-272522 n=0.2948 s=[0.052,0.196,0.341] phi=4.014
Coord Descent[5]=-272522 n=0.2946 s=[0.052,0.196,0.341] phi=4.014
...

Is it possible that runEMclonalCN is getting stuck at gaps in the coverage? Have you noticed there's a "minimal coverage" necessary for TITAN to work properly?

choosing maxCN value

Hi Gavin,

I had a general question about TitanCNA, I saw that there is a maxCN value under 'ichorCNA_maxCN' and this has a default value of 8. For the tumor sample that I'm working with, I know that the CN's can be up to 100 but I am not sure what the maxCN value would be. I saw that the newest version of TitanCNA has the function correctIntegerCN(). Does the new function correctIntegerCN() take care of this issue of not knowing what your maxNC should be and does it simply replace the maxCN when it is higher than inputed value?

I also read that this new function re-calculates the CN by "correcting log ratio based on purity and ploidy and then convert to decimal CN value" and so are the purity and ploidy also values that I should be specifying myself (since the CN value is calculated off of that) or are these also estimated?

Thanks,
Paulina

correctReadDepth requires S4Vectors but it is not imported

As subject says.

TITAN and phyloWGS (getting majorCN and minorCN)

Hi Gavin,

I was able to successfully run TITAN on my BAM files, however, I am trying to run the CNV information through phyloWGS (which says that it is compatible with titan's cnv file format), but it says that my input cnv file should include the following fields:
chromosome
Start_Position
End_Position
MajorCN
MinorCN

I saw in the documentation of the R package that by doing

segs <- outputTitanSegments(results, id = "test", convergeParams, filename = NULL, igvfilename = NULL)
head(segs)

you can get a output with the following sections:

Sample: Name of sample
Chromosome, Start_Position.bp., End_Position.bp.: Coordinates of segment 3. Length.snps.: Number of SNPs in the segment
Median_Ratio: Median allelic ratio across SNPs in the segment
Median_logR: Median log ratio across SNPs in the segment
TITAN_state, TITAN_call, Copy_Number: Same as defined above
MinorCN: Copy number of minor allele
MajorCN: Copy number of major allele
Clonal_Cluster, Cellular_Frequency: Same as defined above

From running the snakemake I found some output files that had the Chr, start, end and copy.number (specifically my tumor_sample_1.seg file in the results/ichorCNA/sample1 folder) however, I can't seem to find them in any of my output files with MinorCN or MajorCN. I've never used R before and was wondering if this was the only way to get this output or if I can get it from the snakemake output?

Thanks,
Paulina

cytobandFile error

Hi,
I was able to run TitanCNA succesfully before. Now when I try to run with hg19 aligned samples, it gives error at rule runTitanCNA step;

Error in job runTitanCNA while creating output files ... RuleException: CalledProcessError in line 54 of /TitanCNAmaster/scripts/snakemake/TitanCNA.snakefile:

and the log files say;
Error in fread(cytobandFile) : object 'cytobandFile' not found

It says cytobandFile is only needed if ref.genome is hg38. I don't know how to get around this error. Any help would be much appreciated. Thanks!

Irem

	## exclude chrX if gender==male ##
	if (gender == "male" \|\| gender == "Male" \|\| gender == "MALE"){
	chrs <- chrs[chrs!=grep("X", chrs, value=TRUE)]
	}

gavinha / titancna Goto Github PK

titancna's People

Contributors

Stargazers

Watchers

Forkers

titancna's Issues

reference settings and paths to reference files

use this for NCBI chr naming

TitanCNA params

Recommend Projects

Recommend Topics

Recommend Org