Giter Site home page Giter Site logo

facets-suite's Introduction

facetsSuite

lifecycle Travis build status Coverage status

See the release notes for information on the new facet-suite version. Backwards compatibility is currently limited, as documented here.

facetsSuite is an R package with functions to run FACETS—an allele-specific copy-number caller for paired tumor-normal DNA-sequencing data from genome-wide and targeted assays. facetSuite both wraps the code to execute the FACETS algorithm itself as well as performs post-hoc analyses on the resulting data. This package was developed by members of the Taylor lab and the Computational Sciences group within the Center for Molecular Oncology at Memorial Sloan Kettering Cancer Center.

Installation

You can install facetsSuite in R from this repository with:

devtools::install_github("mskcc/facets-suite")

Also follow the instructions for installing FACETS.

Note: For the wrapper script snp-pileup-wrapper.R you need to specify the variable snp_pileup_path in the script to point to the installation path of snp-pileup or set the environment variable SNP_PILEUP. Alternatively, the docker image contains the executable.

Usage

R functions

The R functions in this package are documented and their description and usage is available in R by doing:

?facetsSuite::function_name

Central to most functionality in the package is the output from the run_facets, which runs the FACETS algorithm based on provided tumor-normal SNP pileup (i.e. genotyping). The output is a list object with the following named objects:

  • snps: SNPs used for copy-number segmentation, where het==1 indicates heterozygous loci.
  • segs: Inferred copy-number segmentation. – purity: Inferred sample purity, i.e. fraction of tumor cells of the total cellular population.
  • ploidy: Inferred sample ploidy.
  • diplogr: Inferred dipLogR, the sample-specific baseline corresponding to the diploid state.
  • alballogr: Alternative dipLogR value(s) at which a balanced solution was found.
  • flags: Warning flags from the naïve segmentation algorithm.
  • em_flags: Warning flags from the expectation-maximization segmentation algorithm.
  • loglik: Log-likelihood value of the fitted model.

Note that FACETS performs segmentation with two algorithms, the "naïve" base method and an expectation-maximization algorithm. The latter (columns suffixed .em) is used as a default for most of the functions in this package.

Wrapper scripts

Most use of this package can be done from the command line using three wrapper scripts:

  • snp-pileup-wrapper.R:
    This wraps the snp-pileup C++ script that genotypes sites across the genome in both normal and tumor samples. The output from this is the input to FACETS. Most default input arguments are appropriate regardless of usage, but --max-depth may need adjustment depending on the overall depth of the samples used.
    Example command:

    snp-pileup-wrapper.R \
        --snp-pileup-path <path to snp-pileup executable> \
        --vcf-file <path to SNP VCF> \
        --normal-bam normal.bam \
        --tumor-bam tumor.bam \
        --output-prefix <prefix for output file, preferrably tumorSample__normalSample>

    The input VCF file should contain polymorphic SNPs, so that FACETS can infer changes in allelic configuration at genomic loci from changes in allele ratios. dbSNP is a good source for this. By default, snp-pileup also estimates the read depth in the input BAM files every 50th base.

  • run-facets-wrapper.R:
    This wrapper takes above SNP "pileup" as input and executes the FACETS algorithm. The ouputs are in the form of Rdata objects, TXT files, and PNGs of the samples overall copy-number profile. The wrapper allows for running FACETS in a two-pass mode, where first a "purity" run estimates the overall segmentation profile, sample purity and ploidy, and subsequently the dipLogR value from this run seeds a "high-sensitivity" run which may detect more focal events. To run in the two-pass mode, specify the input arguments prefixed by purity. The cval (--purity-cval and --cval) parameters tune the segmentation coarseness.
    Example command:

    run-facets-wrapper.R \
        --counts-file tumor_normal.snp_pileup.gz \
        --sample-id tumorID__normalID \
        --purity-cval 1000 --cval 500 \
        --everything

    The above command runs FACETS in the two-pass mode, first at cval 1000, then at cval 500 based on the sample-specific baseline found at the higher cval. The full suite of analysis and QC is run with the --everything flag. If no output directory is specified, a directory named sample-id is created.

  • annotate-maf-wrapper.R:
    This script estimates the cancer-cell fractions (CCFs) of somatic mutations using purity and ploidy estimates from FACETS. It requires a input MAF file and a mapping of sample names in the MAF file (column Tumor_Sample_Barcode) to FACETS output RDS files (i.e. file paths). Alternatively, it can be run in a single-sample mode by pointing direct to the RDS and providing a MAF file with only mutation calls for the given sample.
    Example command:

    annotate-maf-wrapper.R \
        --maf-file somatic_mutations.maf
        --facets-output <path to facets_output.rds>

    Or run with a mapping file as input (--sample-mapping), in the following format:

    > cat sample_map.txt
    sample      file
    SampleA     SampleA_facets.rds
    SampleB     SampleB_facets.rds
    ...         ...

All three wrappers use argparse for argument handling and can thus be run with --help to see the all input arguments.

Run wrappers from container

In order to run the containerized versions of the wrapper scripts, first pull the docker image:

## Docker
docker pull philipjonsson/facets-suite:dev

## Singularity
singularity pull --name facets-suite-dev.img docker://philipjonsson/facets-suite:dev

Then run either of the scripts as such:

## Docker
docker run -it -v $PWD:/work philipjonsson/facets-suite:dev run-facets-wrapper.R \
    --counts-file work/SampleA.snp_pileup.gz \
    --sample-id SampleA \
    --directory work

## Singularity
singularity run facets-suite-dev.img run-facets-wrapper.R \
    --counts-file SampleA.snp_pileup.gz \
    --sample-id SampleA

For Docker, note the binding (-v) of the current directory on the host to the directory named work inside the container. This is required for the input file, in the current directory, to be accessible inside the container. This, in its turn requires the output to be written to work inside the container so that it is available on the host once the script has executed. Singularity always mounts the directory from which it is being executed.

The image contains the snp-pileup executable used by snp-pileup-wrapper.R, so it can be run without specifying its path. Example for Singularity:

singularity run -B <path to BAMs> -B <path to VCF> facets-suite-dev.img snp-pileup-wrapper.R \
    --vcf-file <path to VCF>/dbsnp.vcf \
    --normal-bam <path to BAMs>/NormalA.bam \
    --tumor-bam <path to BAMs>/TumorA.bam \
    --output-prefix TumorA__NormalA

Note: The binding of full paths to any files outside of the run directory is necessary.

facets-suite's People

Contributors

arichards2564 avatar asntech avatar cband avatar ckandoth avatar evanbiederstedt avatar gongyixiao avatar kpjonsson avatar majoromask avatar price0416 avatar stevekm avatar tischfis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facets-suite's Issues

CNCF algorithm output in Facets R version >= 0.5.2

Egregious bug in the EM algorithm in the newer versions (>=0.5.2) of the Facets R package. The dev_0.5.2 branch now incorporates the cncf estimates of cf, tcn, lcn in the output and plots this data as well.

Meaning of the filter column in focal copy number output

Hi,

I am using FACETS Suite to call somatic copy number changes from whole exome seq data.

I combined all the gene_level.txt files into one for all samples to summarise the copy state for each gene across my cohort and subsequently calculate a % frequency of amps, deletions etc.

Before doing such analysis, I see there is a "filter" column in the gene_level.txt files.

What is the meaning of these filter flags? I presume i should proceed with downstream analysis using the PASS and RESCUE flags?

table(combined.genelevel$filter)

# PASS      RESCUE    suppress_large_homdel suppress_likely_unfocal_large_gain 
# 1309276      232                    13951                              51271 
# suppress_segment_too_large 
# 12644

Columns t_alt_count and t_depth required

Dear all,
I get this error with

ccf_annotate_maf(mafFile).
"Columns t_alt_count and t_depth required"

When I import the maf files to maftools, I can see both of them under laml@data.
Would you please help me with this issue.
Kind Regards,
Rahel

Can I run the facets-suite wrappers with unmatched normals?

Hello,

I am working with WGS data from several cell lines, using a "normal" cell line as a common normal across the cancer cell lines.

Looking through the documentation for the preProcSample function, I see that we can set normals to be unmatched.

Is there any way for me to specify this in the facets-suite through the wrapper commands?

Thank you so much!

Segments with tcn.em=2 and lcn.em=NA?

Hi all,
First of all, thank you for this tool, we found it very useful for our samples! I had a interpretation question from the facets output columns tcn.em and lcn.em: under which conditions the output can give for one individual segment tcn.em=2 and lcn.em=NA? If the caller was not able to determined the lowest CN, does it mean that, having tcn.em=2 and lcn.em=NA corresponds to either neutral LOH or no CN change (since if tcn.em=2 lcn.em can be either equal to 0 or 1) and, consequently, do you confirm that it would be safer to reject them from the list of segments with CN change since they are potentially normal and since we don't want to enrich for CNV false positive?

Very sorry if you've already answered to these questions before and I missed it.

Thank you again,
Lise

gene_level_changes highest alteration

In gene_level_changes.R

seg = seg[which.max(start-end)], # this selects the segment which represents the larger region

it would be nice to add an option to select the segment with the highest absolute log2(cn). The rational being that sometimes a particular exon is deleted but not the whole gene. So by using the larger region, it might mask the true deletion.

add tumor_id and normal_id for run-facets-wrapper.R

The run-facets-wrapper.R script requires tumor-normal output from snp-pileup. However, the script does not seem to offer any way to record the ID for the tumor and normal samples. It has an arg for --sample-id which says "preferrable Tumor_Normal to keep track of the normal used". But this does not work when your sample ID's contain _ characters. So, based on the current sample_id value that is used, there's no actual way to distinguish between the original tumor_id and the normal_id used.

It looks like there are several places where the sample_id is added to output files with this function;

add_column(sample = sample_id, .before = 1)

Can we just add some extra command line args for tumor_id and normal_id and use the same function to add those to the output as well?

Example;

...
parser$add_argument('--tumor-id', required = FALSE,
                    help = 'Tumor ID, to keep track of tumor sample used')
parser$add_argument('--normal-id', required = FALSE,
                    help = 'Normal ID, to keep track of the normal sample used')

...
add_column(tumor = tumor_id, .before = 1)
add_column(normal = normal_id, .before = 1)

Plotting Issues

I've run into several odd plots from facets-suite. The main issues seem to be the cf plots which seem like they might be stretched so that the plots are no longer aligned. I've also seen where the lines are not aligned with actual allele numbers (e.g. the segment should be 6,3 and the cncf plot shows more like 5.5,3). Lastly, the first log ratio plot seems to be plotting oddly. It seems to be setting the diplogR to 0 (even when the diplogR is < -1) and shifting the whole plot up. When you look at the segmentation text file, it seems OK.

FACETS QC

we should check for mcn < lcn as well as negative, NA etc

Error while running run-facets-wrapper.R

I am trying to run run-facets-wrapper.R, using this command -

singularity exec facets-suite-dev.img ./run-facets-wrapper.R --counts-file XP490_V4756-XP490_S4763.snp_pileup.gz --sample-id XP490_V4756-XP490_S4763 --purity-cval 1000 --cval 500 --everything --genome hg38 --normal-depth 25 --facets-lib-path /home2/s204755/R/x86_64-pc-linux-gnu-library/3.6/ --directory ./work

And I am getting this error -

WARNING: ignoring environment value of R_HOME
Reading XP490_V4756-XP490_S4763.snp_pileup.gz
Writing to ./work
Loading required package: pctGCdata
[1] "loaded facets version: 0.6.2"
Error: Discrete value supplied to continuous scale
In addition: Warning message:
In max(mafR.clust[seg$chrom < nX & seg$nhet > min.nhet], na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
Execution halted

Could you please help me resolve this error?

Thanks,

Akhilesh,
UT Southwestern Medical Center
Texas

Unable to library facetsPreview because Package ‘glue’ version 1.6.2 cannot be unloaded

library(facetsPreview)
Loading required package: glue
Error in value[3L] :
Package ‘glue’ version 1.6.2 cannot be unloaded:
Error in unloadNamespace(package) : namespace ‘glue’ is imported by ‘ggplot2’, ‘gtable’, ‘tidyr’, ‘pillar’, ‘dplyr’, ‘stringr’, ‘scales’, ‘tidyselect’, ‘usethis’ so cannot be unloaded

I cannot library facetsPreview even though I have installed it because it requires to unload a package that cannot be unloaded.

Use of t_depth vs. t_alt_count + t_ref_count

Hi,

There is a minor error here: https://github.com/mskcc/facets-suite/blob/master/R/ccf-annotate-maf.R#L29. The check is for the presence of t_alt_count and t_ref_count columns, but the error message is "Columns t_alt_count and t_depth required."

Also, in the same module, I think it will be useful to add an optional argument to enable using t_depth values present in the maf file directly for calculating t_var_freq, instead of calculating t_depth as t_alt_count + t_ref_count. This is because:
1.) In multi-allelic sites, calculating t_depth as t_alt_count + t_ref_count might be incorrect.
2.) Although rare, in some input maf files, only the t_depth and t_alt_count columns are populated. This causes NA to be coerced for all columns related to ccf.

I made the update above for my usage. If this of any interest to you, I'm happy to do a PR.

Singularity image - Error when output folder exists

Hi,
singularity run facets-suite-dev.img run-facets-wrapper.R is exiting with exception that the output folder (-D arg) already exists. I am forced to delete the folder to re-run the program. I don't know why others have not raised this issue. But could you please fix this in the image?

Thanks.

Aparna

Discarding High Amps

I'm not sure if this is the correct place, but I think it is. It looks like any AMP with TCN > 8 is being termed "INDETERMINATE" because that TCN doesn't exist in the call table. There should be a line saying that if TCN > 8, then we call it an AMP.

genes_all[, cn_state := mapvalues(paste(wgd, tcn-lcn, lcn, sep = ':'),

Issues with geneLevel module

There appears to be several issues with geneLevel module in facets-suite

  • Does not take multiple CNCF files:
facets-suite/facets geneLevel \
    -o out \
    -f Proj_05927_C__A102_14__A102_29_hisens.cncf.txt \     
       Proj_05927_C__A102_22__A102_29_hisens.cncf.txt

gives:

Error in FUN("Proj_05927_C__A102_14__A102_29_hisens.cncf.txt,Proj_05927_C__A102_22__A102_29_hisens.cncf.txt"[[1L]],  : 
  File 'Proj_05927_C__A102_14__A102_29_hisens.cncf.txt,Proj_05927_C__A102_22__A102_29_hisens.cncf.txt' does not exist. Include one or more spaces to consider the input a system command.
Calls: get_gene_level_calls -> lapply -> FUN
Execution halted
  • First line seems to be some sort of NULL line:
Tumor_Sample_Barcode    Hugo_Symbol tcn lcm chr  ...
NA  CRLF2   NA  NA  X  ...

fread() issue for pileup

The current master branch does the following to read in the *.snp_pileup.dat.gz file

fread(sprintf('gunzip -c %s', pileup))
https://github.com/mskcc/facets-suite/blob/master/doFacets.R#L24

There's really no reason to do this. Just install R.utils as a package dependency and do fread(pileup)

I've implemented this here:
https://github.com/mskcc/facets-suite/blob/feature/just_do_fread/R/doFacets.R#L25-L31
https://github.com/mskcc/facets-suite/blob/feature/just_do_fread/NAMESPACE#L8

Reasons:
--- performance somewhat
--- remove gunzip/sprintf()
--- street cred; data.table::fread() can read in compressed files

Incorrect AMP filtering

genes_all[cn_state %like% '^AMP' & length > max_seg_length & (tcn > 8 | genes_on_seg > max_gene_count) , filter := add_tag(filter, 'unfocal_amp')]

This line and the next are incorrect. The filters for cf and number of genes on segment are only being applied when the amp is already too big. These are 3 separate filters and should be applied as such: 1) bp size, 2) cf too low, 3) # genes on segment is too large.
Also, tcn > 8 is good, now bad so I'm not sure why that's being flagged.

I think it needs to be re-written to look like the previous version. This one tries to separate the types of failing CNAs but because we only need 1 of the 3 factors to "PASS" in order to pass the CNA, it doesn't really work.

facets-suite/geneLevel.R

Lines 327 to 338 in 97b1e7e

genelevelcalls0 = genelevelcalls0 %>%
mutate(
# CCS filter flag amplifications and homdels at given thresholds
ccs_filter = case_when(
FACETS_CALL.em %in% c("AMP","AMP (LOH)","AMP (BALANCED)") &
!(seg.len < max_seg_length & (tcn.em > 8 | count <= 10 | ( !is.na(purity) & cf.em > CFcut ))) ~ 'FAIL',
FACETS_CALL.em == "HOMDEL" & !(seg.len < max_seg_length & count <= 10) ~ 'FAIL',
TRUE ~ 'PASS'),
# Flag for review certain homdels
review = ifelse(ccs_filter == 'FAIL' & FACETS_CALL.em == "HOMDEL" &
Hugo_Symbol %in% unique(oncokb_tsg$hugoSymbol) & seg.len < 25000000, 'rescue', NA_character_)
)

Error in ifelse(wgd, round(ploidy), 2) : object 'ploidy' not found

Hi,

I am running facets-suite.v2.0.8 with "run-facets-wrapper.R --counts-file normal-VS-tumor.FACETS.pileup.gz --sample-id normal-VS-tumor --directory facets-suite --everything --genome hg38 --purity-cval 2000 --cval 1000", but some sampels were failed.

n = 626733 > max_n{n in table} = 72000 -- using that as asymptotic value.
n = 422038 > max_n{n in table} = 72000 -- using that as asymptotic value.
n = 217270 > max_n{n in table} = 72000 -- using that as asymptotic value.
n = 658014 > max_n{n in table} = 72000 -- using that as asymptotic value.
n = 422180 > max_n{n in table} = 72000 -- using that as asymptotic value.
n = 217309 > max_n{n in table} = 72000 -- using that as asymptotic value.
Error in ifelse(wgd, round(ploidy), 2) : object 'ploidy' not found
Calls: %>% -> eval -> eval -> gene_level_changes -> ifelse
Execution halted

facets gene level calls should not use file name as tumor barcode?

This makes the tests very fragile and people probably don't care about the whole path in the output:

head /ifs/res/pwg/tests/cmo_testdata/facets/expected_outputs/facets _gene_level_calls.txt
Tumor_Sample_Barcode Hugo_Symbol tcn lcn chr seg.start seg.end frac_elev_major_cn Nprobes WGDm cn FACETS_CNA FACETS_CALL
NA WT1 NA NA 11 NA NA NA 1 NA NA NA NA
/ifs/res/pwg/tests/cmo_testdata/facets/facets_test_with_seed/H_LS-A8-A094-01A-11W-A019-09-1__H_LS-A8-A094-10A-01W-A021-09-1. cncf.txt ABL1 7 3 9 127115800 133914570 0.944862280314182 12 WGD 4 2 AMP

Please update docker image

Recently I used this pipeline in docker. It's very convenient to generate downstream outputs, thanks!
However, I encountered some issues during the analysis.

  1. Error when running snp-pileup binary
docker run -v $PWD:/work philipjonsson/facets-suite:dev /usr/bin/snp-pileup \
    --count-orphans \
    --gzip \
    --pseudo-snps 50 \
    --max-depth 4000 \
    /work/common.sample.vcf.gz \
    /work/TCRBOA6.csv.gz \
    /work/TCRBOA6-N-WEX.sample.bam \
    /work/TCRBOA6-T-WEX.sample.bam
*** Error in `/usr/bin/snp-pileup': double free or corruption (out): 0x000055a0042a45e0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7ff3c0f52bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7ff3c0f58fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7ff3c0f5980e]
/usr/lib/x86_64-linux-gnu/libtasn1.so.6(+0xb33f)[0x7ff3bbe9b33f]
/usr/lib/x86_64-linux-gnu/libtasn1.so.6(asn1_delete_structure2+0xf7)[0x7ff3bbe9c4b7]
/usr/lib/x86_64-linux-gnu/libgnutls.so.30(+0x4e42f)[0x7ff3bd7cb42f]
/lib64/ld-linux-x86-64.so.2(+0xfd6a)[0x7ff3c1dd9d6a]
/lib/x86_64-linux-gnu/libc.so.6(+0x35940)[0x7ff3c0f17940]
/lib/x86_64-linux-gnu/libc.so.6(+0x3599a)[0x7ff3c0f1799a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf8)[0x7ff3c0f022e8]
/usr/bin/snp-pileup(+0x221a)[0x55a0031ef21a]
======= Memory map: ========
55a0031ed000-55a0031f8000 r-xp 00000000 00:25e 26                        /usr/bin/snp-pileup
55a0033f7000-55a0033f8000 r--p 0000a000 00:25e 26                        /usr/bin/snp-pileup
55a0033f8000-55a0033f9000 rw-p 0000b000 00:25e 26                        /usr/bin/snp-pileup
55a004299000-55a00432f000 rw-p 00000000 00:00 0                          [heap]
...
  1. bugs in run-facets-wrapper_fixbug.R, I think these are due to not updating to the latest version.
    e.g. not initialize variable ploidy in function gene_level_changes, not parse argument --diplogr as "double".

Fix geneLevel.R to generate portal-friendly data_CNA.txt file

roslin_analysis_helper.py currently runs geneLevel.R as follows:

Rscript geneLevel.R --targetFile data/impact468_targets.ilist -f *_hisens.cncf.txt -m scna -o Proj_01234.gene_level.txt

The option -m scna creates a Proj_01234.gene_level.scna.txt matrix file that Roslin renames to data_CNA.txt for use by the cBioPortal. But since the 1.5.7 release, this is broken and needs debugging. Also remove the -m aka --method argument and replace it with a boolean argument --cna-matrix. This will avoid the confusion that the argument is meant for choosing between the EM and CNCF methods.

Bug in HRD-LOH calculation (copy-number-scores.R) causing loss of HRD-LOH information

calculate_hrdloh function bug

To reproduce the bug: find a tumor-normal sample with only one complete chromosome deletion. This deletion must be homozygous. Slack me for specific tumor-normal examples from IMPACT.

Situation: when there are no *“existing” chromosomes in chr_del, the calculation sets segs_loh to NA/NULL and we lose all HRDLOH information for that sample. For example, a sample segs dataframe has chrom=19, but entire chromosome 19 is 0:0 (mcn:lcn), so chromosome 19 gets filtered out at initial step, and no longer exists in segs_loh to be filtered out by chr_del list. Since chrom 19 is the only deleted chromosome and happens to be a homozygous deletion, the calculation is trying to pass something that exists in chr_del but doesn't exist in segs_loh when trying to filter out that chromosome in the final step. If chromosome 19 was a heterozygous deletion 1:0 (mcn:lcn), this error would not occur. If chromosome 20 was deleted as well (1:0), this error would not occur.

“Initial” step of HRDLOH calculation:
segs_loh = segs_loh[which(segs_loh$lcn == 0 & segs_loh$mcn != 0), ]

once filtered out, then chrom 19 no longer exists in segs_loh (but still exists in chr_del)

“Final” step of HRDLOH calculation after filtering out 0:0 segments:
segs_loh = segs_loh[-which(segs_loh$chrom %in% chr_del)

Potential solutions:

  1. Move the “final” step chronologically above the "initial" step? May cause yet foreseen bugs
  2. Change base R final step solution to a dplyr method that can better handle the NA/Null that is created when there are “existing” chromosomes in chr_del but not segs_loh i.e.:
if (length(chr_del) > 0) {
        segs_loh = segs_loh %>% filter(!(chrom %in% chr_del))
    }

*“exisiting”: chromosomes that are in the chr_del list but not in segs_loh dataframe

Using Alt+Ref instead of true depth

maf[, t_depth := t_alt_count + t_ref_count]

Here, we overwrite the true read depth with alt+ref. Normally, this doesn't change much but I've seen positions where there are 3 alleles present at a given position and the mutation being called is not the major allele and neither is the reference allele. This changes the calculation quite a bit.

Bug in normDepth.R

In normDepth.R You have an undefined variable "filename" here:

if(!interactive()){
  parser = ArgumentParser()
  parser$add_argument('-f', '--file', type='character', help='Filename of counts file to be normalized.')
  args=parser$parse_args()
  normalize_facets_depth(filename)
}

I made the following change to get it to work

$ git diff
diff --git a/normDepth.R b/normDepth.R
index eddaa91..d5c7866 100755
--- a/normDepth.R
+++ b/normDepth.R
@@ -73,5 +73,5 @@ if(!interactive()){
   parser = ArgumentParser()
   parser$add_argument('-f', '--file', type='character', help='Filename of counts file to be normalized.')
   args=parser$parse_args()
-  normalize_facets_depth(filename)
+  normalize_facets_depth(args$file)
 }

rbindList vs rbind

output_maf = rbindlist(output_maf,

I'm not sure if it was the version of R but I was getting an error with this version:
Error in rbindlist(output_maf, maf[!which(maf$Tumor_Sample_Barcode %in% :
Input is data.table but should be a plain list of items to be stacked

I changed it to "rbind" instead of "rbindlist" and it worked fine.

Recommended cval value for whole exome sequencing data

Hello,

I would like to ask what would be the recommended cval number for WES data with 40x coverage when using facets-suite? I come across this post (https://github.com/taylor-lab/facets-suite/wiki/1.-Running-FACETS) and would like to check if purity-cval 1000 and cval 500 is still the recommendation. I am asking as these numbers are different than original FACETS facets default based on exome data which is 150 for procSample step.

Deeply appreciate the kind reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.