broadinstitute / phylogicndt Goto Github PK

License: Other

Python 14.69% Dockerfile 0.03% HTML 76.33% Jupyter Notebook 8.93% R 0.02%

phylogicndt's Introduction

PhylogicNDT

Installation

First: Clone this repository

git clone https://github.com/broadinstitute/PhylogicNDT.git
cd PhylogicNDT

Then either :

Manual Install

Install python 2.7, R (optional) and required packages For debian:

apt-get install python-pip build-essential python-dev r-base r-base-dev git graphviz libgraphviz-dev

Install setuptools and wheel

pip install setuptools wheel

Install required packages

pip install -r req

Install scipy, matplotlib, and pandas (these versions are recommended)

pip install pandas==0.19.2 scipy==1.0.0 matplotlib==2.0.0
pip install -e git+https://github.com/rmcgibbo/logsumexp.git#egg=sselogsumexp (for faster compute)

Docker Install

Install docker from https://www.docker.com/community-edition#/download

docker build --tag phylogicndt .

Using the Package

./PhylogicNDT.py --help

If running from the docker, first run:

docker run -i -t phylogicndt
cd phylogicndt

Clustering

To run clustering on the provided sample input data:

To specify inputs:

./PhylogicNDT.py Cluster -i Patient_ID  -s Sample1_id:Sample1_maf:Sample1_CN_seg:Sample1_Purity:Sample1_Timepoint -s Sample2_id:Sample2_maf:Sample2_CN_seg:Sample2_Purity:Sample2_Timepoint ... SampleN_info

alternatively - provide a tsv sample_information_file (.sif)

with headers: sample_id maf_fn seg_fn purity timepoint

./PhylogicNDT.py Cluster -i Patient_ID  -sif Patient.sif

the .maf should contain pre-computed raw ccf histograms based on mutations alt/ref count (Absolute annotated mafs or .Rdata files are also supported) if the ccf histograms are absent - the --maf_input_type flag must be set to calc_ccf and sample purity must be provided. Also local copy number must be attached to each mutation in the maf with columns named local_cn_a1 and local_cn_a2

CN_seg is optional to annotate copy-number information on the trees

To specify number of iterations:

./PhylogicNDT.py Cluster -ni 1000

_{Acknowledgment: Clustering Module is partially inspired (primary 1D clustering) by earlier work of Carter & Getz (Landau D, Carter S , Stojanov P et al. Cell 152, 714–726, 2013)}

BuildTree (and GrowthKinetics)

The GrowthKinetics module fully incorporates the BuildTree libraries, so when rates are desired, there is no need to run both.

The -w flag should provide a measure of tumor burden, with one value per input sample maf in clustering. When ommited, stable tumor burden is assumed.
The -t flag should provide relative time for spacing the samples. When omitted, equal spacing is assumed.

Just BuildTree

./PhylogicNDT.py BuildTree -i Indiv_ID -sif Patient.sif  -m mutation_ccf_file -c cluster_ccf_file

GrowthKinetics

./PhylogicNDT.py GrowthKinetics -i Indiv_ID -sif Patient.sif -ab cell_population_abundance_mcmc_trace -w 10 10 10 10 10 -t 1 2 3 4 5

Run Cluster together with BuildTree

./PhylogicNDT.py Cluster -i Patient_ID  -sif Patient.sif -rb

SinglePatientTiming

SinglePatientTiming requires a maf input and a seg file input for each sample. The maf file should be the output of PhylogicNDT Clustering module. The seg file should have the following columns:

Chromosome  Start   End A1.Seg.CN   A2.Seg.CN

To run SinglePatientTiming:

./PhylogicNDT.py Timing -i Indiv_ID -sif Patient.sif

LeagueModel

LeagueModel requires an input of comparison tables. The comparison tables should be the output of SinglePatientTiming ending in ".comp.tsv"

To run LeagueModel:

./PhylogicNDT.py LeagueModel -cohort Cohort -comps comp1 comp2 ... compN

Alternatively, one can use a single aggregated table. The table should have the following columns:

sample  event1  event2  p_event1_win    p_event2_win    unknown

To run with the aggregated table:

./PhylogicNDT.py LeagueModel -cohort Cohort -comparison_cn comps

PhylogicSim

A simulation module is provided for convenience.

./PhylogicNDT.py PhylogicSim --help

Command to visualize all the options and help.

./PhylogicNDT.py PhylogicSim

Run the simulation with the default paramters.

./PhylogicNDT.py PhylogicSim -i MySimulation

Specify a prefix for all the output files

./PhylogicNDT.py PhylogicSim -i MySimulation -ns 7

Specify the number of samples you want to simulate.

./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5

Specify the number of distinct clones present in your samples. Minimum 2 (The first clone is always the clonal clone)

./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -seg /Example_SegFile.txt

Specify a segment file with copy number values to sample from. See the "Example_SegFile.txt" for a format example. If no file is specified, a build-in CN profile is used, based on the hg19 contigs.

./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt

Force the ccf values of each cluster on each sample, instead of generating a new random phylogeny from scratch. If -clust_file is specified, the -ns and -nodes flags are ignored an instead replaced with the values from the Clust_File. Each line of the tsv file represents a sample, with each tab separated value the ccf of a cluster. The last value of each line must always be -1 to account for the artifact cluster.

./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt -a 0.3

Specify the proportion of mutations that are artifactual (Random af unrelated to mutation/CN). Can be combined with a clust_file.

./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt -pfile /Example_PurityFile.txt

TSV file to specify the purity of each sample individualy (Otherwise, the purity is specified for all the samples using the -p flag.). Each line represents a sample. The file can optionally contain an extra three columns with the alpha, beta and N values for the coverage betabinomial for each sample (Otherwise, those values are set for all samples using the -ap, -b and -nb flags respectively).

phylogicndt's People

Contributors

Stargazers

Watchers

phylogicndt's Issues

Warnings with example data

Hi, I have been trying to run PhylogicNDT with the example data as well a my own. Everything seems to work fine, but I am getting runtime warnings (from numpy?) that I can't really explain. I was hoping you might have an idea what is causing them and whether they would affect the output?

These are the warnings I get when using the example data:

/phylogicndt/Cluster/DpEngine.py:591: RuntimeWarning: divide by zero encountered in log
  t1 = sum((np.log(k_prob) * k_prob)[ix])
/phylogicndt/Cluster/DpEngine.py:591: RuntimeWarning: invalid value encountered in multiply
  t1 = sum((np.log(k_prob) * k_prob)[ix])

Edit: This happens both when I run it from the docker and when I install everything manually. I have also tried it on Linux/mac/windows systems. The command I run is the following, taken from the example runs you have provided:

./PhylogicNDT.py Cluster -i Test_Clust -s sample_00:ExampleData/ExampleSamples/MySimulation_0.txt::0.7:0 -s sample_01:ExampleData/ExampleSamples/MySimulation_1.txt::0.7:1 -s sample_02:ExampleData/ExampleSamples/MySimulation_2.txt::0.7:2 -s sample_03:ExampleData/ExampleSamples/MySimulation_3.txt::0.7:3

I did np.seterr(all='raise') and then I get a "FloatingPointError underflow encountered" in DpEngine.py in the function get_gamma_prior_from_k_prior(), specifically in k_prob = stats.gamma.pdf(k_0_map[:, 1], A, scale=1.0 / B) when A and/or B are large numbers.

Some more warnings with my own data:

/phylogicndt/utils/calc_ccf.py:99: RuntimeWarning: invalid value encountered in divide
  subc_ccf_hist /= sum(subc_ccf_hist)
/phylogicndt/Cluster/DpEngine.py:591: RuntimeWarning: divide by zero encountered in log
  t1 = sum((np.log(k_prob) * k_prob)[ix])
/phylogicndt/Cluster/DpEngine.py:591: RuntimeWarning: invalid value encountered in multiply
  t1 = sum((np.log(k_prob) * k_prob)[ix])
/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py:429: DataConversionWarning: Data with input dtype float128 was converted to float64 by the normalize function.
  warnings.warn(msg, _DataConversionWarning)
/phylogicndt/data/Patient.py:349: RuntimeWarning: invalid value encountered in greater
  if np.any(mut_coincidence > 0.):
/phylogicndt/BuildTree/Tree.py:277: RuntimeWarning: divide by zero encountered in log
  log_dist = np.log(dist, dtype=np.float64)
/phylogicndt/BuildTree/CellPopulationEngine.py:58: RuntimeWarning: divide by zero encountered in log
  log_dist = np.log(dist, dtype=np.float32)

specifying output folder for singularity

Hi,

i would love to use this through singularity in our HPC, but the issue is, that the program writes to the location of the python script, which errors out on the read-only filesystem singularity is based on.

I created a workaround with sandbox mode in singularity, but it would be amazing, if there was an option to specify the output location for the tool. So it can be more widely used.

Cheers,
Sebastian

PureCN implementation of ABSOLUTE as input for phylogicNDT

Hello!

Thanks for developing and supporting PhylogicNDT.

I used PureCN (based on ABSOLUTE ideas to generate input for PhylogicNDT):
https://bioconductor.org/packages/release/bioc/html/PureCN.html

Namely, I used this https://github.com/naumenko-sa/bioscripts/blob/master/phylogicndt/purecn2phylogicndt.R
script to select columns for files with variants and segments. Also, I was using purity information from PureCN.

My samples.sif file:

sample_id	maf_fn	seg_fn	purity	timepoint
s1	s1.maf.tsv	s1.cn.tsv	0.89	1
s2	s2.maf.tsv	s2.cn.tsv	0.68	1
s3	s3.maf.tsv	s3.cn.tsv	0.94	1

maf_fn file format:

Hugo_Symbol	Chromosome	Start_position	Reference_Allele	Tumor_Seq_Allele2	t_ref_count	t_alt_count	local_cn_a1	local_cn_a2

seg_fn file format as in the example:
https://github.com/broadinstitute/PhylogicNDT/blob/master/ExampleData/Simulations/Example_SegFile.txt

ID	Chromosome	Start_position	End_Position	A1_CN	A2_CN

PhylogicNDT call (clustering and tree building):

PhylogicNDT.py \
Cluster \
-i project \
-sif samples.sif \
--maf_input_type calc_ccf \
-rb

I'm getting a parsing error:

  File "/PhylogicNDT/data/Sample.py", line 478, in _resolve_CnEvents
    start = int(row['Start.bp'])

https://github.com/broadinstitute/PhylogicNDT/blob/master/data/Sample.py#L478

PhylogicNDT interprets it as alleliccapseg format which is different from what I've generated.
How to force timing format for CN files?

When running without CN files phylogic_report.html has been generated (but CNVs were not mapped).

Does anybody else have experience using PureCN output as input for PhylogicNDT?

Thanks!
Sergey

BuildTree & GrowthKinetics cannot use -i parameter

BuildTree & GrowthKinetics cannot use -i parameter, if I omit -i, PhylogicNDT also cannot run.

A minor issue in SinglePatientTiming

Dear Developer,
With the new commit, the previous "out of bounds error when running SinglePatientTiming" has been solved. While there seem to be another minor issue popped up an error message returned is shown below:

Traceback (most recent call last):
  File "${PhylogicNDT_HOME}/PhylogicNDT.py", line 515, in <module>
    args.func(args)
  File "${PhylogicNDT_HOME}/SinglePatientTiming/SinglePatientTiming.py", line 51, in run_tool
    timing_engine.time_events()
  File "${PhylogicNDT_HOME}/SinglePatientTiming/TimingEngine.py", line 216, in time_events
    cn_event.get_pi_dist_for_gain()
  File "${PhylogicNDT_HOME}/SinglePatientTiming/TimingEngine.py", line 560, in get_pi_dist_for_gain
    p2_real = self.get_p2_dist_for_gain()
  File "${PhylogicNDT_HOME}/SinglePatientTiming/TimingEngine.py", line 548, in get_p2_dist_for_gain
    mut_p2_dist += np.sum(mut.log_mult_dist, 1)
  File "${Python_Home}/envs/Conda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 1848, in sum
    out=out, **kwargs)
  File "${Python_Home}/envs/Conda2/lib/python2.7/site-packages/numpy/core/_methods.py", line 32, in _sum
    return umr_sum(a, axis, dtype, out, keepdims)
ValueError: 'axis' entry is out of bounds

Hope this information would help, thanks a lot for updating!

example PhylogicNDT/ExampleRuns/PhylogicNDT-Example.ipynb is broken

command

!./PhylogicNDT.py Cluster -i Test_Clust -s sample_00:MySimulation_0.txt:None:0.7:0 -s sample_01:MySimulation_1.txt:None:0.7:1 -s sample_02:MySimulation_2.txt:None:0.7:2 -s sample_03:MySimulation_3.txt:None:0.7:3

output

/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/phylogicndt.log
Namespace(Delete_Blacklist=False, Pi_k_mu=3, Pi_k_r=3, PoN='false', artifact_blacklist='./data/supplement_data/Blacklist_SNVs.txt', artifact_whitelist='', blacklist_threshold=0.1, buildtree=False, cancer_type='All_cancer', driver_genes_file='./data/supplement_data/Driver_genes_v1.0.txt', func=<function run_tool at 0x7f4dc4c47950>, gistic_fn=None, grid_size=101, html=True, impute_missing=False, indiv_id='Test_Clust', iter=250, maf=False, min_cov=8, n_samples=0, order_by_timepoint=False, sample_data=['sample_00:MySimulation_0.txt:None:0.7:0', 'sample_01:MySimulation_1.txt:None:0.7:1', 'sample_02:MySimulation_2.txt:None:0.7:2', 'sample_03:MySimulation_3.txt:None:0.7:3'], scale=False, seed=None, sif=None, time_points=None, treatment_data=None, tumor_size=None, use_indels=False)
['sample_00:MySimulation_0.txt:None:0.7:0', 'sample_01:MySimulation_1.txt:None:0.7:1', 'sample_02:MySimulation_2.txt:None:0.7:2', 'sample_03:MySimulation_3.txt:None:0.7:3']
Traceback (most recent call last):
  File "./PhylogicNDT.py", line 520, in <module>
    args.func(args)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/Cluster/Cluster.py", line 75, in run_tool
    purity=purity)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Patient.py", line 148, in addSample
    purity=purity, timepoint_value=timepoint_value)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 91, in __init__
    self.CnProfile = self._resolve_CnEvents(seg_file, input_type=seg_input_type)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 507, in _resolve_CnEvents
    raise NotImplementedError('Input file type not supported')
NotImplementedError: Input file type not supported

got it to run with command

!./PhylogicNDT.py Cluster -i Test_Clust -s sample_00:MySimulation_0.txt::0.7:0 -s sample_01:MySimulation_1.txt::0.7:1 -s sample_02:MySimulation_2.txt::0.7:2 -s sample_03:MySimulation_3.txt::0.7:3

as None was casted as a string in variable seg_file, and then if seg_file was used, but returned True, as the string was not empty

Docker build fails

Greetings,

I have been unable to build the Docker container on a RHEL 7 server with "Docker version 20.10.7, build f0df350" and on a Macintosh with "Docker version 20.10.11, build dea9396". I see the following messages (from RHEL 7):

$ docker build --tag phylogicndt .
-- ''messages:''
Sending build context to Docker daemon 4.559MB
Step 1/26 : from bitnami/minideb
latest: Pulling from bitnami/minideb
f8c1c832ce65: Pull complete
Digest: sha256:113aec70ae64fcf8752e70f07fca47652fad1cc5d0e252313d67fcfb0ac112aa
Status: Downloaded newer image for bitnami/minideb:latest
---> 1e6d839039bd
Step 2/26 : RUN install_packages python-pip build-essential python-dev r-base r-base-dev git graphviz python-tk
---> Running in ac75ee173510
Reading package lists...
Building dependency tree...
Package python-pip is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
python3-pip

   E: Package 'python-pip' has no installation candidate
   apt failed, retrying
   Reading package lists...
   Building dependency tree...
   Package python-pip is not available, but is referred to by another package.
   This may mean that the package is missing, has been obsoleted, or
   is only available from another source
   However the following packages replace it:
     python3-pip

   E: Package 'python-pip' has no installation candidate
   apt failed, retrying
   Reading package lists...
   Building dependency tree...
   Package python-pip is not available, but is referred to by another package.
   This may mean that the package is missing, has been obsoleted, or
   is only available from another source
   However the following packages replace it:
     python3-pip

   E: Package 'python-pip' has no installation candidate
   The command '/bin/sh -c install_packages python-pip build-essential python-dev r-base r-base-dev git graphviz python-tk' returned a non-zero code: 100

Please tell me how to proceed.

Regards,
Eric Sisson

ABSOLUTE for CCF

Hi - I am trying to use Absolute to make the CCF histograms required for the algorithm.

I have both allele-specific copy number and total copy number but I cant figure out how to properly format the inputs for absolute.

There is no example outputs from hapseg I could find to see if I could somehow fit my data to that format and I couldn't find an answer for what to put for num_probes with WGS data. My understanding is that segment_mean is log2 ratio.

Would I be fine just using the pipeline without supplying the histogram info as is mentioned in the instruction or better to get histogram through absolute?

Bad clustering with combines 0 CCF and 1 CCF mutations

Hi,

I have been using PhylogicNDT on multiple samples now and on one I realised a very weird clustering result, where variants with 0 CCF are combined with 1 CCF mutations

So this TP53 mutation, which doesnt have any read support in sample 1 and 4 is clustered together with variants with read support and subsequently is shown at 100% and 50% CCF, even though it isnt present at all in these samples.

I have played around with the parameters, but nothing seems to affect this clustering.

The pictures I posted here were generated with:

./PhylogicNDT.py Cluster -sif /dawson_genomics/Projects/CASCADE/CA80/analysis/joint/PhylogicNDT/inputs/samples.sif --impute --n_iter 1000 --maf_input_type calc_ccf -i CA80 --driver_genes_file ~/PhylogicNDT/data/supplement_data/Driver_genes_v1.0.txt --Pi_k_r 10 --Pi_k_mu 50

And the version is the latest from this github.

What can I adjust to make the clustering more granular, because there are 850 variants in this cluster.

Kind regards,
Sebastian

--impute crashes

Hey,

I am running an analysis of my samples and after a few seconds the program crashes with

Traceback (most recent call last):
  File "/home/shollizeck/PhylogicNDT/PhylogicNDT.py", line 342, in <module>
    args.func(args)
  File "/home/shollizeck/PhylogicNDT/Cluster/Cluster.py", line 74, in run_tool
    patient_data.preprocess_samples()
  File "/home/shollizeck/PhylogicNDT/data/Patient.py", line 275, in preprocess_samples
    mis_sample._mut_varstring_hastable[mut.var_str] = mis_sample.concordant_variants[-1]
AttributeError: TumorSample instance has no attribute '_mut_varstring_hastable'

only when setting --impute though.

ModuleNotFoundError: No module named 'Sample'

Hello,
I just cloned PhylogicNDT repostitory and I am trying to run it and I get the following error when running:
'./PhylogicNDT.py Cluster -i Patient_ID -s Sample1_id:Sample1_maf:Sample1_CN_seg:Sample1_Purity:Sample1_Timepoint -s Sample2_id:Sample2_maf:Sample2_CN_seg:Sample2_Purity:Sample2_Timepoint ... SampleN_info '
C:\Users\Vicente\python\Phylogic\PhylogicNDT\phylogicndt.log

ModuleNotFoundError Traceback (most recent call last)
~\python\Phylogic\PhylogicNDT\PhylogicNDT.py in
28 import BuildTree.CellPopulation
29 import GrowthKinetics.GrowthKinetics
---> 30 import SinglePatientTiming.SinglePatientTiming
31 import LeagueModel.LeagueModel
32

~\python\Phylogic\PhylogicNDT\SinglePatientTiming\SinglePatientTiming.py in
2 import numpy as np
3 import itertools
----> 4 from data.Patient import Patient
5 import TimingEngine
6 from output.PhylogicOutput import PhylogicOutput

~\python\Phylogic\PhylogicNDT\data\Patient.py in
2 # Patient - central class to handle and store sample CCF data
3 ##########################################################
----> 4 from Sample import TumorSample
5 from Sample import RNASample
6 from SomaticEvents import SomMutation, CopyNumberEvent

ModuleNotFoundError: No module named 'Sample'

How can I solve this?
I am pretty new to Python, so I would appreciate any help. I am using Jupyter to run this. Thanks!

Protein_Change and Variant_Classification are None

Hi.
I've run PhylogicNDT Cluster based on mutation file only. Here is the output where Protein_Change and Variant_Classification are None.
PhylogicNDT.py Cluster -i ACEJ -sif sif/ACEJ.sif -rb -mt calc_ccf --impute --use_indels

docker permission denied each time

Hi, I installed it through docker. But every time I start to use phylogicNDT, permission denied. I need to use chomd to fix it. Is it because something wrong during the operation? Do you have ways to solve it? Thanks!

Is the CorrectBias module undefined?

Hi,

I have carefully read your bioRxiv preprint. But the two compents ( CorrectBias and Contional Timing) of PhylogicNDT package have been not found. Are they integrated into other modules? if not, do you can provide their related codes? Thank you very much!

Looking forward your reply.

Bestwishes,

SinglePatientTiming: get_arm_level_cn_events

Hello,

Thanks for developing the tool. I was able to run without problems the 'Cluster' module for 39 samples individually (each sample separately) using the code:
./PhylogicNDT.py Cluster -i ${i} -sif /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/${i}_input.sif -rb --maf_input_type calc_ccf --impute --use_indels -drv /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/Drivers.txt

Where the maf input was:
Hugo_Symbol Chromosome Start_position Reference_Allele Tumor_Seq_Allele2 t_ref_count t_alt_count local_cn_a1 local_cn_a2 TRIM62 chr1 33147403 G A 54 28 4 0

The Sif is as follows:
sample_id maf_fn seg_fn purity timepoint
CNSL002T /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/CNSL002T.txt 0.567749795730953 0

Then I try to run 'SinglePatientTiming' for each sample as follows:
for i in CNSL002T; do /home/user/PhylogicNDT/PhylogicNDT.py SinglePatientTiming -i ${i} -sif /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/${i}_input_SinglePatientTiming.sif -drv /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/Drivers.txt; done
The .sif file is as follows:

sample_id maf_fn seg_fn purity timepoint
CNSL002T /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/CNSL002T.mut_ccfs.txt /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/CNSL002T_SegFile.txt 0.567749795730953 0

Where the "CNSL002T.mut_ccfs.txt" is the result from the 'Cluster' module and the 'CNSL002T_SegFile' has the following format:
Chromosome Start End A1.Seg.CN A2.Seg.CN
1 782094 1637096 5 0

I get the following error:
for i in CNSL030T; do /home/user/PhylogicNDT/PhylogicNDT.py SinglePatientTiming -i ${i} -sif /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/${i}_input_SinglePatientTiming.sif -drv /media/user/Seagate_Exome67_133/LIGUE/WES_analysis_R/MOVICS/Ordering_mutations/CS1/Input/Drivers.txt; done /home/user/PhylogicNDT/phylogicndt.log Traceback (most recent call last): File "/home/user/PhylogicNDT/PhylogicNDT.py", line 515, in <module> args.func(args) File "/home/user/PhylogicNDT/SinglePatientTiming/SinglePatientTiming.py", line 50, in run_tool timing_engine = TimingEngine.TimingEngine(patient_data, min_supporting_muts=args.min_supporting_muts) File "/home/user/PhylogicNDT/SinglePatientTiming/TimingEngine.py", line 42, in __init__ self.get_arm_level_cn_events() File "/home/user/PhylogicNDT/SinglePatientTiming/TimingEngine.py", line 166, in get_arm_level_cn_events clonal_concordance = np.prod(cluster_ccfs[1][sample_idx, ccf_idx]) KeyError: 1

Any idea what is causing it?

Thanks in advance

buildtree not showing private mutations

Hi
I am using 2 tumor samples from the same patient and when I try to cluster and build a tree with them, the private mutations are excluded. Is there a way to include them? am I missing something?

cluster 1 with clust_ccf_mean equal 1 was the clonal cluster ?

Hello,
how could i identify which cluster was the clonal cluster? cluster 1 with clust_ccf_mean equal 1 was the clonal cluster ? am i right? but i found more than one cluster has clust_ccf_mean equal 1， Is there any way to choose clonal cluster？
Best regards,
sur

Cannot run with example data

Hey,

I tried to run this with the example data provided, but it crashes with

./PhylogicNDT.py Cluster -i Patient_ID --sample_information ExampleData/ExampleSamples/MySimulation_input.sif 
Namespace(Delete_Blacklist=False, Pi_k_mu=3, Pi_k_r=3, PoN='false', artifact_blacklist='./data/supplement_data/Blacklist_SNVs.txt', artifact_whitelist='', buildtree=False, cancer_type='All_cancer', driver_genes_file='./data/supplement_data/Driver_genes_v1.0.txt', func=<function run_tool at 0x7f561740dde8>, gistic_fn=None, grid_size=101, html=True, impute_missing=False, indiv_id='Patient_ID', iter=250, maf=False, min_cov=8, n_samples=0, order_by_timepoint=False, sample_data=None, scale=False, sif='ExampleData/ExampleSamples/MySimulation_input.sif', time_points=None, treatment_data=None, tumor_size=None, use_indels=False)
Traceback (most recent call last):
  File "./PhylogicNDT.py", line 342, in <module>
    args.func(args)
  File "/home/shollizeck/PhylogicNDT/Cluster/Cluster.py", line 17, in run_tool
    import ClusterEngine
  File "/home/shollizeck/PhylogicNDT/Cluster/ClusterEngine.py", line 11, in <module>
    import DpEngine
  File "/home/shollizeck/PhylogicNDT/Cluster/DpEngine.py", line 9, in <module>
    import sklearn.manifold as manifold
ImportError: No module named sklearn.manifold

The help works though. So i can see the -h output

Would be very interested in using this

Cheers,
Sebastian

Error: No module named emd

Hello,
I installed PhylogicNDT on Docker as instructed, but PhylogicNDT.py exits with:
root@21913eaf8cc6:/phylogicndt# ./PhylogicNDT.py --help
INFO:root:Using fast logsumexp
/phylogicndt/BuildTree/build_tree_log.log
Traceback (most recent call last):
File "./PhylogicNDT.py", line 19, in <module>
import BuildTree.CellPopulation
File "/phylogicndt/BuildTree/CellPopulation.py", line 6, in <module>
from .BuildTreeEngine import BuildTreeEngine
File "/phylogicndt/BuildTree/BuildTreeEngine.py", line 7, in <module>
import ShuffleMutations
File "/phylogicndt/BuildTree/ShuffleMutations.py", line 4, in <module>
from emd import emd
ImportError: No module named emd

The same happens with the local installation.
Running on Linux Mint 19.1.

Best regards,
Andrej

IndexError

Hey,

with my command

~/PhylogicNDT/PhylogicNDT.py Cluster -i sample_id -sif CA99_input.sif -rb

I get

Traceback (most recent call last):
  File "/home/shollizeck/PhylogicNDT/PhylogicNDT.py", line 342, in <module>
    args.func(args)
  File "/home/shollizeck/PhylogicNDT/Cluster/Cluster.py", line 91, in run_tool
    phylogicoutput.write_patient_mut_ccfs(patient_data, cluster_ccfs)
  File "/home/shollizeck/PhylogicNDT/output/PhylogicOutput.py", line 829, in write_patient_mut_ccfs
    mut_mean, mut_high, mut_low = self._get_mean_high_low(np.array(mut.ccf_1d))
  File "/home/shollizeck/PhylogicNDT/output/PhylogicOutput.py", line 878, in _get_mean_high_low
    elif low == 0 or ccf[high + 1] > ccf[low - 1]:
IndexError: index 5051 is out of bounds for axis 0 with size 101

can you help me fix that?

Cheers,
Sebastian

"IndexError: list index out of range" in plot_1d_clusters

Hi,

I seem to have run into an issue when generating the plots with the command

~/bin/PhylogicNDT/PhylogicNDT.py Cluster -i CA99 -sif /dawson_genomics/Projects/CASCADE/CA99/analysis/phylogicNDT/samples.sif -rb --maf_input_type calc_ccf

There were file produced

total 904M
drwxrwsr-x 2 shollizeck res_dawsongenomics  512 Oct 14 23:36 CA99_1d_cluster_plots
-rw-r--r-- 1 shollizeck res_dawsongenomics 549K Oct 14 23:34 CA99.cluster_ccfs.txt
-rw-rw-r-- 1 shollizeck res_dawsongenomics 1.7K Oct 14 23:36 CA99.cnvs.txt
-rw-r--r-- 1 shollizeck res_dawsongenomics 903M Oct 14 23:36 CA99.mut_ccfs.txt
-rw-rw-r-- 1 shollizeck res_dawsongenomics 111K Oct 14 23:36 CA99.unclustered.txt

but the 1d cluster plots folder is empty and the log file displays the following error at the end

Traceback (most recent call last):
  File "/home/shollizeck/bin/PhylogicNDT/PhylogicNDT.py", line 515, in <module>
    args.func(args)
  File "/home/shollizeck/bin/PhylogicNDT/Cluster/Cluster.py", line 100, in run_tool
    phylogicoutput.plot_1d_clusters('{}.cluster_ccfs.txt'.format(patient_data.indiv_name))
  File "/home/shollizeck/bin/PhylogicNDT/output/PhylogicOutput.py", line 703, in plot_1d_clusters
    dist.setAttribute('fill', ClusterColors.get_rgb_string(c))
  File "/home/shollizeck/bin/PhylogicNDT/output/PhylogicOutput.py", line 1456, in get_rgb_string
    return 'rgb({},{},{})'.format(*cls.color_list[c])
IndexError: list index out of range

Could it be, that the program ran out of colours to use, as the CA99.cluster_ccfs.txt says there are 81 clusters to be considered?

Cheers,
Sebastian

Indels not added in after clustering

Hi,

I ran phylogicNDT through Terra a few days ago and when reviewing output noticed that indels with high ccf (for example ccf_hat of > 0.9 across multiple samples per participant) are not assigned back in for output after they are initially removed for clustering. I would be grateful for your input as to whether this was done intentionally (i have previously run the same samples a few months ago and the indels were included in the output at that time) and also for more education on why indels are specifically removed prior to clustering. Thanks!

Alok

Protein_change and Protein_Change are inconformity in different scripts.

Protein_change and Protein_Change are inconformity in different scripts:
the input maf should use "Protein_Change" and then "Protein_change" of the output file *.mut_ccfs.txt could contain information from input maf other than 'none'.

Min number of mutations in each cluster

what is the best approach to control cluster size? I run PhylogicNDT with WGS data and some clusters I found has less than 20 mutations. Usually these clusters have very low probability density with wide CCF variance as expected I think. Is there a way to control min cluster size directly or make PhylogicNDT ignore these clusters during the contraction of the tree and assign them to the most probable nodes later on?

Running cluster step on hypermutated tumour mutations

Hello,

Thank you for this very useful software!

I am trying to run PhylogicNDT.py Cluster on tumours with very large numbers of mutations (e.g. ~900,000) and do not have output even after a week of runtime. I was wondering if there are parameters that could be changed that could help with this step (is there any possibility of parallelising over multiple cores for example?)

Thanks very much

Best wishes

Ben

ccf histogram calculation

Hello, we are interested in testing PhylogicNDT with our own multi-sample sequencing data. How should we compute raw ccf histograms for clustering? Thank you.

More than one tree was generated in Phylogic Results

Hi,

When I ran PhylogicNDT, for some samples I get more than one tree in my Phylogic results.
Could you tell me what is the difference between those generated trees and how to choose the most appropriate one? Is the relationship between the clones different in each generated tree?
Thank you.

behavior parameter --impute

Dear developers,

I was viewing the result from clustering two samples with --impute parameter.
And noticed that many mutations that are private to either sample (having some counts in sample A but zero count in sample B) are clustered into Cluster 1 (which should be the common ancestor); however, I originally thought they would go into private leaf nodes.

It seems that I misunderstood the usage of --impute, would you shortly explain in which circumstances it shoud be used?

Thank you very much!

`--order_by_timepoint` not working

Hello,

--order_by_timepoint option is not implemented yet, but this is not documented. Therefore the order of samples in the SIF file MUST be ordered by timepoint, otherwise all the results (except for the pie charts) will be assigned to wrong samples, i.e. samples will be swapped.

For example, this SIF file is correct:

sample_id  maf_fn            seg_fn  purity  timepoint
3X_T       3X_T_ABS_MAF.txt          0.22    1
3X_M       3X_M_ABS_MAF.txt          0.26    2

But this will produce wrong results:

sample_id  maf_fn            seg_fn  purity  timepoint
3X_M       3X_M_ABS_MAF.txt          0.26    2
3X_T       3X_T_ABS_MAF.txt          0.22    1

Best,
Andrej

What citation should be used for PhylogicNDT?

The result of ABSOLUTE v1.0.6 cannot be used

As the manual says, Absolute annotated mafs or .Rdata files can be used as the input for PhylogicNDT.However, ABSOLUTE V1.06 can only output the CCF of the first mutation. Others are NAs. Besides, the R script "rdata_extractor.R" extracts "seg.obj$mode.res$SSNV.ccf.dens" from the RData file of ABSOLTUE, while it DOES NOT exist in the RData file from ABSOLUTE. Can the author provide another version of ABSOLUTE whose RData file contains seg.obj$mode.res$SSNV.ccf.dens ?

Allele-specific copy number from PhylogicNDT or ABSOLUTE

Hello!

I was wondering How I can answer this question from either PhylogicNDT out or ABSOLUTE output:

For a particular mutation, what is the mutant allele copy number and what is the total copy number?

In the .mut_ccfs.txt file I see Allelic_CN_minor and Allelic_CN_major fields but how do I know if the mutant allele is the major allele or minor allele?

Thanks,

Juber

Log output destination configurable?

I there a way to specify where the log is output? Every time PhylogicNDT.py is run it overwrites phylogicndt.log to the cloned git repo.

(phylogicndt) ubuntu@ip-10-10-4-172:~/PhylogicNDT$ PhylogicNDT.py 
/home/ubuntu/PhylogicNDT/phylogicndt.log
usage: PhylogicNDT.py [-h]
                      {Cluster,BuildTree,CellPopulation,GrowthKinetics,PhylogicSim,Timing,SinglePatientTiming,LeagueModel}
                      ...

out of bounds error when running SinglePatientTiming

Dear developer,

When running SinglePatientTiming, I encounter an error "*** ValueError: ValueError("'axis' entry is out of bounds",)"

It came from Line 548 in PhylogicNDT/SinglePatientTiming/TimingEngine.py
get_p2_dist_for_gain() -> mut_p2_dist += np.sum(mut.log_mult_dist, 1)

I would like to know whether mut.log_mult_dist issupposed to be an 1D array?
In that case is it safe that I just modify it to mut_p2_dist += np.sum(mut.log_mult_dist, 0) and continue with downstream analysis?

Thank you very much :)

CCF of child cluster greater than parents

Hello!

We're testing PhylogicNDT on multi-sample sequencing data, and looking to better understand cluster CCF. In this run (see below), cluster 2 is descendent from cluster 1. However, samples T0, T6, and T8 show a higher CCF of cluster 2 mutations than cluster 1, and their confidence intervals do not overlap. Are we using the tool correctly, and is there a way to ensure that cluster CCFs follow the sum rule?

Thank you!

Copy number file cannot be properly used by PhylogicNDT

I tried to make a .seg file from FACETS, but failed. The error was "raise NotImplementedError('Can only read absolute segtab file')" . Then I use the *segtab.txt file as CNV input, it also cannot work. The error was "KeyError: 'cancer.cell.frac.a1'"

broadinstitute / phylogicndt Goto Github PK

phylogicndt's Introduction

PhylogicNDT

Installation

Manual Install

Docker Install

Using the Package

Clustering

BuildTree (and GrowthKinetics)

SinglePatientTiming

LeagueModel

PhylogicSim

phylogicndt's People

Contributors

Stargazers

Watchers

Forkers

phylogicndt's Issues

Recommend Projects

Recommend Topics

Recommend Org