harbourlab / uphyloplot2 Goto Github PK

View Code? Open in Web Editor NEW

67.0 67.0 24.0 222 KB

Draw phylogenetic trees of tumor evolution

Python 100.00%

uphyloplot2's People

Contributors

Stargazers

uphyloplot2's Issues

Can I set the width of the tree branches and distance between each two tree branches?

Hi harbourlab,

Many thanks for providing this useful tool! I have two questions when trying to include it in my analysis. It would be great if you could help me with it.

I wonder if there is any way to set the gaps between the tree branches? Some tree branches will appear covered by others (e.g. the K-L branch covers the B-E-G branch below). Or is there a way to let K-L swing a little bit to the left side ? In your Nature Communications paper, there seems not this kind of issue. Did you change anything manually in the uphyloplot2.py file?

Any information would be useful. Many thanks in advance!

Also, I wonder if those branches with 0 percentage are redundant (like branches B, C, I, J, K shown below)? Do I need to get rid of them for plotting?

Leiden - script fail

Hi, thanks for this tool ;) A quick Q. I already ran InferCNV, HMM = T, on current "best", which is Leiden.

io2 = infercnv::run(io1,
                    cutoff=0.1, 
                    out_dir="cutoff0_1_res0_000375_HMM", 
                    cluster_by_groups=F, 
                    HMM=T, 
                    analysis_mode='subclusters',
                    tumor_subcluster_partition_method='leiden',
                    leiden_resolution=0.000375,
                    denoise=T,
                    sd_amplifier=2,
                    #up_to_step = 15, 
                    resume_mode = TRUE,
                    num_threads=14
)

I used this for some downstream analysis, but I now want to make a phylogenetic tree, ideally without re-running all prior analysis steps.
Your tool looks good.
I ran the test fine. But on my data, it fails.

Here is a snippet from test, and from my data:

Test:

cell_group_name	cell
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1	GATTCAGAGACGCAAC
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1	GGCCGATCAAGTTCTG
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1	AACTCTTAGACGCTTT
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1	CACATTTGTACAGCAG

Mine:

cell_group_name	cell
all_observations.all_observations_s1	AAAGTGAAGTGGAAGA
all_observations.all_observations_s1	AACAACCTCAGTCTTT
all_observations.all_observations_s1	AACCACAGTTTGGGTT
all_observations.all_observations_s1	AACGGGACAAGCGCAA

For my one, I edited out the reference cells, and also prefixes like REL_ TN_ ahead of cell names. Nonetheless, the result of mine is:

,100.0

Where the test is like:

1,0.0
1.1,0.0,B
1.1.1,0.0,C
1.1.1.1,15.776699029126213,D
1.1.1.2,16.74757281553398,E
1.1.2,0.0,F
1.1.2.1,15.29126213592233,G
1.1.2.2,27.9126213592233,H
1.2,0.0,I
1.2.1,0.0,J
1.2.1.1,9.466019417475728,K
1.2.1.2,6.553398058252427,L

I guess the difference relates to the .1.1.1.1 etc format, where I have .1-15.
Do you know if there is some way I can get round this, or a way I could make my file in the format?

Would be great to know, as would be very useful if it was possible to run this tool with the Leiden approach ..

inferCNV - when you have more than one type of reference cells

I am new to inferCNV and am trying to understand how the program works. The way I have it in my head right now is that the reference cell's expression distribution (the average expression?) is subtracted from both the normal cell expression and tumor cell expression, leaving the normal cell expression nearly void and the tumor cell expression showing the CNVs more clearly if they exist.
My question is, if there is more than one set of reference cells, how are they averaged? Are each of their distributions compared with the tumor cells separately or are their averages averaged again to be subtracted from the respective reference cell expressions and the tumor cell expressions?

Oh, I also have another problem - an error occured at Step 18 :

STEP 18: Run Bayesian Network Model on HMM predicted CNV's

INFO [2020-09-12 21:08:32] Initializing new MCM InferCNV Object.
INFO [2020-09-12 21:08:32] validating infercnv_obj
Error in (function (cl, name, valueClass) :
assignment of an object of class “logical” is not valid for @‘cnv_regions’ in an object of class “MCMC_inferCNV”; is(value, "factor") is not TRUE

The data is scRNAseq 10X genomics, using cutoff 0.1, cluster_by_groups = FALSE. But the tumor cells in the cell annotation file are left clustered (ex. TN57_01, TN57_02, TN57_03)...if that possibly brings up some issues.

Thank you in advance !

Figure 2 of your Nat Common paper

Hello,

Thank you for providing this very useful tool. I am wondering how/where can I find information about q and p to generate a heatmap similar to figure 2b in your paper, and also add the information about LOH rain the clonality trees?

Thank you!

Error when running code uphyloplot2 version 2.3 using the test data

"...\uphyloplot2-master\uphyloplot2.py", line 166, in main
if len(data_row[0].split(".")) > longest_tree:

IndexError: list index out of range

Infercnv subcluster nomenclature

Hi team,

Thanks for the great package!

I am attempting to generate phylogenetic trees with subclustering info from infercnv, but it looks like my .cell_groupings file has a different subcluster nomenclature. When I run infercnv with the same sample data and code that's described in the Uphyloplot2 tutorial, the subcluster names in the .cell_groupings file are "all_observations.all_observations_s1" and so on. I am not seeing any groups labeled with the "all_observations.all_observations.1.1.1.1" format. Same pattern when I use my own data. Any idea why this might be?

Also I was thinking of trying to use the dendrogram file instead of the groupings, but I am not seeing the "newick_input.py" file anywhere in the Uphyloplot2 directory. Does this need to be ddownloaded from elsewhere?

Thank you!

Best,
Kaleab

Mapping subclones loss/gains

Hi there,

I was wondering how do I manually curate the loss/gains from the HMM*.pred_cnv_regions.dat file with the *cell_groupings file? How do I match each branches of the phylogenetic tree?

Thanks

Phylograms look the same

Hi there,

thank you for creating this awesome script. Unfortunately, I ran into some problems with creating the trees. I have 10x data from 10 different patients and performed the downstream analyses with Seurat.

When I run infer CNV, I get the heatmaps that are very consistent with FISH data and CNVs called from WES. When I use UPhyloplot2, all trees look the same, although there are clear differences between the samples.

Might it be a problem that I downsampled to 100 cells? Raw data from each sample contains more than 10,000 cells.

I ran inferCNV with HMM and analysis_mode = "subclusters" and a logistic noise filter.

Thanks in advance

Max

I can"t find the HMM_CNV_predictions.HMMi6.rand_trees.hmm_mode-subclusters.Pnorm_0.5.cell_groupings outcome

I can"t find the HMM_CNV_predictions.HMMi6.rand_trees.hmm_mode-subclusters.Pnorm_0.5.cell_groupings outcome from InferCNV outcome,I wonder if I use the wrong parameter in InferCNV step，The code for the InferCNV step as follows：

infercnv_obj <- infercnv::CreateInfercnvObject(raw_counts_matrix=exprMatrix,
gene_order_file=mm_geneLocate,
annotations_file=cellAnnota,
ref_group_names=c("control"))
infercnv_obj = infercnv::run(infercnv_obj,
cutoff=0.1,
out_dir='inferCNV/positive_1',
cluster_by_groups=TRUE,
denoise=TRUE,
HMM=TRUE,
num_threads=30)

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] infercnv_1.2.1

can be used for copykat?

Hi,

Thanks for your developed tool. I was wondering how to use this tool with the input from copykat, not infercnv?
Could you give me a hand?

Thanks!

Compatibility with inferCNV v1.16.0

"cell_groupings" files are no longer generated as outputs when running infercnv::run() (and none of the standard output files match the format required for generating a tree).

compatibility with infercnv 1.9.1

Hi. Thanks for your great job!

Recently I am working with infercnv(version:1.9.1) ， after run inferCNV as you mentioned in the manual, the cell_grouping result was distinct as you mentioned.

main parameters as follow:
cutoff=0.1, window_length= 20, out_dir='./res', cluster_by_groups=FALSE,denoise=T,analysis_mode='subclusters',HMM_type='i6',HMM=TRUE

So, any compatibility problems with new version of infercnv ?

infercnv::run error - Error in if (run_arguments$HMM) { : argument is of length zero

Hi there,

I'm having an issue with the infercnv::run code. I'm getting this error with the exact code shown:

Plus, for 10x data, shall the cutoff be 0.1 as suggested by inferCNV?

Thanks

Error running with cluster_by_groups set to FALSE

Hi,

Thanks for this awesome tool. I encounter the following error when I set cluster_by_groups set to FALSE:

Error in if (runif(1) <= padj) { : missing value where TRUE/FALSE needed

This is my infercnv code:
infercnv_obj = infercnv::run(infercnv_obj,
cutoff=0.1,
out_dir='output',
cluster_by_groups=FALSE,
denoise=TRUE,
HMM=TRUE)

Any ideas why this might be the case?

Thank you!

Uphyloplot2 with different samples

Hello all and thanks for developing this tool,

In the case of having different tumors from the sample samples (such as sample pre-treatment and in relapse), would it be possible to apply uphyloplot2 to the merged samples (diagnostic+relapse integrated) instead of individually? Would it be recommended?

Thanks

how to choose tree length and can one use a partitioning method other than random tree?

Dear developers,

First, thank you for this very useful tool.
Second, can you explain how one should choose the length of the tree?

Moreover, can you elaborate on why should we use random tree during infercnv run and if leiden could produce reliable results regarding your tool?
I know that the main difference between those methods is the resolution gamma (leiden) and the threshold p-value (random_tree), but playing with those parameters should lead to similar results, so why random tree?

Best,
Andy

Missing sub clones

I ran the python command and for some reason from the branches and CNV_Files there is no information for some sub clones. For example the unique sub clone names from cell groupings was
[1] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.1.1"
[2] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.1.2"
[3] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.2.1"
[4] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.2.2"
[5] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.1.1"
[6] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.1.2"
[7] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.2.1"
[8] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.2.2"

but in the CNV_files there is only
1,0.0
1.1,0.0,B
1.1.1,50.25716385011021,C
1.1.2,5.6208670095518,D
1.2,0.0,E
1.2.1,9.184423218221896,F
1.2.2,34.937545922116094,G

There never seems to cluster names with all four digits.

Using Uphyloplot2 with CaSpER data

Hello!
First of all thank you for making this wonderful and useful tool. I am currently working with CaSpER, and I was wondering how I should process the CaSpER final object (the one obtained after running the runCaSpER function) in order to obtain a file suitable for the Uphyloplot2 Python algorithm.
Thanks in advance.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 178: invalid start byte

Hi there,
Thanks for sharing this tool! I am getting this error when trying to apply it to my inferCNV output:

(upuphyloplot2) -bash-4.2$ python uphyloplot2.py -I /icgc/dkfzlsdf/analysis/OE0519_projects/chptumor/marla/CNVproject/infer_CNV/Uphyloplot2/Inputs
UPhyloplot2 version 2.2
Traceback (most recent call last):
  File "/home/m221r/.conda/envs/upuphyloplot2/lib/python3.8/site-packages/uphyloplot2/uphyloplot2.py", line 239, in <module>
    main()
  File "/home/m221r/.conda/envs/upuphyloplot2/lib/python3.8/site-packages/uphyloplot2/uphyloplot2.py", line 35, in main
    for x, line in enumerate(groupings_file):
  File "/home/m221r/.conda/envs/upuphyloplot2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 178: invalid start byte

I installed uphyloplot2 in a condo virtual environment using:
a) conda install -c amcruz uphyloplot2 --> from:https://anaconda.org/amcruz/uphyloplot2 (version2.2)
b) pip install git+https://github.com/harbourlab/uphyloplot2.git#egg=uphyloplot2 (version 2.3)

After installation of a) I could allocate an uphyloplot2.py file in the following directory: /home/user/.conda/envs/upuphyloplot2/lib/python3.8/site-packages/uphyloplot2
But running the python uphyloplot2.py command gave me the above error.

After installation of b) I could not find the uphyloplot2.py file at all.

Do you know what is causing this error and why I cannot find the uphyloplot2.py file in the 2.3 installation?
Any help would be highly appreciated. Thanks

Best

Run uphyloplot2 when multiple samples

when I have multiple samples, to get the mannual annotations precisely in NC paper Fug2.C. Do I have to run the infercnv sample by sample? Otherwise, how can i get the precise cnv of each sample?

harbourlab / uphyloplot2 Goto Github PK

uphyloplot2's People

Contributors

Stargazers

Forkers

uphyloplot2's Issues

Recommend Projects

Recommend Topics

Recommend Org