harbourlab / uphyloplot2 Goto Github PK
View Code? Open in Web Editor NEWDraw phylogenetic trees of tumor evolution
Draw phylogenetic trees of tumor evolution
Hi harbourlab,
Many thanks for providing this useful tool! I have two questions when trying to include it in my analysis. It would be great if you could help me with it.
Any information would be useful. Many thanks in advance!
Hi, thanks for this tool ;) A quick Q. I already ran InferCNV, HMM = T, on current "best", which is Leiden.
io2 = infercnv::run(io1,
cutoff=0.1,
out_dir="cutoff0_1_res0_000375_HMM",
cluster_by_groups=F,
HMM=T,
analysis_mode='subclusters',
tumor_subcluster_partition_method='leiden',
leiden_resolution=0.000375,
denoise=T,
sd_amplifier=2,
#up_to_step = 15,
resume_mode = TRUE,
num_threads=14
)
I used this for some downstream analysis, but I now want to make a phylogenetic tree, ideally without re-running all prior analysis steps.
Your tool looks good.
I ran the test fine. But on my data, it fails.
Here is a snippet from test, and from my data:
Test:
cell_group_name cell
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1 GATTCAGAGACGCAAC
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1 GGCCGATCAAGTTCTG
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1 AACTCTTAGACGCTTT
Retinoblastoma.Retinoblastoma_sRetinoblastoma.1.1.1.1 CACATTTGTACAGCAG
Mine:
cell_group_name cell
all_observations.all_observations_s1 AAAGTGAAGTGGAAGA
all_observations.all_observations_s1 AACAACCTCAGTCTTT
all_observations.all_observations_s1 AACCACAGTTTGGGTT
all_observations.all_observations_s1 AACGGGACAAGCGCAA
For my one, I edited out the reference cells, and also prefixes like REL_ TN_ ahead of cell names. Nonetheless, the result of mine is:
,100.0
Where the test is like:
1,0.0
1.1,0.0,B
1.1.1,0.0,C
1.1.1.1,15.776699029126213,D
1.1.1.2,16.74757281553398,E
1.1.2,0.0,F
1.1.2.1,15.29126213592233,G
1.1.2.2,27.9126213592233,H
1.2,0.0,I
1.2.1,0.0,J
1.2.1.1,9.466019417475728,K
1.2.1.2,6.553398058252427,L
I guess the difference relates to the .1.1.1.1 etc format, where I have .1-15.
Do you know if there is some way I can get round this, or a way I could make my file in the format?
Would be great to know, as would be very useful if it was possible to run this tool with the Leiden approach ..
I am new to inferCNV and am trying to understand how the program works. The way I have it in my head right now is that the reference cell's expression distribution (the average expression?) is subtracted from both the normal cell expression and tumor cell expression, leaving the normal cell expression nearly void and the tumor cell expression showing the CNVs more clearly if they exist.
My question is, if there is more than one set of reference cells, how are they averaged? Are each of their distributions compared with the tumor cells separately or are their averages averaged again to be subtracted from the respective reference cell expressions and the tumor cell expressions?
Oh, I also have another problem - an error occured at Step 18 :
STEP 18: Run Bayesian Network Model on HMM predicted CNV's
INFO [2020-09-12 21:08:32] Initializing new MCM InferCNV Object.
INFO [2020-09-12 21:08:32] validating infercnv_obj
Error in (function (cl, name, valueClass) :
assignment of an object of class “logical” is not valid for @‘cnv_regions’ in an object of class “MCMC_inferCNV”; is(value, "factor") is not TRUE
The data is scRNAseq 10X genomics, using cutoff 0.1, cluster_by_groups = FALSE. But the tumor cells in the cell annotation file are left clustered (ex. TN57_01, TN57_02, TN57_03)...if that possibly brings up some issues.
Thank you in advance !
Hello,
Thank you for providing this very useful tool. I am wondering how/where can I find information about q and p to generate a heatmap similar to figure 2b in your paper, and also add the information about LOH rain the clonality trees?
Thank you!
"...\uphyloplot2-master\uphyloplot2.py", line 166, in main
if len(data_row[0].split(".")) > longest_tree:
IndexError: list index out of range
Hi team,
Thanks for the great package!
I am attempting to generate phylogenetic trees with subclustering info from infercnv, but it looks like my .cell_groupings file has a different subcluster nomenclature. When I run infercnv with the same sample data and code that's described in the Uphyloplot2 tutorial, the subcluster names in the .cell_groupings file are "all_observations.all_observations_s1" and so on. I am not seeing any groups labeled with the "all_observations.all_observations.1.1.1.1" format. Same pattern when I use my own data. Any idea why this might be?
Also I was thinking of trying to use the dendrogram file instead of the groupings, but I am not seeing the "newick_input.py" file anywhere in the Uphyloplot2 directory. Does this need to be ddownloaded from elsewhere?
Thank you!
Best,
Kaleab
Hi there,
thank you for creating this awesome script. Unfortunately, I ran into some problems with creating the trees. I have 10x data from 10 different patients and performed the downstream analyses with Seurat.
When I run infer CNV, I get the heatmaps that are very consistent with FISH data and CNVs called from WES. When I use UPhyloplot2, all trees look the same, although there are clear differences between the samples.
Might it be a problem that I downsampled to 100 cells? Raw data from each sample contains more than 10,000 cells.
I ran inferCNV with HMM and analysis_mode = "subclusters" and a logistic noise filter.
Thanks in advance
Max
I can"t find the HMM_CNV_predictions.HMMi6.rand_trees.hmm_mode-subclusters.Pnorm_0.5.cell_groupings outcome from InferCNV outcome,I wonder if I use the wrong parameter in InferCNV step,The code for the InferCNV step as follows:
infercnv_obj <- infercnv::CreateInfercnvObject(raw_counts_matrix=exprMatrix,
gene_order_file=mm_geneLocate,
annotations_file=cellAnnota,
ref_group_names=c("control"))
infercnv_obj = infercnv::run(infercnv_obj,
cutoff=0.1,
out_dir='inferCNV/positive_1',
cluster_by_groups=TRUE,
denoise=TRUE,
HMM=TRUE,
num_threads=30)
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] infercnv_1.2.1
Hi,
Thanks for your developed tool. I was wondering how to use this tool with the input from copykat, not infercnv?
Could you give me a hand?
Thanks!
"cell_groupings" files are no longer generated as outputs when running infercnv::run() (and none of the standard output files match the format required for generating a tree).
Hi. Thanks for your great job!
Recently I am working with infercnv(version:1.9.1) , after run inferCNV as you mentioned in the manual, the cell_grouping result was distinct as you mentioned.
main parameters as follow:
cutoff=0.1, window_length= 20, out_dir='./res', cluster_by_groups=FALSE,denoise=T,analysis_mode='subclusters',HMM_type='i6',HMM=TRUE
So, any compatibility problems with new version of infercnv ?
Hi,
Thanks for this awesome tool. I encounter the following error when I set cluster_by_groups set to FALSE:
Error in if (runif(1) <= padj) { : missing value where TRUE/FALSE needed
This is my infercnv code:
infercnv_obj = infercnv::run(infercnv_obj,
cutoff=0.1,
out_dir='output',
cluster_by_groups=FALSE,
denoise=TRUE,
HMM=TRUE)
Any ideas why this might be the case?
Thank you!
Hello all and thanks for developing this tool,
In the case of having different tumors from the sample samples (such as sample pre-treatment and in relapse), would it be possible to apply uphyloplot2 to the merged samples (diagnostic+relapse integrated) instead of individually? Would it be recommended?
Thanks
Dear developers,
First, thank you for this very useful tool.
Second, can you explain how one should choose the length of the tree?
Moreover, can you elaborate on why should we use random tree during infercnv run and if leiden could produce reliable results regarding your tool?
I know that the main difference between those methods is the resolution gamma (leiden) and the threshold p-value (random_tree), but playing with those parameters should lead to similar results, so why random tree?
Best,
Andy
I ran the python command and for some reason from the branches and CNV_Files there is no information for some sub clones. For example the unique sub clone names from cell groupings was
[1] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.1.1"
[2] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.1.2"
[3] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.2.1"
[4] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.1.2.2"
[5] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.1.1"
[6] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.1.2"
[7] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.2.1"
[8] "malignant_HPT1Pat1.malignant_HPT1Pat1.1.2.2.2"
but in the CNV_files there is only
1,0.0
1.1,0.0,B
1.1.1,50.25716385011021,C
1.1.2,5.6208670095518,D
1.2,0.0,E
1.2.1,9.184423218221896,F
1.2.2,34.937545922116094,G
There never seems to cluster names with all four digits.
Hello!
First of all thank you for making this wonderful and useful tool. I am currently working with CaSpER, and I was wondering how I should process the CaSpER final object (the one obtained after running the runCaSpER function) in order to obtain a file suitable for the Uphyloplot2 Python algorithm.
Thanks in advance.
Hi there,
Thanks for sharing this tool! I am getting this error when trying to apply it to my inferCNV output:
(upuphyloplot2) -bash-4.2$ python uphyloplot2.py -I /icgc/dkfzlsdf/analysis/OE0519_projects/chptumor/marla/CNVproject/infer_CNV/Uphyloplot2/Inputs
UPhyloplot2 version 2.2
Traceback (most recent call last):
File "/home/m221r/.conda/envs/upuphyloplot2/lib/python3.8/site-packages/uphyloplot2/uphyloplot2.py", line 239, in <module>
main()
File "/home/m221r/.conda/envs/upuphyloplot2/lib/python3.8/site-packages/uphyloplot2/uphyloplot2.py", line 35, in main
for x, line in enumerate(groupings_file):
File "/home/m221r/.conda/envs/upuphyloplot2/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 178: invalid start byte
I installed uphyloplot2 in a condo virtual environment using:
a) conda install -c amcruz uphyloplot2 --> from:https://anaconda.org/amcruz/uphyloplot2 (version2.2)
b) pip install git+https://github.com/harbourlab/uphyloplot2.git#egg=uphyloplot2 (version 2.3)
After installation of a) I could allocate an uphyloplot2.py file in the following directory: /home/user/.conda/envs/upuphyloplot2/lib/python3.8/site-packages/uphyloplot2
But running the python uphyloplot2.py command gave me the above error.
After installation of b) I could not find the uphyloplot2.py file at all.
Do you know what is causing this error and why I cannot find the uphyloplot2.py file in the 2.3 installation?
Any help would be highly appreciated. Thanks
Best
when I have multiple samples, to get the mannual annotations precisely in NC paper Fug2.C. Do I have to run the infercnv sample by sample? Otherwise, how can i get the precise cnv of each sample?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.