Giter Site home page Giter Site logo

cbg-ethz / bnpc Goto Github PK

View Code? Open in Web Editor NEW
18.0 18.0 4.0 243 KB

Bayesian non-parametric clustering (BnpC) of binary data with missing values and uneven error rates

License: MIT License

Python 100.00%
binary-data clustering genotyping mcmc split-merge

bnpc's People

Contributors

dependabot[bot] avatar nbmueller avatar pepebonet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bnpc's Issues

how can I map my cells to an previously generated cluster?

Hi
there is a publication that has their generated cluster by BnpC, and each cluster associates with a function, now how can I map my data to such an existing cluster? (I don't want to use my data to generate any new cluster because I want to map my data to their functions).

this is equivalent to the situation below:
I have 11 samples, and I want to use 10 of them to generate a cluster, and use the 11th sample to map to the pre-generated cluster from the 10 samples.

if current version of BnpC does not support such a request, could you let me know which part of your code needs to be changed?

Help with the interpretation of heatmap

Hello BnpC team,

Thanks again for a very nice tool. I have a small question regarding the interpretation of the heatmap (genoCluster_posterior_mean.png) generated by Bnpc tool. Since there was no legends available in the heatmap, I am wondering what 0,-,x represent in the image and i am also wondering if the read color mean mutation or the event responsible for the clonal formation.

I look forward to your answer and thank you so much for your help in advance.

Best,
Monica

Support for heterozygous/homozygous genotype categories?

Greetings,

I found BnpC is while testing out infSCITE and think it might help us with deciphering our SCS data. I have an initial question - according to the docs All matrix entries must be of the following: 0|1|3/" ", where 0 indicates the absence of a mutation, 1 the presence, and a 3 or empty element a missing value..

I'm interested in running our categorical genotype data, which is very similar to your input requirements:

Our input BnpC input
0-reference 0 indicates the absence of a mutation
1-heterozygous mutation 1 the presence
2-homozygous mutation 1 the presence
3-unknown 3 or empty element a missing value

Is there any facility, or plans, on including hetero/homozygous genotype distinction in BnpC?

Thanks!
JP

Extraction of row order from genoCluster_posterior_mean.png?

Hi Nico,

Thank you so much for your reply and help with the interpretation.
I just realized that the row orders shown in the cluster_posterior_mean image do not represent the row orders given in the input data. Is there a way to extract this information? I would like to take a closer look at the mutations that are behind each clone formation (clustering) and I think it could only be possible by tracking it back to the input data.

Thank you very much again and I look forward to your reply.

Best,
Monica.

Heatmap with the phylogeny tree

Dear BnpC team,

Thank you so much for the quick fix. I am wondering if there is a way to also display the pylogeny tree with the heatmap (genoCluster_posterior_mean.png) or export the tree in a newick format. Thanks a lot in advance.

Monica.

Understanding the output of BnpC

Hi,
I am using BnpC to get the clusters of a single cell dataset. After executing it I want to know the cluster nos for each cell which I believe is provided in the "assignment.txt" file and the mutations of each cluster.
Can you please help me understand which file will get me the mutations of each cluster? Also, please confirm if "assignment.txt" is the file which indicates cluster no for the cells. I used the following command to execute BnpC:

python ../BnpC/run_BnpC.py filename.tsv -pp 0.75 0.75 -o ./bnpc_results/

Thank you,
Ritu

how can I generate tree file

according to the readme, BnpC can generate a output tree file (did I misunderstand?)
but when I run this:
run_BnpC.py snp.input.txt -tr test.gv

I got error like below, it seems that BnpC is trying to read "test.gv":
Traceback (most recent call last):
File "BnpC/run_BnpC.py", line 296, in
main(args)
File "BnpC/run_BnpC.py", line 291, in main
generate_output(args, results, data, data_names)
File "BnpC/run_BnpC.py", line 232, in generate_output
io.save_tree_plots(
File "BnpC/libs/dpmmIO.py", line 226, in save_tree_plots
pl.color_tree_nodes(
File "BnpC/libs/plotting.py", line 324, in color_tree_nodes
with open(tree_file, 'r') as f_in:
FileNotFoundError: [Errno 2] No such file or directory: 'test.gv'

IndexError with Default Parameter Settings

Hi,

I encountered some index errors when testing BnpC with the default parameter settings. Here is the command I used to run the program: python run_BnpC.py example_data/data.csv. Here is the error I received
in get_mean_hierarchy_assignment params[i] += params_full[step][cl_ids[step]] IndexError: index 67 is out of bounds for axis 0 with size 7
Please let me know if additional information is required to troubleshoot this issue.

Thank you!

genoCluster_posterior_mean.png doesnot display the correct row ids

Hi Bnpc support,

I have used row ids and cell IDs in the input and then the resulting heatmap "genoCluster_posterior_mean.png" from it contains the row ids in the exactly same order as the input. But from the image, we could easily say that the rows were sorted and the order of rows had been changed. The order of the row ids is not in the same way as the input but the row ids display the same order as the input. Is there something that I am doing wrong? Do you happen to have a solution for this?

Thanks a lot in advance.

Best,
Monica.

IndexError: list index out of range

Hello,

I tried to run the tool with python3.9 and my command was python run_BnpC.py example_data/data.csv
and this had returned me the following error. I would appreciate any help with running this tool. Thank you very much and look forward to your reply.

Traceback (most recent call last):
  File "/Volumes/Monica_data_folder/bnpc_software/BnpC/run_BnpC.py", line 295, in <module>
    main(args)
  File "/Volumes/Monica_data_folder/bnpc_software/BnpC/run_BnpC.py", line 287, in main
    results = mcmc.get_results()
  File "/Volumes/Monica_data_folder/bnpc_software/BnpC/libs/MCMC.py", line 65, in get_results
    if not 'burn_in' in results[0]:
IndexError: list index out of range``` 

How to determine mutations responsible for different clones?

Hello BnpC team,

Thank you so much for the wonderful tool. I have a small question. I am wondering how i could determine the mutations that are solely responsible for the formation of the respective clones. this would help us in further analysis.

Thanks again and i look forward to your reply.

Monica.

how to deal with "too many missing data"

Hi BnpC support

from your published paper, your tools are best to deal with missing data (more than 20% missing).

Now. we have targeted sequencing snDNA samples from FFPE, with has lots of missing data due to the random fragmentation of DNA. (By the way, we cannot use regular filter thresholds to reduce the % missing data, otherwise, we will not get any cells). With such large % missing data (~50%), we got too many singleton clusters (which certainly not make sense). now what are the best parameters settings to avoid such a issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.