Giter Site home page Giter Site logo

Comments (14)

caravagn avatar caravagn commented on August 18, 2024

Hi @tomouellette. I usually perform some pre-processing, like removing mutations with VAF < 5%. There is also a mathematical reason to it, which is implied by the tail power law definition. The fit on the right suffers from the lack of a clear clonal peak. May I ask what data are these and what sample purity are you using here?

from mobster.

caravagn avatar caravagn commented on August 18, 2024

Any heads up @tomouellette?

from mobster.

tomouellette avatar tomouellette commented on August 18, 2024

Hi @caravagn,

Sorry! Missed the first notification. This is fit to a synthetic VAF distribution with no clonal mutations. Purity is 100%. In general though, I tend to observe this issue in any sample, empirical or synthetic, where no clonal peak is present (as you note, to stay in line with the pareto distribution mixture, removing mutations less than VAF < 5% solves this issue in both empirical and synthetic samples).

from mobster.

caravagn avatar caravagn commented on August 18, 2024

Oh I see now. The tool always consider at least one Beta distribution so it must fit it to something. Arguably that would be the set of clonal mutations, can I ask if you ever saw real data without clonal mutations?

from mobster.

caravagn avatar caravagn commented on August 18, 2024

If I well remember at some point I tried to allow for K=0. Can you try to run it without Beta distributions and see if it explodes? :)

from mobster.

tomouellette avatar tomouellette commented on August 18, 2024

Hi @caravagn,

Sounds good! I will try and wrangle up some IDs from empirical samples for you when I get a chance (I think PCAWG 07531318-87e8-4db8-aa61-9b93597d063b may be one example?, I haven't run MOBSTER on this one though, I can give it a go).

As a side note, I've run mobster on roughly ~2-3 million synthetic and have a bit of data there if you're interested (wrapping up a current project so haven't looked too deeply into any of the general areas where misfitting occurs though) and (note: in this case, I had to slightly reduce default settings to make it computationally feasible)

from mobster.

caravagn avatar caravagn commented on August 18, 2024

Hi @tomouellette, sure if there is something interesting to chat about (regarding simulated data, results etc) maybe we should arrange a call and discuss, let me know... in the meantime I will have a look at 07531318-87e8-4db8-aa61-9b93597d063b with @Militeee as well.

from mobster.

caravagn avatar caravagn commented on August 18, 2024

@tomouellette We had a look at 07531318-87e8-4db8-aa61-9b93597d063b from PCAWG, that sample has purity 20%, which means that the peak you see on the left is not the tail but just the clonal peak. Therefore the correct MOBSTER setup to fit such a sample is K=1 (one clonal cluster) and no tail - at that purity you simply cannot see any subclonal mutation, including the tail.

Don't you agree?

from mobster.

tomouellette avatar tomouellette commented on August 18, 2024

@caravagn Yes totally agree! I was (wrongly) just looking at the DKFZ purity estimate when I cited that sample (just looked at the consensus purity estimate now). Then it may just only be a theoretical/computational issue. If a sample does come up though, I will send an ID.

from mobster.

caravagn avatar caravagn commented on August 18, 2024

@tomouellette No clonal mutations make little sense biologically speaking, unless the tumour-trigging mutations happens in the first divisions of the embrio.

from mobster.

tomouellette avatar tomouellette commented on August 18, 2024

@caravagn I definitely agree -- but I always like to keep the door open though for weird technical/biological issues. For example, imagine the normal biopsy was too close to the tumour and they shared 95%++ of the same clonal mutations, then with certain filtering steps, you would remove most of these sites in some cases, leading to a sparse clonal peak in the sample (this is a protocol/issue, but I imagine it's a rare possibility).

from mobster.

Militeee avatar Militeee commented on August 18, 2024

@tomouellette I actually agree about that, having almost no clonal mutations is definitely something that can arise (rarely) from technical errors. But, in my opinion, this is a situation that should be ideally managed before using MOBSTER.

Referring to your example, fitting the mixture with calls from a wrong reference is definitely something you want to avoid (even if you get a statistically sound distribution). For instance, you could have checked that the normal sample does not contain driver mutations.

We have indeed developed a package (still unpublished) to perform QC automatically before going on with the deconvolution, so maybe it can help you filter low-quality cases.

Taking that into consideration, one of the future improvements of MOBSTER will probably include a sort of prior variance for the clonal beta mean around its theoretical value, so that it cannot completely overlap with the tail (and hopefully just goes to zero if there are no clonal mutations as in the second panel of your figure). Still, I would not trust a sample with 0 clonal mutations.

The point on the left cutting in the VAF spectrum is actually very interesting and it is something we had to do several times. We are still thinking about automatically choosing the cutting point, but it seems to depend heavily on the coverage, the sequencing technology, and the calling algorithm (still cutting in a range from 0.05 to 0.1 usually works well).

from mobster.

tomouellette avatar tomouellette commented on August 18, 2024

Very true, good points! --- On a side note, I found this equation based on a binomial variance to work reasonably well for trimming tails in a data-driven fashion (where alt is number of alternate reads required to call a mutation and depth is mean sequencing depth

from mobster.

caravagn avatar caravagn commented on August 18, 2024

Oh interesting @tomouellette. The Bin variance np(1-p) applies for n equal to depth in your formula and p equal to alt/depth. What inequality is imposed on the variance (or maybe on some quantiles of the data) to get to that inequality?

As a side note, problems of extreme contamination that lead to spillover of clonal mutations from tumour samples can be checked with our in-preparation tool TINC.

from mobster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.