Hi Giulio, So I've been running lots of examples with mobster (very

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Any heads up <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Model selection failure when sparse low frequency mutations present about mobster HOT 14 CLOSED

tomouellette commented on August 18, 2024

Model selection failure when sparse low frequency mutations present

from mobster.

Comments (14)

caravagn commented on August 18, 2024

Hi @tomouellette. I usually perform some pre-processing, like removing mutations with VAF < 5%. There is also a mathematical reason to it, which is implied by the tail power law definition. The fit on the right suffers from the lack of a clear clonal peak. May I ask what data are these and what sample purity are you using here?

from mobster.

caravagn commented on August 18, 2024

Any heads up @tomouellette?

from mobster.

tomouellette commented on August 18, 2024

Hi @caravagn,

Sorry! Missed the first notification. This is fit to a synthetic VAF distribution with no clonal mutations. Purity is 100%. In general though, I tend to observe this issue in any sample, empirical or synthetic, where no clonal peak is present (as you note, to stay in line with the pareto distribution mixture, removing mutations less than VAF < 5% solves this issue in both empirical and synthetic samples).

from mobster.

caravagn commented on August 18, 2024

Oh I see now. The tool always consider at least one Beta distribution so it must fit it to something. Arguably that would be the set of clonal mutations, can I ask if you ever saw real data without clonal mutations?

from mobster.

caravagn commented on August 18, 2024

If I well remember at some point I tried to allow for K=0. Can you try to run it without Beta distributions and see if it explodes? :)

from mobster.

tomouellette commented on August 18, 2024

Hi @caravagn,

Sounds good! I will try and wrangle up some IDs from empirical samples for you when I get a chance (I think PCAWG 07531318-87e8-4db8-aa61-9b93597d063b may be one example?, I haven't run MOBSTER on this one though, I can give it a go).

As a side note, I've run mobster on roughly ~2-3 million synthetic and have a bit of data there if you're interested (wrapping up a current project so haven't looked too deeply into any of the general areas where misfitting occurs though) and (note: in this case, I had to slightly reduce default settings to make it computationally feasible)

from mobster.

caravagn commented on August 18, 2024

Hi @tomouellette, sure if there is something interesting to chat about (regarding simulated data, results etc) maybe we should arrange a call and discuss, let me know... in the meantime I will have a look at 07531318-87e8-4db8-aa61-9b93597d063b with @Militeee as well.

from mobster.

caravagn commented on August 18, 2024

@tomouellette We had a look at 07531318-87e8-4db8-aa61-9b93597d063b from PCAWG, that sample has purity 20%, which means that the peak you see on the left is not the tail but just the clonal peak. Therefore the correct MOBSTER setup to fit such a sample is K=1 (one clonal cluster) and no tail - at that purity you simply cannot see any subclonal mutation, including the tail.

Don't you agree?

from mobster.

tomouellette commented on August 18, 2024

@caravagn Yes totally agree! I was (wrongly) just looking at the DKFZ purity estimate when I cited that sample (just looked at the consensus purity estimate now). Then it may just only be a theoretical/computational issue. If a sample does come up though, I will send an ID.

from mobster.

caravagn commented on August 18, 2024

@tomouellette No clonal mutations make little sense biologically speaking, unless the tumour-trigging mutations happens in the first divisions of the embrio.

from mobster.

tomouellette commented on August 18, 2024

@caravagn I definitely agree -- but I always like to keep the door open though for weird technical/biological issues. For example, imagine the normal biopsy was too close to the tumour and they shared 95%++ of the same clonal mutations, then with certain filtering steps, you would remove most of these sites in some cases, leading to a sparse clonal peak in the sample (this is a protocol/issue, but I imagine it's a rare possibility).

from mobster.

Militeee commented on August 18, 2024

@tomouellette I actually agree about that, having almost no clonal mutations is definitely something that can arise (rarely) from technical errors. But, in my opinion, this is a situation that should be ideally managed before using MOBSTER.

Referring to your example, fitting the mixture with calls from a wrong reference is definitely something you want to avoid (even if you get a statistically sound distribution). For instance, you could have checked that the normal sample does not contain driver mutations.

We have indeed developed a package (still unpublished) to perform QC automatically before going on with the deconvolution, so maybe it can help you filter low-quality cases.

Taking that into consideration, one of the future improvements of MOBSTER will probably include a sort of prior variance for the clonal beta mean around its theoretical value, so that it cannot completely overlap with the tail (and hopefully just goes to zero if there are no clonal mutations as in the second panel of your figure). Still, I would not trust a sample with 0 clonal mutations.

The point on the left cutting in the VAF spectrum is actually very interesting and it is something we had to do several times. We are still thinking about automatically choosing the cutting point, but it seems to depend heavily on the coverage, the sequencing technology, and the calling algorithm (still cutting in a range from 0.05 to 0.1 usually works well).

from mobster.

tomouellette commented on August 18, 2024

Very true, good points! --- On a side note, I found this equation based on a binomial variance to work reasonably well for trimming tails in a data-driven fashion (where alt is number of alternate reads required to call a mutation and depth is mean sequencing depth

$VAF_{lower} \geq (\frac{alt}{depth}) + 2\frac{(depth\cdot \frac{alt}{depth} \cdot[1 - \frac{alt}{depth}])^{1/2}}{depth}$

from mobster.

caravagn commented on August 18, 2024

Oh interesting @tomouellette. The Bin variance np(1-p) applies for n equal to depth in your formula and p equal to alt/depth. What inequality is imposed on the variance (or maybe on some quantiles of the data) to get to that inequality?

As a side note, problems of extreme contamination that lead to spillover of clonal mutations from tumour samples can be checked with our in-preparation tool TINC.

from mobster.

Model selection failure when sparse low frequency mutations present about mobster HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent