Comments (14)
Hi @tomouellette. I usually perform some pre-processing, like removing mutations with VAF < 5%. There is also a mathematical reason to it, which is implied by the tail power law definition. The fit on the right suffers from the lack of a clear clonal peak. May I ask what data are these and what sample purity are you using here?
from mobster.
Any heads up @tomouellette?
from mobster.
Hi @caravagn,
Sorry! Missed the first notification. This is fit to a synthetic VAF distribution with no clonal mutations. Purity is 100%. In general though, I tend to observe this issue in any sample, empirical or synthetic, where no clonal peak is present (as you note, to stay in line with the pareto distribution mixture, removing mutations less than VAF < 5% solves this issue in both empirical and synthetic samples).
from mobster.
Oh I see now. The tool always consider at least one Beta distribution so it must fit it to something. Arguably that would be the set of clonal mutations, can I ask if you ever saw real data without clonal mutations?
from mobster.
If I well remember at some point I tried to allow for K=0. Can you try to run it without Beta distributions and see if it explodes? :)
from mobster.
Hi @caravagn,
Sounds good! I will try and wrangle up some IDs from empirical samples for you when I get a chance (I think PCAWG 07531318-87e8-4db8-aa61-9b93597d063b may be one example?, I haven't run MOBSTER on this one though, I can give it a go).
As a side note, I've run mobster on roughly ~2-3 million synthetic and have a bit of data there if you're interested (wrapping up a current project so haven't looked too deeply into any of the general areas where misfitting occurs though) and (note: in this case, I had to slightly reduce default settings to make it computationally feasible)
from mobster.
Hi @tomouellette, sure if there is something interesting to chat about (regarding simulated data, results etc) maybe we should arrange a call and discuss, let me know... in the meantime I will have a look at 07531318-87e8-4db8-aa61-9b93597d063b with @Militeee as well.
from mobster.
@tomouellette We had a look at 07531318-87e8-4db8-aa61-9b93597d063b
from PCAWG, that sample has purity 20%, which means that the peak you see on the left is not the tail but just the clonal peak. Therefore the correct MOBSTER setup to fit such a sample is K=1 (one clonal cluster) and no tail - at that purity you simply cannot see any subclonal mutation, including the tail.
Don't you agree?
from mobster.
@caravagn Yes totally agree! I was (wrongly) just looking at the DKFZ purity estimate when I cited that sample (just looked at the consensus purity estimate now). Then it may just only be a theoretical/computational issue. If a sample does come up though, I will send an ID.
from mobster.
@tomouellette No clonal mutations make little sense biologically speaking, unless the tumour-trigging mutations happens in the first divisions of the embrio.
from mobster.
@caravagn I definitely agree -- but I always like to keep the door open though for weird technical/biological issues. For example, imagine the normal biopsy was too close to the tumour and they shared 95%++ of the same clonal mutations, then with certain filtering steps, you would remove most of these sites in some cases, leading to a sparse clonal peak in the sample (this is a protocol/issue, but I imagine it's a rare possibility).
from mobster.
@tomouellette I actually agree about that, having almost no clonal mutations is definitely something that can arise (rarely) from technical errors. But, in my opinion, this is a situation that should be ideally managed before using MOBSTER.
Referring to your example, fitting the mixture with calls from a wrong reference is definitely something you want to avoid (even if you get a statistically sound distribution). For instance, you could have checked that the normal sample does not contain driver mutations.
We have indeed developed a package (still unpublished) to perform QC automatically before going on with the deconvolution, so maybe it can help you filter low-quality cases.
Taking that into consideration, one of the future improvements of MOBSTER will probably include a sort of prior variance for the clonal beta mean around its theoretical value, so that it cannot completely overlap with the tail (and hopefully just goes to zero if there are no clonal mutations as in the second panel of your figure). Still, I would not trust a sample with 0 clonal mutations.
The point on the left cutting in the VAF spectrum is actually very interesting and it is something we had to do several times. We are still thinking about automatically choosing the cutting point, but it seems to depend heavily on the coverage, the sequencing technology, and the calling algorithm (still cutting in a range from 0.05 to 0.1 usually works well).
from mobster.
Very true, good points! --- On a side note, I found this equation based on a binomial variance to work reasonably well for trimming tails in a data-driven fashion (where alt is number of alternate reads required to call a mutation and depth is mean sequencing depth
from mobster.
Oh interesting @tomouellette. The Bin variance np(1-p)
applies for n
equal to depth
in your formula and p
equal to alt/depth
. What inequality is imposed on the variance (or maybe on some quantiles of the data) to get to that inequality?
As a side note, problems of extreme contamination that lead to spillover of clonal mutations from tumour samples can be checked with our in-preparation tool TINC.
from mobster.
Related Issues (20)
- Portability to R 4.0 HOT 12
- On master R>3.6.0 HOT 1
- Error when running exaples HOT 7
- Input specifications HOT 3
- Improve speed of bootstrap
- Can't subset columns that don't exist HOT 4
- Missing required dependencies HOT 1
- Walk-through Example? HOT 6
- Website HOT 1
- How about WES data? HOT 1
- Strelka2 and Mutect2 inputs HOT 8
- Is it possible to deal with ctDNA data? HOT 1
- Plot updates HOT 1
- No Function: squareplot HOT 5
- Error: Error in mobster:::check_input(x, K, samples, init, tail, epsilon, maxIter, : HOT 2
- fit error
- Typo? HOT 3
- Missing line in DESCRIPTION file HOT 2
- Error in vignette during binomial_noise branch build HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mobster.