Add statistical tests against existing simulator about tstrait HOT 15 CLOSED

tskit-dev commented on July 26, 2024

Add statistical tests against existing simulator

from tstrait.

Comments (15)

daikitag commented on July 26, 2024 1

I managed to compare tstrait with AlphaSimR, and we managed to observe a consistency with the results after standardizing the simulated genetic values (AlphaSimR standardizes the simulated genetic values to make sure that it is perfectly the same with the input mean and variance).
Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/geneticBgEffects.html) and their environmental noise simulation is not using heritability values (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/noiseBgEffects.html).

from tstrait.

daikitag commented on July 26, 2024 1

I did the simulation by using the out of Africa model in stdpopsim, and it seems that the simulated phenotypes are slightly different when they are from different populations:

I guess we can repeat the above with the out of Africa model by using a high narrow-sense heritability, such that the simulation results won't be dominated by normal nosie.

from tstrait.

jeromekelleher commented on July 26, 2024

See the msprime verification.py script which does lots of this type of thing, and also the section in the msprime developer docs

from tstrait.

daikitag commented on July 26, 2024

I am thinking about comparing tstrait with SLiM and AlphaSimR. I am planning to simulate traits and genetic information from those packages, obtain the tree sequence from the simulated genetic information, and simulate traits by using tstrait. I think that would be a good comparison, as SLiM and AlphaSimR are widely used packages.

from tstrait.

jeromekelleher commented on July 26, 2024

If you do that though, you'll have to take into account the different processes that generate the ARG in the first place wont you?

Much simpler if you take a given ARG, and then generate traits either using tstrait, or one of the methods that are based on input sequences.

These totally don't have to be big examples, the R methods that read in the full VCF are fine.

from tstrait.

daikitag commented on July 26, 2024

@jeromekelleher
I'm thinking about comparing tstrait with AlphaSimR, as they are using the exact same algorithm that we are using (see Third Step: Assign Quantitative Trait Nucleotide Effects, ... in https://acsess.onlinelibrary.wiley.com/doi/10.3835/plantgenome2016.02.0013), and they have written the codes to incorporate tree sequence file into their simulation algorithm (https://github.com/gaynorr/AlphaSimR_Examples/blob/master/misc/msprime.R).
I imagine that AlphaSimR will obtain similar results as us, as I had taken a lot of ideas from them.

I'm thinking about simulating a small tree sequence in msprime, put it into AlphaSimR as a founder population, and simulate the phenotype information just from that founder population (I'm not planning to do any genetic simulation with AlphaSimR), but what do you think about this?
Given the popularity of AlphaSimR, I think we only need to compare tstrait with AlphaSimR.

from tstrait.

jeromekelleher commented on July 26, 2024

To me this sounds more complicated than exporting to vcf, but if you think we can do what we need with AlphaSim, then great.

from tstrait.

daikitag commented on July 26, 2024

I will upload the comparison codes soon.

from tstrait.

gregorgorjanc commented on July 26, 2024

Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes

Yeah, don’t go into the kinship abyss

from tstrait.

daikitag commented on July 26, 2024

I added a comparison with AlphaSimR in #108. I used various parameter combinations, and the QQ-plots look amazing

Many thanks to @jeromekelleher and @gregorgorjanc for valuable suggestions.

from tstrait.

daikitag commented on July 26, 2024

I think now it is safe to say that tstrait's simulated genetic values are having the correct properties (or otherwise the simulation through AlphaSimR is incorrect, which is very unlikely).

from tstrait.

daikitag commented on July 26, 2024

I should also note that the QQ-plot is not producing a straight line simply because the effect sizes are simulated from a normal distribution. For example, if scaling is not done in tstrait's genetic values (scaling is conducted in AlphaSimR), we will observe a strange QQ-plot. Thus, we can say that the simulated genetic values from both tstrait and AlphaSimR are having similar distributions.

from tstrait.

daikitag commented on July 26, 2024

The statistical tests can be now added to tstrait by using verification.py. All tests must be a subclass of the Test class defined in https://github.com/tskit-dev/tstrait/blob/main/verification.py#L38, and the test methods must start with ''test_''. The codes were largely adapted from msprime/verification.py, and please see its documentation for details.

from tstrait.

daikitag commented on July 26, 2024

We should be testing our results statistically by comparing them against other simulators. There's no reason we can do this at the small scale.

It shouldn't be too hard to use e.g. PhenotypeSimulator on some simple simulations, and qqplot the results as a comparison with our results.

So, we would do something like:

Simulate a small tree sequence using msprime for say, 10 samples, and then for n replicates each do:

Compute phenotypes for each of the 10 individuals under a given model to get a distribution per-individual using tstrait

Export the data to VCF and do the same thing with an external too/ We can then compare the per-individual distributions as qqplots, which should be meaningful.

This will take some work to do, but is an important validation step.

I did this in Pull request #132, but I'm not sure if this validation step would be a good test, considering that all individuals will have a similar normal-looking distributions. So even if I use two different individuals to produce a QQ-plot, the results match.

Instead, I would like to propose a similar validation test that simplePHENOTYPES did https://github.com/samuelbfernandes/simplePHENOTYPES/blob/master/vignettes/Supplementary.pdf, and examine if the simulation output of tstrait is exactly the same as the simulation output of other packages.
I think putting a similar Notebook as this inside the paper as a supplementary material will definitely strengthen our claim that the simulation output of tstrait is correct, as a single exact test ensures that the tstrait algorithm is producing correct results.

I would like to propose that we completely ignore the QQ-plot comparison, and we instead do the following:

Exact test with AlphaSimR for a single trait
Exact test with AlphaSimR for pleiotropic trait
Exact test with simplePHENOTYPES for a single trait
Exact test with simplePHENOTYPES for pleiotropic trait
Exact test with ARG-Needle simulation for a single trait
Exact test with ARG-Needle simulation for a single trait with frequency dependence

For these exact test, we will be simulating traits + effect sizes by using an extrenal program, and we will put those effect sizes into the tstrait package to see if the simulated output will exactly match.

What do you think about this @jeromekelleher ?

from tstrait.

daikitag commented on July 26, 2024

The exact tests are performed here (#140), but we plan to add further tests to examine the complete pipeline of the simulation framework.

from tstrait.

Add statistical tests against existing simulator about tstrait HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent