Giter Site home page Giter Site logo

Comments (15)

daikitag avatar daikitag commented on July 26, 2024 1

I managed to compare tstrait with AlphaSimR, and we managed to observe a consistency with the results after standardizing the simulated genetic values (AlphaSimR standardizes the simulated genetic values to make sure that it is perfectly the same with the input mean and variance).
Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/geneticBgEffects.html) and their environmental noise simulation is not using heritability values (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/noiseBgEffects.html).

from tstrait.

daikitag avatar daikitag commented on July 26, 2024 1

I did the simulation by using the out of Africa model in stdpopsim, and it seems that the simulated phenotypes are slightly different when they are from different populations:
image
image

I guess we can repeat the above with the out of Africa model by using a high narrow-sense heritability, such that the simulation results won't be dominated by normal nosie.

from tstrait.

jeromekelleher avatar jeromekelleher commented on July 26, 2024

See the msprime verification.py script which does lots of this type of thing, and also the section in the msprime developer docs

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

I am thinking about comparing tstrait with SLiM and AlphaSimR. I am planning to simulate traits and genetic information from those packages, obtain the tree sequence from the simulated genetic information, and simulate traits by using tstrait. I think that would be a good comparison, as SLiM and AlphaSimR are widely used packages.

from tstrait.

jeromekelleher avatar jeromekelleher commented on July 26, 2024

If you do that though, you'll have to take into account the different processes that generate the ARG in the first place wont you?

Much simpler if you take a given ARG, and then generate traits either using tstrait, or one of the methods that are based on input sequences.

These totally don't have to be big examples, the R methods that read in the full VCF are fine.

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

@jeromekelleher
I'm thinking about comparing tstrait with AlphaSimR, as they are using the exact same algorithm that we are using (see Third Step: Assign Quantitative Trait Nucleotide Effects, ... in https://acsess.onlinelibrary.wiley.com/doi/10.3835/plantgenome2016.02.0013), and they have written the codes to incorporate tree sequence file into their simulation algorithm (https://github.com/gaynorr/AlphaSimR_Examples/blob/master/misc/msprime.R).
I imagine that AlphaSimR will obtain similar results as us, as I had taken a lot of ideas from them.

I'm thinking about simulating a small tree sequence in msprime, put it into AlphaSimR as a founder population, and simulate the phenotype information just from that founder population (I'm not planning to do any genetic simulation with AlphaSimR), but what do you think about this?
Given the popularity of AlphaSimR, I think we only need to compare tstrait with AlphaSimR.

from tstrait.

jeromekelleher avatar jeromekelleher commented on July 26, 2024

To me this sounds more complicated than exporting to vcf, but if you think we can do what we need with AlphaSim, then great.

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

I will upload the comparison codes soon.

from tstrait.

gregorgorjanc avatar gregorgorjanc commented on July 26, 2024

Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes

Yeah, don’t go into the kinship abyss

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

I added a comparison with AlphaSimR in #108. I used various parameter combinations, and the QQ-plots look amazing
image
image

Many thanks to @jeromekelleher and @gregorgorjanc for valuable suggestions.

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

I think now it is safe to say that tstrait's simulated genetic values are having the correct properties (or otherwise the simulation through AlphaSimR is incorrect, which is very unlikely).

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

I should also note that the QQ-plot is not producing a straight line simply because the effect sizes are simulated from a normal distribution. For example, if scaling is not done in tstrait's genetic values (scaling is conducted in AlphaSimR), we will observe a strange QQ-plot. Thus, we can say that the simulated genetic values from both tstrait and AlphaSimR are having similar distributions.

image

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

The statistical tests can be now added to tstrait by using verification.py. All tests must be a subclass of the Test class defined in https://github.com/tskit-dev/tstrait/blob/main/verification.py#L38, and the test methods must start with ''test_''. The codes were largely adapted from msprime/verification.py, and please see its documentation for details.

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

We should be testing our results statistically by comparing them against other simulators. There's no reason we can do this at the small scale.

It shouldn't be too hard to use e.g. PhenotypeSimulator on some simple simulations, and qqplot the results as a comparison with our results.

So, we would do something like:

  1. Simulate a small tree sequence using msprime for say, 10 samples, and then for n replicates each do:
  2. Compute phenotypes for each of the 10 individuals under a given model to get a distribution per-individual using tstrait
  3. Export the data to VCF and do the same thing with an external too/ We can then compare the per-individual distributions as qqplots, which should be meaningful.

This will take some work to do, but is an important validation step.

I did this in Pull request #132, but I'm not sure if this validation step would be a good test, considering that all individuals will have a similar normal-looking distributions. So even if I use two different individuals to produce a QQ-plot, the results match.

compare_tstrait_phenotype_ind_1_AlphaSimR_phenotype_ind_2

Instead, I would like to propose a similar validation test that simplePHENOTYPES did https://github.com/samuelbfernandes/simplePHENOTYPES/blob/master/vignettes/Supplementary.pdf, and examine if the simulation output of tstrait is exactly the same as the simulation output of other packages.
I think putting a similar Notebook as this inside the paper as a supplementary material will definitely strengthen our claim that the simulation output of tstrait is correct, as a single exact test ensures that the tstrait algorithm is producing correct results.

I would like to propose that we completely ignore the QQ-plot comparison, and we instead do the following:

  1. Exact test with AlphaSimR for a single trait
  2. Exact test with AlphaSimR for pleiotropic trait
  3. Exact test with simplePHENOTYPES for a single trait
  4. Exact test with simplePHENOTYPES for pleiotropic trait
  5. Exact test with ARG-Needle simulation for a single trait
  6. Exact test with ARG-Needle simulation for a single trait with frequency dependence

For these exact test, we will be simulating traits + effect sizes by using an extrenal program, and we will put those effect sizes into the tstrait package to see if the simulated output will exactly match.

What do you think about this @jeromekelleher ?

from tstrait.

daikitag avatar daikitag commented on July 26, 2024

The exact tests are performed here (#140), but we plan to add further tests to examine the complete pipeline of the simulation framework.

from tstrait.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.