Comments (15)
I managed to compare tstrait with AlphaSimR, and we managed to observe a consistency with the results after standardizing the simulated genetic values (AlphaSimR standardizes the simulated genetic values to make sure that it is perfectly the same with the input mean and variance).
Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/geneticBgEffects.html) and their environmental noise simulation is not using heritability values (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/noiseBgEffects.html).
from tstrait.
I did the simulation by using the out of Africa model in stdpopsim, and it seems that the simulated phenotypes are slightly different when they are from different populations:
I guess we can repeat the above with the out of Africa model by using a high narrow-sense heritability, such that the simulation results won't be dominated by normal nosie.
from tstrait.
See the msprime verification.py script which does lots of this type of thing, and also the section in the msprime developer docs
from tstrait.
I am thinking about comparing tstrait with SLiM and AlphaSimR. I am planning to simulate traits and genetic information from those packages, obtain the tree sequence from the simulated genetic information, and simulate traits by using tstrait. I think that would be a good comparison, as SLiM and AlphaSimR are widely used packages.
from tstrait.
If you do that though, you'll have to take into account the different processes that generate the ARG in the first place wont you?
Much simpler if you take a given ARG, and then generate traits either using tstrait, or one of the methods that are based on input sequences.
These totally don't have to be big examples, the R methods that read in the full VCF are fine.
from tstrait.
@jeromekelleher
I'm thinking about comparing tstrait with AlphaSimR, as they are using the exact same algorithm that we are using (see Third Step: Assign Quantitative Trait Nucleotide Effects, ... in https://acsess.onlinelibrary.wiley.com/doi/10.3835/plantgenome2016.02.0013), and they have written the codes to incorporate tree sequence file into their simulation algorithm (https://github.com/gaynorr/AlphaSimR_Examples/blob/master/misc/msprime.R).
I imagine that AlphaSimR will obtain similar results as us, as I had taken a lot of ideas from them.
I'm thinking about simulating a small tree sequence in msprime, put it into AlphaSimR as a founder population, and simulate the phenotype information just from that founder population (I'm not planning to do any genetic simulation with AlphaSimR), but what do you think about this?
Given the popularity of AlphaSimR, I think we only need to compare tstrait with AlphaSimR.
from tstrait.
To me this sounds more complicated than exporting to vcf, but if you think we can do what we need with AlphaSim, then great.
from tstrait.
I will upload the comparison codes soon.
from tstrait.
Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes
Yeah, donβt go into the kinship abyss
from tstrait.
I added a comparison with AlphaSimR in #108. I used various parameter combinations, and the QQ-plots look amazing
Many thanks to @jeromekelleher and @gregorgorjanc for valuable suggestions.
from tstrait.
I think now it is safe to say that tstrait's simulated genetic values are having the correct properties (or otherwise the simulation through AlphaSimR is incorrect, which is very unlikely).
from tstrait.
I should also note that the QQ-plot is not producing a straight line simply because the effect sizes are simulated from a normal distribution. For example, if scaling is not done in tstrait's genetic values (scaling is conducted in AlphaSimR), we will observe a strange QQ-plot. Thus, we can say that the simulated genetic values from both tstrait and AlphaSimR are having similar distributions.
from tstrait.
The statistical tests can be now added to tstrait by using verification.py. All tests must be a subclass of the Test class defined in https://github.com/tskit-dev/tstrait/blob/main/verification.py#L38, and the test methods must start with ''test_''. The codes were largely adapted from msprime/verification.py, and please see its documentation for details.
from tstrait.
We should be testing our results statistically by comparing them against other simulators. There's no reason we can do this at the small scale.
It shouldn't be too hard to use e.g. PhenotypeSimulator on some simple simulations, and qqplot the results as a comparison with our results.
So, we would do something like:
- Simulate a small tree sequence using msprime for say, 10 samples, and then for n replicates each do:
- Compute phenotypes for each of the 10 individuals under a given model to get a distribution per-individual using tstrait
- Export the data to VCF and do the same thing with an external too/ We can then compare the per-individual distributions as qqplots, which should be meaningful.
This will take some work to do, but is an important validation step.
I did this in Pull request #132, but I'm not sure if this validation step would be a good test, considering that all individuals will have a similar normal-looking distributions. So even if I use two different individuals to produce a QQ-plot, the results match.
Instead, I would like to propose a similar validation test that simplePHENOTYPES did https://github.com/samuelbfernandes/simplePHENOTYPES/blob/master/vignettes/Supplementary.pdf, and examine if the simulation output of tstrait is exactly the same as the simulation output of other packages.
I think putting a similar Notebook as this inside the paper as a supplementary material will definitely strengthen our claim that the simulation output of tstrait is correct, as a single exact test ensures that the tstrait algorithm is producing correct results.
I would like to propose that we completely ignore the QQ-plot comparison, and we instead do the following:
- Exact test with AlphaSimR for a single trait
- Exact test with AlphaSimR for pleiotropic trait
- Exact test with simplePHENOTYPES for a single trait
- Exact test with simplePHENOTYPES for pleiotropic trait
- Exact test with ARG-Needle simulation for a single trait
- Exact test with ARG-Needle simulation for a single trait with frequency dependence
For these exact test, we will be simulating traits + effect sizes by using an extrenal program, and we will put those effect sizes into the tstrait package to see if the simulated output will exactly match.
What do you think about this @jeromekelleher ?
from tstrait.
The exact tests are performed here (#140), but we plan to add further tests to examine the complete pipeline of the simulation framework.
from tstrait.
Related Issues (20)
- Incorrect variance with multiple causal sites HOT 18
- Move causal state selection to ``sim_trait`` and make ``sim_genetic`` deterministic (and rename) HOT 7
- Change return type from Pandas dataframe to Xarray dataset? HOT 3
- Change names of fields in return type of sim_phenotype HOT 3
- Change "causal_state" to "causal_allele"? HOT 1
- Review # pragma: no cover usage HOT 1
- Validate against simplePHENOTYPES HOT 1
- Update signatures of sim_phenotype and sim_env HOT 1
- Specify particular causal sites HOT 5
- Quickstart documentation HOT 1
- Option to add -\beta to non-causal nodes HOT 1
- Specifying mean and variance of simulated phenotypes HOT 4
- Degrees of freedom in standard deviation computation HOT 3
- Numba version HOT 2
- Normalize genetic values
- Version 0.1.0 release HOT 11
- New model for trait simulation HOT 1
- Error in rng.choice HOT 6
- Document how ploidy works HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tstrait.