greatapes_sims's Issues
Don't remove msprime mutations in the recaptiation region
so if you keep the initial slim generation in the tree sequence
ie don't simplify
then you can always tell which muts happened in the slim period versus in the recapitation period
so you should only thin mutations in the slim period
so only mutations for which the parent of the edge on which the mutation occurred lived since the start > of the slim period
can efficiently get that out of the tables
fix the way you do recapitation
need to specify a demography object now
shifting coordinates in exon tsv
greatapes_sims/scripts/py/chr.snake
Line 200 in 4c9d548
you are doing it wrong. the shift should be based on the start coordinate, not the first start of the exon bed file.
reconciling rec map in hapmap with actual chr sizes
The recombination maps from HapMap do not span the full chromosome, but I am using the actual chromosome sizes to set simulations.
Deal with this discrepancy in windowed.snake. Check that last position in rec map is the end of the chr, if not, add another line with rec = 0.
rerun stats for the real data (update pxpy branch overlap figure)
convertToSubstitution=T?
Just worried about what happens to substitutions. Are those kept in the treesequences even if convertToSubstitution=True in SLiM?
a
greatapes_sims/scripts/py/chr.snake
Line 200 in 4c9d548
aa
stats script should be aware of windowing during simulation step
modify stats script to retrieve the corresponding genomic window that was simulate, so that the output windows reflect the positions in the chromsoome
snakemake won't work for more than one param comb
as it is right now, the edge rules are taking lists of file names, but snakemake does not run each rule once per element in the list, instead it puts everything in the input placeholder.
work in progress...
change default args in overlay.py
Instead of
def f(a="")
I should be doing:
def f(a=None): if a is None: # insert default argument
exons table in union_recap_mut.py includes exons in the padded region
this means that the mutation rate map is not constructed properly for simulations which are broken up. this is not a problem bc all the simulations in the paper were simulated in full.
specifying more than one replicate per parameter combo will break things
note the table that holds all the mappings from IDs to parameters does not have a rep
column. if you add it, then a bunch of stuff downstream may break (most notably the pipeline to do variable mutation maps in neutral sims).
no biggie bc we might not even run replicates?
vectorized genomic elements
greatapes_sims/scripts/slim/recipe_sel_greatapes.slim
Lines 26 to 48 in ac70724
since SLiM 3.3 you can define genomic elements with vectors of start and end positions. It should be faster than the loop.
ploidy being set as 1
see cc6c9cd
because we can have more than one subpopulation per tree sequence and we want to calculate some stats to each subpop individually, I need to subset my genotype matrix.
it is possible to get the indices belonging to a population for each haploid genome using tskit with ts.samples(population=i)
, but not for diploid genomes. If I were to treat the genotype matrix with ploidy=2, I wouldn't be able to do the subsetting this way.
ploidy should not matter for the stats we are using so far (pi and dxy), but could be an issue with other stats.
handle the rescaling factor the same way stdpopsim does
modify slim script to do the right thing when rescaling:
https://github.com/popsim-consortium/stdpopsim/blob/master/stdpopsim/slim_engine.py#L71
make SLiM check if any intermediate tree seq has been saved - and restart from there
I should figure out how to make SLiM check if any intermediate file has been saved and if so to restart from there
5:05
the stakes haven’t been too high yet to make me want to implement this, but if we are going to do whole chrom sims it might be worth it
5:06
shouldn’t be too hard though
stats are being calculated from all individuals in tree sequence
As of right now, the genotype matrix is being computed from the full tree sequence, not a subsample of it. To be fair, we should be sampling a random set of samples from the tree to calculate the stats on.
set simplification interval for high N sims
review union and tskit stats
@petrelharp, can you please take a quick look at my code for unioning, recapitating, mutating and getting some stats from the simulations?
https://github.com/kr-colab/greatapes_sims/blob/master/scripts/processing/union_and_stats.py#L1
and
https://github.com/kr-colab/greatapes_sims/blob/master/scripts/processing/helper_functions.py#L1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.