tstrait-paper's People
tstrait-paper's Issues
Change node labels to letters in illustration.
It's confusing have 0 mean two different things (node 0 and individual 0). A simple way to fix is to set the node labels to lowercase alphabetical letters like we do in the ARG paper.
Change time of node 6 or 7
It's potentially distracting to have nodes 6 and 7 at exactly the same time. Change one of them to be slightly more or less
Related to #10
Martin et al simulation not a good comparison
Our version of the Martin et al simulation is essentially completely different, and also does some very inefficient things.
It needs to be rewritten to closely follow their approach.
Update time scaling fig
Change the fig s1 to just show time to simulate a trait with 1000 causal sites for a the stdpopsim ARGs.
Drop the Martin et al comparison, as it's not really worth the trouble and we don't have space to discuss it in the text.
Mention extension to dominance in the conclusion
We assume the additive model, but we should support non-additive also longer term.
Papers to cite:
Change simulation result dataframes to long format
Simulation time results currently are in wide format, like
,1e6_50Mb,2e6_50Mb,3e6_50Mb,4e6_50Mb,5e6_50Mb,1e6_100Mb,2e6_100Mb,3e6_100Mb,4e6_100Mb,5e6_100Mb,1e6_200Mb,2e6_200Mb,3e6_200Mb,4e6_200Mb,5e6_200Mb
0,77.64282770000864,179.63335699995514,230.23984719999135,343.7499600999872,423.4201375999837,75.93955379998079,173.95170759997563,254.86329439998372,330.529352499987,392.6764542000019,92.2>
1,72.61392149998574,181.0281139999861,263.9906612999621,356.80779799999436,446.1329022999853,75.12469440000132,146.49259350000648,276.3649019000004,364.99122720002197,399.3250003999565,77.5>
2,70.11137739999685,184.03285519999918,294.038197899994,348.7073199999868,482.82663069997216,77.93682020000415,174.49476670002332,235.45707329997094,306.8637148999842,406.5505961999879,74.4>
3,81.93802689999575,169.1525683000218,229.7734829999972,358.95320299995365,454.1488188999938,64.41976179997437,174.47186769999098,228.89124909997918,318.7389976999839,467.2518087000353,90.1>
4,81.39597090001917,179.43859279999742,234.41665630001808,360.4629126000218,448.3900854999665,68.80955529998755,167.84331939998083,265.9424035999691,314.8192218999611,380.4535507999826,75.9>
5,79.5875359000056,162.72084680001717,242.5002392999595,328.1800843999954,455.24361139995744,72.51941160002025,170.0330126999761,273.7956803999841,320.2541406999808,459.5114398000296,81.465>
6,85.30113890004577,188.09864710003603,248.51294749998488,360.8818011999829,479.57706929999404,74.97351540002273,159.52908700000262,254.10992680001073,346.2119941000128,407.9319478000398,81>
7,87.57253509998554,168.307615800004,269.3018477000296,307.2869950000313,402.21905590000097,77.65944620000664,165.3033509000088,246.15262670000084,367.09767170000123,492.6369316999917,81.51>
8,77.49477789999219,150.29511330003152,251.10237269999925,345.2701656999998,436.22386979998555,82.3944512999733,175.3935331000248,244.92631290003192,335.08269339997787,345.27137590001803,82>
9,70.17078769998625,187.60311570001068,265.39845320000313,351.7046144999913,480.65730829996755,75.47531160002109,151.07406130002346,250.4223663000157,374.0576356999809,432.41990810004063,
This is very unwieldy for downstream work. Store them in long format instead:
sample_size,sequence_length,random_seed,num_traits,cpu_time
1000000,50000000,12345,1000,10.234
....
These can then be processed with Pandas much more easily (using group-bys to average over replicates, eg)
The file-naming is also opaque - if we are splitting across multiple files be consistent in the number scheme.
Get perf numbers for large ARGs
Rename the time-num-causal.ipynb notebook to time-large-args.ipynb and just get the times that we need for the text.
Add "Genotype" column to figure bottom table
It would be helpful to include the genotypes for the three individuals as another column, they will be A|A, T|T and T|A (assuming we use phased VCF-like notation, which people will presumably find familiar)
Report RAM usage as well as time
I think the scaling figure would be much more interesting if we reported RAM usage in the second panel.
Then, we just run one simulation on the French canadian dataset and report its time and memory usage in the text, and do the same for the 1000G tree.
That's all we need
Remove unused files
We still have some unused files lying around.
Can we confirm that
- All code use to generate everything is in the "notebooks" directory
- Running these notebooks creates the correct files used in the latex document
Anything that is not used should be deleted.
Duplicate figures
We have two copies of the same figure with the same caption.
Tree QTL paper published
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.