daikitag / tstrait-paper Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 2.31 MB

TeX 69.14% Jupyter Notebook 30.01% Python 0.65% Makefile 0.20%

tstrait-paper's People

Contributors

Watchers

Forkers

jeromekelleher gertjanbisschop

tstrait-paper's Issues

Change node labels to letters in illustration.

It's confusing have 0 mean two different things (node 0 and individual 0). A simple way to fix is to set the node labels to lowercase alphabetical letters like we do in the ARG paper.

Change time of node 6 or 7

It's potentially distracting to have nodes 6 and 7 at exactly the same time. Change one of them to be slightly more or less

Related to #10

Martin et al simulation not a good comparison

Our version of the Martin et al simulation is essentially completely different, and also does some very inefficient things.

It needs to be rewritten to closely follow their approach.

Update time scaling fig

Change the fig s1 to just show time to simulate a trait with 1000 causal sites for a the stdpopsim ARGs.

Drop the Martin et al comparison, as it's not really worth the trouble and we don't have space to discuss it in the text.

Mention extension to dominance in the conclusion

We assume the additive model, but we should support non-additive also longer term.

Papers to cite:

https://www.science.org/doi/full/10.1126/science.abn8455

Change simulation result dataframes to long format

Simulation time results currently are in wide format, like

,1e6_50Mb,2e6_50Mb,3e6_50Mb,4e6_50Mb,5e6_50Mb,1e6_100Mb,2e6_100Mb,3e6_100Mb,4e6_100Mb,5e6_100Mb,1e6_200Mb,2e6_200Mb,3e6_200Mb,4e6_200Mb,5e6_200Mb
0,77.64282770000864,179.63335699995514,230.23984719999135,343.7499600999872,423.4201375999837,75.93955379998079,173.95170759997563,254.86329439998372,330.529352499987,392.6764542000019,92.2>
1,72.61392149998574,181.0281139999861,263.9906612999621,356.80779799999436,446.1329022999853,75.12469440000132,146.49259350000648,276.3649019000004,364.99122720002197,399.3250003999565,77.5>
2,70.11137739999685,184.03285519999918,294.038197899994,348.7073199999868,482.82663069997216,77.93682020000415,174.49476670002332,235.45707329997094,306.8637148999842,406.5505961999879,74.4>
3,81.93802689999575,169.1525683000218,229.7734829999972,358.95320299995365,454.1488188999938,64.41976179997437,174.47186769999098,228.89124909997918,318.7389976999839,467.2518087000353,90.1>
4,81.39597090001917,179.43859279999742,234.41665630001808,360.4629126000218,448.3900854999665,68.80955529998755,167.84331939998083,265.9424035999691,314.8192218999611,380.4535507999826,75.9>
5,79.5875359000056,162.72084680001717,242.5002392999595,328.1800843999954,455.24361139995744,72.51941160002025,170.0330126999761,273.7956803999841,320.2541406999808,459.5114398000296,81.465>
6,85.30113890004577,188.09864710003603,248.51294749998488,360.8818011999829,479.57706929999404,74.97351540002273,159.52908700000262,254.10992680001073,346.2119941000128,407.9319478000398,81>
7,87.57253509998554,168.307615800004,269.3018477000296,307.2869950000313,402.21905590000097,77.65944620000664,165.3033509000088,246.15262670000084,367.09767170000123,492.6369316999917,81.51>
8,77.49477789999219,150.29511330003152,251.10237269999925,345.2701656999998,436.22386979998555,82.3944512999733,175.3935331000248,244.92631290003192,335.08269339997787,345.27137590001803,82>
9,70.17078769998625,187.60311570001068,265.39845320000313,351.7046144999913,480.65730829996755,75.47531160002109,151.07406130002346,250.4223663000157,374.0576356999809,432.41990810004063,

This is very unwieldy for downstream work. Store them in long format instead:

sample_size,sequence_length,random_seed,num_traits,cpu_time
1000000,50000000,12345,1000,10.234
....

These can then be processed with Pandas much more easily (using group-bys to average over replicates, eg)

The file-naming is also opaque - if we are splitting across multiple files be consistent in the number scheme.

That's all we need

Remove unused files

We still have some unused files lying around.

Can we confirm that

All code use to generate everything is in the "notebooks" directory
Running these notebooks creates the correct files used in the latex document

Anything that is not used should be deleted.

Duplicate figures

We have two copies of the same figure with the same caption.

Tree QTL paper published

Update refs

https://www.sciencedirect.com/science/article/abs/pii/S0002929723003956

daikitag / tstrait-paper Goto Github PK

tstrait-paper's People

Contributors

Watchers

Forkers

tstrait-paper's Issues

Change node labels to letters in illustration.

Change time of node 6 or 7

Martin et al simulation not a good comparison

Update time scaling fig

Mention extension to dominance in the conclusion

Change simulation result dataframes to long format

Get perf numbers for large ARGs

Add "Genotype" column to figure bottom table

Report RAM usage as well as time

Remove unused files

Duplicate figures

Tree QTL paper published

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent