Giter Site home page Giter Site logo

tstrait-paper's People

Contributors

daikitag avatar jeromekelleher avatar

Watchers

 avatar  avatar

tstrait-paper's Issues

Change node labels to letters in illustration.

It's confusing have 0 mean two different things (node 0 and individual 0). A simple way to fix is to set the node labels to lowercase alphabetical letters like we do in the ARG paper.

Change time of node 6 or 7

It's potentially distracting to have nodes 6 and 7 at exactly the same time. Change one of them to be slightly more or less

Related to #10

Martin et al simulation not a good comparison

Our version of the Martin et al simulation is essentially completely different, and also does some very inefficient things.

It needs to be rewritten to closely follow their approach.

Update time scaling fig

Change the fig s1 to just show time to simulate a trait with 1000 causal sites for a the stdpopsim ARGs.

Drop the Martin et al comparison, as it's not really worth the trouble and we don't have space to discuss it in the text.

Change simulation result dataframes to long format

Simulation time results currently are in wide format, like

,1e6_50Mb,2e6_50Mb,3e6_50Mb,4e6_50Mb,5e6_50Mb,1e6_100Mb,2e6_100Mb,3e6_100Mb,4e6_100Mb,5e6_100Mb,1e6_200Mb,2e6_200Mb,3e6_200Mb,4e6_200Mb,5e6_200Mb
0,77.64282770000864,179.63335699995514,230.23984719999135,343.7499600999872,423.4201375999837,75.93955379998079,173.95170759997563,254.86329439998372,330.529352499987,392.6764542000019,92.2>
1,72.61392149998574,181.0281139999861,263.9906612999621,356.80779799999436,446.1329022999853,75.12469440000132,146.49259350000648,276.3649019000004,364.99122720002197,399.3250003999565,77.5>
2,70.11137739999685,184.03285519999918,294.038197899994,348.7073199999868,482.82663069997216,77.93682020000415,174.49476670002332,235.45707329997094,306.8637148999842,406.5505961999879,74.4>
3,81.93802689999575,169.1525683000218,229.7734829999972,358.95320299995365,454.1488188999938,64.41976179997437,174.47186769999098,228.89124909997918,318.7389976999839,467.2518087000353,90.1>
4,81.39597090001917,179.43859279999742,234.41665630001808,360.4629126000218,448.3900854999665,68.80955529998755,167.84331939998083,265.9424035999691,314.8192218999611,380.4535507999826,75.9>
5,79.5875359000056,162.72084680001717,242.5002392999595,328.1800843999954,455.24361139995744,72.51941160002025,170.0330126999761,273.7956803999841,320.2541406999808,459.5114398000296,81.465>
6,85.30113890004577,188.09864710003603,248.51294749998488,360.8818011999829,479.57706929999404,74.97351540002273,159.52908700000262,254.10992680001073,346.2119941000128,407.9319478000398,81>
7,87.57253509998554,168.307615800004,269.3018477000296,307.2869950000313,402.21905590000097,77.65944620000664,165.3033509000088,246.15262670000084,367.09767170000123,492.6369316999917,81.51>
8,77.49477789999219,150.29511330003152,251.10237269999925,345.2701656999998,436.22386979998555,82.3944512999733,175.3935331000248,244.92631290003192,335.08269339997787,345.27137590001803,82>
9,70.17078769998625,187.60311570001068,265.39845320000313,351.7046144999913,480.65730829996755,75.47531160002109,151.07406130002346,250.4223663000157,374.0576356999809,432.41990810004063,

This is very unwieldy for downstream work. Store them in long format instead:

sample_size,sequence_length,random_seed,num_traits,cpu_time
1000000,50000000,12345,1000,10.234
....

These can then be processed with Pandas much more easily (using group-bys to average over replicates, eg)

The file-naming is also opaque - if we are splitting across multiple files be consistent in the number scheme.

Add "Genotype" column to figure bottom table

It would be helpful to include the genotypes for the three individuals as another column, they will be A|A, T|T and T|A (assuming we use phased VCF-like notation, which people will presumably find familiar)

Report RAM usage as well as time

I think the scaling figure would be much more interesting if we reported RAM usage in the second panel.

Then, we just run one simulation on the French canadian dataset and report its time and memory usage in the text, and do the same for the 1000G tree.

That's all we need

Remove unused files

We still have some unused files lying around.

Can we confirm that

  • All code use to generate everything is in the "notebooks" directory
  • Running these notebooks creates the correct files used in the latex document

Anything that is not used should be deleted.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.