Giter Site home page Giter Site logo

razzo's Introduction

razzo

Branch Travis CI logo AppVeyor logo Codecov logo
master Build Status Build status codecov.io
develop Build Status Build status codecov.io
giovanni Build Status Build status codecov.io
richel Build Status Build status codecov.io

Research project by Giovanni Laudanno and Richel J.C. Bilderbeek.

Primary tasks:

  • Giovanni Laudanno: making each step right
  • Richel J.C. Bilderbeek: big picture, software architecture, testing, continuous integration

The research project uses these GitHub repo's:

Roadmap

Project stages

  • Ignition: prepare to do the experiment badly, e.g. short MCMC chains, few replicates, etc.
  • Launch: prepare to do the experiment correctly, in line with manuscript
  • Flight: running the experiment, maintaining the process
  • Land: write down results

Function overview

See doc.

Installation

See doc/install.md.

Package dependencies

Package Travis CI logo Codecov logo
babette Build Status codecov.io
beautier Build Status codecov.io
beastier Build Status codecov.io
mauricer Build Status codecov.io
mbd Build Status codecov.io
tracerer Build Status codecov.io

Data

Data DOI
razzo v1.0 DOI
razzo_article v1.0 DOI
razzo_project DOI

Image attribution

From https://commons.wikimedia.org/wiki/File:RocketX.png

Image comes from Template:Http://wpclipart.com which ONLY features public domain images and provides extensive source information on their "Legal" page: Template:Http://www.wpclipart.com/legal.html [Public domain]

razzo's People

Contributors

giappo avatar richelbilderbeek avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

gagan0123 anneokk

razzo's Issues

'raz_create_mbd_tree' must create an MBD tree

raz_create_mbd_tree must create an MBD tree.

Labeled # TODO: Issue #8: actually create an MBD tree and save it:

test_that("use", {

  # ...

  # No tree present yet
  testit::assert(!file.exists(mbd_tree_filename))

  # Create tree
  raz_create_mbd_tree(parameters, mbd_tree_filename)

  # TODO: Issue #8: actually create an MBD tree and save it
  if (1 == 2) {
    expect_true(file.exists(mbd_tree_filename))
  }
})

Run experiment on razzo_project

razzo_project contains all the scripts for the full experiment. It will run part of the experiment on Travis and shows the results using find ..

This Issue is important in showing the progress of the project:

  • 1_install_razzo.sh`
  • 2_create_parameter_files.sh`
  • 3_create_input_files.sh`
  • 4_create_posterior_files.sh`
  • 5_create_nltt_files.sh`
  • 6_create_marg_lik_files.sh`
  • 7_create_nltt_stats_file.sh`
  • 8_create_esses_file.sh`
  • 9_create_marg_liks_file.sh`
  • 10_create_result_fig_1_file.sh`

Add parameter for site model

Also upgrade the function that uses a site model.

testit::assert(parameters$site_model %in% c("jc69", "gtr"))

Add 'raz_create_bd_alignment' function

test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  bd_alignment_filename <- file.path(folder_name, sub_folder_name, "bd.fasta")

  # TODO: Issue #15: Add 'raz_create_bd_alignment'
  if (1 == 2) {
    raz_create_bd_alignment(parameters_filename)
    expect_true(file.exists(bd_alignment_filename))
  }
})

Remove Nested Sampling output

When doing a nested sampling run, it creates two files:

  • [tmpname].posterior.log
  • [tmpname].posterior.trees

babette should delete these somehow.

`raz_create_inference_files` must create the inference files

Labeled with # TODO: Issue #4:

  • Can only be done after #3
context("raz_create_inference_files")

test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  # Work on the parameter file and create two FASTA files
  input_filenames <- raz_create_input_files(parameters_filename)
  mbd_fasta_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")

  # Do inference on the first FASTA file
  inference_filenames <- raz_create_inference_files(mbd_fasta_filename)

  mbd_trees_filename <- file.path(folder_name, sub_folder_name, "mbd.trees")
  mbd_log_filename <- file.path(folder_name, sub_folder_name, "mbd.log")
  mbd_mar_lik_filename <- file.path(folder_name, sub_folder_name, "mbd_mar_lik.csv")

  expect_true(mbd_trees_filename %in% inference_filenames)
  expect_true(mbd_log_filename %in% inference_filenames)
  expect_true(mbd_mar_lik_filename %in% inference_filenames)

  # TODO: Issue #4
  if (1 == 2) {
    expect_true(file.exists(mbd_trees_filename))
    expect_true(file.exists(mbd_log_filename))
    expect_true(file.exists(mbd_mar_lik_filename))
  }
})

Prepare babette for CRAN

As agreed with Rampal, Thursday 25th (and Friday 26th), I will work on babette instead of razzo.

Add 'raz_create_mbd_alignment' function

Added the test:

test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  mbd_alignment_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")

  # TODO: Issue #14: Add 'raz_create_mbd_alignment'
  if (1 == 2) {
    raz_create_mbd_alignment(parameters_filename)
    expect_true(file.exists(mbd_alignment_filename))
  }
})

'raz_create_parameters_files' must create a parameter file

Let raz_create_parameters_files create one parameter file. The file's name is already known ๐Ÿ‘

Labeled the test with # TODO: Issue #2:

test_that("use", {
  folder_name <- tempdir()
  filenames <- raz_create_parameters_files(folder_name = folder_name)
  expect_true(length(filenames) >= 1)

  # TODO: Issue #2
  if (1 == 2) {
    expect_true(all(file.exists(filenames)))
  }
})

Think: too much conditioning is bad

Note to self/us that I/we need to take into consideration.

From Stadler, Tanja. "How can we improve accuracy of macroevolutionary rate estimates?." Systematic Biology 62.2 (2012): 321-329.

  1. Too much conditioning is bad! In general, if we
    condition on more quantities (e.g., n in addition to
    age as in the last point), we take away information
    in the data (by conditioning on it), and thus the
    estimates become less precise. In the extreme case
    of conditioning on all speciation times t 1 ,...,t nโˆ’1
    of the phylogeny, each parameter combination has
    the same likelihood (f (T |t 0 ,t 1 ,...,t nโˆ’1 ,n,lambda,mu) = 1),
    thus no information is left in the data.

Add `raz_est_marg_lik`

Function to calculate the marginal log likelihood. It relies on babette, yet babette does not return an ESS yet.

Fix style

It is a good idea to follow a coding style, e.g. the one recommended by Hadley Wickham.

The file tests/testthat/test-style_TODO.R can test so.

It is labeled with skip("TODO, Issue #20: Fix package style").

Update demo vignette for scripted use

With the simplified interface, the vignette can now do a full run.

  • Create parameter files
  • Create MBD trees
  • Create twin BD trees
  • Create MBD alignments
  • Create BD alignments
  • Create MBD posterior files
  • Create BD posterior files
  • Calculate nLTT statistics for MBD
  • Calculate nLTT statistics for BD
  • Show figure 1
  • Measure Effective Sample Sizes
  • Compare marginal likelihoods
  • What is the effect of MBD on the error?

Remove global variables

I think the goodpractice will inform us, but in general one should not use global variables:

my_global_variable <<- 42 # No!

It increases the complexity of the code super-exponentially. There are exception, but these do not apply here.

Make simpler to add a parameter to the model

Adding one parameter to the model is very complex. It's needed to change at least 5 or 6 functions every time. I suggest to simplify the overall structure. We can maybe use some tricks like exploiting the raz_get_param_names.
Simplify simplify simplify.

'raz_create_mbd_tree' should create trees with the expected number of speciation events

After fixing #63 ('Fix mbd package')

(note: I just put this here now, as I think we agree on this, else leave a comment below ๐Ÿ‘ )

raz_create_mbd_tree creates an MBD tree for some parameters.

As far as I can see, I think we'll pick nu to have 0/1/2/4 ot 8 expected triggered speciation events. raz_create_mbd_tree should only keep those trees that have experienced that number of triggered speciation events.

Make build clean.

When building the package weird messages appear.
They are of two different kinds:
One of this is related to some plot function somewhere that i can't really find.
The other one is related to some inconsistency with the documentation, probably related to the vignette.

Make babette working inside the function "raz_create_inference_files.R"

If you try to run "raz_create_inference_files" you will get this error when running this line.
We need to make it work.

posterior <- babette::bbt_run(

  •   fasta_filenames = fasta_filename,
    
  •   mcmc = beautier::create_mcmc_nested_sampling(
    
  •     chain_length = chain_length,
    
  •     store_every = sample_interval,
    
  •     sub_chain_length = sub_chain_length
    
  •   ),
    
  •   site_models = site_model,
    
  •   clock_models = clock_model,
    
  •   tree_priors = beautier::create_bd_tree_prior(),
    
  •   mrca_priors = beautier::create_mrca_prior(
    
  •     alignment_id = beautier::get_alignment_id(fasta_filename),
    
  •     taxa_names = beautier::get_taxa_names(fasta_filename),
    
  •     is_monophyletic = TRUE,
    
  •     mrca_distr = beautier::create_normal_distr(
    
  •       mean  = beautier::create_mean_param(value = crown_age),
    
  •       sigma = beautier::create_sigma_param(value = 0.001)
    
  •     )
    
  •   ),
    
  •   rng_seed = rng_seed,
    
  •   beast2_output_trees_filenames = trees_filename, # Will create it
    
  •   beast2_output_log_filename = log_filename, # Will create it
    
  •   verbose = FALSE
    
  • )
    

Error in check_input_filename_validity(input_filename = input_filename, :
'input_filename' must be a valid BEAST2 XML file. File 'C:\Users\P274829\AppData\Local\Temp\RtmpsNRU71\beast2_22d04c7b7984.xml' is not a valid BEAST2 fileFALSE
In addition: Warning messages:
1: In system2(cmds[1], args = cmds[-1], stdout = TRUE, stderr = TRUE) :
running command '"C:\Program Files\Java\jre1.8.0_181\bin\java.exe" -jar "C:\Users\P274829\AppData\Local\BEAST\lib\beast.jar" -validate "C:\Users\P274829\AppData\Local\Temp\RtmpsNRU71\beast2_22d04c7b7984.xml"' had status 1
2: In system2(cmds[1], args = cmds[-1], stdout = TRUE, stderr = TRUE) :
running command '"C:\Program Files\Java\jre1.8.0_181\bin\java.exe" -jar "C:\Users\P274829\AppData\Local\BEAST\lib\beast.jar" -validate "C:\Users\P274829\AppData\Local\Temp\RtmpsNRU71\beast2_22d04c7b7984.xml"' had status 1

'raz_create_nltt_file' must create an nLTT file

Labeled with # TODO: Issue #5

  • Can only be done after #4
test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  # Work on the parameter file and create two FASTA files
  input_filenames <- raz_create_input_files(parameters_filename)
  mbd_fasta_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")
  mbd_tree_filename <- file.path(folder_name, sub_folder_name, "mbd.tree")
  # Do inference on the first MBD trees
  inference_filenames <- raz_create_inference_files(
    fasta_filename = mbd_fasta_filename
  )
  mbd_trees_filename <- file.path(folder_name, sub_folder_name, "mbd.trees")

  # Start real work
  nltt_filename <- raz_create_nltt_file(
    trees_filename = mbd_trees_filename
  )
  expect_equal(file.path(folder_name, sub_folder_name, "mbd_nltts.csv"), nltt_filename)

  # TODO: Issue #5: 'raz_create_nltt_file' must create an nLTT file
  if (1 == 2) {
    expect_true(file.exists(nltt_filename))
  }
})

`raz_create_bd_tree` must create a BD tree and save it

Labeled with # TODO: Issue #9: actually create a BD tree and save it

test_that("use", {

  bd_tree_filename <- tempfile()

  # No tree present yet
  testit::assert(!file.exists(bd_tree_filename))

  # Create tree
  raz_create_bd_tree(
    init_speciation_rate = 0.1,
    init_extinction_rate = 0.1,
    mbd_tree = ape::rcoal(4),
    bd_tree_filename
  )

  # TODO: Issue #9: actually create a BD tree and save it
  if (1 == 2) {
    expect_true(file.exists(bd_tree_filename))
  }
})

'raz_create_input_files' must create the four true/nature/input files

raz_create_input_files must create the four true/nature/input files.

Labeled with # TODO: Issue #3:

  • Can only be done after #2
context("raz_create_input_files")

test_that("use", {
  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")

  # Work on the parameter file
  input_filenames <- raz_create_input_files(parameters_filename)

  # Expect four files to be created
  mbd_fasta_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")
  mbd_tree_filename <- file.path(folder_name, sub_folder_name, "mbd.tree")
  bd_fasta_filename <- file.path(folder_name, sub_folder_name, "bd.fasta")
  bd_tree_filename <- file.path(folder_name, sub_folder_name, "bd.tree")

  # TODO: Issue #3
  if (1 == 2) {
    expect_equal(file.exists(mbd_fasta_filename))
    expect_equal(file.exists(mbd_tree_filename))
    expect_equal(file.exists(bd_fasta_filename))
    expect_equal(file.exists(bd_tree_filename))
  }
})

'raz_open_parameters_file' must open the parameter file

raz_open_parameters_file is a stub that does not open a parameter file at all.

raz_open_parameters_file <- function(filename)
{
  # TODO: actually read the file
  if (1 == 2) {
    testit::assert(file.exists(filename))
  }
  # ...
  parameters
}

It does pass the tests, as it sets lambda to 1.0:

context("raz_open_parameters_file")

test_that("use", {

  folder_name <- tempdir()
  filenames <- raz_create_parameters_files(folder_name = folder_name)
  filename <- filenames[1]
  parameters <- raz_open_parameters_file(filename)
  expect_true(parameters$lambda > 0.0)
})

Add parameter for clock model

Also upgrade the function that uses a clock model.

# s: strict
# rln: relaxed log-normal
testit::assert(parameters$clock_model %in% c("s", "rln"))

DDD::L2phylo gives unclear error

In the function raz_create_bd_tree, one of the last lines is to convert the BD L matrix to a phylogeny:

> DDD::L2phylo(bd_l_matrix)

This gives the unclear error:

Error in `[<-.data.frame`(`*tmp*`, j, 1:3, value = numeric(0)) : 
  replacement has 0 items, need 3

I've added a fake return value until this is fixed:

  # TODO: Issue #43: DDD::L2phylo gives unclear error
  if (1 == 2) {
    return(DDD::L2phylo(bd_l_matrix))
  }
  # FAKE
  mbd_tree

Yup, for now it simply returns the MBD tree ๐ŸŒˆ

raz_create_mbd_alignment must calculate its mutation rate

AFAICS, raz_create_mbd_alignment is perfectly capable of calculating its mutation rate.

At the moment, it uses a mutation rate from an input file:

raz_create_mbd_alignment <- function(
  parameters, mbd_tree
) {
  # ...
  mutation_rate <- parameters$mbd_mutation_rate

  # ...
}
  • The mutation rate should be calculated as in the comments in the code and/or according to as described in the article
  • The mutation rate can be removed from parameter file creation

Remove unused code

I think we are on the right track and don't need the older code.

I suggest to get rid of the older/unused code, as it only slows us down.

Create razzo_project GitHub

  • Should have folder 'scripts'
  • Has a script called create_parameter_files.sh that calls razzo::create_parameter_files
  • Displays the folder structure Travis

Keep test output clean

When I run the tests, it produces output about optimizing parameters. Silence is Golden means that if everything is OK, it should say nothing. We cannot modify testthat in that aspect, but the verbosity of the ML optimization can and should be removed.

Loading razzo
Loading required package: testthat
Testing razzo
โœ” | OK F W S | Context
โœ” |  1       | raz_create_bd_alignment
โ น |  3       | raz_create_bd_treeYou are optimizing lambda0 mu0 
You are fixing lambda1 mu1 
Optimizing the likelihood - this may take a while. 
The loglikelihood for the inital parameter values is -110.4957 

 Maximum likelihood parameter estimates: lambda0: 0.029187, mu0: 0.347394, lambda1: 0.000000, mu1: 0.000000:  
 Maximum loglikelihood: -108.230334 
โœ” |  4       | raz_create_bd_tree [1.0 s]

Prefer the first word of a function being a verb

From the literature, prefer naming the first word of a function to be a verb.

Now Suggest
raz_tempdir raz_?
raz_standard_parameters raz_create_def_params
raz_standard_parameters_interval Put as default values in raz_create_params
raz_source Just use CTRL+L ('Source all') instead

I am open to dropping the raz_ prefix, but that would be a different Issue.

Create 'raz_create_params'

The function raz_create_params creates parameters that are named and checked. Although a trivial function, it assures that we use correct parameter values.

Marked with TODO, Issue #19.

  • tests/testthat/test-raz_create_params.R tests it, but these tests are skipped
  • raz_create_params should call it (instead of using c to combine all values

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.