richelbilderbeek / razzo Goto Github PK

Research project by Giovanni Laudanno and Richel J.C. Bilderbeek

License: GNU General Public License v3.0

R 95.01% Shell 4.99%

razzo's Introduction

razzo

Branch
`master`
`develop`
`giovanni`
`richel`

Research project by Giovanni Laudanno and Richel J.C. Bilderbeek.

Primary tasks:

Giovanni Laudanno: making each step right
Richel J.C. Bilderbeek: big picture, software architecture, testing, continuous integration

The research project uses these GitHub repo's:

razzo: R code
raztr: razzo results of a test run
razzo_project: bash scripts to run and analyse an experiment
razzo_article: scientific manuscript (private GitHub for now)
razzo_pilot_results: results of the pilot runs

Roadmap

Project stages

Ignition: prepare to do the experiment badly, e.g. short MCMC chains, few replicates, etc.
Launch: prepare to do the experiment correctly, in line with manuscript
Flight: running the experiment, maintaining the process
Land: write down results

Function overview

See doc.

Installation

See doc/install.md.

Package dependencies

Package
babette
beautier
beastier
mauricer
mbd
tracerer

Data

Data	DOI
razzo v1.0
razzo_article v1.0
razzo_project

Image attribution

From https://commons.wikimedia.org/wiki/File:RocketX.png

Image comes from Template:Http://wpclipart.com which ONLY features public domain images and provides extensive source information on their "Legal" page: Template:Http://www.wpclipart.com/legal.html [Public domain]

razzo's People

Contributors

Stargazers

Watchers

Forkers

gagan0123 anneokk

razzo's Issues

Fix Travis and GitHub conflict

GitHub is blocking Travis for too many queries. Can this be solved?

'raz_create_mbd_tree' must create an MBD tree

raz_create_mbd_tree must create an MBD tree.

Labeled # TODO: Issue #8: actually create an MBD tree and save it:

test_that("use", {

  # ...

  # No tree present yet
  testit::assert(!file.exists(mbd_tree_filename))

  # Create tree
  raz_create_mbd_tree(parameters, mbd_tree_filename)

  # TODO: Issue #8: actually create an MBD tree and save it
  if (1 == 2) {
    expect_true(file.exists(mbd_tree_filename))
  }
})

Run experiment on razzo_project

razzo_project contains all the scripts for the full experiment. It will run part of the experiment on Travis and shows the results using find ..

This Issue is important in showing the progress of the project:

Make razzo work with scripts

razzo will be called from scripts.

Build status here.

Add parameter for site model

Also upgrade the function that uses a site model.

testit::assert(parameters$site_model %in% c("jc69", "gtr"))

Add 'raz_create_bd_alignment' function

test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  bd_alignment_filename <- file.path(folder_name, sub_folder_name, "bd.fasta")

  # TODO: Issue #15: Add 'raz_create_bd_alignment'
  if (1 == 2) {
    raz_create_bd_alignment(parameters_filename)
    expect_true(file.exists(bd_alignment_filename))
  }
})

Remove Nested Sampling output

When doing a nested sampling run, it creates two files:

[tmpname].posterior.log
[tmpname].posterior.trees

babette should delete these somehow.

`raz_create_inference_files` must create the inference files

Labeled with # TODO: Issue #4:

Can only be done after #3

context("raz_create_inference_files")

test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  # Work on the parameter file and create two FASTA files
  input_filenames <- raz_create_input_files(parameters_filename)
  mbd_fasta_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")

  # Do inference on the first FASTA file
  inference_filenames <- raz_create_inference_files(mbd_fasta_filename)

  mbd_trees_filename <- file.path(folder_name, sub_folder_name, "mbd.trees")
  mbd_log_filename <- file.path(folder_name, sub_folder_name, "mbd.log")
  mbd_mar_lik_filename <- file.path(folder_name, sub_folder_name, "mbd_mar_lik.csv")

  expect_true(mbd_trees_filename %in% inference_filenames)
  expect_true(mbd_log_filename %in% inference_filenames)
  expect_true(mbd_mar_lik_filename %in% inference_filenames)

  # TODO: Issue #4
  if (1 == 2) {
    expect_true(file.exists(mbd_trees_filename))
    expect_true(file.exists(mbd_log_filename))
    expect_true(file.exists(mbd_mar_lik_filename))
  }
})

Prepare babette for CRAN

As agreed with Rampal, Thursday 25th (and Friday 26th), I will work on babette instead of razzo.

Fix mbd's master

This commit broke master. This should be fixed ASAP.

I will try so first myself, else will assign Giovanni.

Add 'raz_create_mbd_alignment' function

Added the test:

test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  mbd_alignment_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")

  # TODO: Issue #14: Add 'raz_create_mbd_alignment'
  if (1 == 2) {
    raz_create_mbd_alignment(parameters_filename)
    expect_true(file.exists(mbd_alignment_filename))
  }
})

'raz_create_parameters_files' must create a parameter file

Let raz_create_parameters_files create one parameter file. The file's name is already known 👍

Labeled the test with # TODO: Issue #2:

test_that("use", {
  folder_name <- tempdir()
  filenames <- raz_create_parameters_files(folder_name = folder_name)
  expect_true(length(filenames) >= 1)

  # TODO: Issue #2
  if (1 == 2) {
    expect_true(all(file.exists(filenames)))
  }
})

Document 'raz_utilities.R'

Build will fail on undocumented functions.

Think: too much conditioning is bad

Note to self/us that I/we need to take into consideration.

From Stadler, Tanja. "How can we improve accuracy of macroevolutionary rate estimates?." Systematic Biology 62.2 (2012): 321-329.

Too much conditioning is bad! In general, if we
condition on more quantities (e.g., n in addition to
age as in the last point), we take away information
in the data (by conditioning on it), and thus the
estimates become less precise. In the extreme case
of conditioning on all speciation times t 1 ,...,t n−1
of the phylogeny, each parameter combination has
the same likelihood (f (T |t 0 ,t 1 ,...,t n−1 ,n,lambda,mu) = 1),
thus no information is left in the data.

raz_create_bd_alignment must calculate its own mutation rate

See #48, which is this Issue but for the MBD alignment.

Add `raz_est_marg_lik`

Function to calculate the marginal log likelihood. It relies on babette, yet babette does not return an ESS yet.

Fix style

It is a good idea to follow a coding style, e.g. the one recommended by Hadley Wickham.

The file tests/testthat/test-style_TODO.R can test so.

It is labeled with skip("TODO, Issue #20: Fix package style").

Update demo vignette for scripted use

With the simplified interface, the vignette can now do a full run.

Remove global variables

I think the goodpractice will inform us, but in general one should not use global variables:

my_global_variable <<- 42 # No!

It increases the complexity of the code super-exponentially. There are exception, but these do not apply here.

`mbd` give unclear error message

when calling with soc = NULL.

check that the twin BD tree creation makes sense

check the twin bd tree

Make simpler to add a parameter to the model

Adding one parameter to the model is very complex. It's needed to change at least 5 or 6 functions every time. I suggest to simplify the overall structure. We can maybe use some tricks like exploiting the raz_get_param_names.
Simplify simplify simplify.

there is no package called 'cli'

Error in loadNamespace(name) : there is no package called 'cli'

'raz_create_mbd_tree' should create trees with the expected number of speciation events

After fixing #63 ('Fix mbd package')

(note: I just put this here now, as I think we agree on this, else leave a comment below 👍 )

raz_create_mbd_tree creates an MBD tree for some parameters.

As far as I can see, I think we'll pick nu to have 0/1/2/4 ot 8 expected triggered speciation events. raz_create_mbd_tree should only keep those trees that have experienced that number of triggered speciation events.

Make build clean.

When building the package weird messages appear.
They are of two different kinds:
One of this is related to some plot function somewhere that i can't really find.
The other one is related to some inconsistency with the documentation, probably related to the vignette.

razzo must provide testing resources

Testing is a bit cumbersome now, as for all pipeline steps, the steps before need to done first.

An alternative approach would be to use some pre-created files in inst/extdata and call these similar to beautier::get_beautier_path

Make babette working inside the function "raz_create_inference_files.R"

If you try to run "raz_create_inference_files" you will get this error when running this line.
We need to make it work.

posterior <- babette::bbt_run(

```
  fasta_filenames = fasta_filename,
```

  mcmc = beautier::create_mcmc_nested_sampling(

```
    chain_length = chain_length,
```
```
    store_every = sample_interval,
```

    sub_chain_length = sub_chain_length

```
  ),
```
```
  site_models = site_model,
```
```
  clock_models = clock_model,
```

  tree_priors = beautier::create_bd_tree_prior(),

  mrca_priors = beautier::create_mrca_prior(

    alignment_id = beautier::get_alignment_id(fasta_filename),

    taxa_names = beautier::get_taxa_names(fasta_filename),

```
    is_monophyletic = TRUE,
```

    mrca_distr = beautier::create_normal_distr(

      mean  = beautier::create_mean_param(value = crown_age),

      sigma = beautier::create_sigma_param(value = 0.001)

```
    )
```
```
  ),
```
```
  rng_seed = rng_seed,
```

  beast2_output_trees_filenames = trees_filename, # Will create it

  beast2_output_log_filename = log_filename, # Will create it

```
  verbose = FALSE
```
```
)
```

Error in check_input_filename_validity(input_filename = input_filename, :
'input_filename' must be a valid BEAST2 XML file. File 'C:\Users\P274829\AppData\Local\Temp\RtmpsNRU71\beast2_22d04c7b7984.xml' is not a valid BEAST2 fileFALSE
In addition: Warning messages:
1: In system2(cmds[1], args = cmds[-1], stdout = TRUE, stderr = TRUE) :
running command '"C:\Program Files\Java\jre1.8.0_181\bin\java.exe" -jar "C:\Users\P274829\AppData\Local\BEAST\lib\beast.jar" -validate "C:\Users\P274829\AppData\Local\Temp\RtmpsNRU71\beast2_22d04c7b7984.xml"' had status 1
2: In system2(cmds[1], args = cmds[-1], stdout = TRUE, stderr = TRUE) :
running command '"C:\Program Files\Java\jre1.8.0_181\bin\java.exe" -jar "C:\Users\P274829\AppData\Local\BEAST\lib\beast.jar" -validate "C:\Users\P274829\AppData\Local\Temp\RtmpsNRU71\beast2_22d04c7b7984.xml"' had status 1

Decide if we need all the create_params functions (one for each param) or we can just do fine with one

I copied the functions from babette. There is one create_params function for each parameter but in the end i feel like this is overkilling and only increasing the complexity of the package. I advise to leave only one function.

'raz_create_nltt_file' must create an nLTT file

Labeled with # TODO: Issue #5

Can only be done after #4

test_that("use", {

  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")
  # Work on the parameter file and create two FASTA files
  input_filenames <- raz_create_input_files(parameters_filename)
  mbd_fasta_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")
  mbd_tree_filename <- file.path(folder_name, sub_folder_name, "mbd.tree")
  # Do inference on the first MBD trees
  inference_filenames <- raz_create_inference_files(
    fasta_filename = mbd_fasta_filename
  )
  mbd_trees_filename <- file.path(folder_name, sub_folder_name, "mbd.trees")

  # Start real work
  nltt_filename <- raz_create_nltt_file(
    trees_filename = mbd_trees_filename
  )
  expect_equal(file.path(folder_name, sub_folder_name, "mbd_nltts.csv"), nltt_filename)

  # TODO: Issue #5: 'raz_create_nltt_file' must create an nLTT file
  if (1 == 2) {
    expect_true(file.exists(nltt_filename))
  }
})

Asymmetry in saving and loading parameter files

`raz_create_bd_tree` must create a BD tree and save it

Labeled with # TODO: Issue #9: actually create a BD tree and save it

test_that("use", {

  bd_tree_filename <- tempfile()

  # No tree present yet
  testit::assert(!file.exists(bd_tree_filename))

  # Create tree
  raz_create_bd_tree(
    init_speciation_rate = 0.1,
    init_extinction_rate = 0.1,
    mbd_tree = ape::rcoal(4),
    bd_tree_filename
  )

  # TODO: Issue #9: actually create a BD tree and save it
  if (1 == 2) {
    expect_true(file.exists(bd_tree_filename))
  }
})

'raz_create_input_files' must create the four true/nature/input files

raz_create_input_files must create the four true/nature/input files.

Labeled with # TODO: Issue #3:

Can only be done after #2

context("raz_create_input_files")

test_that("use", {
  # Work from a folder
  folder_name <- tempdir()

  # Create the parameter files
  raz_create_parameters_files(folder_name)
  sub_folder_name <- "1"
  parameters_filename <- file.path(folder_name, sub_folder_name, "parameters.csv")

  # Work on the parameter file
  input_filenames <- raz_create_input_files(parameters_filename)

  # Expect four files to be created
  mbd_fasta_filename <- file.path(folder_name, sub_folder_name, "mbd.fasta")
  mbd_tree_filename <- file.path(folder_name, sub_folder_name, "mbd.tree")
  bd_fasta_filename <- file.path(folder_name, sub_folder_name, "bd.fasta")
  bd_tree_filename <- file.path(folder_name, sub_folder_name, "bd.tree")

  # TODO: Issue #3
  if (1 == 2) {
    expect_equal(file.exists(mbd_fasta_filename))
    expect_equal(file.exists(mbd_tree_filename))
    expect_equal(file.exists(bd_fasta_filename))
    expect_equal(file.exists(bd_tree_filename))
  }
})

Fix babette possible bug in truncating output in longer MCMC runs

razzo will need the complete MCMC run's output.

As reported here

Remove all '1 == 2' and TODO

When all is supposedly done, remove all if(1 == 2) statements and check all TODO's.

'raz_open_parameters_file' must open the parameter file

raz_open_parameters_file is a stub that does not open a parameter file at all.

raz_open_parameters_file <- function(filename)
{
  # TODO: actually read the file
  if (1 == 2) {
    testit::assert(file.exists(filename))
  }
  # ...
  parameters
}

It does pass the tests, as it sets lambda to 1.0:

context("raz_open_parameters_file")

test_that("use", {

  folder_name <- tempdir()
  filenames <- raz_create_parameters_files(folder_name = folder_name)
  filename <- filenames[1]
  parameters <- raz_open_parameters_file(filename)
  expect_true(parameters$lambda > 0.0)
})

Add parameter for clock model

Also upgrade the function that uses a clock model.

# s: strict
# rln: relaxed log-normal
testit::assert(parameters$clock_model %in% c("s", "rln"))

Fix develop

DDD::L2phylo gives unclear error

In the function raz_create_bd_tree, one of the last lines is to convert the BD L matrix to a phylogeny:

> DDD::L2phylo(bd_l_matrix)

This gives the unclear error:

Error in `[<-.data.frame`(`*tmp*`, j, 1:3, value = numeric(0)) : 
  replacement has 0 items, need 3

I've added a fake return value until this is fixed:

  # TODO: Issue #43: DDD::L2phylo gives unclear error
  if (1 == 2) {
    return(DDD::L2phylo(bd_l_matrix))
  }
  # FAKE
  mbd_tree

Yup, for now it simply returns the MBD tree 🌈

Fix babette bug from create_rln_clock_model

As reported here

Call 'DDD::bd_ML' with which condition?

raz_create_bd_tree uses DDD::bd_ML with a certain conditioning.

Check if it is indeed the right one. Document that in the article.

check that the inference of lambda and mu provided by bd_ML is decent.

bd_ML sometimes gives us back some weird lambda and mu. let's be sure they are right

phylogeny must not contain extant species

From vignette:

Error in pirouette::sim_alignment(phylogeny = bd_tree, sequence_length = NULL, : phylogeny must not contain extant species

raz_create_mbd_alignment must calculate its mutation rate

AFAICS, raz_create_mbd_alignment is perfectly capable of calculating its mutation rate.

At the moment, it uses a mutation rate from an input file:

raz_create_mbd_alignment <- function(
  parameters, mbd_tree
) {
  # ...
  mutation_rate <- parameters$mbd_mutation_rate

  # ...
}

The mutation rate should be calculated as in the comments in the code and/or according to as described in the article
The mutation rate can be removed from parameter file creation

Remove unused code

I think we are on the right track and don't need the older code.

I suggest to get rid of the older/unused code, as it only slows us down.

Make babette work under Windows again

When the time is right 👍

Create razzo_project GitHub

Should have folder 'scripts'
Has a script called create_parameter_files.sh that calls razzo::create_parameter_files
Displays the folder structure Travis

Keep test output clean

When I run the tests, it produces output about optimizing parameters. Silence is Golden means that if everything is OK, it should say nothing. We cannot modify testthat in that aspect, but the verbosity of the ML optimization can and should be removed.

Loading razzo
Loading required package: testthat
Testing razzo
✔ | OK F W S | Context
✔ |  1       | raz_create_bd_alignment
⠹ |  3       | raz_create_bd_treeYou are optimizing lambda0 mu0 
You are fixing lambda1 mu1 
Optimizing the likelihood - this may take a while. 
The loglikelihood for the inital parameter values is -110.4957 

 Maximum likelihood parameter estimates: lambda0: 0.029187, mu0: 0.347394, lambda1: 0.000000, mu1: 0.000000:  
 Maximum loglikelihood: -108.230334 
✔ |  4       | raz_create_bd_tree [1.0 s]

Prefer the first word of a function being a verb

From the literature, prefer naming the first word of a function to be a verb.

Now	Suggest
`raz_tempdir`	`raz_?`
`raz_standard_parameters`	`raz_create_def_params`
`raz_standard_parameters_interval`	Put as default values in `raz_create_params`
`raz_source`	Just use CTRL+L ('Source all') instead

I am open to dropping the raz_ prefix, but that would be a different Issue.

row names were found from a short variable and have been discarded

Message from raz_create_bd_tree_file, triggered in demo vignette.

Create 'raz_create_params'

The function raz_create_params creates parameters that are named and checked. Although a trivial function, it assures that we use correct parameter values.

Marked with TODO, Issue #19.

tests/testthat/test-raz_create_params.R tests it, but these tests are skipped
raz_create_params should call it (instead of using c to combine all values