theopannetier / comrad Goto Github PK

An R package to simulate an individual-based model with diversity-dependent diversification emerging from competition

License: GNU General Public License v3.0

R 35.80% C++ 1.23% HTML 62.97%

comrad's Introduction

comrad: A more mechanistic approach to diversity-dependent diversification

Branch
`master`
`develop`
`brute_force`*

*see the Installation section

Description

comrad is an individual-based model of diversity-dependent diversification, where branching (and by extension, speciation), extinction and equilibrium diversity emerge from Lotka-Volterra like competition between individuals.

This package contains the algorithm for the model and functions used to run it, extract, analyse and visualise its output, including the phylogenetic tree of the community.

An overview of the model and the main features of the package can be found in the vignette.

The package accompanies the third chapter of my doctoral dissertation, and the corresponding manuscript, currently in prep. The functions it contains were used to produce the results and figures presented through the paper.

Installation

I maintain two versions of the package. The default version, suitable for all applications that do not include running the simulation for large communities, can be installed from the master branch:

remotes::install_github("TheoPannetier/comrad")

An alternative version, which lives on branch brute_force, is more suitable for running simulations with large communities ($N > 10^4$). Indeed, because the pairwise effects of competition must be calculated at every generation, the number of calculations grows quadratically with the size of the community, and the simulation may take a very long time to complete in such cases.

A solution has been implemented by @HHildenbrandt to address this using parallel- (via xsmid) and multi-processing (via OpenMP), which greatly reduces the time the simulation takes to process the whole community. This version of the package lives on branch `brute_force`, and can be installed with

remotes::install_github("TheoPannetier/comrad", ref = "brute_force")

*To best exploit these solutions, compilation is optimised for the local system on which comrad is run via the compiler flags -march=native, -mtune=native. These flags are not portable, so this version of the package cannot pass the CRAN check.

In case of doubt over which version is installed, the user can call has_brute_force_opt(), which evaluates to FALSE for the main version and to TRUE for the brute_force option.

Contributing / Bug report / Support

For questions or bug reports, please open an Issue
For contributions, please make a branch for yourself and open a pull request when you are done

comrad's People

Contributors

Stargazers

Watchers

Forkers

thijsjanzen bradduthie

comrad's Issues

Allow simulation to start from pre-existing community

I would like to be able to run the simulation starting from a point where the simulation previously exited, e.g. with a community read from an .csv file.

The file from which the initial community was read from, if different from default, should also be recorded in metadata.

#19 Get comrad to branch

So far, it seems that the population evolves and occupies the whole niche space, but no branching appears to be produced.
Instead, the trait values spread continuously along the spectrum of possible values

This is surprising, the parameter values I've used are the same as in the original publication.
Maybe I haven't run the simulation for enough generations, but I rather suspect that this has to do with the way I sample mutations, which is quite different from how Pontarp proceeded.

#3 Make carrying capacity parameters consistent

carr_cap parameters are currently provided as a single vector (carr_cap_pars) to fitness(), but as separate elements to carr_cap.

Make this consistent across the two functions and tests.

#6 Make fitness function population-level

fitness() returns the fitness of a single individual and compares its trait with the rest of the population.

For the generation step perspective, it would be simpler and more efficient to instead return the fitness for the entire population.

#8 Write faster effective pop size function

Current get_eff_pop_sizes sums comp_coeff() for each individual against each other individual in the population.

comp_coeff() is symmetric - comp_coeff(z_i, z_j) == comp_coeff(z_j z_i) so we don't need to compute all the possible interactions - just the unique one, which would then make get_eff_pop_sizes loop over less values and make the fitness computation faster.

The first step is to extract all unique competing trait combinations in a population, using either expand.grid() or combn()

#10 test eff_pop_size

get_eff_pop_sizes is missing tests for ordinary cases and abuse cases

#13 Consider Rcpp

If speed proves unsufficient, one resort is to throw the main computations in C++ and call them with Rcpp

library(Rcpp)
sourceCpp("MyFunc.cpp")
system.time (output <- myFunc(df))

where a minimal MyFunc.cpp would be

#include <rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void myFunc() {
 return 0;
}

Write tests for `convert_sim_to_phylo()`

Simulated communities produced with run_simulation() can be converted to ape-style phylo class objects with assemble_phylo_tbl() + convert_to_newick() + ape::read.tree(), all piped in convert_sim_to_phylo(). I'm not confident this will always work as intended so let's experiment with a bit of TDD.

#18 Plot the distribution of a trait in a generation

Better resolution at the expense of the whole picture!

#25 Treat multiple simultaneous speciation events

As of now, apply_speciation() can only apply one speciation event per species per generation. That is there are multiple gaps in the trait values of a species, only one will result in speciation.

Although this is no ideal, this should not be an issue for the simulation, because an untreated speciation event will be caught at the next generation.
In other, phylogenetic words, polytomies are treated as soft polytomies.

Make fitness function #1

The fitness function links the trait value of a focal individual to the number of offspring it will produces, based on values of traits of other individuals in the population, and the overall fitness of the trait without competition, as defined by the carrying capacity.

#12 Fix LaTeX blocks

Mathematical exponents such as exponents and greater that don't appear like in LaTeX in the compiled docs.

[Feature test] Link to a line of code

Say I don't like the apply_mutations.R and feel that there should be a large blank
maybe,
say,
four lines?
before line 33.

#7 Pick consistent model variable and parameter names

Dilemma: descriptive variable names, or concise names reflecting conventions?
E.g.

n_eff or eff_pop_size ?
sigma_alpha or competition_width; or the current trade-off sigma_comp ?
trait_ind and traits_pop or z_ind and z_pop ?

Implement species identity & speciation

#21
comrad can now produce some lovely branches, but those are still just cluster of individuals. I want to produce phlyogenies, so I need to track lineages.

I will use the species definition from the original paper (Pontarp et al. 2012).

#11 Make testarg_bounds

testarg_bounds(arg, min, max) asserts that the value(s) of arg are between min and max

#5 Implement generation step

Pseudocode
— For t in 1:nb_gen
—— For each individual

Get fitness G(z)
Draw the nb of offspring from Pois(G(z))
——— For each offspring
Apply mutation
——— done

—— done
Replace current generation with new generation
— done

Fix develop Travis build

https://travis-ci.org/github/TheoPannetier/comrad/builds/722712768

tarivs y u do dis ;-;

Automate read_comrad_tbl skip

The nb of lines of metadata in the simulation output of comrad keeps changing across versions.
So far I specify the numebr of lines that should be skipped via an argument, but this causes frequent bugs and is annoying.
Is there any regexp magic I could use to automatically find where the table starts?

#14 Solve anticipated merge conflict

I made a mistake and worked on master on last commits. Now back on develop but I forgot to merge master into develop now, so I anticipate a merge conflict when I do it.

#9 test fitness

get_fitness is missing tests for ordinary cases

#17 Plot population growth over generations

Has the population indeed hit carrying capacity?

#24 Add tests around the simulation

I had to rush to 0.11.0 lately, so comrad is now looking handsome but is quite fragile, I expect. It's time to write some tests!

#26 Change species identity

charlatan::ch_taxonomic_epithet() is great fun, but unfortunately there is a pretty high chance of drawing twice the same name for different species.
I need an alternative that provides random, unique identifiers for new species.

#16 Plot trait values over generations

It would be nice to visualize all those .csv's!

Add vignette?

Salut @TheoPannetier,

I checked this package and I am missing a vignette that illustrates what the package does. Do you consider adding it?

Cheers, Richel

#4 Solve the issue of NaN fitness

As of 0.3.0, fitness() can evaluate to a NaN if both n_eff and k take value 0.

This can happen when an individual takes a trait value that is very far from the optimal value and from the trait values of other individuals in the population; in which case carr_cap() evaluates to 0 (poor fitness) and comp_coeff() evaluates to 0 for all individuals (as there is no individual nearby in the trait space).

This is currently caught before fitness() evaluation, but still crashes the simulation.
The error can be called by running fitness(trait_ind = 30, traits_pop = rep(0, 1000)).

#2 Nest numeric testargs

testarg_pos, testarg_prop, testarg_not_zero should call testarg_num to assert that the provided argument is a numeric in the first place.

Nesting testargs in this way is not compatible with a single argument to testargs, because the latter relies on substitute which does not do great in arguments passed from upper functions.

For now, the solution was to unnest the statements, but finding a way to make both features compatible would save a few lines of code in every function.

Some attempts can be found in testargs_test.

#15 Fix carrying capacity

The population currently reaches a size of around 2500 individuals, vs the K = 1000 specified carrying capacity. That makes an overshoot of x2.5!