This package is not yet useful. We're working hard to make it useful.
To install dependencies, use the associated conda environment file:
conda env create -f environment.yml
conda activate libsbn
However, you also need to install platform-specific compiler packages as follows.
- if you are on linux, use
conda install -y gxx_linux-64
- if you are on OS X, use
conda install -y clangxx_osx-64
make
will build and run tests.
On OS X the build process will also modify the conda environment to point DYLD_LIBRARY_PATH
to where BEAGLE is installed.
If you get an error about missing BEAGLE, just conda activate libsbn
again and you should be good.
- (Optional) If you modify the lexer and parser, call
make bison
. This assumes that you have installed Bison > 3.4 (conda install -c conda-forge bison
). - (Optional) If you modify the test preparation scripts, call
make prep
. This assumes that you have installed ete3 (conda install -c etetoolkit ete3
).
libsbn is written in C++14.
We want the code to be:
- correct, so we write tests
- efficient in an algorithmic sense, so we consider algorithms carefully
- clear to read and understand, so we write code with readers in mind and use code standards
- fast, so we do profiling to find and eliminate bottlenecks
- robust, so we use immutable data structures and safe C++ practices
- simple and beautiful, so we keep the code as minimal and DRY as we can without letting it get convoluted or over-technical
Also let's:
- Prefer a functional style: returning variables versus modifying them in place. Because of return value optimization, this doesn't have a performance penalty.
- RAII. No
new
. - Avoid classic/raw pointers except as const parameters to functions.
- Prefer variable names and simple coding practices to code comments. If that means having long identifier names, that's fine! If you can't make the code use and operation inherently obvious, please write documentation.
- Prefer GitHub issues to TODO comments in code.
- Always use curly braces for the body of conditionals and loops, even if they are one line.
- Prefer types of known size, such as
uint32_t
, to types that can vary across architectures.
The C++ core guidelines are the authority for how to write C++, and we will follow them. For issues not covered by these guidelines (especially naming conventions), we will use the Google C++ Style Guide to the letter. We use cpplint to check some aspects of this.
There are certainly violations of these guidelines in the code, so fix them when you see them!
Code gets formatted using clang-format. See the Makefile for the invocation.
Add a test for every new feature.
- Code changes start by raising an issue proposing the changes, which often leads to a discussion
- Make a branch associated with the issue named with the issue number and a description, such as
4-efficiency-improvements
for a branch associated with issue #4 about efficiency improvements - If you have another branch to push for the same issue (perhaps a fresh, alternate start), you can just name them consecutively
4-1-blah
,4-2-etc
, and so on - Push code to that branch
- Once the code is ready to merge, open a pull request
- Code review on GitHub
- Squash and merge, closing the issue via the squash and merge commit message
- Delete branch
PCSS stands for parent-child subsplit.
They are represented as bitsets in three equal-sized "chunks", which are sub-bit-sets.
For example, 100011001
is composed of the chunks 100
, 011
and 001
.
If the taxa are x0, x1, and x2 then this means the parent subsplit is (A, BC), and the child subsplit is (B,C).
- The first chunk is called the "uncut parent" because it is not further split apart by the child subsplit.
- The second chunk is called the "cut parent" because it is further split apart by the child subsplit.
- The third chunk is called the "child," and it's well defined relative to the cut parent: the other part of the subsplit is the cut parent setminus the child.