Giter Site home page Giter Site logo

phylovi / bito Goto Github PK

View Code? Open in Web Editor NEW
38.0 38.0 9.0 15.1 MB

Python-interface C++ library for Bayesian phylogenetics via optimization

Home Page: https://phylovi.github.io/bito/

License: GNU General Public License v3.0

Makefile 0.10% Python 5.98% C++ 92.67% Yacc 0.15% LLVM 0.12% Terra 0.01% Jupyter Notebook 0.10% CMake 0.28% Turing 0.58% Raku 0.01%

bito's People

Contributors

4ment avatar annakooperberg avatar chrisjenningsshaffer avatar christiaanjs avatar davidrich27 avatar ejisaac avatar hrnasif avatar junseonghwan avatar lucyyang01 avatar matsen avatar mdkarcher avatar msuchard avatar ognian- avatar shokiami avatar tanviganapathy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bito's Issues

Rename tree indices to ids (or something).

Currently we have the name "index" appear in the SBN parameter indexer, and also in the Node class. I think this is confusing.

So, I propose changing the "index" in Node to "id".

Objections? Alternate proposals?

Reorganize code

  • naming of libsbn.cpp and libsbn.hpp is confusing: change libsbn.cpp to pylibsbn.cpp
  • build.cpp is vaguely named
  • functions vs methods
  • Newick functions a mess

SymbolVectorOf segfaults when run from Python

Uncommenting inst.prepare_beagle_instance() in test_instance.py, and adding some print statements

      std::cout << "before\n";
        SymbolVector symbols =
            SymbolVectorOf(alignment.at(iter->second), symbol_table);
        std::cout << "after\n";

shows that this is a problem with SymbolVectorOf, which seems pretty innocuous.

Everything runs find when run from C++ in the libsbn.hpp test ๐Ÿคทโ€โ™‚ ...

Implement a bitarray class

So, it turns out that std::bitset requires length to be known at compile time. ๐Ÿ˜ฌ

There is std::vector but it doesn't have all the nice ~ etc syntax.

So I'll wrap it and add these features.

Add contributors list

Hello everyone-- it's time to acknowledge all of your contributors.

Shall we move development to a separate organization? If so, what shall we name it?

My suggestion would be viphylo or variationalphylo. Or perhaps VariationalPhylo? Ideas? Votes?

implement a tree class

Here are some initial ideas about how to implement a tree class.

Because we are sampling rooted trees and not changing them, the trees can be rooted and immutable. Because we'll want to efficiently store a lot of them, it makes sense to have the implementation separate out extra information such as branch length. If we combine this with a preference for composition over inheritance and RAII we have a simple design in which the core tree data structure is an array of shared_ptrs to descendants, along with a unique integer identifier. These can then get combined with other information via the integer index, such as an unordered_map to map from branches to branch lengths if they are needed.

After thinking this out I realized that this is quite similar to what we did in pplacer with stree and gtree.

I'm sure several things here will blow up in my face as I start playing around with implementations. If you can think of some of those things, please comment!

Expose branch lengths to Numpy

One initial manner external modelling/inference tools could interact with libsbn is by using it to calculate the phylogenetic likelihood. Doing this in Numpy-based tools (e.g. PyMC3) requires:

  • Exposing branch lengths as a mutable array
  • Exposing topology structure in an array-based data structure (e.g. an array of parent indices)

Parallelize likelihood computation

At first I thought that we could do this simply with C++'s built in parallel loops, etc, but we'll need to think a little about how that will work.

If we want k threads I'd think that we'd want to initialize k BEAGLE instances and then send trees to the various instances on various threads. I think this is worth the programming effort?

Test out sharing vectors between C++ and Python

Our current design involves doing computation in C++ (e.g. branch length gradients) and then using Python to modify things like the conditional subsplit distributions.

Let's try to test this sort of thing out by

  • allocating a vector in C++
  • modifying it using Python

Reinstate topology counting and subsplit support estimation

The transition from Nodes to Trees broke the logic of collapsing trees on topology. So I'm ripping it out for now in favor of a simple tree vector.

  • reinstate tests in build.hpp

Here are some bits that may be useful:

TreeCollection::TreeCollectionPtr Driver::ParseFile(const std::string &fname) {
  Tree::TreePtr tree;
  yy::parser parser_instance(*this);

  parser_instance.set_debug_level(trace_parsing_);

  std::ifstream in(fname.c_str());
  if (!in) {
    std::cerr << "Cannot open the File : " << fname << std::endl;
    abort();
  }
  std::string line;
  unsigned int line_number = 1;
  auto trees = std::make_shared<TreeCollection::TreePtrCounter>();
  while (std::getline(in, line)) {
    // Set the Bison location line number properly so we get useful error
    // messages.
    location_.initialize(nullptr, line_number);
    line_number++;
    if (line.size() > 0) {
      tree = ParseString(&parser_instance, line);
      auto search = trees->find(tree);
      if (search == trees->end()) {
        assert(trees->insert(std::make_pair(tree, 1)).second);
      } else {
        search->second++;
      }
    }
  }
  in.close();
  return std::make_shared<TreeCollection>(trees, this->TagTaxonMap());
}

syntactic sugar for maps

Right now we write

assert(tag_to_branch_number.insert(std::make_pair(k, v)).second);

for insertion. That's a little cumbersome. Getting elements is a little verbose too, though there at least this comes with the benefit of writing a good error message.

We could have a https://en.cppreference.com/w/cpp/language/function_template that would do the former. Or we could have a templated wrapper class?

Unfucking the tree containers

TreeCollection now uses a counter, which doesn't match with our desired use case of trees with branch lengths. We want to count topologies.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.