Giter Site home page Giter Site logo

molloy-lab / tree-qmc-v2 Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 67.52 MB

Species tree estimation from gene trees via quartets

Home Page: https://doi.org/10.1101/gr.277629.122

License: MIT License

C++ 81.42% Python 7.03% Makefile 0.18% R 11.37%
gene-trees phylogenetics quartets species-trees

tree-qmc-v2's Introduction

STOP:

We recommend using the latest version of TREE-QMC so that you can advantage of the quartet weighting options proposed by Zhang & Mirarab, Mol Biol Evol, 2022. The original algorithm for unweighted quartets can also be called from the latest TREE-QMC method using the following command:

./treeqmc --fast -i <input gene trees> -o <output species tree>

TREE-QMC

TREE-QMC is a quartet-based method for estimating species trees from gene trees, like the popular method ASTRAL. To learn more about TREE-QMC, check out Han & Molloy, Genome Res, 2023.

Acknowledgements

TREE-QMC is based on the Quartet Max Cut (QMC) framework introduced by Sagi Snir and Satish Rao; see Snir & Rao, IEEE/ACM TCBB, 2010 and Avni, Cohen, & Snir, Syst Biol, 2015.

TREE-QMC uses MQLib for its max cut heuristic; see Dunning, Gupta, & Silberholz, INFORMS Journal on Computing, 2018.

Tutorial

We recommend working through this tutorial.

Build

To build TREE-QMC, use commands:

git clone https://github.com/molloy-lab/TREE-QMC.git
cd TREE-QMC
cd MQLib
make
cd ..
g++ -std=c++11 -O2 -I MQLib/include -o TREE-QMC src/*.cpp MQLib/bin/MQLib.a

Usage

To run TREE-QMC, use command:

./TREE-QMC -i <input file> -o <output file name>

To see the TREE-QMC usage options, use command:

./TREE-QMC -h

The output should be

TREE-QMC version 2.0.0
COMMAND: ./TREE-QMC -h 
=================================== TREE-QMC ===================================
This is version 2.0.0 of TREe Embedded Quartet Max Cut (TREE-QMC).

USAGE:
./TREE-QMC (-i|--input) <input file> [(-o|--output) <output file>]
           [(--polyseed) <integer>] [(--maxcutseed) <integer>]
           [(-n|--normalize) <normalization scheme>]
           [(-x|--execution) <execution mode>]
           [(-v|--verbose) <verbose mode>] [-h|--help]

OPTIONS:
[-h|--help]
        Prints this help message.
(-i|--input) <input file>
        Name of file containing gene trees in newick format (required)
        IMPORTANT: current implementation of TREE-QMC requires that the input
        gene trees are unrooted and binary. Thus, TREE-QMC suppresses roots
        and randomly refines polytomies during a preprocessing phase; the
        resulting trees are written to "<input file>.refined".
[(-o|--output) <output file>]
        Name of file for writing output species tree (default: stdout)
[(--polyseed) <integer>]
        Seeds random number generator with <integer> prior to arbitrarily
        resolving polytomies. If <integer> is set to -1, system time is used;
        otherwise, <integer> should be positive (default: 12345).
[(--maxcutseed) <integer>]
        Seeds random number generator with <integer> prior to calling the max
        cut heuristic but after the preprocessing phase. If <integer> is set to
        -1, system time is used; otherwise, <integer> should be positive
        (default: 1).
[(-n|--normalize) <normalization scheme>]
        Initially, each quartet is weighted by the number of input gene
        trees that induce it. At each step in the divide phase of wQMC and
        TREE-QMC, the input quartets are modified with artificial taxa. We
        introduce two normalization schemes for artificial taxa and find
        that they improve empirical performance of TREE-QMC in a simulation
        study. The best scheme is run by default. See paper for details.
        -n 0: none
        -n 1: uniform
        -n 2: non-uniform (default)
[(-x|--execution) <execution mode>]
        TREE-QMC uses an efficient algorithm that operates directly on the
        input gene trees by default. The naive algorithm, which operates on a
        set of quartets weighted based on the input gene trees, is also
        implemented for testing purposes.
        -x 0: run efficient algorithm (default)
        -x 1: run naive algorithm
[(-v|--verbose) <verbose mode>]
        -v 0: write no subproblem information (default)
        -v 1: write CSV with subproblem information (subproblem ID, parent
              problem ID, depth of recursion, total # of taxa, # of artifical
              taxa, species names)
        -v 2: write CSV with subproblem information (info from v1 plus # of
              of elements in f, # of pruned elements in f, # of zeroes in f)
[--shared <use shared taxon data structure to normalize quartet weights>]
        Do NOT use unless there are no missing data!!!

Contact: Post issue to Github (https://github.com/molloy-lab/TREE-QMC/)
        or email Yunheng Han ([email protected]) & Erin Molloy ([email protected])

If you use TREE-QMC in your work, please cite:
  Han and Molloy, 2023, "Improving quartet graph construction for scalable
  and accurate species tree estimation from gene trees," Genome Research,
  http:doi.org/10.1101/gr.277629.122.
================================================================================

tree-qmc-v2's People

Contributors

yhhan19 avatar ekmolloy avatar

Stargazers

Maddie Bonanno avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.