Giter Site home page Giter Site logo

udem-lbit / superrec2 Goto Github PK

View Code? Open in Web Editor NEW
5.0 4.0 1.0 757 KB

Run and compare algorithms for phylogenetic reconciliation and super-reconciliation

License: GNU General Public License v3.0

Python 68.12% TeX 31.79% Shell 0.09%
phylogenetics reconciliation maximum-parsimony synteny

superrec2's Introduction

superrec2

superrec2 is a software package enabling researchers to run and compare algorithms for phylogenetic reconciliation and super-reconciliation.

Installation

superrec2 can be installed through pip. Python ⩾3.11 is required.

$ pip install superrec2
Installing from Git (for development)

You first need to install Hatch, which is used by superrec2 to manage virtual environments and to build and publish packages.

Then clone the repository and use Hatch to install dependencies and start a development shell.

$ git clone https://github.com/UdeM-LBIT/superrec2
$ cd superrec2
$ hatch shell

The following commands are useful for development. You should make sure to pass unit tests and to reformat and lint the code before committing to the main branch.

Command Task
hatch run dev:test Run all unit tests
hatch run dev:lint Check the code using Ruff
hatch run dev:format Reformat the code using Black
hatch build Build distributable packages
hatch publish Publish distributable packages to PyPI

Usage

Preparing the input

To submit a reconciliation problem to one of the algorithms of this package, the first step is to prepare an input file containing the desired set of species, genes (or syntenies), and their phylogenetic trees. Input files are JSON objects containing the following keys:

  • object_tree: Newick string specifying the gene or synteny tree (if ancestral nodes are unnamed, they will be automatically named O# with indices increasing in pre-order),
  • species_tree: Newick string specifying the species tree (if ancestral nodes are unnamed, they will be automatically named S# with indices increasing in pre-order),
  • leaf_object_species: dictionary associating each leaf of the gene (or synteny) tree to its corresponding leaf species in the species tree.
  • leaf_syntenies (optional, only for super-reconciliations): dictionary associating each leaf of the synteny tree to its corresponding synteny, specified as an array of genes.

Below is an example input for some fictional species and syntenies:

{
    "object_tree": "((x_1,x_2),y_1);",
    "species_tree": "(X,Y);",
    "leaf_object_species": {
        "x_1": "X", "x_2": "X", "y_1": "Y"
    },
    "leaf_syntenies": {
        "x_1": ["g1", "g2", "g3"],
        "x_2": ["g1", "g3", "g4"],
        "y_1": ["g1", "g2", "g3", "g4"]
    }
}

A complete input example including some Class-1 CRISPR-Cas systems is also available. (This is the input used for the RECOMB-CG 2022 publication.)

Running reconciliation algorithms

To run a reconciliation algorithm on a given input file, use the superrec2 reconcile command. The basic usage of the command is as follows:

$ superrec2 reconcile --input data/example.in.json --output example.out.json superdtl
Multifurcation resolutions: 100%|##########################################| 1/1 [00:00<00:00, 285.91it/s]
Minimum cost: 2

The --input flag specifies the path to the input file, and the --output flag specifies where to write the results. The last argument is used to select the reconciliation algorithm, in this example superdtl (run superrec2 reconcile --help to see a list of available algorithms). The program prints out the minimum cost of a solution and writes one of the solutions to the output file, which now contains the following object (edited for readability):

{
    "input": {
        "object_tree": "((x_1,x_2)O1,y_1)O0;",
        "species_tree": "(X,Y)S0;",
        "leaf_object_species": { (repeated from above) },
        "costs": {
            "SPECIATION": 0, "DUPLICATION": 1, "HORIZONTAL_TRANSFER": 1,
            "FULL_LOSS": 1, "SEGMENTAL_LOSS": 1
        },
        "leaf_syntenies": { (repeated from above) }
    },
    "object_species": {
        "O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
    },
    "syntenies": {
        "O0": ["g1", "g2", "g3", "g4"],
        "O1": ["g1", "g2", "g3", "g4"],
        "x_1": ["g1", "g2", "g3"],
        "x_2": ["g1", "g3", "g4"],
        "y_1": ["g1", "g2", "g3", "g4"]
    },
    "ordered": false
}

Of particular interest are the object_species key, which contains the computed reconciliation (mapping of synteny tree nodes to species tree nodes), and the syntenies key, which contains the labeling of synteny tree nodes with syntenic content.

Additional options are available to generate all possible solutions and set the individual event costs (for compatible algorithms); please run superrec2 reconcile --help for details.

Generating reconciliation diagrams

From a solution

The superrec2 draw command can be used to visualize and inspect solutions generated by reconciliation algorithms. The basic usage of the command is as follows:

$ superrec2 draw --input example.out.json --output example.out.pdf

This generates the following diagram, representing the reconciliation result:

Reconciliation diagram generated from the output above

From manual input

The superrec2 draw program can also be used to plot any reconciliation, not just one generated by a reconciliation algorithm. To that end, you need to create a JSON-formatted description of the reconciliation you are interested in plotting, for example:

{
    "input": {
        "object_tree": "((x_1,x_2)O1,y_1)O0;",
        "species_tree": "(X,Y)S0;"
    },
    "object_species": {
        "O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
    },
    "syntenies": {
        "O0": ["g1", "g2", "g3", "g4"],
        "O1": ["g1", "g2", "g3", "g4"],
        "x_1": ["g1", "g2", "g3"],
        "x_2": ["g1", "g3", "g4"],
        "y_1": ["g1", "g2", "g3", "g4"]
    }
}

This will generate the same diagram as in the previous section. Notice that some parts are omitted compared to the previous JSON document: this is because this one is only for drawing and does not result from a reconciliation computation.

Adding color

If you need to distinguish parts of the object tree in the generated diagram, you can add color to a subtree by specifying the color attribute on its root node.

 {
     "input": {
-       "object_tree": "((x_1,x_2)O1,y_1)O0;",
+       "object_tree": "((x_1,x_2)O1[&&NHX:color=0000FF],y_1)O0;",
        "species_tree": "(X,Y)S0;"
     },
     "object_species": {
         "O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
     },
     "syntenies": {
         "O0": ["g1", "g2", "g3", "g4"],
         "O1": ["g1", "g2", "g3", "g4"],
         "x_1": ["g1", "g2", "g3"],
         "x_2": ["g1", "g3", "g4"],
         "y_1": ["g1", "g2", "g3", "g4"]
     }
 }

The input above will add a blue coloring to the subtree that undergoes a duplication event.

Reconciliation diagram generated from the output above

References

  • M. Goodman, J. Czelusniak, G. W. Moore, A. E. Romero-Herrera, and G. Matsuda, “Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences,” Systematic Biology, vol. 28, Art. no. 2, 1979-06, doi: 10.1093/sysbio/28.2.132.
  • A. Tofigh, M. Hallett, and J. Lagergren, “Simultaneous identification of duplications and lateral gene transfers,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, Art. no. 2, 2011-03, doi: 10.1109/tcbb.2010.14.
  • M. S. Bansal, E. J. Alm, and M. Kellis, “Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss,” Bioinformatics, vol. 28, Art. no. 12, 2012-06, doi: 10.1093/bioinformatics/bts225.
  • M. Delabre, N. El-Mabrouk, K. T. Huber, M. Lafond, V. Moulton, E. Noutahi, and M. S. Castellanos, “Evolution through segmental duplications and losses: a super-reconciliation approach,” Algorithms for Molecular Biology, vol. 15, Art. no. 12, 2020-05, doi: 10.1186/s13015-020-00171-4.
  • Y. Anselmetti, M. Delabre, and N. El-Mabrouk, “Reconciliation with Segmental Duplication, Transfer, Loss and Gain,” RECOMB-CG 2022, Lecture Notes in Computer Science, vol. 13234, 2022-06, doi: 10.1007/978-3-031-06220-9_8.

superrec2's People

Contributors

matteodelabre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

wook2014

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.