rodriados / museqa Goto Github PK

Multiple Sequence Aligner using hybrid parallel computing

License: GNU General Public License v3.0

Makefile 0.94% C++ 56.20% Cuda 31.32% Shell 3.43% Python 6.81% C 0.77% Perl 0.54%

cuda cpp hybrid-parallelism

museqa's Introduction

Hello, I am Rodrigo!

Welcome to my GitHub profile!
I am a software developer and languages enthusiast from Brazil.
I am always open to collaborate with new projects and innovative ideas.
Here you will find the projects I've been working on lately.

museqa's People

Contributors

Stargazers

Watchers

museqa's Issues

Improve installation process

Currently, the project cannot be easily installed on a new machine and is totally dependent on relative paths to find the required scripts to run on our entrypoint.

A basic Docker installation was provided as of #25, but unfortunately such container did not serve its purpose - did not provide ease of use and did not allow the project to run within it - and thus has been removed on 3adae10.

A make install and make uninstall target should be available for installing and uninstalling the project from the user's system.

Introduce middlewares on pipeline modules

Museqa's pipeline modularity is one of the fundamental pieces of its framework. As such, a module must be extensible enough to allow its functionality to be customized depending on each particular use case, without the need of creating a whole new module for it.

This issue suggests the creation of pipeline::middleware, which could be attached to a pipeline::module. Each middleware implements a run method that is responsible for calling a next method, that bubbles the execution down the line to the next middleware or to the wrapped module.

msarun should take care of error handling

As issues were found when trying to use threads to take care of errors, msarun should become the piece responsible for handling errors.

This can be achieved by calling a shell script in each cluster node (via mpirun inside msarun) and passing messages between these scripts. Each script shall end it's process as needed. Thus, #1 will also be resolved.

Install Docker and CI integrations

A Docker image with the required environment for the project should be created. This would ease the use while making it easier to be shipped and installed by anyone.

Also, once a Docker container is ready, Continuous Integrations should be installed in order to allow the project to be tested automatically after each change.

Implement gap extension penalties in pairwise module

The pairwise alignment heuristic should implement a different treatment to gaps which are extended. Biologically, it's much more possible the occurrence of less gaps but longer, than much more smaller gaps.

Thus, gap penalties should be smaller if an already existing gap is being extended, and higher when a new gap is being introduced into the alignment.

Add new input sequence parsers

There are many file formats out there for storing biological sequences, such as DNA and RNA. Some formats that may have parsers implemented for are:

NBRF/PIR
EMBL/SwissProt
Clustal
GDE
GCG/MSF
RSF

Avoid creation of zombie processes

When running the project in a cluster, the creation of zombie processes have been observed. This must be avoided.

Probable causes for this problem:

Watchdog does not kill all processes when finalizing execution on error
MPI does not finalize all processes when execution is over.

Avoid GPU memory allocation by master node in pairwise module

When executing the hybrid Needleman algorithm for the pairwise module, the cluster's master node tries to allocate memory and transfer data into a GPU. The cluster's master node, though, may not have access to a GPU and thus should not try interacting with one.

Introduce assert guards

The are plenty of cases where one must assert whether a given condition is met or not. For these cases, an assert statement might be useful to make sure everything is okay. This is currently being achieved by ugly and unreadable code blocks like the following:

#if defined(msa_compile_cython) && !defined(msa_compile_cuda)
    if(static_cast<unsigned>(offset) >= getSize())
        throw Exception("buffer offset out of range");
#endif

A much more elegant solution would be the introduction of exception guards, which will assert whether the condition is true or else will throw the exception.

msa::guard(static_cast<unsigned>(offset) >= getSize(), "buffer offset out of range");

Improvements to the pair generation algorithm

Currently, the pair generation algorithm, performed as an initialization step of the pairwise module is naive. So that every compute node and also the master node must generate all sequence pairs before aligning any of them, even if most of the generated pairs is not relevant to the node generating it.

This can be optimized by taking advantage of the OEIS sequence A002024. Thus, each node will be able to generate only the pairs it needs for its pairwise module's execution.

Implement unit tests for the phylogeny module

Unit tests must be implemented for the phylogeny module. At the moment, unit tests are only available for one heuristics step, namely the pairwise module.

It is important to implement unit tests for phylogeny as well, in order to be sure the module is still behaving as expected through future code base changes.

Standardization of Reflection using Loophole and Reflector

The Reflection module does not produce the same results when an array is present in the reflected object depending on whether the Loophole or Reflector back-end is used.

struct Object
{
    int v[3];
};

To access the second element in v, using Loophole is ReflectionTuple<Object>::get<1>() whereas using Reflector is ReflectionTuple<Object>::get<0>()[1].

The first interface is preferred.

Allow multiple alignments per GPU block in hybrid-needleman algorithm

Currently, when executing the pairwise module with the hybrid-needleman algorithm, a GPU block can only process a single alignment per kernel call. This limitation is not efficient, and it takes an increasingly higher toll when the number of sequences grows.

For each kernel call, a huge load of memory is allocated on device and subsequently, a high load of data is transferred between device and host memories. As a measure against this inefficiency, blocks should be allowed to process more than one alignment pair per kernel call if there is enough device memory available for such.

Improve time measurements

Instead of timing heuristic stages as a whole, it would also be relevant to show execution times stratified in different ways such as: time spent in communication IO, GPU processing and also the whole execution time.

Adjust guide-tree rerooting when using phylogeny's njoining algorithm

The phylogeny module's algorithm njoining is not currently performing the correct logic at its final step. The current behavior chooses to execute the exact same node-joining logic until only one node is left at the star tree. This final node is, consequently, the guide-tree's root node.

The correct behavior, though, is that this aforementioned logic halts when 3 nodes are left in the star tree and a tree rerooting algorithm is run to find the best guide-tree.

Hybrid needleman wrong results

A bug has been found in hybrid needleman algorithm: whenever the algorithm tries to reuse the last calculated column, the first thread (and only the first) will get an invalid value from global memory.

The cause for this bug is yet to be determined.

Sequential neighbor-joining algorithm crashes when running in parallel

When running the phylogeny module with a distributed sequential version of the neighbor-joining algorithm, a crash happens when the number of sequences to align is too low.

The thrown exception's message is related to a buffer offset out of range, which is a clue that a slave node is trying to do work when there's none assigned to it.

A possible fix is to limit the number of workers depending on the amount of nodes on the algorithm's star-tree.

Improve const-qualifier consistency and enforcement

There are many const-qualifiers inconsistencies throughout the project's classes and files. For instance, the following code should not be valid:

const Buffer<int> buffer {10};
for(size_t i = 0; i < buffer.getSize(); ++i)
    buffer[i] = i;

If buffer is const-qualified, than one should not be able to change buffer's contents after its initialization. In this case, buffer is only usable when copy- or move-constructed.

museqa host find: runs the hostfinder and update the user's .hostfile automatically.
museqa host add <host>: adds a host to the user's .hostfile.
museqa [options] run <file>: runs with the given mpirun options.