Giter Site home page Giter Site logo

lasv_lineage_pred's Introduction

LASV_lineage_pred

Overview

Lassa virus lineage prediction based on random forest.

This is one out of 3 for the manuscript suporting data and code. all here:

Project Repositories

Jupyter Notebooks on Google Colab

Prediction Pipeline Overview

Phylogenetic Pipeline

Running the Pipeline

This pipeline relies on Nextstrain for gene extraction and alignmnent. Please install Nextstrain first by following the installation guide and ensure the Nextstrain command is available in your terminal.

Clone this repository using (or simply download it as a zipped file and unzip.):

git clone https://github.com/JoiRichi/LASV_lineage_pred.git

Enter the Nextstrain shell in the root directory of the pipeline. Note: you must enter the Nextstrain shell each time you want to use the pipeline.

nextstrain shell .

When the shell is active, run the pipeline using:

snakemake -s predict_lineage.smk --cores 5  # you can change the number of cores
# To re-run the pipeline from scratch, use snakemake -s predict_lineage.smk --cores 5  -F
#please refer to snakemake documentation for help.

Upon completion, go to the pipeline 'visuals' folder and open the html files in a browser.

Model training

Learn how the data was preprocessed here: LASV_ML_Manuscript_Data. Training process here Notebook Link.

Customization

This pipeline has the ability to process multiple FASTA files containing multiple sequences with proficiency and speed. It is recommended that multiple FASTA files are concatenated into one; however, this is not compulsory, especially if the projects are different. By default, the pipeline finds all files with the extension .fasta in the raw_data folder and tries to find LASV GPC sequences in the files. You can either move your FASTA files into this folder (recommended) or copy the PATH of the folder containing your sequences and use it as raw_seq_folder in the config.yaml file.

To ensure Snakemake has a memory of what files have been checked, intermediary files are created for all files checked, even if they contain no GPC sequences. However, those files would be empty.

Important Outputs

At the end of the run, you can check the predictions folder for the CSV files containing the predictions per sample. A visualization of the prediction can be found in the visuals folder. Open the HTML files in a browser. The images are high quality and reactive, allowing you to hover over them to see more information.

For further details, please refer to the respective notebooks and repositories linked above. You can also leave a comment for help regarding the pipeline.

lasv_lineage_pred's People

Contributors

joirichi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.