Giter Site home page Giter Site logo

ronboger / protein-vec Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tymor22/protein-vec

0.0 0.0 0.0 615 KB

Repository for Protein-Vec, a protein embedding mixture of experts model

License: BSD 3-Clause "New" or "Revised" License

Shell 0.77% Python 15.91% Jupyter Notebook 83.32%

protein-vec's Introduction

Protein-Vec: Repo for the mixture of experts model, Protein-vec

Here are instructions for how to use Protein-Vec.

First, install the GitHub repository as follows:

git clone https://github.com/tymor22/protein-vec.git

Install required packages, run from within the protein-vec directory:

pip install .

pip install seaborn faiss-gpu jupyter notebook

Download Protein-Vec mixture of experts model and each of the Aspect-Vec (expert) models

Now download all of the aspect-vec and the protein-vec models with the following command (approximately ~3GB in total):

wget https://users.flatironinstitute.org/thamamsy/public_www/protein_vec_models.gz

Unzip this directory of models with the following command:

tar -zxvf protein_vec_models.gz

Now move this directory of models into the same directory as ‘src_run’ that you just installed using git clone. There are relative paths so it is important that it is moved there.

mv protein_vec_models protein-vec/src_run/

Download Protein-Vec lookup database

In order to perform Protein-Vec search, you will need to read from a Protein-Vec lookup database. Download this lookup database and the corresponding metadata with the following command:

wget https://users.flatironinstitute.org/thamamsy/public_www/protein_vec_embeddings.gz

Now unzip it with the following command:

tar -zxvf protein_vec_embeddings.gz

Move this into the src_run directory as well.

mv protein_vec_embeddings protein-vec/src_run/

Tutorial

To follow an instructional tutorial of how to use Protein-Vec, follow along with the notebook: “gh_encode_and_search_new_proteins.ipynb” which is in the src_run directory.

In this notebook, you will learn how to encode proteins using Protein-Vec, and visualize/cluster those proteins. You will also learn how to search using Protein-Vec.

For a dataset with sequences and other meta data fields to follow the tutorial notebook, you can download the uniprot data as follows:

wget https://users.flatironinstitute.org/thamamsy/public_www/uniprotkb_AND_reviewed_true_2023_07_03.tsv

Create the directory data/ in the src_run/ directory, and move the dataset file to it:

mkdir src_run/data/

mv uniprotkb_AND_reviewed_true_2023_07_03.tsv src_run/data/

protein-vec's People

Contributors

nowittynamesleft avatar tymor22 avatar ronboger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.