Giter Site home page Giter Site logo

peptide-map's Introduction

Hello! Thank you for your interest in this work. We hope you find it valuable. Here are descriptions of what you will find in this repository. See the requirements.txt for the required Python modules. Instructions:

If you only have a list of peptide sequences without other information, it is still recommended to use '0 Data Curation Step 1- Filter by Library Design' before moving onto 3 Make Map by Dim Reduction. Otherwise, the resulting sequence space map may just separate your peptides based on library design.

========== All data from our publication is here in this repository in these folders:
-------	1 Data Curation Step 1- Filter by Library Design
	This jupyter notebook will filter your list of peptides by regular expressions to isolate peptides that fit a specific design (e.g., X12K). It is highly recommended to use this even if you only have a list of peptides, and just skip the last filtering step. Otherwise, you may end up with sequence space map that just separates your peptides based on library design

-------	2 Data Curation Step 2- Remove Seq Isomers
	This jupyter notebook is AS-MS specific, completely optional, and will rigorously remove sequence isomers from the dataset.

-------	3 Make Map by Dim Reduction
	This is the main jupyter notebook of all this work.
	It takes a list of peptide sequences, encodes them with whichever encoding method you choose, and performs dimensionality reduction to prepare the sequence space map. Random peptides sampled from your library can be added to improve interpretability.

-------	4 Scout Cluster Detection Parameters
	After making your sequence space map, you may see apparent clusters. Herein, python files are provided to scan the parameters of DBSCAN and AggomerativeClustering to detect your apparent clusters based on the density of points in the clusters.

-------	5 Detect Cluster on Map and Logo
	Last, this notebook will take the scount parameters from (4) and prepare a figure quality plot of your detected clusters.
	It will also isolate a consensus sequence, logo plot, and centroid sequence(s) from each detected cluster

-------	Extra Pieces
	Herein is a dedicated logo plot script prepared using Logomaker.

This repository uses Jupyter 6.4.8 (Python 3.9.12), but see the requirements.txt for all module specific details. 
Also, additional details are provided for specialized modules at the beginning of each notebook. 
Installation of all modules including a fresh install of anaconda should take <1 hour if not <10 minutes. 

Please follow the annotate Jupyter notebook for further instructions of how to use this code. Any laptop can run this code in <1 hour, and the expected output is shown in the notebook

Note about reproducibility: PCA is deterministic and will always produce the same output. However, UMAP is stochastic and varies slightly from computer to computer. To limit this variation, use the data provided, which has been randomized once) and keep the random seed set.

peptide-map's People

Contributors

josephsbrown1 avatar

Stargazers

Yitong Tseo avatar  avatar Soojung Yang avatar  avatar

Watchers

Somesh Mohapatra avatar  avatar

peptide-map's Issues

How to deal with sequences of unequal length?

This is awesome work.
I am trying to encode some peptides and generate the plots using
Make_Map_Encode_Peptide_and_Dim_Reduce.ipynb
When I tried with my sequences I found that there shouldn't be any duplicate sequences in the input. So, adding

df.drop_duplicates(subset=['Peptide'])

is probably a good idea.
Now, everything works if I trim the sequences to the same length before processing, but if the length of the sequences is different, I get

File ~/anaconda3/envs/EsmFold-jupyter/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1428, in Block.setitem(self, indexer, value, using_cow)
   1425     if isinstance(casted, np.ndarray) and casted.ndim == 1 and len(casted) == 1:
   1426         # NumPy 1.25 deprecation: https://github.com/numpy/numpy/pull/10615
   1427         casted = casted[0, ...]
-> 1428     values[indexer] = casted
   1429 return self

ValueError: could not broadcast input array from shape (32,) into shape (34,)

Have you seen this error?
I would be really grateful for any suggestions.
Amin.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.