Giter Site home page Giter Site logo

evanseitz / squid-nn Goto Github PK

View Code? Open in Web Editor NEW
14.0 2.0 1.0 3.89 MB

surrogate quantitative interpretability for deepnets

License: MIT License

Python 100.00%
attribution-methods explainable-ai model-interpretability motif-analysis regulatory-genomics surrogate-modelling

squid-nn's Introduction

SQUID: interpreting sequence-based deep learning models for regulatory genomics

PyPI version Downloads Documentation Status DOI


logo_dark logo_light


SQUID (Surrogate Quantitative Interpretability for Deepnets) is a Python suite to interpret sequence-based deep learning models for regulatory genomics data with domain-specific surrogate models. For installation instructions, tutorials, and documentation, please refer to the SQUID website, https://squid-nn.readthedocs.io/. For an extended discussion of this approach and its applications, please refer to our paper:

  • Seitz, E.E., McCandlish, D.M., Kinney, J.B., and Koo P.K. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models. Nat Mach Intell (2024). https://doi.org/10.1038/s42256-024-00851-5

Installation:

With Anaconda sourced, create a new environment via the command line:

conda create --name squid python==3.7.2

Next, activate this environment via conda activate squid, and install the following packages:

pip install squid-nn
pip install logomaker
pip install mavenn --upgrade

Finally, when you are done using the environment, always exit via conda deactivate.

Notes

SQUID has been tested on Mac and Linux operating systems. Typical installation time on a normal computer is less than 5 minutes.

If you have any issues installing SQUID, please see:

For issues installing MAVE-NN, please see:

Older DNNs may require inference via Tensorflow 1.x or related packages not supported by MAVE-NN. Users will need to run SQUID piecewise within separate environments:

  1. Tensorflow 1.x environment for generating in silico MAVE data
  2. Tensorflow 2.x and Python>=3.72 environment for training MAVE-NN surrogate models

An example of this workflow using BPNet is provided in the examples/ folder.

Usage:

SQUID provides a simple interface that takes as input a sequence-based deep-learning model (e.g., a DNN), which is used as an oracle to generate an in silico MAVE dataset representing a localized region of sequence space. The MAVE dataset can then be fit using a domain-specific surrogate model, with the resulting parameters visualized to reveal the cis-regulatory mechanisms driving model performance.

fig

Examples

Google Colab examples for applying SQUID on previously-published deep learning models are available at the following links:

Python script examples are provided in the examples/ folder for locally running SQUID and exporting outputs to file. Additional dependencies for these examples may be required and outlined at the top of each script. Examples include:

As well, the squid-manuscript repository contains examples to reproduce results in the manuscript, including the application of SQUID on other DNNs such as ENFORMER

Expected run time for the "Variant effect (local) prediction with DeepSTARR–Kipoi" demo (above) is 4 minutes using Google Colab V100 GPU.

Citation:

If this code is useful in your work, please cite our paper.

@article{seitz2023_squid,
	author = {Evan E Seitz and David M McCandlish and Justin B Kinney and Peter K Koo},
	title = {Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models},
	year = {2024},
	doi = {10.1038/s42256-024-00851-5},
	URL = {https://doi.org/10.1038/s42256-024-00851-5},
	journal = {Nature Machine Intelligence}
}

License:

Copyright (C) 2022–2023 Evan Seitz, David McCandlish, Justin Kinney, Peter Koo

The software, code sample and their documentation made available on this website could include technical or other mistakes, inaccuracies or typographical errors. We may make changes to the software or documentation made available on its web site at any time without prior notice. We assume no responsibility for errors or omissions in the software or documentation available from its web site. For further details, please see the LICENSE file.

squid-nn's People

Contributors

evanseitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

p-koo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.