Giter Site home page Giter Site logo

regulatory-epigenome-dl's Introduction

Deep learning based model for regulatory epigenome

I am doing my master thesis about learning DNA sequence activity like accessibility. There's a lot of research out there on this topic. Inspired by a great collection of research papers at inspired by papers-for-molecular-design-using-DL and Machine-learning-for-proteins, I decided to create my own collection. My aim is to understand these papers better and share what I learn, hope this can help people who interested in this field.

Updating ...

Menu

Reviews
Predicting chromatin accessibility from sequence
Predicting gene expression from sequence
Predicting TF binding from sequence
Genomic Foundation Models
DL-based enhancer design
Datasets

Reviews

Chromatin accessibility and the regulatory epigenome.
Sandy L. Klemm, Zohar Shipony, William J. Greenleaf.
Nature Reviews Genetics, January 2019.
[10.1038/s41576-018-0089-8]

Predicting chromatin accessibility from sequence

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.
David R. Kelley, Jasper Snoek, and John L. Rinn.
Genome Research, May 2016.
[10.1101/gr.200535.115][github code]

Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts.
Surag Nair, Daniel S Kim, Jacob Perricone, Anshul Kundaje.
Bioinformatics, July 2019.
[10.1093/bioinformatics/btz352][github code]

scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks.
Han Yuan, David R. Kelley.
Nature Methods, August 2022.
[10.1038/s41592-022-01562-8][github code]

Bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants.
Anusri Pampari, Anna Shcherbina, Anshul Kundaje.
Manuscript in preparation.
[github code]

EpiGePT: a Pretrained Transformer model for epigenomics.
Zijing Gao, Qiao Liu, Wanwen Zeng, Rui Jiang, Wing Hung Wong.
bioRxiv, February 2024.
[10.1101/2023.07.15.549134][github code][online web]

Predicting gene expression from sequence

Sequential regulatory activity prediction across chromosomes with convolutional neural networks.
David R. Kelley, Yakir A. Reshef, Maxwell Bileschi, David Belanger, Cory Y. McLean and Jasper Snoek.
Genome Research, March 2018.
[10.1101/gr.227819.117][github code]

Effective gene expression prediction from sequence by integrating long-range interactions.
Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli & David R. Kelley.
Nature Methods, October 2021.
[10.1038/s41592-021-01252-x][github code]

Predicting TF binding from sequence

Base-resolution models of transcription-factor binding reveal soft motif syntax.
Žiga Avsec, Melanie Weilert, Avanti Shrikumar, Sabrina Krueger, Amr Alexandari, Khyati Dalal, Robin Fropf, Charles McAnany, Julien Gagneur, Anshul Kundaje, Julia Zeitlinger.
Nature Genetics, February 2021.
[10.1038/s41588-021-00782-6][github code]

Genomic Foundation Models

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome.
Yanrong Ji, Zhihan Zhou, Han Liu, Ramana V Davuluri.
Bioinformatics, August 2021.
[10.1093/bioinformatics/btab083][github code][hugging face]

DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome.
Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, Han Liu.
arXiv, June 2023.
[10.48550/arXiv.2306.15006][github code][hugging face]

The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics.
Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Bernardo P. de Almeida, Hassan Sirelkhatim, Guillaume Richard, Marcin Skwark, Karim Beguir, Marie Lopez, Thomas Pierrot.
bioRxiv, September 2023.
[10.1101/2023.01.11.523679][github code][hugging face]

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré.
arXiv, June 2023.
[10.48550/arXiv.2306.15794][github code][hugging face]

DL-based enhancer design

Cell-type-directed design of synthetic enhancers.
Ibrahim I. Taskiran, Katina I. Spanier, Hannah Dickmänken, Niklas Kempynck, Alexandra Pančíková, Eren Can Ekşi, Gert Hulselmans, Joy N. Ismail, Koen Theunis, Roel Vandepoel, Valerie Christiaens, David Mauduit & Stein Aerts.
Nature, December 2023.
[10.1038/s41586-023-06936-2]

DNA-Diffusion: Leveraging Generative Models for Controlling Chromatin Accessibility and Gene Expression via Synthetic Regulatory Elements.
Lucas Ferreira DaSilva, Simon Senan, Zain Munir Patel, Aniketh Janardhan Reddy, Sameer Gabbita, Zach Nussbaum, César Miguel Valdez Córdova, Aaron Wenteler, Noah Weber, Tin M. Tunjic, Talha Ahmad Khan, Zelun Li, Cameron Smith, Matei Bejan, Lithin Karmel Louis, Paola Cornejo, Will Connell, Emily S. Wong, Wouter Meuleman, Luca Pinello.
bioRxiv, February 2024.
[10.1101/2024.02.01.578352][github code]

Datasets

DNaseI Hypersensitivity sites (DNase-seq)

# ENCODE
wget -r ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgDnaseUniform

# Roadmap
wget -r -A "*DNase.hotspot.fdr0.01.peaks.bed.gz" http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/narrowPeak

DNase-seq + ChIp-seq + CAGE

wget https://storage.googleapis.com/131k/sample_wigs.txt
wget https://storage.googleapis.com/131k/l131k_w128.bed
wget https://storage.googleapis.com/131k/l131k_w128.h5

sc-ATAC seq
Buenrostro_2018

regulatory-epigenome-dl's People

Contributors

haoranhuang22 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.