Giter Site home page Giter Site logo

dl-paper-implementation's Introduction

Paper得来终觉浅,绝知此事要coding。

Knowledge obtained on the papers always feels shallow, and it must be known that this thing requires coding.

Purpose

  1. Minimal Practice
  2. Project Notes
  3. Optimization
  4. Algorithm Competition

Basic

1. CNN

Model Link Paper Code
Resnet Deep Residual Learning for Image Recognition
InceptionV3 Rethinking the Inception Architecture for Computer Vision
InceptionV4 Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
MobileNet MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
EfficientNet EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Residual Attention Network Residual Attention Network for Image Classification
Non-deep Networks Non-deep Networks

2. RNN

Model Link Paper Code
LSTM Long Short-term Memory
BiLSTM Bidirectional recurrent neural networks
GRU Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

3. Transformer

Model Link Paper Code
Transformer Attention Is All You Need
BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT-3 Language Models are Few-Shot Learners
ViT An image is worth 16x16 words: Transformers for image recognition at scale

4. Generation

Model Link Paper Code
GAN Generative Adversarial Networks
pix2pix Image-to-Image Translation with Conditional Adversarial Networks
CycleGAN Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
VAE Auto-Encoding Variational Bayes
DDPM Denoising Diffusion Probabilistic Models
Guided Diffusion Diffusion Models Beat GANs on Image Synthesis
DALL.E 2 Hierarchical Text-Conditional Image Generation with CLIP Latents

5. Multimodal

Model Link Paper Code
CLIP Learning Transferable Visual Models From Natural Language Supervision(Connecting Text and Images)
ViLT ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
SimVLM SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
ALBEF Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
VLMo VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
BLIP BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
CYCLIP CyCLIP: Cyclic Contrastive Language-Image Pretraining
+MAE Training Vision-Language Transformers from Captions Alone
VLMixer VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Project

1. Object Detection

Model Link Paper Code
R-CNN Rich feature hierarchies for accurate object detection and semantic segmentation
Faster R-CNN Faster R-CNN
YoloV3 You Only Look Once: Unified, Real-time Object Detection
DETR End-to-End Object Detection with Transformers

3. Audio-visual

Model Link Paper Code
Syncnet Out of time: automated lip sync in the wild
Wav2lip A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.