Vision | |||
2014 | VAE | Kingma and Welling | [✓] Training on MNIST [✓] Encoder output visualization [✓] Decoder output visualization |
2015 | CAM | Zhou et al. | [✓] Application to GoogleNet |
2016 | Gatys et al., 2016 | Gatys et al. | [✓] Application to VGGNet-19 |
YOLO | Redmon et al. | [✗] Training on VOC 2012 [✗] Class probability map [✗] Ground truth vlisualization on grid |
|
DCGAN | Radford et al. | [✓] Training on CelebA at 64 × 64 [✓] Sampling [✓] Latent space interpolation |
|
Noroozi et al., 2016 | Noroozi et al. | [✓] Architecture [✓] Chromatic aberration [✓] Permutation set |
|
Zhang et al., 2016 | Zhang et al. | [✓] Empirical probability distribution [✗] Color space |
|
2014 2017 |
Conditional GAN WGAN-GP |
Mirza et al. Gulrajani et al. |
[✓] Training on MNIST |
2016 2017 |
PixelCNN VQ-VAE |
Oord et al. Oord et al. |
[✓] Training on Fashion MNIST [✓] Training on CIFAR-10 |
2017 | Pix2Pix | Isola et al. | [✓] Training on Google Maps [✓] Training on Facades [✗] Inference on larger resolution |
CycleGAN | Zhu et al. | [✓] Training on Monet to photo [✓] Training on Vangogh to photo [✓] Training on Cezanne to photo [✓] Training on Ukiyo-e to photo [✓] Training on Horse to zebra [✓] Training on Summer to winter |
|
Noroozi et al., 2017 | Noroozi et al. | [✓] Constrastive loss | |
2018 | PGGAN | Karras et al. | [✓] Training on CelebA-HQ at 512 × 512 |
DeepLab v3 | Chen et al. | [✓] Training on VOC 2012 [✓] Prediction on VOC 2012 validation set [✓] Average mIoU |
|
PixelLink | Deng et al. | [✓] Architecture [✓] Instance-balanced cross entropy loss [✓] Post-processing |
|
RotNet | Gidaris et al | [✓] Attention map visualization | |
2020 | STEFANN | Roy et al. | [✓] FANnet architecture [✓]Training FANnet on Google Fonts [✓] Custom Google Fonts dataset [✓] Average SSIM |
DDPM | Ho et al. | [✓] Training on CelebA at 32 × 32 [✓] Training on CelebA at 64 × 64 [✓] Denoising process visualization [✓] Linear interpolation sampling [✓] Coarse-to-fine sampling |
|
DDIM | Song et al. | [✓] Sampling [✓] Spherical interpolation sampling [✓] Interpolation on grid sampling [✓] Truncated normal |
|
ViT | Dosovitskiy et al. | [✓] Training on CIFAR-10 [✓] Training on CIFAR-100 [✓] Attention Roll-out [✓] Position embedding similarity [✓] Position embedding interpolation Extra [✓] CutOut [✓] Hide-and-Seek [✓] CutMix |
|
SimCLR | Chen et al. | [✓] Normalized temperature-scaled cross entropy loss [✓] Data augmentation [✓] Pixel intensity histogram |
|
DETR | Carion et al. | [✓] Architecture [✗] Batch normalization freezing [✗] Data preparation [✗] Training on COCO 2017 |
|
2021 | Improved DDPM | Nichol and Dhariwal | [✓] Cosine diffusion schedule |
Classifier-Guidance | Dhariwal and Nichol | [✗] AdaGN [✗] BiGGAN Upsample/Downsample [✗] Improved DDPM sampling [✗] Conditional/Unconditional models [✗] Super-resolution model [✗] Interpolation |
|
ILVR | Choi et al. | [✓] Sampling from single reference [✓] Sampling from various scale factors [✓] Sampling from various conditioning range |
|
SDEdit | Meng et al. | [✓] User input stroke simulation | |
MAE | He et al. | [✓] MAE architecture for pre-training [✗] MAE architecture for self-supervised learning [✗] Training on ImageNet-1K [✗] Fine-tuning [✗] Linear probing |
|
Copy-Paste | Ghiasi et al. | [✓] Large scale jittering [✓] Copy-Paste (within mini-batch) [✗] Gaussian filter |
|
ViViT | Arnab et al. | ||
2022 | CFG | Ho et al. | |
Language | |||
2017 | Transformer | Vaswani et al. | [✓] Architecture [✓] Position encoding visualization |
2019 | BERT | Devlin et al. | [✓] BookCorpus data pre-processing [✓] Architecture [✓] Masked language modeling [✓] SQuAD data pre-processing [✓]SWAG data pre-processing |
Sentence-BERT | Reimers et al. | [✓] Classification loss [✓] Regression loss [✓] Constrastive loss [✓] STSb data pre-processing [✓] WikiSection data pre-processing [✗] NLI data pre-processing |
|
RoBERTa | Liu et al. | [✓] BookCorpus data pre-processing [✓] Masked language modeling [✗] BookCorpus data pre-processing (SEGMENT-PAIR + NSP) [✗] BookCorpus data pre-processing (SENTENCE-PAIR + NSP) [✓] BookCorpus data pre-processing (FULL-SENTENCES) [✗] BookCorpus data pre-processing (DOC-SENTENCES) |
|
Vision-Language | |||
2021 | CLIP | Radford et al. | [✓] Training on Flickr8k + Flickr30k [✓] Zero-shot classification on ImageNet1k (mini) [✓] Linear classification on ImageNet1k (mini) |
kimrass / bert Goto Github PK
View Code? Open in Web Editor NEW'BERT' (Devlin et al., 2019) implementation from scratch in PyTorch