techzzt / pytorch-paper-implement Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 58 KB

Python 100.00%

pytorch-paper-implement's Introduction

Pytorch

Image data를 바탕으로 모델을 구현하고 정리합니다.

베이스가 되는 모델부터 최신 모델까지 구조를 공부하는 것을 목표로 합니다. (2022년도 업로드 예정)

Classification

VGGNet (2014)
- Very Deep Convolutional Networks for Large-Scale Image Recognition. Karen Simonyan, Andrew Zisserman
GoogLeNet (2014)
- Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
ResNet (2015)
- Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
DenseNet (2016)
- Densely Connected Convolutional Networks Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger
Xception (2016)
- Xception: Deep Learning with Depthwise Separable Convolutions François Chollet
ResNeXt (2017)
- Aggregated Residual Transformations for Deep Neural Networks Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
SEResNet (2017)
- Hu, Jie and Shen, Li and Albanie, Samuel and Sun, Gang and Wu, Enhua

Generative Model

GAN (2014)

Model Summarize Table

ConvNet	Dataset	Published In
VGGNet	STL10	ICLR2015
GoogleNet	STL10	CVPR2015
ResNet	STL10	CVPR2015
DenseNet	-	ECCV2017
ResNeXt	CIFAR10	CVPR2017
SEResNet	-	CVPR2018

Table of Contents

CodeLab
Computer Vision
Natural Language Processing
Tabular Data
Time-Series
Reinforcement Learning
Audio Data
Multi-modality
Extra
Pytorch Accelerator

Computer Vision

Classification

Model Soup [Jaehyuk Heo]
Point cloud classification with PointNet [Hyeongwon Kang]
Involutional neural networks [Subin Kim]
Image classification with Vision Transformer [Jaehyuk Heo]
Video Classification with Transformers + Video Vision Transformer [Hyeongwon Kang]

Self-Supervised Learning

Semi-supervised image classification using contrastive pretraining with SimCLR [Subin Kim]
Self-supervised contrastive learning with SimSiam [Jaehyuk Heo]
Supervised Contrastive Learning [Subin Kim]

Image Denoising

Convolutional autoencoder for image denoising [Jeongseob Kim]

Segmentation

Point cloud segmentation with PointNet [Hyeongwon Kang]
Image segmentation with a U-Net-like architecture [Jeongseob Kim]

Object Detection

Object Detection with RetinaNet [Jaehyuk Heo]

Knowledge Distillation

Knowledge Distillation [Jaehyuk Heo]

Retrieval

Metric learning for image similarity search [Jaehyuk Heo]
Image similarity estimation using a Siamese Network with a triplet loss [Yonggi Jeong]

OCR

OCR model for reading Captchas [Subin Kim]

Augmentation

RandAugment for Image Classification for Improved Robustness [Yonggi Jeong]
CutMix data augmentation for image classification [Jaehyuk Heo]

Clustering

Semantic Image Clustering [Yonggi Jeong]

Depth Estimation

Monocular depth estimation [Hyeongwon Kang]

Attribution Methods

Grad-CAM class activation visualization [Jaehyuk Heo]
Model interpretability with Integrated Gradients [Jaehyuk Heo]
Visualizing what convnets learn [Jaehyuk Heo]

Optimizer

Gradient Centralization for Better Training Performance [Jaehyuk Heo]

Adepter

Finetuning ViT with LoRA [Jaehyuk Heo]

Generative Models

Variational AutoEncoder [Jaehyuk Heo]
DCGAN to generate face images [Hyeongwon Kang]
Neural style transfer [Subin Kim]
Deep Dream [Jaehyuk Heo]
Conditional GAN [Yonggi Jeong]
CycleGAN [Yonggi Jeong]
PixelCNN [Jeongseob Kim]
Density estimation using Real NVP [Jeongseob Kim]
Non-linear Independent Component Estimation (NICE) [Jeongseob Kim]
Diffusion generative model(Tutorials) [Jeongseob Kim]
Diffusion generative model(Examples - Swiss-roll, MNIST, F-MNIST, CELEBA) [Jeongseob Kim]
Score based generative model(Tutorials) [Jeongseob Kim]

Adversarial Attacks

Fast Gradient Sign Method [Jaehyuk Heo]
Projected Gradient Descent [Jaehyuk Heo]

Adversarial Detection

Detecting Adversarial Examples from Sensitivity Inconsistency of Spatial-Transform Domain [Jaehyuk Heo]

Anomaly Detection

PatchCore: Towards Total Recall in Industrial Anomaly Detection [Jaehyuk Heo]
MemSeg: A semi-supervised method for image surface defect detection using differences and commonalities [Jaehyuk Heo]

Natural Language Processing

Classification

Text classification with Switch Transformer [Subin Kim]
Text classification with Transformer [Yookyung Kho]
Bidirectional LSTM on IMDB [Jeongseob Kim]

Generation

Text generation with a miniature GPT [Subin Kim]
Sequence to sequence learning for performing number addition [Yookyung Kho]
Character-level recurrent sequence-to-sequence model [Jeongseob Kim]
English-to-Spanish translation with a sequence-to-sequence Transformer [Yookyung Kho]

Question Answering

Question Answering with Hugging Face Transformers [Yookyung Kho]
Text Extraction with BERT [Jaehyuk Heo]

Pretrained Language Model

End-to-end Masked Language Modeling with BERT [Subin Kim]

Named Entity Recognition

Named Entity Recognition using Transformers [Subin Kim]

Natural Language Inference

Semantic Similarity with BERT [Jaehyuk Heo]

Table MRC

Table Pre-training with TapasForMaskedLM [Yookyung Kho]

Tutorial

TorchText introduction [Jeongseob Kim]

Tabular Data

Classification

Classification with Gated Residual and Variable Selection Networks [Hyeongwon Kang]
Structured data learning with TabTransformer [Hyeongwon Kang]

Recommendation

Collaborative Filtering for Movie Recommendations [Hyeongwon Kang]
A Transformer-based recommendation system [Hyeongwon Kang]

Anomaly Detection

Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection [Sunwoo Kim]

Time-Series

Anomaly Detection

Timeseries anomaly detection using an Autoencoder [Hyeongwon Kang]

Classification

Timeseries classification with a Transformer model [Hyeongwon Kang]

Forecasting

Timeseries forecasting for weather prediction [Hyeongwon Kang]

Reinforcement Learning

Actor Critic Method [Hyeongwon Kang]
Deep Deterministic Policy Gradient (DDPG) [Hyeongwon Kang]
Deep Q-Learning for Atari Breakout [Hyeongwon Kang]
Proximal Policy Optimization [Hyeongwon Kang]

Audio Data

Recognition

Speaker Recognition [Subin Kim]

Multi-modality

Vision-Langauge

Multimodal entailment [Yookyung Kho]
Natural language image search with a Dual Encoder [Subin Kim]

Extra

Distributions_TFP_Pyro [Jeongseob Kim]

Pytorch Accelerator

Huggingface Accelerator [Jaehyuk Heo]
Automatic Mixed Precision [Jaehyuk Heo]
Gradient Accumulation [Jaehyuk Heo]
Distributed Data Parallel [Jaehyuk Heo]

pytorch-paper-implement's People

Contributors

Stargazers

Watchers

pytorch-paper-implement's Issues

[6] CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

Yun, Sangdoo, et al. "Cutmix: Regularization strategy to train strong classifiers with localizable features." Proceedings of the IEEE/CVF international conference on computer vision. 2019.

Figure 1: Class activation mapping (CAM) [52] visualizations on ‘Saint Bernard’ and ‘Miniature Poodle’ samples using various augmentation techniques

Regularization과 robustness를 위해 mixed된 이미지 데이터 기반의 learning method 제안
원본 이미지에 다른 라벨 데이터의 일정 비율로 concat한 입력 데이터를 사용하여 학습하는 알고리즘


for each iteration do 

     input, target = get_minibatch(dataset) 
         if mode == training then 
             input_s, target_s = shuffle_minibatch(input, target)
             lambda = Unif(0,1) 
             r_x =Unif(0,W) 
             r_y =Unif(0,H) 
             r_w=Sqrt(1- lambda) 
             r_h =Sqrt(1- lambda) 
             x1 = Round(Clip(r_x + r_w / 2, min = 0))
             x2 = Round(Clip(r_x + r_w / 2, min = W)) 
             y1 = Round(Clip(r_y - r_h / 2, min = 0))
             y2 = Round(Clip(r_y- r_h / 2, min = H))
             input[:, :, x1:x2, y1:y2] = input_s[:, :, x1:x2, y1:y2] 
             lambda = 1- (x2-x1)*(y2-y1)/(W*H) 
             target = lambda * target + (1- lambda) * target_s 
         end if 
         output = model_forward(input)
         loss = compute_loss(output, target)
         model_update()
end for

기존에는 원본 데이터와 동일한 크기의 다른 라벨 데이터(yB)를 합치거나 특정 구간을 마스킹하는 방법론을 사용한 반면, 제안 방법론은 다른 범주의 데이터를 일부 사용하는 방법을 사용
이는 regional dropout을 적용하는 동시에 usage of full image region, mixed image & label 정보를 보완하는 것을 의미
본 논문에서 cropping region은 uniform 기반으로 정의되며 CAM으로 주요 영역을 시각화한 결과 분류 label과 동일한 구간에 대해서만 activation이 되는 것을 확인할 수 있음

label에 대한 반영 비율을 다르게 설정했을 때 ImageNet에 대해 유의미한 성능이 도출된 것을 확인할 수 있고, cutmix augmentation이 regularization 및 robustness에 효과적으로 작용했음을 확인 가능함
기존 라벨과 mixed된 이미지의 라벨에 대한 공통적인 부분을 제외하고 원본 라벨의 주요 영역만을 학습함으로써 기존 모델에서 학습하지 않았던 특징 (털의 질감, 색 등)을 학습할 수 있음

[5] X-ViT: High Performance Linear Vision Transformer without Softmax

X-ViT: High Performance Linear Vision Transformer without Softmax

Figure 3. X-ViT module

Computer vision task에서 기존의 self-attention (SA) algorithm의 complexity를 최소화하며 학습하는 ViT 구조를 제안
본 논문에서 제안하는 알고리즘인 X-ViT는 기존의 SA에 대해 nonlinearity를 제거한 모델
기존의 ViT 코드에서 몇 줄만 변경했음에도 ImageNet Top-1 accuracy에 대해 Swin 모델, DeiT 모델에 비해 향상된 결과를 보임

Figure 1. Top-1 accuracy vs. Model capacity

[10] Do Vision Transformers See Like Convolutional Neural Networks?

Do Vision Transformers See Like Convolutional Neural Networks?

CNN과 ViT의 feature map representation 비교를 통해 ViT 모델이 representation 측면에서 가지는 장점에 대해 작성한 논문
Locality한 특성을 반영하는 cnn 모델과 달리 ViT는 global, local한 정보를 모두 포함하며 각 layer에서 block이 깊어질수록 global한 특성을 보존하고 있음을 확인

[1] Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders Are Scalable Vision Learners

마스크 패치 비율이 높은 상황에서 이미지 정보를 최대화하는 representation을 구성하는 것이 목표
Encoder에서는 입력 데이터 중 이미지 정보가 포함된 패치만 사용되었으며, decoder의 입력 값으로는 모든 패치 정보가 입력됨
Encoder에서는 ViT 구조를 사용하였으며 flatten 형태로 입력 데이터가 사용되며 decoder에서는 positional encoding과 함께 사용
(Decoder에서는 pixel prediction을 위해 간단한 구조 사용)

본 논문에서는 mask 영역의 비율을 다르게 설정하며 실험 진행

NLP 분야에서 사용하는 BERT와 다르게 이미지는 token하나가 연속적인 정보를 포함하기 때문에 이를 방지하기 위해서 random한 mask augmentation을 적용하고 일반적인 비율보다 높은 비율의 mask augmentation을 적용함

Encoder 영역에서 모델 역할이 중요하고, encoder과 decoder가 분리가 가능하다는 점에서 encoder 구조에 의한 latent representation이 얼마나 잘 구축되었는가가 성능에 핵심적인 영향을 끼칠 것으로 보임

논문: He, Kaiming, et al. "Masked autoencoders are scalable vision learners." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

[9] A TRANSFORMER-BASED FRAMEWORK FOR MULTIVARIATE TIME SERIES REPRESENTATION LEARNING

A TRANSFORMER-BASED FRAMEWORK FOR MULTIVARIATE TIME SERIES REPRESENTATION LEARNING

Zerveas, George, et al. "A transformer-based framework for multivariate time series representation learning." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021.

다변량 시계열 데이터의 unsupervised representation learning에 Transformer를 접목한 알고리즘 제안
Pre-training: unlabeled time series 데이터를 기반으로 masked input을 예측하는 task 수행
- binary noise mask: markov chain의 transition probability를 기반으로 mask segmentation 적용
- 연속적인 구간에 segmentation을 적용해 예측해가며 representation 최대화
Fine-tuning: pre-train 모델의 가중치는 고정하고 labeled time series 분류 문제 수행

[11] How do vision transformers work?

How do vision transformers work?

ViT가 높은 성능을 보이는 이유에 대해 설명 및 증명한 논문
구조 중 가장 주요한 역할을 하는 MSA (Multi-head self-attentions)가 효과적인 이유에 대해 설명
ViT의 representation을 보강하는 요인에 대해 설명

[7] Self Supervision for Attention Networks

Self Supervision for Attention Networks

Patro, Badri N., et al. "Self supervision for attention networks." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021.

기존의 attention mechanism의 한계를 언급하고 self-supervision을 도입해 attention mechanism의 성능적인 향상을 증명함

Figure 2. Generation of Surrogate Attention Maps for visualattention based network.

Attention module을 이용해 attention map을 추출했으며 해당 방법론은 어떤 모델을 사용하더라도 적용 가능함
방법론의 입력 데이터는 image, question (text modality) 형태이며 각 형태에 맞는 모델 (CNN, LSTM)의 output으로 attention을 계산함
모델은 입력 데이터의 각기 다른 영역에서 확률 값(attention probability)을 추출해 사용하며 해당 확률 값으로 계산된 attention distribution은 supervision을 위해 사용됨
Supervision을 기반으로 모델은 salient한 영역에 더욱 집중할 수 있으며 해당 부분을 통해 accuracy 향상이 가능해짐

제안된 attention map과 masking을 제외한 attention (original)을 출력한 결과, surrogate attention이 특징 적인 영역을 좀 더 세밀하게 attention을 부여하고 있음을 확인 가능함

[8] Generative Adversarial Transformer

Generative Adversarial Transformer

github: https://github.com/dorarad/gansformer

Image interpolation 내에서 latent space의 long information 강화를 위해 GAN + transformer 구조를 제안한 논문

[12] Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

Auto-correlation과 time-delay aggregation을 활용한 모델 제안
Correlation을 고려하면서 time series 데이터 간의 dependency를 반영하고 information aggregation을 통해 정보를 추출
Decoder에서는 seasonal과 trend로 데이터를 구분하고 이를 반영

Fast Fourier Transform을 통해 연산을 효율적으로 수행
또한, time delay aggregation 과정에서 rolling된 데이터의 auto correlation을 계산하기 위해 FFT 사용
Aucto correlation 계산 과정을 통해 time delay aggregation으로부터의 sub series 유사도와 autocorrelation을 결합한 period-based dependency를 계산

[4] How to Understand Masked Autoencoders

[2] GAIN: Missing Data Imputation using Generative Adversarial Nets

GAIN: Missing Data Imputation using Generative Adversarial Nets

Table 형식의 데이터의 결측 값을 imputation하기 위해 generative 기반의 모델을 사용한 방법론

Input에 대해 결측 값이 존재하는 구간에 대한 정보를 추가하기 위해 입력 데이터만을 사용하는 것이 아닌, hint matrix와 mask matrix를 함께 모델의 입력 값으로 사용

Data matrix (원본 값), random matrix (결측 영역에 random하게 imputation한 matrix), mask matrix (결측 영역 정보 포함)을 generator의 입력 값으로 사용 -> 결측 구간에 추정된 값이 채워진 imputed matrix와 mask matrix에서 결측 영역의 정보 포함 여부를 저장한 hint mask를 discriminator의 입력 값으로 사용 -> 이후 discriminator의 결과 값으로 마스크 영역이 모두 추정된 estimated mask matrix를 출력해 원본 mask matrix와 loss (cross entropy loss)를 계산해가며 학습

[3] AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Attention is All You Need 논문에서 제안한 transformer 구조를 vision task에 적용한 논문
이전 vision task에서 self-attention을 적용하는 것의 한계를 극복하여 NLP 분야에서 사용되는 self attention을 활용한 transformer 구조 (ViT) 모델 제안
이미지를 patch로 분할하고 sequence로 입력 (입력 데이터 생성 시 patch, position embedding을 통해 patch가 input으로 사용됨)
핵심 구조로는 BERT와 유사하게 토큰 정보를 제공하는 CLS token, classification head, position embedding (embedding의 position 정보를 유지), transformer (multi-head로 구성된 self-attention 메커니즘 적용)

ViT의 경우 일반적인 CNN 계열의 모델과 달리 공간에 대한 inductive bias가 존재하지 않음
이에 많은 데이터를 사용해 데이터의 관계를 robust하게 학습해야하며 MLP layer에서만 local 및 translation equivariance한 특징을 가짐 (self-attention에서는 global한 특성을 지님)

제안 모델의 구조를 다르게 설정 (Base, Large, Huge)하여 pre-train을 진행하고 이후 1-linear layer을 이용해 classification 수행
JFT로 사전 학습을 실시한 후에 transfer learning 진행 (성능 결과 확인 시 사전학습에서 사용된 데이터가 클수록 성능이 좋았으며 작은 크기의 데이터 셋 최적화를 위해 regularization 설정 - weight decay, dropout, label smoothing)

techzzt / pytorch-paper-implement Goto Github PK

pytorch-paper-implement's Introduction

Pytorch

Classification

Generative Model

Model Summarize Table

Computer Vision

Natural Language Processing

Tabular Data

Time-Series

Reinforcement Learning

Audio Data

Multi-modality

Extra

Pytorch Accelerator

pytorch-paper-implement's People

Contributors

Stargazers

Watchers

pytorch-paper-implement's Issues

Recommend Projects

Recommend Topics

Recommend Org