boostcampaitech6 / level2-3-cv-finalproject-cv-07 Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 114 KB

level2-3-cv-finalproject-cv-07 created by GitHub Classroom

License: MIT License

Shell 0.32% Python 99.68%

level2-3-cv-finalproject-cv-07's Introduction

군중 계수(Crowd Counting) 모델의 계산 효율성을 위한 경량 모델링

모델링 대상

Transformer 기반의 군중 계수 SOTA 모델인 PET(Point-Query Quadtree for Crowd Counting, Localization, and More)모델

데이터셋

ShanghaiTech A

제공사항

베이스 라인으로서 학습된 군중 계수 모델과 해당 모델을 학습 시킨 데이터 셋 제공
모델 정확도 측정을 위한 도구 제공
추론 속도 측정을 위한 도구 제공

수행사항

모델을 구성하는 레이어, 합성곱 블록 등을 수정하여 베이스 라인 모델의 성능을 유지하면서 CPU/GPU에서의 추론 속도 개선

접근방법

모델을 학습시키는 방법이 아닌 주어진 모델에 대한 구조 변경만을 허용함

기간

24.02.26 ~ 24.03.27

프로젝트 목표

Transformer 기반의 군중 계수 SOTA 모델인 PET (Chengxin Liu et al., ICCV 2023)모델을 구성하는 레이어/블록을 재설계하여 모델의 정확도(MAE(Mean Absolute Error))을 최대한 유지하면서도, CPU/GPU에서의 추론 속도를 개선

프로젝트 팀 구성 및 역할

김한규	민하은	이하연	심유승	안채연	강동기

강동기: backbone의 영향이 낮은 layer 구조 변경, transformer의 encoder, decoder구조 개선 (1x1convolution, batch norm, layer감소)
김한규: transformer의 encoder layer 구조 개선(FastViT)
민하은: vgg13_bn backbone 교체, transformer의 encoder layer 구조 개선(poolformer)
심유승: Encoder 블록 재설계 - 블록개수 최적화, window size 최적화, FFN 조정
안채연 : vgg11_bn backbone교체, transformer의 encoder layer 구조 개선 (depthwise)
이하연: mobilenet backbone 교체, encoder progressive window size 변경, transformer의 encoder, decoder Parameters sharing, transformer의 encoder layer 구조 개선 (poolformer)

설치 및 실행

git clone https://github.com/boostcampaitech6/level2-3-cv-finalproject-cv-07.git
cd level2-3-cv-finalproject-cv-07
mkdir data
# https://paperswithcode.com/dataset/shanghaitech - ShanghaiTech A 데이터 다운로드 후 data 디렉토리에 넣기

pip install -r requirements.txt

# train 시
sh ./train.sh
# transformer 다른 메소드로 변경 시
# ./models/transformer/__init__.py 에 from .prog_win_transformer import build_encoder, build_decoder 해당 부분 변경

# eval 시
sh ./evel.sh

🛠️Methodology

Backbone 경량화

Backbone 교체: mobilenet_v3, vgg11_bn, vgg13_bn
Backbone layer 제거: 비중이 낮은 Batchnorm layer를 선별, 제거

Encoder 경량화

PoolFormer

PET의 Encoder에 self attention 연산을 pooling으로 대체
연산을 효율적으로 계산, token mixer 역할 수행
cross-channel pooling을 사용해 여러 feature map 간 정보 통합

Depthwise

Poolformer 사용 시 성능 하락 보완하기 위해 depthwise convolution 사용
지연 시간 오버헤드 도입하지 않으면서 성능 향상

Component 재설계

encoder block 개수 최적화
window size 최적화
FFN 조정

Encoder, Decoder 경량화

encoder, decoder의 linear layer, layer norm 간소화
Linear layer를 1x1 convolution으로 대체
Feed froward network 조정

TOP 3

Mae 측면

	실험명	Best MAE	Inference time
encoder reduction	encoder layer X 2 + [(32,16),(8,4)]	약 6.22% 감소(50.49→47.35)	6.92ms 감소(63.95→57.03)
poolformer	Enc win size 1/4 + attnX2-> poolingX2	약 0.30% 감소(50.13→49.98)	5.6ms 감소(63.95→58.35)
depthwise	depthwise encoder layer 1개 [(8,4)]	약 0.02% 증가(52.39 → 52.4)	8.29ms 감소(65.62 → 57.33)

Inference 측면

	실험명	Best MAE	Inference time
encoder reduction	encoder layer X 2 + [(32,16),(8,4)]• ffn 제거	약 1.78% 증가 (50.49→51.39)	9.19ms 감소 (63.95→54.76)
poolformer	layer reduction + pooling(X2) [(32,16),(8,4)]	약 6.98% 증가 (50.13→53.63)	6.98ms 감소 (63.95→56.97)
depthwise	depthX1_attnX1	약 2.04% 증가 (52.39 → 53.46)	9.27ms 감소 (63.95 → 54.68)

최종 모델

encoder layer를 4개에서 2개로 감소
Encoder window size를 [(32,16),(16,8)] 에서 [(32,16),(8,4)]로 변경
인코더 및 디코더에서 FFN을 제거

	실험명	Best MAE	Inference time
encoder reduction	encoder layer X 2 + [(32,16),(8,4)] + ffn 제거	약 1.78% 증가 (50.49→51.39)	9.19ms 감소 (63.95→54.76)

mae 측면에서 성능하락이 1.78% 수준이며 inference time은 9.19ms 대폭 감소하였기에

mae와 inference time 측면 모두에서 top 3의 수준을 기록하였다.

mae : 51.39 inference time 54.76ms로 최종 모델로 선정 되었다.

Citataion

If you find this work helpful for your research, please consider citing:

@InProceedings{liu2023pet,
  title={Point-Query Quadtree for Crowd Counting, Localization, and More},
  author={Liu, Chengxin and Lu, Hao and Cao, Zhiguo and Liu, Tongliang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2023}
}

References

level2-3-cv-finalproject-cv-07's People

Contributors

Forkers

hayeonlee88

level2-3-cv-finalproject-cv-07's Issues

:sparkles: feat : measure_inference_time에 백분위수 출력기능 추가

내용

inference time 측정시 평균과 표준편차 이외에 백분위수도 함께 표시되도록 코드 수정

:sparkles: feat : PET 베이스 모델 추가

내용

PET 베이스 모델 코드 추가

🔨 chore : commit template 설정

내용

.commit_template 파일 작성

:sparkles: feat : Pooling과 DW Conv 조합 시 Linear layer와 LN을 1x1 Conv와 BN으로 변경

내용

Transformer encdoerd에 사용되는 self-attn을 pooling 및 DW Conv로 교체하였을 때 추가적인 latency 향상을 위한 방법
Linear layer와 LN을 1x1 Conv와 BN으로 변경하여 pooling -> linear layer / DW Conv -> linear layer 진행시 사용되는 reshape을 없앰

:sparkles: feat : inference time 측정 wandb 연동

내용

inference time 측정 결과를 시각화하여 wandb에 기록할 수 있도록 수정 필요

🔨 chore : gitignore 추가

.gitignore 파일 추가

🔨 chore : github issue, pr template 설정

issue, pr template 설정

:sparkles: feat : Parameters sharing 코드 추가

내용

Transformer의 각 encoder, decoder 레이어 사이에서 params를 공유하여 params의 수를 줄이기 위한 방안

ref:Lessons on Parameter Sharing across Layers in Transformers

:bug: fix : init.sh 실행시 commit template 설정 오류

증상

init.sh 스크립트 실행시 key가 -global을 포함하지 않는다는 에러가 발생합니다.
global과 wait 앞을 아래와 같이 수정해주세요.

git config --global commit.template ./.commit_template
git config --global core.editor "code --wait"

재현 방법

init.sh 스크립트를 실행합니다.

bash init.sh

스크린샷

기대하는 결과

commit template 설정이 완료되고, pre-commit 설정이 완료됩니다.

로그

error: key does not contain a section: —global
error: key does not contain a section: —global
Fin git config
Requirement already satisfied: pre-commit in /opt/conda/lib/python3.10/site-packages (3.6.2)
Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from pre-commit) (3.4.0)
Requirement already satisfied: identify>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from pre-commit) (2.5.35)
Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/lib/python3.10/site-packages (from pre-commit) (1.8.0)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.10/site-packages (from pre-commit) (6.0)
Requirement already satisfied: virtualenv>=20.10.0 in /opt/conda/lib/python3.10/site-packages (from pre-commit) (20.25.0)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (from nodeenv>=0.11.1->pre-commit) (60.2.0)
Requirement already satisfied: distlib<1,>=0.3.7 in /opt/conda/lib/python3.10/site-packages (from virtualenv>=20.10.0->pre-commit) (0.3.8)
Requirement already satisfied: filelock<4,>=3.12.2 in /opt/conda/lib/python3.10/site-packages (from virtualenv>=20.10.0->pre-commit) (3.13.1)
Requirement already satisfied: platformdirs<5,>=3.9.1 in /opt/conda/lib/python3.10/site-packages (from virtualenv>=20.10.0->pre-commit) (4.2.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[https://github.com/pre-commit/pre-commit-hooks] already up to date!
[https://github.com/psf/black] already up to date!
pre-commit installed at .git/hooks/pre-commit

:sparkles: feat : Poolformer의 Linear layer와 LN을 1x1 Conv와 BN으로 변경

내용

Transformer encoder에 사용되는 self-attn을 pooling으로 교체하였을 때 추가적인 latency 향상을 위한 방법
Linear layer와 LN을 1x1 Conv와 BN으로 변경하여 pooling -> linear layer 진행시 사용되는 reshape을 없앰