Giter Site home page Giter Site logo

bcaitech1 / p4-fr-sorry-math-but-love-you Goto Github PK

View Code? Open in Web Editor NEW
11.0 5.0 8.0 10.51 MB

a math-formula image recognition project which placed at the first place in a competition hosted by NAVER CONNECT boostcamp AI Tech

Python 100.00%
scene-text-recognition optical-character-recognition pytorch artificial-intelligence

p4-fr-sorry-math-but-love-you's Introduction

๐Ÿ†์ˆ˜์‹ ์ธ์‹: To be Modeler and Beyond!

Contents

ย ย ย ย ๐ŸงTask Description

ย ย ย ย ๐Ÿ†Project Result

ย ย ย ย โš™Installation

ย ย ย ย ๐ŸคCollaboration Tools

Task Description

Subject

๋ณธ ๋Œ€ํšŒ์˜ ์ฃผ์ œ๋Š” ์ˆ˜์‹ ์ด๋ฏธ์ง€๋ฅผ LaTex ํฌ๋งท์˜ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฌธ์ œ์˜€์Šต๋‹ˆ๋‹ค. LaTex์€ ๋…ผ๋ฌธ ๋ฐ ๊ธฐ์ˆ  ๋ฌธ์„œ ์ž‘์„ฑ ํฌ๋งท์œผ๋กœ, ์ž์—ฐ ๊ณผํ•™ ๋ถ„์•ผ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ด‘ํ•™ ๋ฌธ์ž ์ธ์‹(optical character recognition)๊ณผ ๋‹ฌ๋ฆฌ ์ˆ˜์‹์ธ์‹์€ multi-line recognition์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐ˜์  ๋ฌธ์žฅ๊ณผ ๋‹ฌ๋ฆฌ ์ˆ˜์‹์€ ๋ถ„์ˆ˜์˜ ๋ถ„์žยท๋ถ„๋ชจ, ๊ทนํ•œ์˜ ๊ตฌ๊ฐ„ ํ‘œํ˜„ ๋“ฑ ๋‹ค์ฐจ์›์  ๊ด€๊ณ„ ํŒŒ์•…์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ˆ˜์‹์ธ์‹ ๋ฌธ์ œ๋Š” ์ผ๋ฐ˜์ ์ธ single line recognition ๊ธฐ๋ฐ˜์˜ OCR์ด ์•„๋‹Œ multi line recognition์„ ์ด์šฉํ•˜๋Š” OCR ๋ฌธ์ œ๋กœ ๋ฐ”๋ผ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Multi line recognition์˜ ๊ด€์ ์—์„œ ์ˆ˜์‹ ์ธ์‹์€ ๊ธฐ์กด OCR๊ณผ ์ฐจ๋ณ„ํ™”๋˜๋Š” task๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Data

  • ํ•™์Šต ๋ฐ์ดํ„ฐ: ์ถœ๋ ฅ๋ฌผ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 5๋งŒ ์žฅ, ์†๊ธ€์”จ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 5๋งŒ ์žฅ, ์ด 10๋งŒ ์žฅ์˜ ์ˆ˜์‹ ์ด๋ฏธ์ง€

  • ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ: ์ถœ๋ ฅ๋ฌผ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 6์ฒœ ์žฅ, ์†๊ธ€์”จ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 6์ฒœ ์žฅ

Metric

  • ํ‰๊ฐ€ ์ฒ™๋„: 0.9 ร— ๋ฌธ์žฅ ๋‹จ์œ„ ์ •ํ™•๋„ + 0.1 ร— (1 - ๋‹จ์–ด ์˜ค๋ฅ˜์œจ)

  • ๋ฌธ์žฅ ๋‹จ์œ„ ์ •ํ™•๋„(Sentence Accuracy): ์ „์ฒด ์ถ”๋ก  ๊ฒฐ๊ณผ ์ค‘ ๋ช‡ ๊ฐœ์˜ ์ˆ˜์‹์ด ์ •๋‹ต๊ณผ ์ •ํ™•ํžˆ ์ผ์น˜ํ•˜๋Š” ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ ์ฒ™๋„์ž…๋‹ˆ๋‹ค.

  • ๋‹จ์–ด ์˜ค๋ฅ˜์œจ(Word Error Rate, WER): ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ •๋‹ต์— ์ผ์น˜ํ•˜๋„๋ก ์ˆ˜์ •ํ•˜๋Š” ๋ฐ ๋‹จ์–ด์˜ ์‚ฝ์ž…, ์‚ญ์ œ, ๋Œ€์ฒด๊ฐ€ ์ด ๋ช‡ ํšŒ ๋ฐœ์ƒํ•˜๋Š” ์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ์ฒ™๋„์ž…๋‹ˆ๋‹ค.

Project Result

  • 12ํŒ€ ์ค‘ 1์œ„

  • Public LB Score: 0.8574 / Private LB Score: 0.6288

  • 1๋“ฑ ์†”๋ฃจ์…˜ ๋ฐœํ‘œ ์ž๋ฃŒ๋Š” ์ด๊ณณ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ˆ˜์‹ ์ธ์‹ ๊ฒฐ๊ณผ ์˜ˆ์‹œ

Installation

# clone repository
git clone https://github.com/bcaitech1/p4-fr-sorry-math-but-love-you.git

# install necessary tools
pip install -r requirements.txt

Dataset Structure

[dataset]/
โ”œโ”€โ”€ gt.txt
โ”œโ”€โ”€ tokens.txt
โ””โ”€โ”€ images/
    โ”œโ”€โ”€ *.jpg
    โ”œโ”€โ”€ ...     
    โ””โ”€โ”€ *.jpg

Code Structure

[code]
โ”œโ”€โ”€ configs/ # configuration files
โ”œโ”€โ”€ data_tools/ # modules for dataset
โ”œโ”€โ”€ networks/ # modules for model architecture
โ”œโ”€โ”€ postprocessing/ # modules for postprocessing during inference
โ”œโ”€โ”€ schedulers/ # scheduler for learning rate, teacher forcing ratio
โ”œโ”€โ”€ utils/ # useful utilities
โ”œโ”€โ”€ inference_modules/ # modules for inference
โ”œโ”€โ”€ train_modules/ # modules for train
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ train.py
โ””โ”€โ”€ inference.py

Command Line Interface

Train

Train with single optimizer

$ python train.py --train_type single_opt --config_file './configs/EfficientSATRN.yaml'

Train with two optimizers for encoder and decoder

$ python train.py --train_type dual_opt --config_file './configs/EfficientSATRN.yaml'

Knowledge distillation training

$ python train.py --train_type distillation --config_file './configs/LiteSATRN.yaml' --teacher_ckpt 'TEACHER-MODEL_CKPT_PATH'

Train with Weight & Bias logging tool

$ python train.py --train_type single_opt --project_name <PROJECTNAME> --exp_name <EXPNAME> --config_file './configs/EfficientSATRN.yaml'

Arguments

train_type (str): ํ•™์Šต ๋ฐฉ์‹
  • 'single_opt': ๋‹จ์ผ optimizer๋ฅผ ํ™œ์šฉํ•œ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • 'dual_opt': ์ธ์ฝ”๋”, ๋””์ฝ”๋”์— optimizer๊ฐ€ ๊ฐœ๋ณ„ ๋ถ€์—ฌ๋œ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • 'distillation': Knowledge Distillation ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
config_file (str): ํ•™์Šต ๋ชจ๋ธ์˜ configuration ํŒŒ์ผ ๊ฒฝ๋กœ
  • ๋ชจ๋ธ configuration์€ ์•„ํ‚คํ…์ฒ˜๋ณ„๋กœ ์ƒ์ดํ•˜๋ฉฐ, ์ด๊ณณ์—์„œ ํ•ด๋‹น ์˜ˆ์‹œ๋ฅผ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์€ EfficientSATRN, EfficientASTER, SwinTRN, LiteSATRN ์ž…๋‹ˆ๋‹ค.
teacher_ckpt (str): Knowledge Distillation ํ•™์Šต ์‹œ ๋ถˆ๋Ÿฌ์˜ฌ Teacher ๋ชจ๋ธ checkpoint ๊ฒฝ๋กœ
project_name (str): (optional) ํ•™์Šต ์ค‘ Weight & Bias ๋กœ๊น… ํˆด์„ ํ™œ์šฉํ•  ๊ฒฝ์šฐ ์‚ฌ์šฉํ•  ํ”„๋กœ์ ํŠธ๋ช…
exp_name (str): (optional) ํ•™์Šต ์ค‘ Weight & Bias ๋กœ๊น… ํˆด์„ ํ™œ์šฉํ•  ๊ฒฝ์šฐ ์‚ฌ์šฉํ•  ์‹คํ—˜๋ช…

Inference

Inference with single model

$ python inference.py --inference_type single --checkpoint <MODELPATH.pth>

Ensemble inference

$ python inference.py --inference_type ensemble --checkpoint <MODEL1PATH.pth> <MODEL2PATH.pth> ...

Arguments

inference_type (str): ์ถ”๋ก  ๋ฐฉ์‹
  • single: ๋‹จ์ผ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€ ์ถ”๋ก ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ensemble: ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€ ์•™์ƒ๋ธ” ์ถ”๋ก ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
checkpoint (str): ๋ถˆ๋Ÿฌ์˜ฌ ๋ชจ๋ธ ๊ฒฝ๋กœ
  • ์•™์ƒ๋ธ” ์ถ”๋ก ์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ชจ๋ธ์˜ ๊ฒฝ๋กœ๋ฅผ ๋‚˜์—ดํ•ฉ๋‹ˆ๋‹ค.

    --checkpoint <MODELPATH_1.pth> <MODELPATH_2.pth> <MODELPATH_3.pth> ...
max_sequence (int): ์ˆ˜์‹ ๋ฌธ์žฅ ์ƒ์„ฑ ์‹œ ์ตœ๋Œ€ ์ƒ์„ฑ ๊ธธ์ด (default. 230)
batch_size (int) : ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ (default. 32)
decode_type (str): ๋””์ฝ”๋”ฉ ๋ฐฉ์‹
  • 'greedy': ๊ทธ๋ฆฌ๋”” ๋””์ฝ”๋”ฉ ๋ฐฉ๋ฒ•์œผ๋กœ ๋””์ฝ”๋”ฉ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • 'beam': ๋น”์„œ์น˜ ๋ฐฉ๋ฒ•์œผ๋กœ ๋””์ฝ”๋”ฉ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
decoding_manager (bool): DecodingManager ์‚ฌ์šฉ ์—ฌ๋ถ€
tokens_path (str): ํ† ํฐ ํŒŒ์ผ ๊ฒฝ๋กœ
  • NOTE. DecodingManager๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ์—๋งŒ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.
max_cache (int): ์•™์ƒ๋ธ”('ensemble') ์ถ”๋ก  ์‹œ ์ธ์ฝ”๋” ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ž„์‹œ ์ €์žฅํ•  ๋ฐฐ์น˜ ์ˆ˜
  • NOTE. ๋†’์€ ๊ฐ’์„ ์ง€์ •ํ•  ์ˆ˜๋ก ์ถ”๋ก  ์†๋„๊ฐ€ ๋นจ๋ผ์ง€๋งŒ, ์ผ์‹œ์ ์œผ๋กœ ๋งŽ์€ ์ €์žฅ ๊ณต๊ฐ„์„ ์ฐจ์ง€ํ•ฉ๋‹ˆ๋‹ค.
file_path (str): ์ถ”๋ก ํ•  ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ
output_dir (str): ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•  ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ (default: './result/')

Collaboration Tools


Github Issues

Github Discussions

Github Pull Requests

Experiments Logging(W&B)

Who Are We?


๊ณ ์ง€ํ˜•
[email protected]

๊น€์ค€์ฒ 
[email protected]

๊น€ํ˜•๋ฏผ
[email protected]

์†ก๋ˆ„๋ฆฌ
[email protected]

์ด์ฃผ์˜
[email protected]

์ตœ์ค€๊ตฌ
[email protected]

p4-fr-sorry-math-but-love-you's People

Contributors

ahaampo5 avatar doritos0812 avatar ilovemyminutes avatar jun9choi avatar lala-chick avatar nureesong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

p4-fr-sorry-math-but-love-you's Issues

๋ฆฌํŒฉํ„ฐ๋ง

ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ์ƒ ์ฐจ์ด๋Š” ์—†์ง€๋งŒ, ๋ชจ๋“ˆ ์žฌํ• ๋‹น ๋“ฑ์„ ํ†ตํ•ด ๊ฐ€๋…์„ฑ ๋†’์ด๊ธฐ

Attention - RGB3 + size(256,256)

width, height = 256
rgb = 3
data_proportions = 1.0
batch_size = 32
num_workers = 8
์—ํญ๋‹น 23๋ถ„ ์†Œ์š”

SATRN - CLAHE ์‹คํ—˜

Seed: 21
Data Proportion: 1.0
Data Split Type: Stratified 5-fold
Validation Data: fold 3
Epoch: 50
rgb: 3
scheduler: CustomCosine
optimizer: AdamW
train transforms: CLAHE

๋…ผ๋ฌธ๋ฆฌ๋ทฐ - Z Wang, Jyn-Charn Liu, 2019, Translating Math Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequence-level Training

  • ์ธ์ฝ”๋”/๋””์ฝ”๋”์˜ ์ž…์ถœ๋ ฅ ํŒŒ์•…
  • ์ˆ˜์‹ ์ธ์‹ task์˜ ์ฑŒ๋ฆฐ์ง•ํ•œ ๋ถ€๋ถ„, ํ•ด๋‹น ๋ชจ๋ธ์˜ ์ฃผ์•ˆ์  ํŒŒ์•…
  • ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/1801.03530.pdf

SATRN - Pre-training with IM2LATEX

  • IM2LATEX ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•œ ์‚ฌ์ „ํ•™์Šต์„ ํ†ตํ•ด weight๋ฅผ ์žก์•„๋‘๋Š”๊ฒŒ ์ข‹์€ ์˜ํ–ฅ์„ ๋ผ์น ์ง€ ํ™•์ธ
  • ๋Œ€ํšŒ์— ์ฃผ์–ด์ง„ ํ† ํฐ๋งŒ์„ ํฌํ•จํ•œ ์ƒ˜ํ”Œ๋งŒ์„ ํ™œ์šฉ
  • ๋ฐ์ดํ„ฐ๋ฅผ ์ž‘๊ฒŒ ์žก๊ณ  ์šฐ์„  ํ•™์Šต ์ง„ํ–‰ํ•ด๋ณด๊ณ , ์‚ฌ์ „ํ•™์Šต์ด ์—†์„ ๋–„์— ๋น„ํ•ด ์„ฑ๋Šฅ ํ–ฅ์ƒ๋œ ๊ฒƒ์œผ๋กœ ๋ณด์ด๋ฉด ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ด ํ•™์Šตํ•˜์ž

W&B ๋กœ๊น… ์ถ”๊ฐ€

Train phase์—์„œ ๋ชจ๋ธ ๊ฐ„ ๋Ÿฌ๋‹ ์ปค๋ธŒ ๋น„๊ต๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ•˜๊ธฐ ์œ„ํ•ด W&B ์ถ”๊ฐ€

Pretrained-BERT for SATRN

  • SARTN์˜ ์ธ์ฝ”๋” ์ผ๋ถ€์™€ ๋””์ฝ”๋”๋Š” Transformer ๊ตฌ์กฐ๋ฅผ ์ฐจ์šฉํ•˜๋Š”๋ฐ, BERT ๋ชจ๋ธ๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ?
  • ๋งŒ์•ฝ ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด, ์ฃผ์–ด์ง„ Ground Truth๋ฅผ ํ™œ์šฉํ•ด ์‚ฌ์ „ํ•™์Šต(NSP, MLM)์„ ์ง„ํ–‰ํ•œ ๋’ค ๋””์ฝ”๋”๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค
  • SATRN ๋ชจ๋ธ ๊ตฌ์กฐ ํŒŒ์•…ํ•˜๊ณ , BERT ๊ตฌ์กฐ ๋˜์งš์–ด๋ณธ ๋’ค ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ ํ™•์ธํ•ด๋ณด์ž!

BERT ๋ชจ๋ธ ํ™œ์šฉ ํ›„์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ฐœ๋ฐœ

SATRN ๋“ฑ ์ˆ˜์‹์ธ์‹๊ธฐ์˜ ์ถœ๋ ฅ์„ ์žฌ๊ต์ •ํ•˜๋Š” ๋ชจ๋ธ ๊ฐœ๋ฐœ

  • ์ž…๋ ฅ: ์ˆ˜์‹์ธ์‹๊ธฐ์˜ ์ˆ˜์‹ ์ƒ์„ฑ ๊ฒฐ๊ณผ
  • ์ถœ๋ ฅ: ์ˆ˜์‹ ๊ต์ • ๊ฒฐ๊ณผ
  • ๋ชจ๋ธ ํ•™์Šต ์‹œ GT: ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์˜ ํ…์ŠคํŠธ GT(์ˆ˜์‹)
  • ํ•„์š”์‚ฌํ•ญ: ๋ ˆ์ดํ… ํ† ํฐ์„ ํ™œ์šฉํ•œ BERT ๋ชจ๋ธ์˜ ์‚ฌ์ „ ํ•™์Šต
  • ๊ธฐํƒ€
    • ์ˆ˜์‹์ธ์‹๊ธฐ๊ฐ€ ์ƒ์„ฑํ•œ ์ˆ˜์‹์„ '์žฌ์ƒ์„ฑ'ํ•˜๋Š” ๊ฒƒ์€ Score๋ฅผ ๋˜๋ ค ๋‚ฎ์ถœ ์œ„ํ—˜์„ฑ์ด ์žˆ์Œ. ์ˆ˜์‹์ธ์‹๊ธฐ๊ฐ€ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด์„œ๋„ ๊ตณ์ด ์žฌ์ƒ์„ฑ์— ๋Œ€ํ•œ ๋ฆฌ์Šคํฌ๋ฅผ ์•ˆ๊ฒจ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ
    • ๋”ฐ๋ผ์„œ, ์ผ์ข…์˜ Confidence๋ฅผ ๋งˆ๋ จํ•ด Confidence๊ฐ€ ๋‚ฎ์€ ์ž…๋ ฅ์— ๋Œ€ํ•ด์„œ๋งŒ ์žฌ์ƒ์„ฑ์„ ์ˆ˜ํ–‰ํ•  ํ•„์š”๊ฐ€ ์žˆ์Œ. ์ฆ‰, ์ˆ˜์‹์ธ์‹๊ธฐ๊ฐ€ ์ด๋ฏธ ์ •ํ™•ํžˆ ์ถ”๋ก ํ•œ ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋œ ๊ฒฐ๊ณผ๋ฌผ์€ ํ›„์ฒ˜๋ฆฌ ๋ชจ๋ธ์ด ๊ฑด๋“ค์ง€ ์•Š๋„๋ก ํ•ด์•ผ ํ•จ
    • ํ˜„์žฌ๋กœ์จ ์ƒ๊ฐ๋˜๋Š” ๊ฐ€์žฅ ํ˜„์‹ค์ ์ธ ๋ฐฉ๋ฒ•์€, BLEU ์Šค์ฝ”์–ด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ
      • '์ˆ˜์‹์ธ์‹๊ธฐ์˜ ์ƒ์„ฑ ๋ฌธ์žฅ๊ณผ ํ›„์ฒ˜๋ฆฌ ๋ชจ๋ธ์˜ ์žฌ์ƒ์„ฑ ๋ฌธ์žฅ ๊ฐ„ ์ฐจ์ด๊ฐ€ ์ ์„ ์ˆ˜๋ก ์ˆ˜์‹์ธ์‹๊ธฐ์˜ ์ƒ์„ฑ ๋ฌธ์žฅ์€ ์ •๋‹ต์ผ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค'๋ผ๋Š” ๊ฐ€์ • ํ•˜, ์ˆ˜์‹์ธ์‹๊ธฐ์˜ ์ƒ์„ฑ ๋ฌธ์žฅ๊ณผ ํ›„์ฒ˜๋ฆฌ ๋ชจ๋ธ์˜ ์žฌ์ƒ์„ฑ ๋ฌธ์žฅ ๊ฐ„ BLEU ์Šค์ฝ”์–ด๋ฅผ ์ธก์ •ํ•ด, ํŠน์ • threshold ์ด์ƒ์ด๋ฉด ์ˆ˜์‹์ธ์‹๊ธฐ์˜ ์ƒ์„ฑ ๋ฌธ์žฅ์„ ์ตœ์ข… ์ถ”๋ก  ๊ฒฐ๊ณผ๋กœ, ์ดํ•˜์ด๋ฉด ํ›„์ฒ˜๋ฆฌ ๋ชจ๋ธ์˜ ์žฌ์ƒ์„ฑ ๋ฌธ์žฅ์„ ์ตœ์ข… ์ถ”๋ก  ๊ฒฐ๊ณผ๋กœ ๊ฐ„์ฃผ
      • threshold๋Š” ๊ฒฝํ—˜์ ์œผ๋กœ ์ฐพ์•„์•ผํ•  ๊ฒƒ ๊ฐ™์Œ + EDA
      • ์œ„ ๋ฐฉ๋ฒ• ์ด์™ธ์˜ Confidence๋ฅผ ๋งค๊ธธ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค๋ฉด ํ™œ์šฉํ•ด๋ณด์•„๋„ ๊ดœ์ฐฎ์„ ๊ฒƒ ๊ฐ™์Œ

์ธ์ฝ”๋” & ๋””์ฝ”๋” lr ๊ฐœ๋ณ„ ๋ถ€์—ฌ

teacher forcing ์Šค์ผ€์ค„๋ง์— ๋”ฐ๋ผ ๋””์ฝ”๋”์™€ ์ธ์ฝ”๋”์— ๊ฐœ๋ณ„์ ์ธ lr์„ ๋ถ€์—ฌํ•  ๋งŒํ•œ ๊ฐ€์น˜๊ฐ€ ์žˆ์–ด๋ณด์ž„

  • ์ดˆ๋ฐ˜์— teacher forcing์„ ๋†’๊ฒŒ ๊ฐ€์ ธ๊ฐˆ ๊ฒฝ์šฐ ํ•™์Šต์ด ๋” ์ž˜ ๋˜๋Š” ๊ฒฝํ–ฅ
  • teacher forcing์ด ๋†’์„ ์ˆ˜๋ก ์ธ์ฝ”๋”์˜ CNN์˜ ํ•™์Šต ๋น„์ค‘์ด ๋†’๊ณ , ๋‚ฎ์„ ์ˆ˜๋ก ๋””์ฝ”๋”์˜ RNN์˜ ํ•™์Šต ๋น„์ค‘์ด ๋†’์„ ๊ฒƒ์œผ๋กœ ์ถ”์ธก
  • teacher forcing์„ ์ดˆ๋ฐ˜๋ถ€์— ๋†’๊ฒŒ ์žก์•„์ค„ ๊ฒฝ์šฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋ฐœ์ƒํ•œ ๊ฒƒ์€, ๊ทธ๋งŒํผ ํ•™์Šต ์ดˆ๋ฐ˜ ์ธ์ฝ”๋”(CNN)์˜ weight๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž๋ฆฌ์žก๋Š๋ƒ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹๊นŒ?
  • ๊ทธ๋ž˜์„œ! ๋””์ฝ”๋”์™€ ์ธ์ฝ”๋”, ๊ทธ๋ฆฌ๊ณ  teacher forcing ์Šค์ผ€์ค„๋ง์„ ๋ชจ๋‘ ๊ฐœ๋ณ„์ ์œผ๋กœ ๊ณ ๋ คํ•œ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด๋ณผ ๊ฐ€์น˜๊ฐ€ ์žˆ์Œ. ๋Œ€๋žต ๋‹ค์Œ๊ณผ ๊ฐ™์ด
    • ์ดˆ๋ฐ˜๋ถ€: ๋†’์€ teacher forcing & ๋†’์€ ๋””์ฝ”๋” lr & ๋‚ฎ์€ ์ธ์ฝ”๋” lr
    • ํ›„๋ฐ˜๋ถ€: ๋‚ฎ์€ teacher forcing & ๋‚ฎ์€ ๋””์ฝ”๋” lr & ๋†’์€ ๋””์ฝ”๋” lr

BEAM-SEARCH

๊ธฐ์กด Best-path generation(greedy decoding)๋ณด๋‹ค ์ƒ์„ฑ ์„ฑ๋Šฅ์ด ๋†’์€ beam-search ๋ฐฉ๋ฒ•์„ ๊ตฌํ˜„

Reference.

  • Pytorch-seq2seq: ๋น”์„œ์น˜ ๊ตฌํ˜„ ์ฝ”๋“œ ๋‚˜์™€์žˆ์Œ
  • Beam Search in NLP: ๋น”์„œ์น˜ ๊ฐœ๋… ์ •๋ฆฌ ์ž˜ ๋˜์–ด ์žˆ์Œ

SEED - BERT ์ ์šฉํ•ด๋ณด๊ธฐ

๋…ผ๋ฌธ๋ฆฌ๋ทฐ - XIAOXUE CHEN, 2020, Text Recognition in the Wild:A Survey : STR์— ๋Œ€ํ•œ ๋ฌธ์ œ ์ •์˜์™€ ํ•ด๊ฒฐ์ฑ… ์ œ์‹œ

๋‹จ์ˆœ OCR๋ณด๋‹ค ์šฐ๋ฆฌ Task๋Š” Scene Text Recognition(STR)๊ณผ ๋‹ฎ์•˜๋‹ค.
์šฐ๋ฆฌ๊ฐ€ ๋ฐ”๋ผ๋ด์•ผํ•˜๋Š” ๋ถ€๋ถ„์€ Text Recognition + NLP ๊ฐ€ ๋  ๊ฒƒ.

STR์—์„œ ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๋ฌธ์ œ๋“ค - Issues

  1. script identification
    : Image ๋ฐ์ดํ„ฐ์ด๋ฏ€๋กœ ๋ฌธ์ž๋ฅผ ๋ฌธ์ž๋กœ ์ธ์‹ํ•˜๋Š” ๊ฒƒ ์ž์ฒด๊ฐ€ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค.
  2. Text enhancement
    : ๋ฌธ์ž ์ธ์‹๋ฅ ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๋‚ฎ์€ ํ’ˆ์งˆ์˜ ํ…์ŠคํŠธ๋ฅผ ๋ณต๊ตฌํ•˜๊ฑฐ๋‚˜ text resolution์„ ๋†’์—ฌ์ฃผ๊ฑฐ๋‚˜ text์˜ ์ฐŒ๊ทธ๋Ÿฌ์ง(distortion)์„ ์ค„์ด๊ฑฐ๋‚˜, ๋ฐฐ๊ฒฝ์„ ์—†์• ๋Š” ๊ฒƒ์ด ์ข‹์€ ๋ฐฉ๋ฒ•์ด๋‹ค.
  3. Text tracking
    : ๊ฐ•์˜์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ฌธ์ž์˜ ์œ„์น˜๋ฅผ ์ผ๊ด€์„ฑ ์žˆ๊ฒŒ ์ถ”์ ํ•ด ๊ฐ€๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•œ ๋ฌธ์ œ๋‹ค. ๋ฐฐ๊ฒฝ์ด๋‚˜ ๋…ธ์ด์ฆˆ๋ž‘ ์„ž์–ด์„œ text๋กœ ์ธ์‹ํ•˜๋Š” ๊ฒƒ ๊ตฌ๋ถ„ ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค.
  4. NLP ํ™œ์šฉ ๋Šฅ๋ ฅ
    : ๊ฒฐ๊ตญ cnn backbone์„ ํ†ตํ•ด ๋„์ถœ ๋˜๋Š” feature๋ฅผ ํ•ด์„ํ•˜๊ณ  sequence ๋ฐ์ดํ„ฐ๋กœ ์ถœ๋ ฅํ•ด์ฃผ๋Š” ๊ฒƒ์€ sequential ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— sequential ๋ชจ๋ธ ํ™œ์šฉ์ด ์ค‘์š”ํ•œ ๋ฌธ์ œ๋กœ ๋‹ค๋ฃจ์–ด์•ผ ํ•œ๋‹ค.

Issue 1 ํ•ด๊ฒฐ์ฑ…
: ์ง€ํ˜•์ด๊ฐ€ ํ”ผ์–ด์„ธ์…˜ ๋•Œ ๋งํ•œ ๊ฒƒ์ฒ˜๋Ÿผ receptive field ์‹œ๊ฐํ™”๋กœ ํ•™์Šต ์ง„ํ–‰ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์„ ๋“ฏ.

Issue 2 ํ•ด๊ฒฐ์ฑ…
: ๋…ผ๋ฌธ์— ํฐ ํšจ๊ณผ๋ฅผ ๊ฐ€์ ธ์™”๋‹ค๊ณ  ์ด์•ผ๊ธฐ ํ•˜๋Š” ์ƒ์„ฑ๋ชจ๋ธ์˜ ํ™œ์šฉ์ด๋‚˜ ์ฃผ์˜์ด ํ˜•์ด ๊ฐ€์ ธ์˜จ Adaptive threshold, Median blur, Closing์ด ์ข‹์€ ๋ฐฉ๋ฒ•์ด ๋  ๊ฒƒ ๊ฐ™๋‹ค.
์ƒ์„ฑ๋ชจ๋ธ์€ De-GAN์ด ๊ฐ€์žฅ ์œ ๋ ฅํ•˜๋‹ค. fine-tuning ๋ฐฉ๋ฒ• ์ƒ๊ฐํ•ด๋ด์•ผ ํ•  ๋“ฏ

Issue 3 ํ•ด๊ฒฐ์ฑ…
: ์ด ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ๋Š” ํ•™์Šต ๋Œ๋ ค๋ณด๊ณ  ์ถ”๊ฐ€๋กœ ์ด์•ผ๊ธฐ ํ•ด๋ด์•ผ ํ•  ๋“ฏ?

Issue 4 ํ•ด๊ฒฐ์ฑ…
: NLP์—์„œ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ๋‹ค์–‘ํ•˜๊ฒŒ ์จ๋ณด๋Š” ๊ฒƒ๋„ ์ข‹์€ ์‹œ๋„๊ฐ€ ๋  ๊ฒƒ ๊ฐ™๋‹ค.

์•™์ƒ๋ธ”

๋ชจ๋ธ ์•™์ƒ๋ธ” ์ง„ํ–‰ํ•˜๊ธฐ
-ํ•˜๋“œ๋ณดํŒ…๋ถ€ํ„ฐ ์‹œ์ž‘

EDA - Tokens

  1. ํ† ํฐ ์ข…๋ฅ˜๋ณ„ ๋นˆ๋„์ˆ˜ ์„ธ์„œ ๋ชจ๋“  ํด๋“œ์— 1/5์”ฉ ๋ถ„๋ฐฐ
  2. ํ† ํฐ๋ผ๋ฆฌ์˜ correlation ๋ถ„์„ (categorization)
    ๋ณต์žกํ•œ ์ˆ˜์‹(int,lim,frac ๋“ฑ)์˜ ํฌํ•จ ์—ฌ๋ถ€์— ๋”ฐ๋ผ
    ๊ทธ๋ฃน ๋ ˆ์ด๋ธ”๋งํ•ด์„œ ์ธตํ™” ์ถ”์ถœํ•˜๊ธฐ

SATRN - Dataset 0.1 ratio Basic (๋น„๊ต์šฉ)

1.0 ์œผ๋กœ ์‹คํ—˜ํ•˜๊ธฐ์—” ๋„ˆ๋ฌด ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ณ  ํฐ๊ฑฐ๊ฐ™์•„์„œ 0.1 ๋‘๊ณ  ๋‹ค๋ฅธ๊ฒƒ๋“คํ•˜๊ณ  ์‹คํ—™ํ•ด๋ณด๊ธฐ ์œ„ํ•ด์„œ ์šฐ์„  basic์ฝ”๋“œ ๊ทธ๋Œ€๋กœ 0.1 dataset ์‹คํ—˜

EDA - ์ด๋ฏธ์ง€ ๋ฆฌ์‚ฌ์ด์ง• ์ด์Šˆ

์ง€ํ˜• : resize + padding โ†’ square image
์ค€์ฒ  : aspect_ratio์˜ ๋ฒ”์œ„์— ๋”ฐ๋ผ transform,resize๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์ฃผ๊ธฐ
object detection์—์„œ multiscale train์„ ํ–ˆ์œผ๋‹ˆ๊นŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ?? โ†’ ์‰ฝ์ง€ ์•Š์„๋“ฏ.

EDA - Preprocessing

binary image๋กœ ๋ฐ”๊พธ๊ธฐ & ๋ฐฐ๊ฒฝ ๋…ธ์ด์ฆˆ ๊น”๋”ํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ
(grayscale, Fourier transform, sharpening)
aspect_ratio๊ฐ’์ด ์–ด๋–ค threshold ์ด์ƒ์ด๋ฉด ์ด๋ฏธ์ง€๊ฐ€ 90๋„ ํšŒ์ „๋œ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•˜๊ณ  ๋Œ๋ฆฌ์ž
โ†’ ๊ทผ๋ฐ inferenceํ•  ๋•Œ ๊ฐ€๋Šฅํ• ๊นŒ?? ๋ฒ ์ด์Šค๋ผ์ธ ๋ด์•ผ ํ•  ๋“ฏ.

SATRN - RGB 3 ์‹คํ—˜

Seed: 21
Data Proportion: 1.0
Data Split Type: Stratified 5-fold
Validation Data: fold 3
Epoch: 50
rgb: 3
scheduler: CustomCosine
optimizer: AdamW

SATRN - FAST EXP

Data proportion: 0.1
RGB - ์–ด๋–ค RGB๊ฐ€ ์ข‹์€์ง€
์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ
Vocab - vocab ์‚ฌ์ด์ฆˆ์— ์˜ํ–ฅ์ด ํด์ง€

SATRN - Size256, RGB3 ์‹คํ—˜

Seed: 21
height, width : 256
Data Proportion: 1.0
Data Split Type: JC validation
Validation Data: fold 4
Epoch: 50
rgb: 3
batch size: 8
scheduler: CustomCosine
optimizer: Adam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.