RCLSTR

[ACMMM 2023] Relational Contrastive Learning for Scene Text Recognition

Introduction

This repository is an official implementation of RCLSTR.

Getting Started

1. Environment Setup

Base Environments

Python >= 3.8
CUDA == 11.0
PyTorch == 1.7.1

Step-by-step installation instructions

a. Create a conda virtual environment and activate it.

conda create -n RCLSTR python=3.8 -y
conda activate RCLSTR

b. Install PyTorch and torchvision following the official instructions.

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

c. Install other packages.

pip install lmdb pillow nltk natsort fire tensorboard tqdm imgaug einops
pip install numpy==1.22.3

2. Data Preparation

Download datasets

We use the ST(SynthText) training datasets from STR-Fewer-Labels. Download the datasets from baiduyun (password:px16).

Data folder structure

data_CVPR2021
├── training
│   └── label
│       └── synth
├── validation
│   ├── 1.SVT
│   ├── 2.IIIT
│   ├── 3.IC13
│   ├── 4.IC15
│   ├── 5.COCO
│   ├── 6.RCTW17
│   ├── 7.Uber
│   ├── 8.ArT
│   ├── 9.LSVT
│   ├── 10.MLT19
│   └── 11.ReCTS
└── evaluation
    └── benchmark
        ├── CUTE80
        ├── IC03_867
        ├── IC13_1015
        ├── IC15_2077
        ├── IIIT5k_3000
        ├── SVT
        └── SVTP

Link the dataset path as follows:

cd pretrain
ln -s /path/to/data_CVPR2021 data_CVPR2021
cd evaluation
ln -s /path/to/data_CVPR2021 data_CVPR2021

TPS model weights

For the TPS module, we use the pretrained TPS model weights from STR-Fewer-Labels. Please download the TPS model weights from baiduyun (password:px16) and put it in pretrain/TPS_model.

3. Pretrain and decoder evaluation

Pretrain

RCLSTR method includes regularization module (reg), hierarchical module (hier) and cross-hierarchy consistency module (con).

Pretrain SeqMoCo model:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--frame_weight 0 \
--frame_alpha 0 \
--word_weight 0 \
--word_alpha 0

Pretrain SeqMoCo model with reg module:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo_reg   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation \
--frame_weight 0 \
--frame_alpha 0 \
--word_weight 0 \
--word_alpha 0

Pretrain SeqMoCo model with reg and hier module:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo_reg_hier   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation

Pretrain SeqMoCo model with reg, hier and con module:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo_reg_hier_con   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation \
--multi_level_consistent global2local \
--multi_level_ins 0

Feature representation evaluation

Train attention-based decoder for feature representation evaluation:

cd evaluation
CUDA_VISIBLE_DEVICES=0 python train_new.py \
--model_name TRA \
--exp_name TRA_reg_hier_con \
--saved_model ../pretrain/SeqMoCo_reg_hier_con/checkpoint_0004.pth.tar \
--select_data synth \
--batch_size 256 \
--Aug light

Main Results

TODO

Support ViT

Acknowledgements

We thank these great works and open-source codebases:

Citation

If you find our method useful for your reserach, please cite

@misc{zhang2023relational,
      title={Relational Contrastive Learning for Scene Text Recognition}, 
      author={Jinglei Zhang and Tiancheng Lin and Yi Xu and Kai Chen and Rui Zhang},
      year={2023},
      eprint={2308.00508},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

thundervvv / rclstr Goto Github PK

rclstr's Introduction

RCLSTR

[ACMMM 2023] Relational Contrastive Learning for Scene Text Recognition

Introduction

Getting Started

1. Environment Setup

Base Environments

Step-by-step installation instructions

2. Data Preparation

3. Pretrain and decoder evaluation

Pretrain

Feature representation evaluation

Main Results

TODO

Acknowledgements

Citation

rclstr's People

Contributors

Stargazers

Watchers

Forkers

rclstr's Issues

Recommend Projects

Recommend Topics

Recommend Org