The tcm from a1004123217

Turning a CLIP Model into a Scene Text Detector

This repository is build upon mmocr 0.4.0.

NightTime-ArT Dataset

NightTime-ArT dataset, collected from ArT, can be downloaded from here.

Usage

Environment

cuda 11.1
torch=1.8.0
torchvision=0.9.0
timm=0.4.12
mmcv-full=1.3.17
mmseg=0.20.2
mmdet=2.19.1
mmocr=0.4.0

The code is based on mmocr. Please first install the mmcv-full and mmocr following the official guidelines (mmocr).

Dataset

Please following the mmocr official guidelines to prepare the datasets accordingly.
Configure the dataset path in ocrclip/configs/_base_/det_datasets.

Pre-trained CLIP Models

Download the pre-trained CLIP models (RN50.pt) and save them to the pretrained folder.
Configure the pre-trained CLIP models path in config file as

model = dict(
    pretrained='xxx/ocrclip/pretrained/RN50.pt',
    )

Pretraining & Training & Evaluation

To pretrain the TCM model on SynthText/Synth150k, please configure the corresponding dataset path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To finetune the TCM model based on pretrained model, please configure the load_from to the pretrained checkpoint path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To evaluate the performance with checkpoint, run:

bash dist_test.sh configs/textdet/xxnet/xxx.py /path/to/checkpoint 1 --eval hmean-iou

Results

Method	Data	F-measure	Model
TCM-DB	TD	88.8%	config weights
TCM-DB	IC15	88.8%	config weights
TCM-DB	CTW	85.1%	config
TCM-DB	TT	85.9%	config

TODO

Add FastTCM
Migration from mmocr 0.4.0 to mmocr 1.0.0
Refactor and clean code
Release domain adaptation setting

Cites

If you find this project helpful for your research, please consider citing the paper

@inproceedings{Yu2023TurningAC,
  title={Turning a CLIP Model into a Scene Text Detector},
  author={Wenwen Yu and Yuliang Liu and Wei Hua and Deqiang Jiang and Bo Ren and Xiang Bai},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Licence

This project is under the CC-BY-NC 4.0 license. See LICENSE for more details.

Acknowledges

The project partially based on MMOCR, CLIP, DenseCLIP. Thanks for their great works.

a1004123217 / tcm Goto Github PK

tcm's Introduction

Turning a CLIP Model into a Scene Text Detector

NightTime-ArT Dataset

Usage

Environment

Dataset

Pre-trained CLIP Models

Pretraining & Training & Evaluation

Results

TODO

Cites

Licence

Acknowledges

tcm's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent