Giter Site home page Giter Site logo

a1004123217 / tcm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wenwenyu/tcm

0.0 0.0 0.0 9.65 MB

Turning a CLIP Model into a Scene Text Detector (CVPR2023)

Home Page: https://arxiv.org/abs/2302.14338

License: Other

Shell 6.34% Python 93.55% CSS 0.01% Makefile 0.04% Batchfile 0.04% Dockerfile 0.02%

tcm's Introduction

Turning a CLIP Model into a Scene Text Detector

This repository is build upon mmocr 0.4.0.

NightTime-ArT Dataset

NightTime-ArT dataset, collected from ArT, can be downloaded from here.

Usage

Environment

  • cuda 11.1
  • torch=1.8.0
  • torchvision=0.9.0
  • timm=0.4.12
  • mmcv-full=1.3.17
  • mmseg=0.20.2
  • mmdet=2.19.1
  • mmocr=0.4.0

The code is based on mmocr. Please first install the mmcv-full and mmocr following the official guidelines (mmocr).

Dataset

Pre-trained CLIP Models

  • Download the pre-trained CLIP models (RN50.pt) and save them to the pretrained folder.
  • Configure the pre-trained CLIP models path in config file as
model = dict(
    pretrained='xxx/ocrclip/pretrained/RN50.pt',
    )

Pretraining & Training & Evaluation

To pretrain the TCM model on SynthText/Synth150k, please configure the corresponding dataset path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To finetune the TCM model based on pretrained model, please configure the load_from to the pretrained checkpoint path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To evaluate the performance with checkpoint, run:

bash dist_test.sh configs/textdet/xxnet/xxx.py /path/to/checkpoint 1 --eval hmean-iou

Results

Method Data F-measure Model
TCM-DB TD 88.8% config weights
TCM-DB IC15 88.8% config weights
TCM-DB CTW 85.1% config
TCM-DB TT 85.9% config

TODO

  • Add FastTCM
  • Migration from mmocr 0.4.0 to mmocr 1.0.0
  • Refactor and clean code
  • Release domain adaptation setting

Cites

If you find this project helpful for your research, please consider citing the paper

@inproceedings{Yu2023TurningAC,
  title={Turning a CLIP Model into a Scene Text Detector},
  author={Wenwen Yu and Yuliang Liu and Wei Hua and Deqiang Jiang and Bo Ren and Xiang Bai},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Licence

This project is under the CC-BY-NC 4.0 license. See LICENSE for more details.

Acknowledges

The project partially based on MMOCR, CLIP, DenseCLIP. Thanks for their great works.

tcm's People

Contributors

wenwenyu avatar yuliang-liu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.