Giter Site home page Giter Site logo

czczup / fast Goto Github PK

View Code? Open in Web Editor NEW
127.0 3.0 17.0 2.42 MB

Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

Home Page: https://arxiv.org/pdf/2111.02394

License: Apache License 2.0

Shell 0.05% Python 62.65% C++ 33.80% Cuda 2.18% Cython 0.88% CMake 0.45%
ocr real-time text-detection

fast's Introduction

FAST

PWC PWC PWC PWC

This repository is an official implementation of the FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.

Text Detection

News

  • Mar 11, 2024:๐Ÿš€๐Ÿš€ FAST has been integrated into the docTR, a seamless, high-performing & accessible library for OCR-related tasks.
  • Jan 10, 2023:๐Ÿš€ Code and models are released.
  • Dec 06, 2022: Code and models of FAST will be released in this repository.

Zero-shot Video Text Detection Demo

FAST.mp4

Catalog

  • TensorRT implementation
  • Code and models
  • Initialization

Abstract

We propose an accurate and efficient scene text detection framework, termed FAST (i.e., Faster Arbitrarily-Shaped Text detector). Different from recent advanced text detectors that used complicated post-processing and hand-crafted network architectures, resulting in low inference speed, FAST has two new designs. (1) We design a minimalist kernel representation (only has 1-channel output) to model text with arbitrary shape, as well as a GPU-parallel post-processing to efficiently assemble text lines with a negligible time overhead. (2) We search the network architecture tailored for text detection, leading to more powerful features than most networks that are searched for image classification. Benefiting from these two designs, FAST achieves an excellent trade-off between accuracy and efficiency on several challenging datasets, including Total Text, CTW1500, ICDAR 2015, and MSRA-TD500. For example, FAST-T yields 81.6% F-measure at 152 FPS on Total-Text, outperforming the previous fastest method by 1.7 points and 70 FPS in terms of accuracy and speed. With TensorRT optimization, the inference speed can be further accelerated to over 600 FPS.

Method

image

Usage

Installation

First, clone the repository locally:

git clone https://github.com/czczup/FAST

Then, install PyTorch 1.1.0+, torchvision 0.3.0+, and other requirements:

# for python3 (training and testing)
pip install editdistance
pip install Polygon3
pip install pyclipper
pip install Cython
pip install mmcv
pip install prefetch_generator
pip install scipy
pip install yacs
pip install tqdm
pip install opencv-python==4.6.0.66

# for python2 (evaluation)
# the evaluation code is from pan_pp.pytorch
pip2 install numpy==1.10
pip2 install scipy==1.2.2
pip2 install polygon2

Finally, compile codes of post-processing:

# build pse, pa, and ccl algorithms
sh ./compile.sh

Dataset

Please refer to dataset/README.md for dataset preparation.

Training

First, please download the pretrained checkpoints:

mkdir pretrained/
cd pretrained/
wget https://github.com/czczup/FAST/releases/download/release/fast_tiny_ic17mlt_640.pth
wget https://github.com/czczup/FAST/releases/download/release/fast_small_ic17mlt_640.pth
wget https://github.com/czczup/FAST/releases/download/release/fast_base_ic17mlt_640.pth
cd ../

Then, run the following command for training:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py <config>

For example:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py config/fast/tt/fast_base_tt_800_finetune_ic17mlt.py

Testing

Evaluate single checkpoint

python test.py <config> <checkpoint> --ema
cd eval/
./eval_{DATASET}.sh

For example:

python test.py config/fast/tt/fast_base_tt_800_finetune_ic17mlt.py download/fast_base_tt_800_finetune_ic17mlt.pth --ema
cd eval/
sh eval_tt.sh

It should give:

Precision:_0.900048239267_______/Recall:_0.851633393829/Hmean:_0.875171745978

Evaluate all checkpoints in one folder

python test_all.py <config> <checkpoint-dir> --dataset [{tt/ctw/ic15/msra}] --start-ep 1 --end-ep 60 --ema

Evaluate the speed

python test.py <config> --report-speed

For example:

python test.py config/fast/tt/fast_base_tt_800_finetune_ic17mlt.py --report-speed

Visulization

Run the following script to visulize the prediction results:

python visualize.py --dataset [{tt/ctw/ic15/msra}] --show-gt
  • This script will load the predictions in outputs/ and plot them on images.

  • The visulized results will be saved in visual/.

  • Left is the ground truth and right is the prediction.

visulization

Model Zoo

IC17-MLT Pretrained FAST Models

Model Backbone Pretrain Resolution #Params Config Download
FAST-T TextNet-T ImageNet-1K 640x640 8.5M config ckpt | log
FAST-S TextNet-S ImageNet-1K 640x640 9.7M config ckpt | log
FAST-B TextNet-B ImageNet-1K 640x640 10.6M config ckpt | log
FAST-T TextNet-T - 640x640 8.5M - ckpt
FAST-S TextNet-S - 640x640 9.7M - ckpt
FAST-B TextNet-B - 640x640 10.6M - ckpt
  • We provide the IC17-MLT pretrained weights with and without ImageNet pretraining.

Results on Total-Text

Method Backbone Precision Recall F-measure FPS Config Download
FAST-T-448 TextNet-T 86.5 77.2 81.6 152.8 config ckpt | log
FAST-T-512 TextNet-T 87.3 80.0 83.5 131.1 config ckpt | log
FAST-T-640 TextNet-T 87.1 81.4 84.2 95.5 config ckpt | log
FAST-S-512 TextNet-S 88.3 81.7 84.9 115.5 config ckpt | log
FAST-S-640 TextNet-S 89.1 81.9 85.4 85.3 config ckpt | log
FAST-B-512 TextNet-B 89.6 82.4 85.8 93.2 config ckpt | log
FAST-B-640 TextNet-B 89.9 83.2 86.4 67.5 config ckpt | log
FAST-B-800 TextNet-B 90.0 85.2 87.5 46.0 config ckpt | log

Results on CTW1500

Method Backbone Precision Recall F-measure FPS Config Download
FAST-T-512 TextNet-T 85.5 77.9 81.5 129.1 config ckpt | log
FAST-S-512 TextNet-S 85.6 78.7 82.0 112.9 config ckpt | log
FAST-B-512 TextNet-B 85.7 80.2 82.9 92.6 config ckpt | log
FAST-B-640 TextNet-B 87.8 80.9 84.2 66.5 config ckpt | log

Results on ICDAR 2015

Method Backbone Precision Recall F-measure FPS Config Download
FAST-T-736 TextNet-T 86.0 77.9 81.7 60.9 config ckpt | log
FAST-S-736 TextNet-S 86.3 79.8 82.9 53.9 config ckpt | log
FAST-B-736 TextNet-B 88.0 81.7 84.7 42.7 config ckpt | log
FAST-B-896 TextNet-B 89.2 83.6 86.3 31.8 config ckpt | log
FAST-B-1280 TextNet-B 89.7 84.6 87.1 15.7 config ckpt | log

Results on MSRA-TD500

Method Backbone Precision Recall F-measure FPS Config Download
FAST-T-512 TextNet-T 91.1 78.8 84.5 137.2 config ckpt | log
FAST-T-736 TextNet-T 88.1 81.9 84.9 79.6 config ckpt | log
FAST-S-736 TextNet-S 91.6 81.7 86.4 72.0 config ckpt | log
FAST-B-736 TextNet-B 92.1 83.0 87.3 56.8 config ckpt | log

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@misc{chen2021fast,
  title={FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation}, 
  author={Zhe Chen and Jiahao Wang and Wenhai Wang and Guo Chen and Enze Xie and Ping Luo and Tong Lu},
  year={2021},
  eprint={2111.02394},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

License

This project is released under the Apache 2.0 license.

fast's People

Contributors

czczup avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.