Giter Site home page Giter Site logo

master-tf's Introduction

MASTER-TensorFlow

TensorFlow reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This project is different from our original implementation that builds on the privacy codebase FastOCR of the company. You can also find PyTorch reimplementation at MASTER-pytorch repository, and the performance is almost identical. (PS. Logo inspired by the Master Oogway in Kung Fu Panda)

News

  • 2021/07: MASTER-mmocr, reimplementation of MASTER by mmocr. @Jiaquan Ye
  • 2021/07: TableMASTER-mmocr, 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing Task B based on MASTER. @Jiaquan Ye
  • 2021/07: Talk can be found at here (Chinese).
  • 2021/05: Savior, which aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for RPA, is now integrated MASTER for captcha recognition. @Tao Luo
  • 2021/04: Slides can be found at here.

Honors based on MASTER

Introduction

MASTER is a self-attention based scene text recognizer that (1) not only encodes the input-output attention, but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and (3) owns a better training and evaluation efficiency. Overall architecture shown follows.

This repo contains the following features.

  • Multi-gpu Training
  • Greedy Decoding
  • Single image inference
  • Eval iiit5k
  • Convert Checkpoint to SavedModel format
  • Refactory codes to be more tensorflow-style and be more consistent to graph mode
  • Support tensorflow serving mode

Preparation

It is highly recommended that install tensorflow-gpu using conda.

Python3.7 is preferred.

pip install -r requirements.txt

Dataset

I use Clovaai's MJ training split for training.

please check src/dataset/benchmark_data_generator.py for details.

Eval datasets are some real scene text datasets. You can downloaded directly from here.

Training

# training from scratch
python train.py -c [your_config].yaml

# resume training from last checkpoint
python train.py -c [your_config].yaml -r

# finetune with some checkpoint
python train.py -c [your_config].yaml -f [checkpoint]

Eval

Since I made change to the usage of gcb block, the weight could not be suitable to HEAD. If you want to test the model, please use https://github.com/jiangxiluning/MASTER-TF/commit/85f9217af8697e41aefe5121e580efa0d6d04d92

Currently, you can download checkpoint from here with code o6g9, or from Google Driver, this checkpoint was trained with MJ and selected for the best performance of iiit5k dataset. Below is the comparision between pytorch version and tensorflow version.

Framework Dataset Word Accuracy Training Details
Pytorch MJ 85.05% 3 V100 4 epochs Batch Size: 3*128
Tensorflow MJ 85.53% 2 2080ti 4 epochs Batch Size: 2 * 50

Please download the checkpoint and model config from here with code o6g9 and unzip it, and you can get this metric by running:

python eval_iiit5k.py --ckpt [checkpoint file] --cfg [model config] -o [output dir] -i [iiit5k lmdb test dataset]

The checkpoint file argument should be ${where you unzip}/backup/512_8_3_3_2048_2048_0.2_0_Adam_mj_my/checkpoints/OCRTransformer-Best

Tensorflow Serving

For tensorflow serving, you should use savedModel format, I provided test case to show you how to convert a checkpoint to savedModel and how to use it.

pytest -s tests/test_units::test_savedModel  #check the test case test_savedModel in tests/test_units
pytest -s tests/test_units::test_loadModel  # call decode to inference and get predicted transcript and logits out.

Citations

If you find MASTER useful please cite our paper:

@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

Thanks to the authors and their repo:

master-tf's People

Contributors

jiangxiluning avatar wenwenyu avatar meicsu199345 avatar harmonicahappy avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.