Giter Site home page Giter Site logo

ssd-text_detection's Introduction

SSD-text detection: Text Detector

This is a modified SSD model for text detection.

Compared to faster R-CNN, SSD is much faster. In my expriment, SSD only needs about 0.05s for each image.

Disclaimer

This is a re-implementation of mxnet SSD. The official repository is available here. The arXiv paper is available here.

Getting started

  • Build MXNet: Make sure the extra operators for this example is enabled, and please following the the official instructions here.

Train the model

I modify the original SSD on SynthText and ICDAR. Other datasets should be easily supported by adding subclass derived from class Imdb in dataset/imdb.py. See example of dataset/pascal_voc.py for details.

  • Download the converted pretrained vgg16_reduced model here, unzip .param and .json files into model/ directory by default.

To gain a good performance, we should train our model on SynthText which is a quite big dataset (about 40G) firstly, and then fine tune this model on ICDAR. If you want to apply this model for other applications, you can fine tune it on any dataset.

  • Download the SynthText dataset here, and extract it into data.

Because SSD requires every image's size but SythText is too big, it will take too much time if we have to use opencv to read the images' size each time when we star training. So I use 'read_size.py' (data/synthtext_img_size) to creat a h5py file 'size.h5' to store the sizes of all images. You can copy this file to the extracted folder 'SynthText'.

  • Start training:
python train_synthtext.py

Fine tune the model

  • Download the ICDAR challenge 2 dataset here, and extract it into data.

  • Start training:

python train_icdar.py --finetune N

Please replace 'N' into an integer number which depends on the save model you train on SynthText.

Try the demo

  • After training, you can try your model on test images. I give two demos here (demo.py and demo_savefig.py). demo.py can visualize the detection result, while demo_savefig.py can save the detection result as images.

When running demo_savefig.py, please give the test images path.

  • Run demo.py
# play with examples:
python demo.py --epoch 0 --images ./data/demo/test.jpg --thresh 0.5
  • Check python demo.py --help for more options.

When running demo_savefig.py, please give the test images folder path.

  • Run demo_savefig.py
# play with examples:
python demo_savefig.py --epoch 0 --images ./data/demo/test --thresh 0.5

ssd-text_detection's People

Contributors

oyxhust avatar

Watchers

James Cloos avatar Tao Luo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.