Giter Site home page Giter Site logo

masao-taketani / fots_ocr Goto Github PK

View Code? Open in Web Editor NEW
56.0 3.0 16.0 11.67 MB

TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

License: GNU General Public License v3.0

Python 21.65% Makefile 0.04% C++ 77.92% Shell 0.03% Jupyter Notebook 0.36%
ocr tensorflow scene-text-recognition deep-learning computer-vision image-recognition

fots_ocr's Introduction

FOTS: Fast Oriented Text Spotting with a Unified Network

I am still working on this repo. updates and detailed instructions are coming soon!

Table of Contens

TensorFlow Versions

As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.

Other Requirements

GCC >= 6

Trained Models

Datasets

Train

Pre-train with SynthText

  1. Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.
cd ckpt/resnet_v1_50
wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -zxvf resnet_v1_50_2016_08_28.tar.gz
rm resnet_v1_50_2016_08_28.tar.gz
  1. Download Synth800k dataset and place it at data/SynthText/ dir to pre-train the whole net.

  2. Transform(Pre-process) the SynthText data into the ICDAR data format.

python data_provider/SynthText2ICDAR.py
  1. Train with SynthText for 10 epochs(with 1 GPU).
python train.py \
  --max_steps=715625 \
  --gpu_list='0' \
  --checkpoint_path=ckpt/synthText_10eps/ \
  --pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
  --training_img_data_dir=data/SynthText/ \
  --training_gt_data_dir=data/SynthText/ \
  --icdar=False \
  1. Visualize pre-pretraining progress with TensorBoard.
tensorboard --logdir=ckpt/synthText_10eps/

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/)

  • Combine ICDAR data before training.

    1. Place ICDAR data under tmp/ foler.
    2. Run the following script to combine the data.
    python combine_ICDAR_data.py --year [year of ICDAR to train(13 or 15 or 17)]
    
  • ICDAR 2017 MLT/pre-finetune for ICDAR 2013 or ICDAR 2015 (text detection task only)

    • Train the pre-trained model with 9,000 images from ICDAR 2017 MLT training and validation datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR17MLT/ \
      --pretrained_model_path=ckpt/synthText_10eps/ \
      --train_stage=0 \
      --training_img_data_dir=data/ICDAR17MLT/imgs/ \
      --training_gt_data_dir=data/ICDAR17MLT/gts/
    
  • ICDAR 2015

    • Train the model with 1,000 images from ICDAR 2015 training dataset and 229 images from ICDAR 2013 training datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR15/ \
      --pretrained_model_path=ckpt/ICDAR17MLT/ \
      --training_img_data_dir=data/ICDAR15+13/imgs/ \
      --training_gt_data_dir=data/ICDAR15+13/gts/
    
  • ICDAR 2013(horizontal text only)

    • Train the model with 229 images from ICDAR 2013 training datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR13/ \
      --pretrained_model_path=ckpt/ICDAR17MLT/ \
      --training_img_data_dir=data/ICDAR13/imgs/ \
      --training_gt_data_dir=data/ICDAR13/gts/
    

Test

Place some images in test_imgs/ dir and specify a trained checkpoint path to see the test result.

python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]

References

fots_ocr's People

Contributors

masao-taketani avatar yu20103983 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fots_ocr's Issues

How to choose lambda correctly?

Hi, why did you choose to set lambda to 1 when calculating the total loss?
I know that in the FOTS paper they set lambda to 1, but in other FOTS repos the value is often 0,01.
Do you know in which interval the values of the recognition loss and the detection loss are?

Recognition Error

  1. We are applying an additional Korean language when studying on config.py in order to do Korean ocr. Does fosts support multi-language features?

  2. Even though we completed the recognition test through the github readme, there seems to be a problem in detection, but we don’t know that is the cause of the error.

how to support it for ICDAR2017

Hi, I like your project and I trained it for ICDAR2017, but I met a problem.

reading file error: data\ICDAR17MLT\gts\gt_ICDAR17MLT_img_199.txt
ثوار
substring not found

ICDAR2017 DATA contains multi languages, such as Arabic, Korean. I know maybe I should modify the config.py, but I‘m not clear about how to do it. Could you please give me any suggestion?
Thanks a lot.

Restore error

When I try to restore training using the previous checkpoint, I get the following error:

continue training from previous checkpoint
ERROR:tensorflow:Couldn't match files for checkpoint [my checkpoint path]/model.ckpt-20041
E1126 14:28:09.314913 139785224988480 checkpoint_management.py:346] Couldn't match files for checkpoint [my checkpoint path]/model.ckpt-20041
...
File "train.py", line 216, in main
saver.restore(sess, ckpt)
File "[python path]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1277, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.

How can I fix it? If I check the checkpoint after training is complete, only checkpoint, events.out.tfevents.1606212446, model.ckpt-20041.data-00000-of-00001 , model.ckpt-20041.index and model.ckpt-20041.meta are created. Even if an absolute path is used for the first training, multiple checkpoints are created and are not restored.

Or am I misusing the following?
python train.py --gpu_list='0' --checkpoint_path=[check point path (absolute path)]/ --training_img_data_dir=[images path]/ --training_gt_data_dir=[gts path]/
(I set FLAG.Restore to True in train.py)

CTC Loss label length is zero with ICDAR 2013

My error is

2020-11-15 22:28:43.704077: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at ctc_loss_op.cc:168 : Invalid argument: Labels length is zero in batch 25
Traceback (most recent call last):
File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Labels length is zero in batch 25
[[{{node CTCLoss}}]]
(1) Invalid argument: Labels length is zero in batch 25
[[{{node CTCLoss}}]]
[[gradients/Mean_3_grad/Shape/_595]]
0 successful operations.
0 derived errors ignored.

Error("CTC Loss label length is zero with ICDAR 2013") occurs when using ICDAR2013.

ICDAR 2013 text label content for recognition training is 72,271,382,312,"Greenstead" (four digits).

but ICDAR 2015 and 2017 label content is eight digits.

Why is it different from ICDAR2013 and ICDAR015/2017?

tf version

using your pretrained model, restore the weights, errorKey Conv/biases/ExponentialMovingAverage not found in checkpoint [[node save/RestoreV2
which version tensorflow are you using?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.