masao-taketani / fots_ocr Goto Github PK

View Code? Open in Web Editor NEW

56.0 3.0 16.0 11.67 MB

TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

License: GNU General Public License v3.0

Python 21.65% Makefile 0.04% C++ 77.92% Shell 0.03% Jupyter Notebook 0.36%

ocr tensorflow scene-text-recognition deep-learning computer-vision image-recognition

fots_ocr's Introduction

FOTS: Fast Oriented Text Spotting with a Unified Network

I am still working on this repo. updates and detailed instructions are coming soon!

Table of Contens

TensorFlow Versions
Other Requirements
Trained Models
Datasets
Train
- Pre-train with SynthText
- Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013
Test
References

TensorFlow Versions

As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.

Other Requirements

GCC >= 6

Trained Models

tmp pre-trained model
trained model comming soon

Datasets

pre-training
Synth800k(The dataset is only available for non-commercial research and educational purposes)
finetuning
ICDAR 2015, 2017MLT, 2013

Train

Pre-train with SynthText

Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.

cd ckpt/resnet_v1_50
wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -zxvf resnet_v1_50_2016_08_28.tar.gz
rm resnet_v1_50_2016_08_28.tar.gz

Download Synth800k dataset and place it at data/SynthText/ dir to pre-train the whole net.
Transform(Pre-process) the SynthText data into the ICDAR data format.

python data_provider/SynthText2ICDAR.py

Train with SynthText for 10 epochs(with 1 GPU).

python train.py \
  --max_steps=715625 \
  --gpu_list='0' \
  --checkpoint_path=ckpt/synthText_10eps/ \
  --pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
  --training_img_data_dir=data/SynthText/ \
  --training_gt_data_dir=data/SynthText/ \
  --icdar=False \

Visualize pre-pretraining progress with TensorBoard.

tensorboard --logdir=ckpt/synthText_10eps/

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/)

Combine ICDAR data before training.
1. Place ICDAR data under tmp/ foler.
2. Run the following script to combine the data.
```
python combine_ICDAR_data.py --year [year of ICDAR to train(13 or 15 or 17)]
```

ICDAR 2017 MLT/pre-finetune for ICDAR 2013 or ICDAR 2015 (text detection task only)

Train the pre-trained model with 9,000 images from ICDAR 2017 MLT training and validation datasets(with 1 GPU).

python train.py \
  --gpu_list='0' \
  --checkpoint_path=ckpt/ICDAR17MLT/ \
  --pretrained_model_path=ckpt/synthText_10eps/ \
  --train_stage=0 \
  --training_img_data_dir=data/ICDAR17MLT/imgs/ \
  --training_gt_data_dir=data/ICDAR17MLT/gts/

ICDAR 2015

Train the model with 1,000 images from ICDAR 2015 training dataset and 229 images from ICDAR 2013 training datasets(with 1 GPU).

python train.py \
  --gpu_list='0' \
  --checkpoint_path=ckpt/ICDAR15/ \
  --pretrained_model_path=ckpt/ICDAR17MLT/ \
  --training_img_data_dir=data/ICDAR15+13/imgs/ \
  --training_gt_data_dir=data/ICDAR15+13/gts/

ICDAR 2013(horizontal text only)

Train the model with 229 images from ICDAR 2013 training datasets(with 1 GPU).

python train.py \
  --gpu_list='0' \
  --checkpoint_path=ckpt/ICDAR13/ \
  --pretrained_model_path=ckpt/ICDAR17MLT/ \
  --training_img_data_dir=data/ICDAR13/imgs/ \
  --training_gt_data_dir=data/ICDAR13/gts/

Test

Place some images in test_imgs/ dir and specify a trained checkpoint path to see the test result.

python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]

References

fots_ocr's People

Contributors

Stargazers

Watchers

Forkers

pgsrv ganwang benjamesbabala emidah yangxuehang teacrown xrosliang jeffc00 kagemeka chenying99 mmilani1 laofeiwei neverstoplearn aniketgurav kapitsa2811 max975

fots_ocr's Issues

How to choose lambda correctly?

Hi, why did you choose to set lambda to 1 when calculating the total loss?
I know that in the FOTS paper they set lambda to 1, but in other FOTS repos the value is often 0,01.
Do you know in which interval the values of the recognition loss and the detection loss are?

Is this model completed for both detection and recognition?

And how about the accuracy？

Recognition Error

We are applying an additional Korean language when studying on config.py in order to do Korean ocr. Does fosts support multi-language features?
Even though we completed the recognition test through the github readme, there seems to be a problem in detection, but we don’t know that is the cause of the error.

how to convert the model to saved_model

I want to deploy this model on tfserving, how can I convert the model to saved_model

how to support it for ICDAR2017

Hi, I like your project and I trained it for ICDAR2017, but I met a problem.

reading file error: data\ICDAR17MLT\gts\gt_ICDAR17MLT_img_199.txt
ثوار
substring not found

ICDAR2017 DATA contains multi languages, such as Arabic, Korean. I know maybe I should modify the config.py, but I‘m not clear about how to do it. Could you please give me any suggestion?
Thanks a lot.

Restore error

When I try to restore training using the previous checkpoint, I get the following error:

continue training from previous checkpoint
ERROR:tensorflow:Couldn't match files for checkpoint [my checkpoint path]/model.ckpt-20041
E1126 14:28:09.314913 139785224988480 checkpoint_management.py:346] Couldn't match files for checkpoint [my checkpoint path]/model.ckpt-20041
...
File "train.py", line 216, in main
saver.restore(sess, ckpt)
File "[python path]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1277, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.

How can I fix it? If I check the checkpoint after training is complete, only checkpoint, events.out.tfevents.1606212446, model.ckpt-20041.data-00000-of-00001 , model.ckpt-20041.index and model.ckpt-20041.meta are created. Even if an absolute path is used for the first training, multiple checkpoints are created and are not restored.

Or am I misusing the following?
python train.py --gpu_list='0' --checkpoint_path=[check point path (absolute path)]/ --training_img_data_dir=[images path]/ --training_gt_data_dir=[gts path]/
(I set FLAG.Restore to True in train.py)

CTC Loss label length is zero with ICDAR 2013

My error is

2020-11-15 22:28:43.704077: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at ctc_loss_op.cc:168 : Invalid argument: Labels length is zero in batch 25
Traceback (most recent call last):
File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Labels length is zero in batch 25
[[{{node CTCLoss}}]]
(1) Invalid argument: Labels length is zero in batch 25
[[{{node CTCLoss}}]]
[[gradients/Mean_3_grad/Shape/_595]]
0 successful operations.
0 derived errors ignored.

Error("CTC Loss label length is zero with ICDAR 2013") occurs when using ICDAR2013.

ICDAR 2013 text label content for recognition training is 72,271,382,312,"Greenstead" (four digits).

but ICDAR 2015 and 2017 label content is eight digits.

Why is it different from ICDAR2013 and ICDAR015/2017?

tf version

using your pretrained model, restore the weights, errorKey Conv/biases/ExponentialMovingAverage not found in checkpoint [[node save/RestoreV2
which version tensorflow are you using?