Giter Site home page Giter Site logo

fots_tf's Introduction

Fast Oriented Text Spotting with a Unified Networkt

Update

[08/17/2019] A new version is updated, please checkout the branch 'dev' (link).

Introduction

This is an implementation of FOTS: Fast Oriented Text Spotting with a Unified Network

Install

  • Python2
  • tensorflow
  • OpenCV

Model

Model pretrained on Synth800 for 6 epoch and finetuned on ICDAR15 BaiduYunLink keys:0aky or GithubLink thanks for harish2704. If you encounter problems, you can refer to #16.

Train

python2 multigpu_train.py --gpu_list=gpu_id --training_data_path=/path/to/trainset/

You should also change line 824 in icdar.py should be changed for the path of annotation file

Test

python2 eval.py --gpu_list=gpu_id --test_data_path=/path/to/testset/ --checkpoint_path=checkpoints/

Examples

image_1 image_2 image_3

Differences from paper

  • Without OHEM
  • Pretrained on Synth800k for 6 epochs not 10 epochs
  • Fine-tuned on ICDAR15 only without ICDAR2017 MLT
  • And it can only get F-score 56 on ICDAR2015 testset, more training tricks are needed

Reference

fots_tf's People

Contributors

pay20y avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fots_tf's Issues

about stn

wether the result of stn will be same to warpAffine of cv2?

training own data problems

running model trained from own datasets where there are 7321 classes (chinese character)
gets output as below:
1072 text boxes before nms
./demo_images/BizLicenseOCR2.jpg : net 410630ms, restore 1ms, nms 10ms
[timing] 410.6401104927063

very slow !!!

训练步数的问题

请问作者用ICDAR13-15数据集训练(229张训练图像)训练步数设置为多少步为好?训练的也比较慢,用GPU可能一晚上只能跑几万步,请问是什么原因呢

key not find in the checkpoint

hi,when i run train,I have already encountered this problem:

NOTFoundError:key recog/rnn/bidirectional_rnn/fw/lstm_cell/biases/exponentialMovingAverage not found in checkpoint

Can anyone guide me?or is the checkpoint useful now?

loading and testing trained model

Hi
I have trained the model on custom data,
and i get these files in the checkpoints directory

model.ckpt-28281.data-00000-of-00001
model.ckpt-28281.index
model.ckpt-28281.meta

but in eval.py when i manually change the path in line 284 to one of these files, i get a not found error, or a data loss error.

am i doing something wrong here? which of these models can i use for inference.

####EDIT

And even when the script runs, i get no predictions, i mean the text files in the output folder are empty and there are no predictions in the image saved.

Any ideas?

Thanks

Out Of Memory when training on images with many bounding boxes.

Hello,

First, thank you for your work on FOTS_TF, I can get pretty good results with it.
But when training with my own dataset, which has a lot of text on each image, OOM crashes occur.
I saw that, in main_test.py, there is a batch size limit of 32 bounding boxes when using the recognition network.
Is there any way to put a limit like the above on recognition during training?
I know that I can reduce the batch size, but this bug forces me to take a batch size of 2, where I want a batch size of 4+.

data = next(dg)#程序调试到这一步没有响应了

你好,调试模式下运行main_train.py,用的是ICDAR2015数据集,通过跟踪,加载图片都正常,

   dg = data_generator.get_batch(input_images_dir=FLAGS.training_data_dir, 
   input_gt_dir=FLAGS.training_gt_data_dir, num_workers=FLAGS.num_readers,
                                     input_size=FLAGS.input_size,
                                     batch_size=FLAGS.batch_size_per_gpu)
    start = time.time()
    for step in range(FLAGS.max_steps):
        data = next(dg)#程序调试到这一步没有响应了

Training SynthText

Hi
Even i am trying to train synth text, but the problem i am facing is that, synth text gives word level bounding box annotations as 4 points , where as icdar gives it as 8 points, how did you manage to solve this?

Thanks.

About Finetune Dataset?

Hello, your final model is finetuned just on 1000 training images of ICDAR 2015, or on ICDAR2015 plus 229 ICDAR 2013 training images as the paper said?

about box_widths in data_utils.py

I don't understand why did you divide the text_polyses by 4: x1, y1, x2, y2, x3, y3, x4, y4 = text_polyses[i] / 4, and what this line of code doing: width_box = math.ceil(8 * box_w / box_h), what's the meaning of 8 * box_w / box_h

Does the CTC loss affects the gradients of the Segmentation CNN during training?

Does the CTC loss affects the gradients of the Segmentation CNN during training? I could not find a clear answer to that question in the paper. If it is not the case, are the shared features somehow detached from the segmentation CNN during the roi rotate computation, before going into the RNN? (I am not an expert in Tensorflow)?

Quick question regarding training annotation

Hi, I was wondering what annotation tool do you recommend using for the training images. I've noticed that the training image examples provided are in .txt format, I was wondering what the values within the .txt files represent though and how I can recreate this for my own dataset. Thank you for your help.

how to use makefile in win10?

I am a novice, and I am learning the demo you provided. I don't know how to use the makefile file in the lanms folder, can you help me?(I use win10)

Model freezing

Have you tried to freeze trained model from checkpoints to .pb file ( converting .ckpt, .meta, .index inside checkpoints folder to frozen model .pb) in a end-to-end way?
Thank you beforehand

训练数据无法正确导入

我使用了icdar2019中文广告牌识别的训练集,处理之后跟train_samples的格式一致,但运行程序时报错inp_dict = {input_images: data[0],
TypeError: 'NoneType' object is not subscriptable
我不太能看懂generator部分的代码,我理解的是虽然读取到了txt和jpg文件,但是没能生成data,请问您知道大概是哪一部分的错误吗?

inference的文字识别部分结果总是不对

作者您好!
我在运行您的源码时发现了如下问题,在eval时可以对图片中的文字进行基本定位,但是识别结果很差,和之前的issue里有一位提出的问题很类似,结果总是A,E等字符。
我以为是识别分支的预训练模型参数不佳的问题,于是只拿了三张图片单独在预训练模型上继续训练,可以看到识别的loss在下降,而且我把识别结果打印出来也是正确的。
接着我拿重新训练的模型在训练模型上进行测试,发现识别结果还是没有得到改进。而且把cnn_feature单独拿出来看感觉数值也比较正常,十分迷惑所以向您寻求帮助。
望解答,谢谢!

reason for vocab.txt

Hi

I need some clarifications, Could you please tell me how the vocab.txt file is used?
does the bktree does some sort of similarity search on the words in the file?

Input to reshape is a tensor with 6 values, but the requested shape has 12

I am attempting to retrain the model but am receiving the error posted below.

I have tried using the ICDAR2015 training data to ensure the appropriate formatting.
I'm using tensorflow-gpu 1.13.1 and cuda 10.

Regarding the shape I feed in, the error above shows 12 which reflects that annotation file having two lines. So for an annotation file with 4 lines, there it will have shape 24.

Anyone else had this issue and any ideas on fixing?

Thanks!

Traceback (most recent call last):
  File "multigpu_train.py", line 203, in <module>
    tf.app.run()
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "multigpu_train.py", line 169, in main
    dl, rl, tl,  _ = sess.run([d_loss, r_loss, total_loss, train_op], feed_dict=inp_dict)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 6 values, but the requested shape has 12
	 [[node RoIrotate/Reshape (defined at /home/lookdeep/gits/FOTS_TF/module/stn/transformer.py:49) ]]
	 [[node transpose (defined at /home/lookdeep/gits/FOTS_TF/module/Recognition_branch.py:94) ]]

Caused by op u'RoIrotate/Reshape', defined at:
  File "multigpu_train.py", line 203, in <module>
    tf.app.run()
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "multigpu_train.py", line 81, in main
    f_score, f_geometry, recognition_logits = build_graph(input_images, input_transform_matrix, input_box_masks, input_box_widths, input_seq_len)
  File "multigpu_train.py", line 36, in build_graph
    pad_rois = roi_rotate_part.roi_rotate_tensor_pad(shared_feature, input_transform_matrix, input_box_masks, input_box_widths)
  File "/home/lookdeep/gits/FOTS_TF/module/RoI_rotate.py", line 99, in roi_rotate_tensor_pad
    trans_feature_map = transformer(tile_feature_maps, transform_matrixs)
  File "/home/lookdeep/gits/FOTS_TF/module/stn/transformer.py", line 49, in spatial_transformer_network
    theta = tf.reshape(theta, [B, 2, 3])
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 7179, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 6 values, but the requested shape has 12
	 [[node RoIrotate/Reshape (defined at /home/lookdeep/gits/FOTS_TF/module/stn/transformer.py:49) ]]
	 [[node transpose (defined at /home/lookdeep/gits/FOTS_TF/module/Recognition_branch.py:94) ]]

input_box_nums: data[6].shape[0]

Hi again

I am trying to run this on icdar 2015 data, i have downloaded the icdar 2015 data, and i have changed the data paths in icdar.py, config,py etc,

when i run multi_gpu_train.py i get the above mentioned error.

  File "multigpu_train.py", line 172, in <module>
    tf.app.run()
   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
     _sys.exit(main(argv))
  File "multigpu_train.py", line 138, in main
     input_box_nums: data[6].shape[0],
AttributeError: 'list' object has no attribute 'shape'

This is what my data[6] looks like

     [array([0, 0, 0, 0]), array([1]), array([2]), array([3, 3, 3]), array([4, 4]), array([5]), array([6]), array([7, 7]),  array([8]), array([9]), array([10, 10, 10, 10, 10, 10]), array([11, 11, 11]), array([12, 12]), array([13, 13]), array([14, 14]), array([15])])

Any suggestions would be really helpful.Thanks in advance.

ImportError: No module named stn

Thanks for sharing your work, but i think your recent commits are breaking the code.

When i run multi_gpu_train.py i get the error above.

TypeError: 'NoneType' object is not subscriptable

@Pay20Y
Note that I installed tensorflow-gpu 1.13.2
A sample of the files I used, it's from icdar2017: training.zip
I gave changed the path to these 2 lines in the data_generator.py:

tf.app.flags.DEFINE_string('training_data_dir', default='./training/img/', help='training images dir')
tf.app.flags.DEFINE_string('training_gt_data_dir', default='./training/label/', help='training gt dir')

When I run:

python3 main_train.py --gpu_list='0' --learning_rate=0.0001 --train_stage=0 --training_data_dir_ic17='./training/img/' training_gt_dir_ic17='./training/label'

full log.txt
I get error:

ice: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:2e:00.0, compute capability: 6.1)
Generator use 10 batches for buffering, this may take a while, you can tune this yourself.
Process Process-1:
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/fotsd/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/home/anaconda3/envs/fotsd/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/home/p2/FOTS_TF-dev/data_provider/data_enqueuer.py", line 53, in data_generator_task
    generator_output = next(self._generator)
  File "/home/home/p2/FOTS_TF-dev/data_provider/data_generator.py", line 35, in generator
    if imfn.split(".")[0][-1] == '2':
IndexError: string index out of range
Traceback (most recent call last):
  File "main_train.py", line 198, in <module>
    tf.app.run()
  File "/home/home/anaconda3/envs/fotsd/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main_train.py", line 152, in main
    inp_dict = {input_images: data[0],
TypeError: 'NoneType' object is not subscriptable

No multiple GPU training?

When running multigpu_train.py, only 1 GPU is used despite multiple GPUs being detected.

Is multi GPU actually implemented?

recognition will be very slow when boxes are many

when i use a pic with 284 text, the roi and recognition cost lost of time, and if the boxes more than 300, because the reconigion without batchsize, the code will OOM.

i use 1080ti gpu, the time information is as follow( i print more detail):

6947 text boxes before nms
284 text boxes after nms
pic/2.jpg : get_det_socremap 37ms, nms 1091ms, roi_net 560ms, rec_net 10622ms, all_net 12.34272s
[timing] 12.40138s

why so slow?

Find 1 images
6055 text boxes before nms
test/screenshot.png : detect 3372ms, restore 5ms, nms 147ms, recog 4796ms
[timing] 8.196640491485596

screenshot

使用中文数据集训练的问题

我在使用ICDAR2019广告牌数据训练的时候detect的loss收敛到0.015,recognize的loss收敛到20z左右就不再下降了,也尝试过调整学习率,没有什么效果,我自己在config的char里加了很多简繁体汉字,loss很高是因为中文数据集的原因吗?recognize的loss是不是太高了?我自己写的对gt数据处理的输出结果是这样的:
image
recloss

Problems loading labels when training SynthText800

Hello, may I ask, do you need to convert the gt.mat file when training SynthText800, see your code load_annotation or read the txt file, how did you extract the text_polys, text_labels from the gt.mat file and save it txt thank you

What special measures should be taken during training?

训练同样的步数,但我把mutilgpu_train.py中的batch_size_per_gpu改成比较小的值(如1,2),训练的结果对数字的识别没有你预训练模型来的好,数字的定位不准确,也容易把数字识别成英文。请问,要想提高对数字的识别,需要做什么特别处理嘛?又或者改哪些参数可以提高对数字的识别?

识别结果丢字情况

您好,

   FOTS的在连续相同的多个字符识别中会出现丢掉重复字符的情况。例如“ll”识别后是“l”,“0000”识别后是“0”;相同大小的文本图像,如果不是这种连续相同的字符就可以识别正确,“1234”这种就可以正确识别。在ctc解码前的结果就不对了,连续相同字符的时候,除了一个字符,其他的地方都是blank概率远远高于字符。请问您知道这是为什么吗?

谢谢

progress?

Hi

Any luck on icdar, did your score improve?

Do you need any help?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.