pay20y / fots_tf Goto Github PK

View Code? Open in Web Editor NEW

181.0 8.0 59.0 3.36 MB

This an implementation of FOTS with tensorflow

License: GNU General Public License v3.0

Python 17.90% Makefile 0.05% C++ 82.05%

text-detection-recognition fots tensorflow

fots_tf's Introduction

Fast Oriented Text Spotting with a Unified Networkt

Update

[08/17/2019] A new version is updated, please checkout the branch 'dev' (link).

Introduction

This is an implementation of FOTS: Fast Oriented Text Spotting with a Unified Network

Install

Python2
tensorflow
OpenCV

Model

Model pretrained on Synth800 for 6 epoch and finetuned on ICDAR15 BaiduYunLink keys:0aky or GithubLink thanks for harish2704. If you encounter problems, you can refer to #16.

Train

python2 multigpu_train.py --gpu_list=gpu_id --training_data_path=/path/to/trainset/

You should also change line 824 in icdar.py should be changed for the path of annotation file

Test

python2 eval.py --gpu_list=gpu_id --test_data_path=/path/to/testset/ --checkpoint_path=checkpoints/

Examples

Differences from paper

Without OHEM
Pretrained on Synth800k for 6 epochs not 10 epochs
Fine-tuned on ICDAR15 only without ICDAR2017 MLT
And it can only get F-score 56 on ICDAR2015 testset, more training tricks are needed

Reference

EAST
FOTS.Pytorch Thanks for the authors!

fots_tf's People

Contributors

Stargazers

Watchers

Forkers

ustczhouyu coorful mblackcgi liuwenhaha elavin11 eldon fendaq happog sailinglqh trendingtechnology alaske kmfeng 10183308 ganwang kamalm8 xxlxx1 shiyuan0806 lss616263 stivensss liushuchun chenconggit billyzju sunxingxingtf hell-to-heaven kahojacky mervo yagerpeng fusichao asderfreedom cjsure sslience airyym yangtong1989 duducode liuzheng081 scollins83 smartparrot pythonpupil yuzhouzhili qingqingsun wangtong1122 zhanzhanmiao dy1998 chakrabortyrajatsubhra noamholz cv-ip jackzhangcn freeworkearth yuanjixiang burak-yildizoz edend10 xanderdavies xrosliang jeffc00 ph76290 nightingalesystem max975

fots_tf's Issues

about stn

wether the result of stn will be same to warpAffine of cv2?

running model trained from own datasets where there are 7321 classes (chinese character)
gets output as below:
1072 text boxes before nms
./demo_images/BizLicenseOCR2.jpg : net 410630ms, restore 1ms, nms 10ms
[timing] 410.6401104927063

very slow !!!

Pretrained model

Will you be uploading the pretrained model?

Thanks

训练步数的问题

请问作者用ICDAR13-15数据集训练（229张训练图像）训练步数设置为多少步为好？训练的也比较慢，用GPU可能一晚上只能跑几万步，请问是什么原因呢

key not find in the checkpoint

hi,when i run train,I have already encountered this problem：

NOTFoundError:key recog/rnn/bidirectional_rnn/fw/lstm_cell/biases/exponentialMovingAverage not found in checkpoint

Can anyone guide me?or is the checkpoint useful now?

loading and testing trained model

Hi
I have trained the model on custom data,
and i get these files in the checkpoints directory

model.ckpt-28281.data-00000-of-00001
model.ckpt-28281.index
model.ckpt-28281.meta

but in eval.py when i manually change the path in line 284 to one of these files, i get a not found error, or a data loss error.

am i doing something wrong here? which of these models can i use for inference.

####EDIT

And even when the script runs, i get no predictions, i mean the text files in the output folder are empty and there are no predictions in the image saved.

Any ideas?

Thanks

where is the /usr/share/dict/american-english

Very random results??

I tested the model as it is and got very poor results. Any idea why??

Out Of Memory when training on images with many bounding boxes.

Hello,

First, thank you for your work on FOTS_TF, I can get pretty good results with it.
But when training with my own dataset, which has a lot of text on each image, OOM crashes occur.
I saw that, in main_test.py, there is a batch size limit of 32 bounding boxes when using the recognition network.
Is there any way to put a limit like the above on recognition during training?
I know that I can reduce the batch size, but this bug forces me to take a batch size of 2, where I want a batch size of 4+.

data = next(dg)#程序调试到这一步没有响应了

你好，调试模式下运行main_train.py,用的是ICDAR2015数据集，通过跟踪，加载图片都正常，

   dg = data_generator.get_batch(input_images_dir=FLAGS.training_data_dir, 
   input_gt_dir=FLAGS.training_gt_data_dir, num_workers=FLAGS.num_readers,
                                     input_size=FLAGS.input_size,
                                     batch_size=FLAGS.batch_size_per_gpu)
    start = time.time()
    for step in range(FLAGS.max_steps):
        data = next(dg)#程序调试到这一步没有响应了

Unable to download pretrained model

Hi,

I live in India and am trying to download the checkpoints from BaiduYunLink (https://pan.baidu.com/s/1BgkVHFaT91AptdwcTeJ2gg). However, I am unable to create an account and therefore cannot download. Can the model be provided to me at some different location?

Thanks.

Training SynthText

Hi
Even i am trying to train synth text, but the problem i am facing is that, synth text gives word level bounding box annotations as 4 points , where as icdar gives it as 8 points, how did you manage to solve this?

Thanks.

About Finetune Dataset?

Hello, your final model is finetuned just on 1000 training images of ICDAR 2015, or on ICDAR2015 plus 229 ICDAR 2013 training images as the paper said?

about box_widths in data_utils.py

I don't understand why did you divide the text_polyses by 4: x1, y1, x2, y2, x3, y3, x4, y4 = text_polyses[i] / 4, and what this line of code doing: width_box = math.ceil(8 * box_w / box_h), what's the meaning of 8 * box_w / box_h

Does the CTC loss affects the gradients of the Segmentation CNN during training?

Does the CTC loss affects the gradients of the Segmentation CNN during training? I could not find a clear answer to that question in the paper. If it is not the case, are the shared features somehow detached from the segmentation CNN during the roi rotate computation, before going into the RNN? (I am not an expert in Tensorflow)?

Quick question regarding training annotation

Hi, I was wondering what annotation tool do you recommend using for the training images. I've noticed that the training image examples provided are in .txt format, I was wondering what the values within the .txt files represent though and how I can recreate this for my own dataset. Thank you for your help.

How long does the training model take, the CPU runs?

The 1000 images have been running for 2 hours.

vocab.txt file

Could you please share the vocab.txt file?

Thanks

Niu B!

Tai Qiang Le Zhi Ge!

how to use makefile in win10?

I am a novice, and I am learning the demo you provided. I don't know how to use the makefile file in the lanms folder, can you help me?(I use win10)

Model freezing

Have you tried to freeze trained model from checkpoints to .pb file ( converting .ckpt, .meta, .index inside checkpoints folder to frozen model .pb) in a end-to-end way?
Thank you beforehand

训练数据无法正确导入

我使用了icdar2019中文广告牌识别的训练集，处理之后跟train_samples的格式一致，但运行程序时报错inp_dict = {input_images: data[0],
TypeError: 'NoneType' object is not subscriptable
我不太能看懂generator部分的代码，我理解的是虽然读取到了txt和jpg文件，但是没能生成data，请问您知道大概是哪一部分的错误吗？

inference的文字识别部分结果总是不对

作者您好！
我在运行您的源码时发现了如下问题，在eval时可以对图片中的文字进行基本定位，但是识别结果很差，和之前的issue里有一位提出的问题很类似，结果总是A,E等字符。
我以为是识别分支的预训练模型参数不佳的问题，于是只拿了三张图片单独在预训练模型上继续训练，可以看到识别的loss在下降，而且我把识别结果打印出来也是正确的。
接着我拿重新训练的模型在训练模型上进行测试，发现识别结果还是没有得到改进。而且把cnn_feature单独拿出来看感觉数值也比较正常，十分迷惑所以向您寻求帮助。
望解答，谢谢！

reason for vocab.txt

I need some clarifications, Could you please tell me how the vocab.txt file is used?
does the bktree does some sort of similarity search on the words in the file?

the results so far?

Could you make your evaluate results public

Why are the text_polyses divided by 4?

Hello,
The 679 line in the data_utils.py:
x1, y1, x2, y2, x3, y3, x4, y4 = text_polyses[i] / 4;
why are the coordinates diveded by 4?
Thank you!

InvalidArgumentError (see above for traceback): assertion failed: [width must be >= target + offset.]

This error caused by
RoI_rotate.py line 112 roi = tf.image.crop_to_bounding_box(_affine_feature_map, 0, 0, 8, width_box)

_affine_feature_map's width is not bigger than width_box.

How can I fix it?

get InvalidArgumentError (see above for traceback): Not enough time for target transition sequence (required: 17, available: 11)14You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs in tf.nn.ctc_loss

get InvalidArgumentError (see above for traceback): Not enough time for target transition sequence (required: 17, available: 11)14You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs in tf.nn.ctc_loss

Input to reshape is a tensor with 6 values, but the requested shape has 12

I am attempting to retrain the model but am receiving the error posted below.

I have tried using the ICDAR2015 training data to ensure the appropriate formatting.
I'm using tensorflow-gpu 1.13.1 and cuda 10.

Regarding the shape I feed in, the error above shows 12 which reflects that annotation file having two lines. So for an annotation file with 4 lines, there it will have shape 24.

Anyone else had this issue and any ideas on fixing?

Thanks!

Traceback (most recent call last):
  File "multigpu_train.py", line 203, in <module>
    tf.app.run()
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "multigpu_train.py", line 169, in main
    dl, rl, tl,  _ = sess.run([d_loss, r_loss, total_loss, train_op], feed_dict=inp_dict)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 6 values, but the requested shape has 12
	 [[node RoIrotate/Reshape (defined at /home/lookdeep/gits/FOTS_TF/module/stn/transformer.py:49) ]]
	 [[node transpose (defined at /home/lookdeep/gits/FOTS_TF/module/Recognition_branch.py:94) ]]

Caused by op u'RoIrotate/Reshape', defined at:
  File "multigpu_train.py", line 203, in <module>
    tf.app.run()
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "multigpu_train.py", line 81, in main
    f_score, f_geometry, recognition_logits = build_graph(input_images, input_transform_matrix, input_box_masks, input_box_widths, input_seq_len)
  File "multigpu_train.py", line 36, in build_graph
    pad_rois = roi_rotate_part.roi_rotate_tensor_pad(shared_feature, input_transform_matrix, input_box_masks, input_box_widths)
  File "/home/lookdeep/gits/FOTS_TF/module/RoI_rotate.py", line 99, in roi_rotate_tensor_pad
    trans_feature_map = transformer(tile_feature_maps, transform_matrixs)
  File "/home/lookdeep/gits/FOTS_TF/module/stn/transformer.py", line 49, in spatial_transformer_network
    theta = tf.reshape(theta, [B, 2, 3])
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 7179, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/lookdeep/.venv/FOTS_TF_PY2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 6 values, but the requested shape has 12
	 [[node RoIrotate/Reshape (defined at /home/lookdeep/gits/FOTS_TF/module/stn/transformer.py:49) ]]
	 [[node transpose (defined at /home/lookdeep/gits/FOTS_TF/module/Recognition_branch.py:94) ]]

input_box_nums: data[6].shape[0]

Hi again

I am trying to run this on icdar 2015 data, i have downloaded the icdar 2015 data, and i have changed the data paths in icdar.py, config,py etc,

when i run multi_gpu_train.py i get the above mentioned error.

  File "multigpu_train.py", line 172, in <module>
    tf.app.run()
   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
     _sys.exit(main(argv))
  File "multigpu_train.py", line 138, in main
     input_box_nums: data[6].shape[0],
AttributeError: 'list' object has no attribute 'shape'

This is what my data[6] looks like

     [array([0, 0, 0, 0]), array([1]), array([2]), array([3, 3, 3]), array([4, 4]), array([5]), array([6]), array([7, 7]),  array([8]), array([9]), array([10, 10, 10, 10, 10, 10]), array([11, 11, 11]), array([12, 12]), array([13, 13]), array([14, 14]), array([15])])

Any suggestions would be really helpful.Thanks in advance.

Hello, I could not Download the pre-trained model, could you please send me a link to me My email: [email protected]

ImportError: No module named stn

Thanks for sharing your work, but i think your recent commits are breaking the code.

When i run multi_gpu_train.py i get the error above.

TypeError: 'NoneType' object is not subscriptable

@Pay20Y
Note that I installed tensorflow-gpu 1.13.2
A sample of the files I used, it's from icdar2017: training.zip
I gave changed the path to these 2 lines in the data_generator.py:

tf.app.flags.DEFINE_string('training_data_dir', default='./training/img/', help='training images dir')
tf.app.flags.DEFINE_string('training_gt_data_dir', default='./training/label/', help='training gt dir')

When I run:

python3 main_train.py --gpu_list='0' --learning_rate=0.0001 --train_stage=0 --training_data_dir_ic17='./training/img/' training_gt_dir_ic17='./training/label'

full log.txt
I get error:

ice: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:2e:00.0, compute capability: 6.1)
Generator use 10 batches for buffering, this may take a while, you can tune this yourself.
Process Process-1:
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/fotsd/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/home/anaconda3/envs/fotsd/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/home/p2/FOTS_TF-dev/data_provider/data_enqueuer.py", line 53, in data_generator_task
    generator_output = next(self._generator)
  File "/home/home/p2/FOTS_TF-dev/data_provider/data_generator.py", line 35, in generator
    if imfn.split(".")[0][-1] == '2':
IndexError: string index out of range
Traceback (most recent call last):
  File "main_train.py", line 198, in <module>
    tf.app.run()
  File "/home/home/anaconda3/envs/fotsd/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main_train.py", line 152, in main
    inp_dict = {input_images: data[0],
TypeError: 'NoneType' object is not subscriptable

is frame per second for recognition similar to mention in paper?

I am trying to recognize korean character and my character vector is big how can i reduce the recognition time. Also did you prepare network as fine tune because recognition result is so far.

No multiple GPU training?

When running multigpu_train.py, only 1 GPU is used despite multiple GPUs being detected.

Is multi GPU actually implemented?

How to get your pretrained like result?

We ran your codes on ICDAR-13 dataset but not getting results anywhere near to you! Can you please suggest something to get more accurate result??

recognition will be very slow when boxes are many

when i use a pic with 284 text, the roi and recognition cost lost of time, and if the boxes more than 300, because the reconigion without batchsize, the code will OOM.

i use 1080ti gpu, the time information is as follow( i print more detail):

6947 text boxes before nms
284 text boxes after nms
pic/2.jpg : get_det_socremap 37ms, nms 1091ms, roi_net 560ms, rec_net 10622ms, all_net 12.34272s
[timing] 12.40138s

why so slow?

Find 1 images
6055 text boxes before nms
test/screenshot.png : detect 3372ms, restore 5ms, nms 147ms, recog 4796ms
[timing] 8.196640491485596

some problems in sort_rectangle() function

I always get angle <= 0 in icdar.sort_rectangle() function when trainning models , does it matter?

使用中文数据集训练的问题

我在使用ICDAR2019广告牌数据训练的时候detect的loss收敛到0.015，recognize的loss收敛到20z左右就不再下降了，也尝试过调整学习率，没有什么效果，我自己在config的char里加了很多简繁体汉字，loss很高是因为中文数据集的原因吗？recognize的loss是不是太高了？我自己写的对gt数据处理的输出结果是这样的：

Problems loading labels when training SynthText800

Hello, may I ask, do you need to convert the gt.mat file when training SynthText800, see your code load_annotation or read the txt file, how did you extract the text_polys, text_labels from the gt.mat file and save it txt thank you

Maybe you should train and finetune by few steps

@Pay20Y

What special measures should be taken during training?

训练同样的步数，但我把mutilgpu_train.py中的batch_size_per_gpu改成比较小的值（如1,2），训练的结果对数字的识别没有你预训练模型来的好，数字的定位不准确，也容易把数字识别成英文。请问，要想提高对数字的识别，需要做什么特别处理嘛？又或者改哪些参数可以提高对数字的识别？

识别结果丢字情况

您好，

   FOTS的在连续相同的多个字符识别中会出现丢掉重复字符的情况。例如“ll”识别后是“l”，“0000”识别后是“0”；相同大小的文本图像，如果不是这种连续相同的字符就可以识别正确，“1234”这种就可以正确识别。在ctc解码前的结果就不对了，连续相同字符的时候，除了一个字符，其他的地方都是blank概率远远高于字符。请问您知道这是为什么吗？

谢谢

AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

Not being able to run with pretrained model. This is the error, not sure what checkpoint directory i must provide. I provided both the checkpoint directory and the individual files themselves all cannot work.

progress?

Any luck on icdar, did your score improve?

Do you need any help?

Thanks

pay20y / fots_tf Goto Github PK

fots_tf's Introduction

Fast Oriented Text Spotting with a Unified Networkt

Update

Introduction

Install

Model

Train

Test

Examples

Differences from paper

Reference

fots_tf's People

Contributors

Stargazers

Watchers

Forkers

fots_tf's Issues

Recommend Projects

Recommend Topics

Recommend Org