zjulearning / pixel_link Goto Github PK

View Code? Open in Web Editor NEW

766.0 766.0 254.0 431 KB

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

License: MIT License

Python 98.57% Shell 1.43%

pixel_link's People

Contributors

Stargazers

Watchers

Forkers

shengzhang90 fireae lizzyyl ii0 airyym zyxunh zxdeepdiver ericzgw godofsmallthings cyzforgithub fendaq bygreencn lxj0276 weifeifan wxbxj wenhuach lzd0825 lllhhhqqq dx111 heterfire robin521111 ml-lab jiachen0212 janzd duchen521 templeblock senitco zgsxwsdxg afterimagex wyw636 icedream2linxi pandasmx yuckfu kuyun-zhangyang xianfengju luoweimeng nguyenhongchau jeffrey98-ai column6942 yuanhang8605 sanster gds101054108 shiyongde fanofjava yuchengml tgbamg arestorres roughsoft parsonszeng ericustc wenlihaoyu dugujiujian12321 xikunlun001 tobechao zhyj3038 ted8201 liviust happog ouya-bytes styjb 10183308 xiaowu1987 josephuan wuyunxiangwyx vikas-kumar-infrrd vikasmech simonway rayjvillicana ewanlee frankfqchen moefh xzf125244170 ieee820 jianlizh429 tung1404 mim92 weiliangxiao lucaslu1987 wangxiaocao alannewimage zhengfangwu wolhow123 xhappy liuhan10 huizhang0110 kspook akshaypatil15 junan007 roozbehsanaei rkshuai sirnader tiok-cek1 peternara wurentidai xuliangfrdc xtanitfy choonkiattay fxwispig ray-mami frankjame

pixel_link's Issues

A question about training icdar2015

Thank you for your sharing :)

I tried using your network on icdar2015 to trian the model. And I made the dataset 'icdar2015_train.tfrecord' by the script 'icdar2015_to_tfrecords.py' . Then I used './scripts/train.sh' to trian the model. But some warning like this appeared:

2018-05-03 08:20:52.938117: W tensorflow/core/kernels/draw_bounding_box_op.cc:122] Bounding box (-117,-490,-52,-323) is completely outside the image and will not be drawn.
2018-05-03 08:20:52.938501: W tensorflow/core/kernels/draw_bounding_box_op.cc:122] Bounding box (-60,-439,-33,-372) is completely outside the image and will not be drawn.
2018-05-03 08:20:52.938528: W tensorflow/core/kernels/draw_bounding_box_op.cc:122] Bounding box (-141,214,-81,484) is completely outside the image and will not be drawn.

Why did this happen? What shall I do? Respectfully waiting for a reply~

I have few questions.

I have few questions noted below.

Have you tried training with SynthText dataset? If yes, does benchmark improved?
Your model uses aligned rectangle, rather than 8-coordinate bounding box. Would the performance be improved if we use tight bounding box?

Thank you!

Can I change the code from py2.7 to py 3.6?

Would there be an issue if I change the python code from 2.7 to 3.6 using Automated 2-3 translation?
Also, I wish to train the model on coco dataset, I have prepared the data in ICDAR 2015 format, would there be any issues there? I am asking this because of a lot of issues concerning training the model.

no module named util

谢作者分享，还在初学阶段
同之前那个提问一样，在运行test.sh时，出现：
Traceback (most recent call last):
File "test_pixel_link.py", line 8, in
from datasets import dataset_factory
File "/home/zht/study/PixelLink/pixel_link-master/datasets/dataset_factory.py", line 2, in
from datasets import dataset_utils
File "/home/zht/study/PixelLink/pixel_link-master/datasets/dataset_utils.py", line 21, in
import util
ImportError: No module named 'util'
同没有找到util这个文件在哪，求指教，谢谢。

Bounding box (-157,-92,-101,53) is completely outside the image and will not be drawn

Hello friends，
My OS is Ubuntu16.04 LTS, and When I run the command "./script/train 0 8" on dataset icdar2015, I found that it would be output like this,
2018-05-06 16:24:15.192086: W tensorflow/core/kernels/draw_bounding_box_op.cc:116] Bounding box (-157,-92,-101,53) is completely outside the image and will not be drawn.

2018-05-06 16:24:15.192152: W tensorflow/core/kernels/draw_bounding_box_op.cc:116] Bounding box (-211,3,-195,34) is completely outside the image and will not be drawn.

2018-05-06 16:24:15.192159: W tensorflow/core/kernels/draw_bounding_box_op.cc:116] Bounding box (-153,196,-112,266) is completely outside the image and will not be drawn.

After checked the code, I found that when preprocessing image(line130 in train_pixel_link.py), the function distorted_bounding_box_crop() was called, but the function bboxes_filter_overlap() was suppressed by the parameter assign_value=LABEL_IGNORE. So these outside bboxes would not be filtered, and the output would generate like this.

The question I want to ask is that why not filter the bboxes outside the croped picture?

.sh file can not run on my windows.

Since .sh file can not run on my windows, is there any way to run the demo on my win10?

train with other data set

i want to train this network with own dataset but i can't
bounding box error is appear
how to solve it

InvalidArgumentError (see above for traceback): All bounding box coordinates must be in [0.0, 1.0]: 2.9401994
[[Node: ssd_preprocessing_train/distorted_bounding_box_crop/SampleDistortedBoundingBox = SampleDistortedBoundingBox[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2], max_attempts=200, min_object_covered=0.1, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/cpu:0"](ssd_preprocessing_train/distorted_bounding_box_crop/Shape_1, ssd_preprocessing_train/distorted_bounding_box_crop/ExpandDims)]]
[[Node: clone_0/fifo_queue_Dequeue/_325 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_481_clone_0/fifo_queue_Dequeue", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Loss变化很不规律

INFO:tensorflow:global step 368468: loss = 1.0245 (0.416 sec/step)
INFO:tensorflow:global step 368469: loss = 1.0627 (0.404 sec/step)
INFO:tensorflow:global step 368470: loss = 1.0460 (0.399 sec/step)
INFO:tensorflow:global step 368471: loss = 0.8222 (0.410 sec/step)
INFO:tensorflow:global step 368472: loss = 1.0190 (0.408 sec/step)
INFO:tensorflow:global step 368473: loss = 0.7397 (0.410 sec/step)
INFO:tensorflow:global step 368474: loss = 0.7372 (0.412 sec/step)
INFO:tensorflow:global step 368475: loss = 0.7329 (0.412 sec/step)
INFO:tensorflow:global step 368476: loss = 1.0552 (0.420 sec/step)
INFO:tensorflow:global step 368477: loss = 0.7700 (0.398 sec/step)
INFO:tensorflow:global step 368478: loss = 0.6954 (0.418 sec/step)
INFO:tensorflow:global step 368479: loss = 0.7515 (0.397 sec/step)
INFO:tensorflow:global step 368480: loss = 1.2682 (0.411 sec/step)
INFO:tensorflow:global step 368481: loss = 0.9167 (0.398 sec/step)
INFO:tensorflow:global step 368482: loss = 1.2476 (0.416 sec/step)
INFO:tensorflow:global step 368483: loss = 0.9134 (0.419 sec/step)
INFO:tensorflow:global step 368484: loss = 0.6969 (0.414 sec/step)
INFO:tensorflow:global step 368485: loss = 0.8940 (0.407 sec/step)
INFO:tensorflow:global step 368486: loss = 0.9226 (0.394 sec/step)
INFO:tensorflow:global step 368487: loss = 1.0456 (0.420 sec/step)
INFO:tensorflow:global step 368488: loss = 0.6929 (0.406 sec/step)
INFO:tensorflow:global step 368489: loss = 0.9335 (0.411 sec/step)
INFO:tensorflow:global step 368490: loss = 0.8754 (0.410 sec/step)
INFO:tensorflow:global step 368491: loss = 0.9203 (0.411 sec/step)
INFO:tensorflow:global step 368492: loss = 1.0077 (0.420 sec/step)
INFO:tensorflow:Recording summary at step 368492.
INFO:tensorflow:global step 368493: loss = 0.8308 (0.588 sec/step)
INFO:tensorflow:global step 368494: loss = 0.6966 (0.422 sec/step)
INFO:tensorflow:global step 368495: loss = 0.9326 (0.411 sec/step)
INFO:tensorflow:global step 368496: loss = 0.7347 (0.421 sec/step)
INFO:tensorflow:global step 368497: loss = 0.8080 (0.419 sec/step)
INFO:tensorflow:global step 368498: loss = 0.8667 (0.409 sec/step)
INFO:tensorflow:global step 368499: loss = 1.2679 (0.407 sec/step)
INFO:tensorflow:global step 368500: loss = 0.8499 (0.418 sec/step)

the loss changes too much

INFO:tensorflow:global step 6420: loss = 5.4209 (0.596 sec/step)
INFO:tensorflow:global step 6430: loss = 5.3637 (0.334 sec/step)
INFO:tensorflow:global step 6440: loss = 9.0471 (0.274 sec/step)
INFO:tensorflow:global step 6450: loss = 6377.4712 (0.464 sec/step)
INFO:tensorflow:global step 6460: loss = 5869.8906 (0.447 sec/step)
INFO:tensorflow:global step 6470: loss = 1870.6338 (0.513 sec/step)
INFO:tensorflow:global step 6480: loss = 711.0534 (0.274 sec/step)
INFO:tensorflow:global step 6490: loss = 13.2399 (0.453 sec/step)
INFO:tensorflow:global step 6500: loss = 8.3379 (0.381 sec/step)
INFO:tensorflow:global step 6510: loss = 5.8521 (0.346 sec/step)
.........
INFO:tensorflow:global step 10830: loss = 146732.0312 (0.392 sec/step)
INFO:tensorflow:global step 10840: loss = 104813.1641 (0.487 sec/step)
INFO:tensorflow:global step 10850: loss = 47788.4492 (0.803 sec/step)
INFO:tensorflow:global step 10860: loss = 40910.5898 (0.389 sec/step)
INFO:tensorflow:global step 10870: loss = 19970.1309 (0.450 sec/step)
INFO:tensorflow:global step 10880: loss = 41234.4492 (0.281 sec/step)
.........
INFO:tensorflow:global step 11480: loss = 3617.8975 (0.323 sec/step)
INFO:tensorflow:global step 11490: loss = 13358.6592 (0.528 sec/step)
INFO:tensorflow:global step 11500: loss = 22579.4043 (0.419 sec/step)
INFO:tensorflow:global step 11510: loss = 16897972224.0000 (0.362 sec/step)
INFO:tensorflow:global step 11520: loss = 18114068.0000 (0.509 sec/step)
INFO:tensorflow:global step 11530: loss = 6474723.0000 (0.454 sec/step)
INFO:tensorflow:global step 11540: loss = 231045.1250 (0.264 sec/step)
INFO:tensorflow:global step 11620: loss = 186.8571 (0.364 sec/step)
INFO:tensorflow:global step 11630: loss = 132.7600 (1.336 sec/step)
INFO:tensorflow:global step 11640: loss = 89.9768 (0.917 sec/step)

The loss is too big and unstable. I try to reduce the learning rate, but I still can not solve it.

GPU OOM problem

Hi,
I tried to train the model with icdar2015 dataset on a GTX1080 with 8GB memory,but failed with GPU out of memory problem,even when I set the batchsize to 1.
What should I do to solve this problem?Should this be some error when I run the code or just mean I need a GPU with larger memory.

Detect mutli-line text in one box?

I have re-trained the model based on my own dataset. And it works better than east. But there is a problem. Sometimes the model detect multi-line in one box like this:

gt for icdar 2015 test image

missing parameter MIN_ROTATION_SCLAE and MAX_ROTATION_SCLAE

The ssd_vgg_preprocessing.py file lost the MIN_ROTATION_SCLAE and MAX_ROTATION_SCLAE parameters. How should this be set?

为什么box是两个对角坐标，而不是四个角点坐标呢？

                xs = oriented_box.reshape(4, 2)[:, 0]                
                ys = oriented_box.reshape(4, 2)[:, 1]
                xmin = xs.min()
                xmax = xs.max()
                ymin = ys.min()
                ymax = ys.max()
                bboxes.append([xmin, ymin, xmax, ymax])

@dengdan

No module named util

感谢作者的分享。
但当我运行test.sh，报错信息
Traceback (most recent call last):
File "test_pixel_link.py", line 8, in
from datasets import dataset_factory
File "/home/wen/TextDetection/pixel_link/datasets/dataset_factory.py", line 2, in
from datasets import dataset_utils
File "/home/wen/TextDetection/pixel_link/datasets/dataset_utils.py", line 21, in
import util
ImportError: No module named util
确实没有找到util这个文件，还望指点

Name: <unknown>, Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]

训练icdar2015的数据集时出现如下问题：

2018-05-04 10:24:37.968168: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:37.968299: W tensorflow/core/kernels/queue_base.cc:303] _1_icdar2015_prefetch_queue/prefetch_queue/fifo_queue: Skipping cancelled dequeue attempt with queue not closed
2018-05-04 10:24:37.968330: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:37.968377: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:37.968536: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
[[Node: icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample = ParseExample[Ndense=4, Nsparse=13, Tdense=[DT_STRING, DT_STRING, DT_STRING, DT_INT64], dense_shapes=[[], [], [], [3]], sparse_types=[DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](icdar2015_data_provider/ParseSingleExample/ExpandDims, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/names, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_0, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_1, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_2, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_3, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_4, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_5, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_6, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_7, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_8, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_9, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_10, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_11, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_12, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_0, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_1, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_2, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_3, icdar2015_data_provider/ParseSingleExample/ParseExample/Reshape, icdar2015_data_provider/ParseSingleExample/ParseExample/Reshape, icdar2015_data_provider/ParseSingleExample/ParseExample/Reshape_2, icdar2015_data_provider/ParseSingleExample/ParseExample/Const)]]
2018-05-04 10:24:37.968875: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
2018-05-04 10:24:38.059918: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.059918: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.059997: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.059918: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060078: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060118: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060124: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060203: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.059947: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060279: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.059918: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060363: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060378: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060445: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060463: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060474: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060204: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060476: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060576: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
2018-05-04 10:24:38.060204: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
Traceback (most recent call last):
File "train_pixel_link.py", line 293, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_pixel_link.py", line 289, in main
train(train_op)
File "train_pixel_link.py", line 278, in train
session_config = sess_config
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 775, in train
sv.stop(threads, close_summary_writer=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1231, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: , Key: image/shape, Index: 0. Number of int64 values != expected. Values size: 2 but output shape: [3]
[[Node: icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample = ParseExample[Ndense=4, Nsparse=13, Tdense=[DT_STRING, DT_STRING, DT_STRING, DT_INT64], dense_shapes=[[], [], [], [3]], sparse_types=[DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](icdar2015_data_provider/ParseSingleExample/ExpandDims, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/names, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_0, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_1, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_2, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_3, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_4, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_5, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_6, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_7, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_8, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_9, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_10, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_11, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/sparse_keys_12, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_0, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_1, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_2, icdar2015_data_provider/ParseSingleExample/ParseExample/ParseExample/dense_keys_3, icdar2015_data_provider/ParseSingleExample/ParseExample/Reshape, icdar2015_data_provider/ParseSingleExample/ParseExample/Reshape, icdar2015_data_provider/ParseSingleExample/ParseExample/Reshape_2, icdar2015_data_provider/ParseSingleExample/ParseExample/Const)]]

Bounding box (545,155,559,181) is completely outside the image and will not be drawn

2018-07-25 09:32:24.857873: W tensorflow/core/kernels/draw_bounding_box_op.cc:122] Bounding box (545,155,559,181) is completely outside the image and will not be drawn.

Do you have this problem ?

save model as a pb file

Has anyone successfully save the model as a pb file?
when I try to inspect the pbtxt graph with
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=graph.pbtxt
I got

Found 351 possible outputs: (name=icdar2015_data_provider/parallel_read/filenames/filenames_EnqueueMany, op=QueueEnqueueManyV2) (name=icdar2015_data_provider/parallel_read/filenames/filenames_Close, op=QueueCloseV2) (name=icdar2015_data_provider/parallel_read/filenames/filenames_Close_1, op=QueueCloseV2) (name=icdar2015_data_provider/parallel_read/common_queue_enqueue, op=QueueEnqueueV2) (name=icdar2015_data_provider/parallel_read/common_queue_Close, op=QueueCloseV2) (name=icdar2015_data_provider/parallel_read/common_queue_Close_1, op=QueueCloseV2) (name=icdar2015_data_provider/Reshape_1, op=Reshape) (name=icdar2015_data_provider/Reshape_3, op=Reshape) (name=icdar2015_data_provider/case/cond/switch_t, op=Identity) (name=icdar2015_data_provider/case/cond/cond_jpeg/switch_t, op=Identity) (name=icdar2015_data_provider/case/cond/cond_jpeg/decode_image/cond_jpeg/cond_png/switch_t, op=Identity) (name=ssd_preprocessing_train/cond/switch_f, op=Identity) (name=ssd_preprocessing_train/cond/random_rotate90/strided_slice, op=StridedSlice) (name=ssd_preprocessing_train/cond/random_rotate90/strided_slice_1, op=StridedSlice) (name=ssd_preprocessing_train/cond/random_rotate90/rot90/cond/switch_f, op=Identity) (name=ssd_preprocessing_train/cond/random_rotate90/rot90/cond/cond/switch_f, op=Identity) (name=ssd_preprocessing_train/cond/random_rotate90/rot90/cond/cond/cond/switch_f, op=Identity) (name=ssd_preprocessing_train/distorted_bounding_box_crop/cond/switch_t, op=Identity) (name=ssd_preprocessing_train/resize_image/unstack, op=Unpack) (name=icdar2015_batch/batch/fifo_queue_enqueue, op=QueueEnqueueV2) (name=icdar2015_batch/batch/fifo_queue_Close, op=QueueCloseV2) (name=icdar2015_batch/batch/fifo_queue_Close_1, op=QueueCloseV2) (name=icdar2015_prefetch_queue/prefetch_queue/fifo_queue_enqueue, op=QueueEnqueueV2) (name=icdar2015_prefetch_queue/prefetch_queue/fifo_queue_Close, op=QueueCloseV2) (name=icdar2015_prefetch_queue/prefetch_queue/fifo_queue_Close_1, op=QueueCloseV2) (name=clone_0/pixel_cls/strided_slice, op=StridedSlice) (name=clone_0/pixel_link/strided_slice, op=StridedSlice) (name=clone_0/strided_slice_3, op=StridedSlice) (name=clone_0/strided_slice_4, op=StridedSlice) (name=count_warning/read, op=Identity) (name=clone_0/add_1, op=Add) (name=clone_0/gradients/clone_0/truediv_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/clone_0/conv1/conv1_1/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv1/conv1_2/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv2/conv2_1/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv2/conv2_2/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv3/conv3_1/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv3/conv3_2/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv3/conv3_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv4/conv4_1/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv4/conv4_2/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv4/conv4_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv5/conv5_1/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv5/conv5_2/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/conv5/conv5_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/fc6/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/fc7/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_cls/score_from_fc7/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_cls/score_from_conv5_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_cls/score_from_conv4_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_cls/score_from_conv3_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_link/score_from_fc7/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_link/score_from_conv5_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_link/score_from_conv4_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_link/score_from_conv3_3/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_cls_loss/mul_1_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/clone_0/pixel_link_loss/mul_1_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_cls_loss/truediv_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/clone_0/pixel_link_loss/cond/Merge_1_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_link_loss/mul_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/clone_0/pixel_cls_loss/mul_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/clone_0/pixel_link_loss/cond/truediv_1_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/clone_0/pixel_link_loss/cond/Merge_grad/tuple/control_dependency, op=Identity) (name=clone_0/gradients/clone_0/pixel_link_loss/cond/truediv_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/zeros_like, op=ZerosLike) (name=clone_0/gradients/clone_0/pixel_link_loss/cond/mul_3_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/clone_0/pixel_link_loss/cond/mul_1_grad/tuple/control_dependency_1, op=Identity) (name=clone_0/gradients/zeros_like_1, op=ZerosLike) (name=clone_0/gradients/clone_0/conv1/conv1_1/Conv2D_grad/tuple/control_dependency, op=Identity) (name=conv1/conv1_1/weights/Momentum/read, op=Identity) (name=conv1/conv1_1/biases/Momentum/read, op=Identity) (name=conv1/conv1_2/weights/Momentum/read, op=Identity) (name=conv1/conv1_2/biases/Momentum/read, op=Identity) (name=conv2/conv2_1/weights/Momentum/read, op=Identity) (name=conv2/conv2_1/biases/Momentum/read, op=Identity) (name=conv2/conv2_2/weights/Momentum/read, op=Identity) (name=conv2/conv2_2/biases/Momentum/read, op=Identity) (name=conv3/conv3_1/weights/Momentum/read, op=Identity) (name=conv3/conv3_1/biases/Momentum/read, op=Identity) (name=conv3/conv3_2/weights/Momentum/read, op=Identity) (name=conv3/conv3_2/biases/Momentum/read, op=Identity) (name=conv3/conv3_3/weights/Momentum/read, op=Identity) (name=conv3/conv3_3/biases/Momentum/read, op=Identity) (name=conv4/conv4_1/weights/Momentum/read, op=Identity) (name=conv4/conv4_1/biases/Momentum/read, op=Identity) (name=conv4/conv4_2/weights/Momentum/read, op=Identity) (name=conv4/conv4_2/biases/Momentum/read, op=Identity) (name=conv4/conv4_3/weights/Momentum/read, op=Identity) (name=conv4/conv4_3/biases/Momentum/read, op=Identity) (name=conv5/conv5_1/weights/Momentum/read, op=Identity) (name=conv5/conv5_1/biases/Momentum/read, op=Identity) (name=conv5/conv5_2/weights/Momentum/read, op=Identity) (name=conv5/conv5_2/biases/Momentum/read, op=Identity) (name=conv5/conv5_3/weights/Momentum/read, op=Identity) (name=conv5/conv5_3/biases/Momentum/read, op=Identity) (name=fc6/weights/Momentum/read, op=Identity) (name=fc6/biases/Momentum/read, op=Identity) (name=fc7/weights/Momentum/read, op=Identity) (name=fc7/biases/Momentum/read, op=Identity) (name=pixel_cls/score_from_fc7/weights/Momentum/read, op=Identity) (name=pixel_cls/score_from_fc7/biases/Momentum/read, op=Identity) (name=pixel_cls/score_from_conv5_3/weights/Momentum/read, op=Identity) (name=pixel_cls/score_from_conv5_3/biases/Momentum/read, op=Identity) (name=pixel_cls/score_from_conv4_3/weights/Momentum/read, op=Identity) (name=pixel_cls/score_from_conv4_3/biases/Momentum/read, op=Identity) (name=pixel_cls/score_from_conv3_3/weights/Momentum/read, op=Identity) (name=pixel_cls/score_from_conv3_3/biases/Momentum/read, op=Identity) (name=pixel_link/score_from_fc7/weights/Momentum/read, op=Identity) (name=pixel_link/score_from_fc7/biases/Momentum/read, op=Identity) (name=pixel_link/score_from_conv5_3/weights/Momentum/read, op=Identity) (name=pixel_link/score_from_conv5_3/biases/Momentum/read, op=Identity) (name=pixel_link/score_from_conv4_3/weights/Momentum/read, op=Identity) (name=pixel_link/score_from_conv4_3/biases/Momentum/read, op=Identity) (name=pixel_link/score_from_conv3_3/weights/Momentum/read, op=Identity) (name=pixel_link/score_from_conv3_3/biases/Momentum/read, op=Identity) (name=cond/switch_t, op=Identity) (name=cond/switch_f, op=Identity) (name=cond/Merge, op=Merge) (name=conv1/conv1_1/weights/ExponentialMovingAverage/cond/switch_t, op=Identity) (name=conv1/conv1_1/weights/ExponentialMovingAverage/cond/switch_f, op=Identity) (name=cond_1/switch_t, op=Identity) (name=cond_1/switch_f, op=Identity) (name=cond_1/Merge, op=Merge) (name=conv1/conv1_1/biases/ExponentialMovingAverage/cond/switch_t, op=Identity) (name=conv1/conv1_1/biases/ExponentialMovingAverage/cond/switch_f, op=Identity) (name=cond_2/switch_t, op=Identity) (name=cond_2/switch_f, op=Identity) (name=cond_2/Merge, op=Merge) (name=conv1/conv1_2/weights/ExponentialMovingAverage/cond/switch_t, op=Identity) (name=conv1/conv1_2/weights/ExponentialMovingAverage/cond/switch_f, op=Identity) (name=cond_3/switch_t, op=Identity) (name=cond_3/switch_f, op=Identity) (name=cond_3/Merge, op=Merge) (name=conv1/conv1_2/biases/ExponentialMovingAverage/cond/switch_t, op=Identity) (name=conv1/conv1_2/biases/ExponentialMovingAverage/cond/switch_f, op=Identity) (name=cond_4/switch_t, op=Identity) (name=cond_4/switch_f, op=Identity) (name=cond_4/Merge, op=Merge) (name=conv2/conv2_1/weights/ExponentialMovingAverage/cond/switch_t, op=Identity) (name=conv2/conv2_1/weights/ExponentialMovingAverage/cond/switch_f, op=Identity) (name=cond_5/switch_t, op=Identity) (name=cond_5/switch_f, op=Identity) (name=cond_5/Merge, op=Merge) (name=conv2/conv2_1/biases/ExponentialMovingAverage/cond/switch_t, op=Identity) (name=conv2/conv2_1/biases/ExponentialMovingAverage/cond/switch_f, op=Identity) (name=cond_6/switch_t, op=Identity) (name=cond_6/switch_f, op=Identity) (name=cond_6/Merge, op=Merge) (name=conv2/conv2_2/weights/ExponentialMovingAverage/cond/switch_t, op=Identity) (name=conv2/conv2_2/weights/ExponentialMovingAverage/cond/switch_f, op=Identity) (name=cond_7/switch_t, op=Identity) (name=cond_7/switch_f, op=Identity) (name=cond_7/Merge, op=Merge) (name=conv2/conv2_2/biases/ExponentialMovingAverage/cond/switch_t, op=Identity)

which stops me from saving it as a pb file.
Is there a way to save this model as a pb file?

hi

when i run the test code received util error
Traceback (most recent call last):
File "test_pixel_link.py", line 8, in
from datasets import dataset_factory
File "/home/mvl/pixel_link-master/datasets/dataset_factory.py", line 2, in
from datasets import dataset_utils
File "/home/mvl/pixel_link-master/datasets/dataset_utils.py", line 21, in
import util
ImportError: No module named util

please help me

生成数据集 label 和label_text对检测结果有影响么？

RT，打算训练模型用来检测表单中的文字，打标的数据是没有将label标出的，在生成tfrecord的不指定label和label_text对结果有影响么？个人理解，应该是没有的，求解

可以详细解释一下这个函数吗？

def decode_image_by_join(pixel_scores, link_scores, 
                 pixel_conf_threshold, link_conf_threshold):
    pixel_mask = pixel_scores >= pixel_conf_threshold
    link_mask = link_scores >= link_conf_threshold
    points = zip(*np.where(pixel_mask))
    h, w = np.shape(pixel_mask)
    group_mask = dict.fromkeys(points, -1)
    def find_parent(point):
        return group_mask[point]
        
    def set_parent(point, parent):
        group_mask[point] = parent
        
    def is_root(point):
        return find_parent(point) == -1
    
    def find_root(point):
        root = point
        update_parent = False
        while not is_root(root):
            root = find_parent(root)
            update_parent = True
        
        # for acceleration of find_root
        if update_parent:
            set_parent(point, root)
            
        return root
        
    def join(p1, p2):
        root1 = find_root(p1)
        root2 = find_root(p2)
        
        if root1 != root2:
            set_parent(root1, root2)
        
    def get_all():
        root_map = {}
        def get_index(root):
            if root not in root_map:
                root_map[root] = len(root_map) + 1
            return root_map[root]
        
        mask = np.zeros_like(pixel_mask, dtype = np.int32)
        for point in points:
            point_root = find_root(point)
            bbox_idx = get_index(point_root)
            mask[point] = bbox_idx
        return mask
    
    # join by link
    for point in points:
        y, x = point
        neighbours = get_neighbours(x, y)
        for n_idx, (nx, ny) in enumerate(neighbours):
            if is_valid_cord(nx, ny, w, h):
#                 reversed_neighbours = get_neighbours(nx, ny)
#                 reversed_idx = reversed_neighbours.index((x, y))
                link_value = link_mask[y, x, n_idx]# and link_mask[ny, nx, reversed_idx]
                pixel_cls = pixel_mask[ny, nx]
                if link_value and pixel_cls:
                    join(point, (ny, nx))

@dengdan @BowieHsu @GodOfSmallThings

util module

util module is not implemented, so ' No module named util ' is occurred.

i have a question about the learning rate

When I used the script to train icdar2015 datasets, Nan appeared after roughly 20k rounds. Why is this learning rate set to 1e-3 first and then fixed to 1e-2?

Will resnet50 or pvanet works better?

I tried resnet50 and pvanet as base nets. However, vgg16 is the best one. I wonder that if I did something wrong in training process? Have you guys tried other base nets?

test error！！

(tf3_w) ??@thinkstation:~/w/pixel_link$ ./scripts/test.sh 0 ~/w/pixel_link/pixel_link_vgg_2s/conv2_2/model.ckpt-73018.data-00000-of-00001 ~/w/pixel_link/datasets/data
++ set -e
++ export CUDA_VISIBLE_DEVICES=0
++ CUDA_VISIBLE_DEVICES=0
++ python test_pixel_link.py --checkpoint_path=/home/??/w/pixel_link/pixel_link_vgg_2s/conv2_2/model.ckpt-73018.data-00000-of-00001 --dataset_dir=/home/??/w/pixel_link/datasets/data --gpu_memory_fraction=-1
INFO:tensorflow:loading config.py from /home/??/w/pixel_link/pixel_link_vgg_2s/conv2_2/config.py
test_pixel_link_on_icdar2015
Traceback (most recent call last):
  File "test_pixel_link.py", line 176, in <module>
    tf.app.run()
  File "/home/??/anaconda2/envs/tf3_w/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "test_pixel_link.py", line 171, in main
    config_initialization()
  File "test_pixel_link.py", line 79, in config_initialization
    util.proc.set_proc_name('test_pixel_link_on'+ '_' + FLAGS.dataset_name)
  File "/home/??/w/pixel_link/pylib/src/util/proc.py", line 21, in set_proc_name
    setproctitle.setproctitle(name)
UnboundLocalError: local variable 'setproctitle' referenced before assignment

你好，在测试的过程中，我出现了下面的报错，我不知道怎么解决这个问题，可以帮我解答一下吗？

训练的时候不输出loss

你好，我想请问一下。我先用icdar的数据试了一下训练，没有问题，loss正常输出。然后我用自己的数据生成了tfrecord格式来训练，但训练的时候不输出loss，只是模型的数据在一直更新，请问这是什么情况

ImportError: No module named util

Thank you very much for sharing code. Its beautiful.
But i can not find this python module?

Loss is not improving

Hi @dengdan @BowieHsu

I have been training the model with COCO dataset for a couple of days, and yet the loss is not improving. I used the default parameters and training with batch size set at 8. Looking forward to your suggestions. Please find the screenshot for reference:

Best,
Vishal

训练时候的一个问题

你好，我用其他的数据生成了tfrecord格式的数据，然后一直卡住不动了，显示的是
INFO:tensorflow:Starting Session. INFO:tensorflow:Starting Queues. INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global_step/sec: 0
请问这是什么原因

内存一直在上涨？

训练到一半报内存不足的错，重新开始训练，发现内存占用率一直在上涨，显存没有变化，问题可能出现在哪里？

can't download

hi
i can't download pretrained model on baidu
can you upload it on google drive or dropbox
please

Can you provide the setting that reaches to 85.x% accuracy?

@dengdan
I tried to reproduce the result that you've reached, which is 85%. However, I failed to make that result. It goes as much as 83% in my experiments. Can you provide the setting(e.g hyper parameters that you've used for pretraining and finetuning) using both Synth Text and ICDAR datasets?

Thank you.

Pre-trained model url is not valid

I can't download pre-trained models

PixelLink + VGG16 4s, trained on IC15
PixelLink + VGG16 2s, trained on IC15

both gave me : https://pan.baidu.com/error/404.html

Problem I met when I try to train with ICDAR2017 dataset.

When I try to train the model using ICDAR2017 dataset, I got the problem below. I wonder how can I fix this? I use the ICARD2015 converting scripts to convert the dataset, am I wrong here? Thanks! :)

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, All bounding box coordinates must be in [0.0, 1.0]: -0.00061273575
[[Node: ssd_preprocessing_train/distorted_bounding_box_crop/SampleDistortedBoundingBox = SampleDistortedBoundingBox[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2], max_attempts=200, min_object_covered=0.1, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/cpu:0"](ssd_preprocessing_train/distorted_bounding_box_crop/Shape_1, ssd_preprocessing_train/distorted_bounding_box_crop/ExpandDims)]]

Caused by op u'ssd_preprocessing_train/distorted_bounding_box_crop/SampleDistortedBoundingBox', defined at:
File "train_pixel_link.py", line 293, in
tf.app.run()
File "/data/app/smallhuang/anaconda2/envs/pixel_link/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_pixel_link.py", line 287, in main
batch_queue = create_dataset_batch_queue(dataset)
File "train_pixel_link.py", line 136, in create_dataset_batch_queue
is_training = True)
File "/data/app/smallhuang/pixel_link/preprocessing/ssd_vgg_preprocessing.py", line 480, in preprocess_image
data_format=data_format)
File "/data/app/smallhuang/pixel_link/preprocessing/ssd_vgg_preprocessing.py", line 380, in preprocess_for_train
area_range = AREA_RANGE)
File "/data/app/smallhuang/pixel_link/preprocessing/ssd_vgg_preprocessing.py", line 246, in distorted_bounding_box_crop
use_image_if_no_bounding_boxes=True)
File "/data/app/smallhuang/anaconda2/envs/pixel_link/lib/python2.7/site-packages/tensorflow/python/ops/gen_image_ops.py", line 989, in sample_distorted_bounding_box
name=name)
File "/data/app/smallhuang/anaconda2/envs/pixel_link/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/data/app/smallhuang/anaconda2/envs/pixel_link/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/data/app/smallhuang/anaconda2/envs/pixel_link/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

Anybody can give me a trained model for test?The author's cannot work

Models trained on long text datasets

Could anyone share links to pixel-link models trained on TD500-train +HUST-TR400?
It is mentioned in the paper under Sec-5.4 on detecting long texts.

why got loss = nan

why got loss =nan after some steps.

INFO:tensorflow:global step 98468: loss = nan (0.416 sec/step)

Where test ground truth transcript ?

I see that at line: #L67, you read bounding box from ground truth transcript of test data.
But, i dont see the transcript of test images at ICDAR2015 dataset.
So how i got it?

Can't reach the results on the paper

Hi,
@DelightRun
I used my own data set for training, the test results are not good. I have trained 20,000 steps, and The loss function stops and does not reduced( stop at 0.6).
Can you give some Suggestions for improvement to get better results？Which parameters need to be changed？

Thank you.
@comzyh

Want to see your loss curve...

I am trying to implement this paper under pytorch.
It has been done except loss weight , wish to achieve your performance in Table5 , 3 "without instance balance "
But my loss value keep wondering at a high level for hours.....and behaving like a pure FCN in Table5,2 , predicting links all of 1 with neighbor pixels....
So could you show your loss curve or train log ? or any suggestion ?

When training on my datasets , the loss =0.00000

I convert my datasets format from voc to icdar2015. And then, i use icdar2015_to_tfrecord.py to convert them to tfrecord file. I use this file to train pixel_link, but loss always 0.000. What's wrong with it? Thanks.

gt for test image icdar2015

i can't find ground truth for icdar 2015 test image
please help me to find it

Help please!

When I run the test code on win10 terminal, I received UnicodeDecodeError:

E:\文本检测之pixel_link\pixel_link-master\pixel_link-master>python test_pixel_link.py
Traceback (most recent call last):
File "test_pixel_link.py", line 8, in
from datasets import dataset_factory
File "E:\文本检测之pixel_link\pixel_link-master\pixel_link-master\datasets\dataset_factory.py", line 3, in
from datasets import dataset_utils
File "E:\文本检测之pixel_link\pixel_link-master\pixel_link-master\datasets\dataset_utils.py", line 24, in
import util
File "E:\文本检测之pixel_link\pixel_link-master\pixel_link-master\util_init_.py", line 5, in
import plt
File "E:\文本检测之pixel_link\pixel_link-master\pixel_link-master\util\plt.py", line 12, in
import matplotlib.pyplot as plt
File "D:\Python27\lib\site-packages\matplotlib\pyplot.py", line 73, in
from matplotlib.backends import pylab_setup
File "D:\Python27\lib\site-packages\matplotlib\backends_init_.py", line 18, in
line for line in traceback.format_stack()
File "D:\Python27\lib\site-packages\matplotlib\backends_init_.py", line 20, in
if not line.startswith(' File "<frozen importlib._bootstrap'))
File "D:\Python27\lib\encodings\utf_8.py", line 17, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 11: invalid continuation byte

Please help me!

[Questtion] Is training step need pretrained model

did training process with SynthText dataset need pretrained model in https://github.com/ZJULearning/pixel_link#download-the-pretrained-model or not?
i want to train price detection of postal stamp .
sample image somethings like this :

i'm tested with pretrained model . result is good but i want detect price area on stamp only (short and biggest area in stamp) like 100 50 30 10 mostly 3 char only
Can I ignore the text size length in https://github.com/ZJULearning/pixel_link/blob/master/datasets/synthtext_to_tfrecords.py#L134 and retrain?
sorry for my bad english
thanks for read

Getting no results on ICDAR-15

I have followed every step as given, and yet I cannot get a single result on the ICDAR-15 test dataset. Any tips on what I might be doing wrong? I tried tweaking the parameters to test_pixel_link_on_any_image.py and test_pixel_link.py , and tried with both the models. The code runs without any error, and generates the text and zip files as well.

If there are any gotchas or things I'm missing, please let me know.

Can you provide ch4_test_localization_transcription_gt?

Hello sir, I can't find this file on website ICDAR2015, but since I want to run your code to convert data into tfrecoders, the file ch4_test_localization_transcription_gt is needed in file dataset/icdar2015_to_tfrecords.py line67, so can you please provide this file for me? I would appreciate it a lot.