wenmuzhou / pan.pytorch Goto Github PK

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

License: Apache License 2.0

Python 19.45% Makefile 0.05% C++ 80.49% Objective-C 0.02%

pan.pytorch's Introduction

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Requirements

pytorch 1.1+
torchvision 0.3+
pyclipper
opencv3
gcc 4.9+

Download

PAN_resnet18_FPEM_FFM and PAN_resnet18_FPEM_FFM on icdar2015：

the updated model(resnet18:78.8,shufflenetv2: 72.4,lr:le-3) is not the best model

google drive

Data Preparation

train: prepare a text in the following format, use '\t' as a separator

/path/to/img.jpg path/to/label.txt
...

val: use a folder

img/ store img
gt/ store gt file

Train

config the train_data_path,val_data_pathin config.json
use following script to run

python3 train.py

Test

eval.py is used to test model on test dataset

config model_path, img_path, gt_path, save_path in eval.py
use following script to test

python3 eval.py

Predict

predict.py is used to inference on single image

config model_path, img_path, in predict.py
use following script to predict

python3 predict.py

The project is still under development.

Performance

ICDAR 2015

only train on ICDAR2015 dataset

Method	image size (short size)	learning rate	Precision (%)	Recall (%)	F-measure (%)	FPS
paper(resnet18)	736	x	x	x	80.4	26.1
my (ShuffleNetV2+FPEM_FFM+pse扩张)	736	1e-3	81.72	66.73	73.47	24.71 (P100)
my (resnet18+FPEM_FFM+pse扩张)	736	1e-3	84.93	74.09	79.14	21.31 (P100)
my (resnet50+FPEM_FFM+pse扩张)	736	1e-3	84.23	76.12	79.96	14.22 (P100)
my (ShuffleNetV2+FPEM_FFM+pse扩张)	736	1e-4	75.14	57.34	65.04	24.71 (P100)
my (resnet18+FPEM_FFM+pse扩张)	736	1e-4	83.89	69.23	75.86	21.31 (P100)
my (resnet50+FPEM_FFM+pse扩张)	736	1e-4	85.29	75.1	79.87	14.22 (P100)
my (resnet18+FPN+pse扩张)	736	1e-3	76.50	74.70	75.59	14.47 (P100)
my (resnet50+FPN+pse扩张)	736	1e-3	71.82	75.73	73.72	10.67 (P100)
my (resnet18+FPN+pse扩张)	736	1e-4	74.19	72.34	73.25	14.47 (P100)
my (resnet50+FPN+pse扩张)	736	1e-4	78.96	76.27	77.59	10.67 (P100)

examples

todo

MobileNet backbone
ShuffleNet backbone

reference

If this repository helps you，please star it. Thanks.

pan.pytorch's People

Contributors

Stargazers

Watchers

Forkers

fendaq wuxiaolianggit yangchao0053 fujingling cqray1990 juandai8401 super-ljg kapitsa2811 xiaolaodi fireae oysz2016 fengdashuai xiaoyubing wxs29 delldu tianfukang yanggui19891007 banyueqin xgmiao fanofjava jadentan worksking teresasun trarynight dx111 lijun20 zipengfeng sunxingxingtf yfb12306 sunzhuojun bachelorwangwei zhzhuangxue dun933 zzmcdc yangtong1989 bitqinyong dlreseach talqinyong zhengjiawen ouguozhen zhangjunyi1225054736 wuzuowuyou lmpan chengmuni66 shengzhang90 chadpieere zhyj3038 mentorezio shualite uchile-robotics-forks tony1236 tukjet cxf2015 huanglinjia zhitao654321 gsyn77 zhanguochang yunfeiooo dy1998 corleonechensiyu zawecha1 wuyx hyfine xialuxi zhangjuhui duanjiaqi vincezengqiang brave731 jiyuxuan926 huyhoang17 bobycv06fpm panfei748 curiouscat-7 yongduek cruil chenli2013 peternara thelordofdream dimplesl jxncyym evjeny bilal-rachik don-taiwan techkarthiks liuheng0111 huazai-1994 niogo comnamu18 phil-chow code4101 qinhuaping xhwnobody phidch ouya-bytes hunt-cat cgl0408 masterfoad query060 adijindal30 aiedward

pan.pytorch's Issues

The order of label cordinates in different dataset

I noticed the author make some cordinates changes with ICDAR2015 dataset in util.py with "order_points_clockwise" fuction, could you kindly explain the reason for this?
I known the ICDAR2015 dataset has the clockwise label, why did you apply this fuction?
I read the Total-Text dateset paper, could not find any discription about the clockwise label. Do we need to do this kind of changes with the Total-Text dataset?

fpem 实现方式写错了

论文用的是 Separable convolution 你用的是 regular convolution。

分割网络的输出w,h是原图的1/4，train的label也是原图的1/4吗？那么predict得到bbox的坐标也要*4才是真正的文本区域是吗？

Finetune checkpoint

There is a parameter "finetune_checkpoint", I want to know how does it work here, and does it freeze any initial layers, if yes then how one can control it?

GPU显存占满，但利用率低

batch_size设为4，pin_memory=true, num_workers=8,
现在GPU利用率一直在1%左右，你们有这样的情况吗？

predicted similarity vectors.

麻烦问一下这个预测的相似向量具体代表什么呢?为何是4通道的?这里没有太看懂,谢谢

num_samples=0

@WenmuZhou Thank you for your hard work,

I am trying to train icdar2015, when running train.py I get error.
My config.json
My training & testing list, and file tree.

The error message:

(final) home@home-desktop:~/p2/PAN.pytorch-master$ python train.py
Traceback (most recent call last):
  File "train.py", line 33, in <module>
    main(config)
  File "train.py", line 18, in main
    train_loader, eval_loader = get_dataloader(config['data_loader']['type'], config['data_loader']['args'])
  File "/home/home/p2/PAN.pytorch-master/data_loader/__init__.py", line 98, in get_dataloader
    num_workers=module_args['loader']['num_workers'])
  File "/home/home/anaconda3/envs/final/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
    sampler = RandomSampler(dataset)
  File "/home/home/anaconda3/envs/final/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 94, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

想问问，作者跑的时候有没有发现边界比较差

rt。。。发现最后的结果边界都比较差，会往内缩一点，不知道是不是我迭代的次数太少了，但是600个epoch也太长了吧，现在我一般只训50个epoch左右，600个估计一个星期能有一个结果

请问一下有没有pretrained model可以用来test模型效率呀？

想测试一下长文本的detection效果和速度，想问问有没有地方可以下载pre_trained model?

有没有训练好的模型可以放出来吗

关于performance中精度和参数的问题

请问一下performance的res+fpn+pse扩张应该是对应PSENet吧，但是看精度还是差距比较大的，f1分别是77和80多。这里面是参数细节上有什么区别吗或者是实现上面加了什么只适用PAN的trick吗

KeyError: 'metrics'

facing this error KeyError: 'metrics'
trainer/trainer.py , line 169

decode_np卡死是怎么回事，具体是pse_cpp函数

你好，请问下你pse里的get_points是什么作用

emmm。。。没大看懂

Mobilenet model ?

Are there any plans on release of mobilenet based trained model ?

Weights?

Thanks for sharing the code, is there any training weights available for this work?

similarity vectors是否需要四通道？

论文中并未对similarity vectors的通道数进行说明，根据loss计算公式感觉一个通道就可以了。

Segmentation fault (core dumped)

when i run predict.py

[shakey@xiaoi-778 PAN]$ python predict.py
make: Entering directory /opt/shakey/deep-learning/PAN/post_processing' make: pse.so' is up to date.
make: Leaving directory `/opt/shakey/deep-learning/PAN/post_processing'
self。gpu 1
ininstance True
torch.cuda.is True
self。gpu 1
ininstance True
torch.cuda.is True
device: cuda:0
Segmentation fault (core dumped)

作者好，感谢分享代码，请问这个复现的代码有使用PA聚类重建的吗？还是只使用pse重建？

作者好，感谢分享代码，请问这个复现的代码有使用PA聚类重建的吗？还是只使用pse重建？ @WenmuZhou

why i can not load the model/when i want to predict pictures

make: Entering directory '/home/git_repo/PAN.pytorch/post_processing'
make: 'pse.so' is up to date.
make: Leaving directory '/home/git_repo/PAN.pytorch/post_processing'
Backend Qt5Agg is interactive backend. Turning interactive mode on.
QXcbConnection: XCB error: 145 (Unknown), sequence: 171, resource id: 0, major code: 139 (Unknown), minor code: 20
device: cuda:0

then it stops ,and not return anythings

The loss does not decrease

It's a great job.Before your update, I tried to train PAN, but the loss was still high until the end of the training.Does the current version support effective training and inference?

关于PAN的损失

在算agg_dis_loss的时候：
text_num = gt_text_i.max().item()+1
请问这句话的意思是什么呀，是想计算文本实例的个数对吧，
gt_text_i不是值为0或1的像素嘛，那用这句话得到的结果不一直是2嘛。

win 10 下面编译的问题

很多语法问题

重新编译的PSE.so 使用报Segmentation fault

/usr/lib64/python3.6/site-packages/torch/nn/functional.py:2479: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Segmentation fault

不能使用gpu训练

@WenmuZhou 请问大佬，训练的时候只可以启动不了gpu，这是怎么回事呢

/path/to/img.jpg path/to/label.txt有没有人知道训练数据集具体的格式

想跑代码，看不懂具体训练数据的格式

from post_processing import decode_np as decode ImportError: cannot import name 'decode_np'

hello I can not find decode_np in post_processing

OCR integration

@WenmuZhou Are you considering adding an ocr model?

cpp的pse有些问题

某些图片在cpp的pse过程会卡住，没报错一直卡住，改成pypse就可以正常运行。pypse代码里有个bug，25行应该改成for i in range(label_values): 之前是for i in label_values:，label_values是个int

Exception: ZIP entry not valid

/utils/cal_recall/rrc_evaluation_funcs.py", line 102, in load_folder_file
raise Exception('ZIP entry not valid: %s' % name)
Exception: ZIP entry not valid: res_100104531.txt

验证和eval的时候都报这个错，请问这个是什么意思，我应该怎么解决呢？

some problem about pse.cpp

谢谢大佬的分享，有一些疑问就是，在predict测试的时候，（model用resnet18的，mac os上测试）提示以下错误，不知能否给些意见？ 0.0, best wishes!

pse.cpp:49:29: error: variable-sized object may not be initialized
        float kernel_vector[label_num][5] = {0};
                            ^~~~~~~~~
1 error generated.
make: *** [pse.so] Error 1
Traceback (most recent call last):
  File "/Users/abelleon/Documents/project/PAN.pytorch-master/predict.py", line 13, in <module>
    from post_processing import decode
  File "/Users/abelleon/Documents/project/PAN.pytorch-master/post_processing/__init__.py", line 17, in <module>
    raise RuntimeError('Cannot compile pse: {}'.format(BASE_DIR))
RuntimeError: Cannot compile pse: /Users/abelleon/Documents/project/PAN.pytorch-master/post_processing

recall: 0.000000, precision: 0.000000, f1: 0.000000

Following some other comments, I'm also getting this issue of 0 f1 score when tested on ICDAR-2015 dataset

Pretrained model

Hi,
Given checkpoint is not performing very well. Can you share best checkpoints or help in reproduce that.

请问如何检测curved text呢？

Text-line detection

@WenmuZhou @guochengzhen Hi there,
If I want to train a text-line detection model, what should be my configurations?

Batch image detection & Exporting the detected boxes

@WenmuZhou Thank you for your hard work,

It would be great if you would add:

Script to detect/ inference on multiple images.
The option to export the detected boxes, perhaps .txt containing x1,y1,x2,y2,x3,y3,x4,y4

特别密集的文本会有内存溢出的情况

特别密集的文本优化内存溢出的情况，导致程序挂掉。

hello，请问有模型吗？

导入c++版pse

为啥 from .pse import pse_cpp, get_points, get_num在函数decode内部，局部导入，而不是全局导入呢，我在其他一个psenet项目中看到是全局导入，我对c++不是很了解

post_processing 中的subprocess.call

File "/Users/liubowen/Downloads/PAN.pytorch-master/post_processing/init.py", line 17, in
raise RuntimeError('Cannot compile pse: {}'.format(BASE_DIR))
RuntimeError: Cannot compile pse: /Users/liubowen/Downloads/PAN.pytorch-master/post_processing

有没有解决方案？please！！

import error

from .pse import pse_cpp, get_points, get_num

ModuleNotFoundError: No module named 'post_processing.pse'

how can i fix this problem?
thx!

关于segmentation_head的输出

我想问一下FPEM_FFM这一层的输出为什么是6个通道
self.out_conv = nn.Conv2d(in_channels=conv_out * 4, out_channels=6, kernel_size=1)

[疑问]FPEM模块

您好，在您的FPEM模块的实现中,有self.add_up和self.add_down，
这两块应该相当于是将不同大小的feature map进行up-sample和down-sample并相加后再进行feature map的融合，而这个融合模块实例应该是每次融合都不一样，而不是每次融合都是同一个模块实例，比如在Up-scale Enhancement中c5和c4down-sample相加后的add_up与c4和c3down-sample相加后的add_up实例应该是不同的两个模块实例，虽然两个add_up结构相同，但是其参数并不能共享，论文中好像也没有强调这些参数是共享的，所以我的理解是每个add_up部分应该创建不同的nn.sequential，down-sample同理。不知道我说的对不对。。。还是说我的理解有偏差
唉，语死早，感觉很难说清楚我要表达的意思

测试精度和召回率都0

在跑训练的时候遇到如下问题：
test: recall: 0.000000, precision: 0.000000, f1: 0.000000
请问这个怎么解决？谢谢！