wenmuzhou / pytorchocr Goto Github PK

View Code? Open in Web Editor NEW

1.3K 24.0 290.0 23.71 MB

基于Pytorch的OCR工具库，支持常用的文字检测和识别算法

Python 98.64% C++ 0.68% Cuda 0.50% Cython 0.18%

pytorchocr's Introduction

PytorchOCR

PytorchOCR
- 模型对齐信息
- TODO
- 使用方式
  - 数据准备
  - train
  - eval
  - infer
  - export
  - predict

从PaddleOCR转换模型到PytorchOCR

模型对齐信息

环境

torch: 2.0.1
paddle: 2.5.1
系统：win10 cpu

目录说明

ppocr目录仅做代码转换用，全部模型转换完成后删掉
padiff为权重转换工具，全部模型转换完成后删掉

对齐列表

注意：不在下述列表中的模型代表还未经过验证

模型下载地址

百度云: 链接：https://pan.baidu.com/s/17NVg9VSBmrDmbX5MmubZgQ?pwd=ppdz 提取码：ppdz

PP系列

模型	是否对齐	对齐误差	配置文件
ch_PP-OCRv4_rec_distill	X	配置不一致	config
ch_PP-OCRv4_rec_teacher	Y	1.4605024e-10	config
ch_PP-OCRv4_rec_student	Y	3.6277156e-06	config
ch_PP-OCRv4_det_student	Y	0	config
ch_PP-OCRv4_det_teacher	Y	maps 7.811429e-07 cbn_maps 1.0471307e-06	config
ch_PP-OCRv4_det_cml	Y	Student_res 0.0 Student2_res 0.0 Teacher_maps 1.1398747e-06 Teacher_cbn_maps 1.2791393e-06	config
ch_PP-OCRv3_rec	Y	4.615016e-11	config
ch_PP-OCRv3_rec_distillation.yml	Y	Teacher_head_out_res 7.470646e-10 Student_head_out_res 4.615016e-11	config
ch_PP-OCRv3_det_student	Y	1.766314e-07	config
ch_PP-OCRv3_det_cml	Y	Student_res 1.766314e-07 Student2_res 3.1212483e-07 Teacher_res 8.829421e-08	config
ch_PP-OCRv3_det_dml	Y	ok	config
cls_mv3	Y	5.9604645e-08	config

识别模型

模型	是否对齐	对齐误差	配置文件
rec_mv3_none_none_ctc	Y	2.114354e-09	config
rec_r34_vd_none_none_ctc	Y	3.920279e-08	config
rec_mv3_none_bilstm_ctc	Y	1.1861777e-09	config
rec_r34_vd_none_bilstm_ctc	Y	1.9336952e-08	config
rec_mv3_tps_bilstm_ctc	Y	1.1886948e-09	config
rec_r34_vd_tps_bilstm_ctc	N	0.0035705192	config
rec_mv3_tps_bilstm_att	Y	1.8528418e-09	config
rec_r34_vd_tps_bilstm_att	N	0.0006942689	config
rec_r31_sar	Y	7.348353e-08	config
rec_mtb_nrtr	N	res_0 8.64 res_1 0.13501492	config

TODO

功能性：

使用方式

数据准备

参考PaddleOCR

train

# 单卡
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

# 多卡
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=1 --nproc_per_node=4 tools/train.py --c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

eval

CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints=xxx.pth

infer

python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=xxx.pth

export

python tools/export.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=xxx.pth

会将模型导出为onnx格式(默认，torch script未做测试)，同时导出后处理和预处理参数

predict

# det + cls + rec
python .\tools\infer\predict_system.py --det_model_dir=path/to/det/export_dir  --cls_model_dir=path/to/cls/export_dir  --rec_model_dir=path/to/rec/export_dir  --image_dir=doc/imgs/1.jpg --use_angle_cls=true

# det
python .\tools\infer\predict_det.py --det_model_dir=path/to/det/export_dir --image_dir=doc/imgs/1.jpg

# cls
python .\tools\infer\predict_cls.py --cls_model_dir=path/to/cls/export_dir --image_dir=doc/imgs/1.jpg

# rec
python tools/infer/predict_rec.py --rec_model_dir=path/to/rec/export_dir --image_dir=doc/imgs_words/en/word_1.png

ref:

pytorchocr's People

Contributors

Stargazers

Watchers

Forkers

zonasw novioleo pkq1688 wjinhai csyhhb zenfsheng csaimd happog jinreejing afterimagex wishgale jingmouren hajungong007 jdc08161063 wushilian extreme-assistant dsp6414 lnsoso zhangandin cqray1990 teresasun wini1680 dun933 samsgood0310 julyliying lgcy jiapyliu shining-love yjdqk uvfanfan damengdameng fakegit thomaswang525 mingming008 eiffelmen advancer-debug arui1 zhangxiao339 hucongkun peternara fengxingxiang baby313 askintution tdlist jack-zhen alexjuncpp haowangyin jiachen0212 shanhedian2017 lwzbuaa zhongqianli zfxu beijing-penguin xugk lem89757 gakkifan2020 light201212 zhitao654321 gavin666github arctanxy wjj962464 ttyhu stq-hydra zhzhuangxue zhanguochang zlszhonglongshen xinwei-yang mengxiaolu lmolhw5252 zwy4896 liuheng0111 zhangguichuan sdusxw morestart beausoft sunxingxingtf terrancraft panfei748 xiaomimi3456 liutianling fishero-xzhang piaoxue88 hdmyeah liuzhuang1024 roterdl lcz1117 yale1417 poker-hudl jingjingdewo whu-wxy super-ljg lwppwl ebreak water2bear limengyang1992 bourne-m chasingdreams2020 ttthomaschan roro11 nodototaofordl

pytorchocr's Issues

padded_polygon = np.array(padding.Execute(distance)[0]) IndexError: list index out of range

以下代码报错
make_order_map.py

padded_polygon = np.array(padding.Execute(distance)[0])

File "/data2/deliancen/workpace/OCR/PytorchOCR/torchocr/datasets/det_modules/make_border_map.py", line 54, in draw_border_map
padded_polygon = np.array(padding.Execute(distance)[0])
IndexError: list index out of range

新错误类型：Attribute error

您好，出现了新的错误：
模型虽然跑了起来，但是第一个saving point之后就出现了错误。
而且accuracy的值一直是0.

望解答！由衷感谢！

dbnet预训练模型 withconfig和普通的有什么区别呢？

您好，非常感谢您提供的预训练模型，想了解下dbnet预训练模型 withconfig和普通的有什么区别呢？

paddleocr识别模型转成pytorch的权重可以共享下吗

识别模型训练时RuntimeError: CUDA out of memory

请教下各位学霸，为什么我尝试训练识别模型时显存好像在不停的增长，然后就RuntimeError: CUDA out of memory，这个要怎么解决？
（1080的老卡，bs=16，开始运行时只要3G多显存，逐渐上升。）
2020-09-10 06:55:41,145 - torchocr - INFO - [0/200] - [7650/85482] - lr:0.001 - loss:0.8423 - acc:0.2500 - norm_edit_dis:0.8858 - time:3.2507
2020-09-10 06:55:44,284 - torchocr - INFO - [0/200] - [7660/85482] - lr:0.001 - loss:0.8961 - acc:0.1875 - norm_edit_dis:0.8449 - time:3.1395
2020-09-10 06:55:47,402 - torchocr - INFO - [0/200] - [7670/85482] - lr:0.001 - loss:0.5101 - acc:0.5625 - norm_edit_dis:0.9381 - time:3.1169
2020-09-10 06:55:49,011 - torchocr - ERROR - Traceback (most recent call last):
File "tools/rec_train.py", line 237, in train
loss_dict['loss'].backward()
File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 232.00 MiB (GPU 0; 7.93 GiB total capacity; 6.42 GiB already allocated; 110.19 MiB free; 7.22 GiB reserved in total by PyTorch)
Exception raised from malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f19c980e1e2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x1e64b (0x7f19c9a6464b in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x1f464 (0x7f19c9a65464 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x1faa1 (0x7f19c9a65aa1 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: at::native::empty_cuda(c10::ArrayRef, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x11e (0x7f19cc78c52e in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xf51329 (0x7f19cabc8329 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xf6b157 (0x7f19cabe2157 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0x10e9c7d (0x7f1a0194cc7d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: + 0x10e9f97 (0x7f1a0194cf97 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::empty(c10::ArrayRef, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0xfa (0x7f1a01a57a1a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::native::empty_like(at::Tensor const&, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x49e (0x7f1a016d5c3e in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: + 0x12880c1 (0x7f1a01aeb0c1 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x12c3863 (0x7f1a01b26863 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::empty_like(at::Tensor const&, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x101 (0x7f1a01a3ab31 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: at::native::contiguous(at::Tensor const&, c10::MemoryFormat) + 0x89 (0x7f1a016f2469 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: + 0x1290470 (0x7f1a01af3470 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: + 0x12c351f (0x7f1a01b2651f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: at::Tensor::contiguous(c10::MemoryFormat) const + 0xe8 (0x7f1a01b912e8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: at::Tensor at::native::(anonymous namespace)::host_softmax_backward<at::native::(anonymous namespace)::LogSoftMaxBackwardEpilogue, true>(at::Tensor const&, at::Tensor const&, long, bool) + 0x14b (0x7f19cc01826b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #19: at::native::log_softmax_backward_cuda(at::Tensor const&, at::Tensor const&, long, at::Tensor const&) + 0x65a (0x7f19cc0026da in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #20: + 0xf3efa0 (0x7f19cabb5fa0 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #21: + 0x11141d6 (0x7f1a019771d6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: at::_log_softmax_backward_data(at::Tensor const&, at::Tensor const&, long, at::Tensor const&) + 0x119 (0x7f1a01a05649 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: + 0x2ec639f (0x7f1a0372939f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: + 0x11141d6 (0x7f1a019771d6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: at::_log_softmax_backward_data(at::Tensor const&, at::Tensor const&, long, at::Tensor const&) + 0x119 (0x7f1a01a05649 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #26: torch::autograd::generated::LogSoftmaxBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x1d7 (0x7f1a035a5057 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #27: + 0x3375bb7 (0x7f1a03bd8bb7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #28: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f1a03bd4400 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #29: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f1a03bd4fa1 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #30: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f1a03bcd119 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #31: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f1a1136dc8a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #32: + 0xc70f (0x7f1a10a3070f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #33: + 0x76ba (0x7f1a1454a6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #34: clone + 0x6d (0x7f1a1428041d in /lib/x86_64-linux-gnu/libc.so.6)

about rec_pretrained models

你好，请问是否可以提供一下您训练的预训练模型

生成自己的数据集文件

大佬，请问下怎么生成自己的训练数据集对应的字典文件

Hi！有一些bug不知道怎么引起的。

2020-07-20 01:40:12,632 - torchocr - INFO - Training... 2020-07-20 01:40:12,632 - torchocr - INFO - train dataset has 10000 samples,1250 in dataloader 2020-07-20 01:40:12,632 - torchocr - INFO - eval dataset has 10000 samples,10000 in dataloader 2020-07-20 01:40:12,765 - torchocr - ERROR - Traceback (most recent call last): File "tools/det_train.py", line 205, in train for i, batch_data in enumerate(train_loader): # traverse each batch in the epoch File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) cv2.error: Caught error in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/media/wqzhaha/TOSHIBA EXT/ocr/PytorchOCR/torchocr/datasets/DetDataSet.py", line 104, in __getitem__ im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.2.0) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

您可以试着跑一下训练，torch1.0.1 和 torch1.5.0都出现这个问题，数据集试了两个都出现这个问题。

训练报错

您好！
我完全按照教程的步骤，出现net.module 不错，模型里没有module 吧？
还有logging一直报错./weights/resnet50_vd.pth not exists，真的只要把resnet50放入weights就可以吗？
我总觉得你们的说明缺少了什么。

谢谢，期待回答！

Will language model be supported in the future?

启动训练时， keyerror 无法找到字典中的映射。

您好，我使用您的crnn.pytorch仓库中的代码进行训练，却发生了keyerror的问题。训练日志如下：

数据集是我自己准备的，数据标签文件格式如下：

img/word_20347.png 中國國際航空公司
img/word_20351.png 头等舱
img/word_20363.png 北京市科普教育基地

按照您的readme，我使用gen_key.py通过上面的数据标签生了字典文件：
迫
迹
追
送
适
选
透
递
途
通
速
造
逸
逻

使用的yaml文件如下(其实用的是您仓库里的imagedataset_None_VGG_RNN_CTC.yaml文件)：
{'name': 'crnn', 'base': ['config/image_dataset.yaml'], 'arch': {'type': 'Model', 'trans': {'type': 'None', 'input_size': [32, 320], 'num_fiducial': 20}, 'backbone': {'type': 'VGG', 'conv_type': 'BasicConv'}, 'neck': {'type': 'RNNDecoder', 'hidden_size': 256}, 'head': {'type': 'CTC'}}, 'loss': {'type': 'CTCLoss', 'blank': 0}, 'optimizer': {'type': 'Adam', 'args': {'lr': 0.001}}, 'lr_scheduler': {'type': 'StepLR', 'args': {'step_size': 30, 'gamma': 0.1}}, 'trainer': {'seed': 2, 'gpus': [0], 'epochs': 10, 'log_iter': 10, 'resume_checkpoint': '', 'finetune_checkpoint': '', 'output_dir': 'output', 'tensorboard': True}, 'dataset': {'alphabet': 'digit.txt', 'train': {'dataset': {'type': 'ImageDataset', 'args': {'data_path': [['path/train.txt']], 'data_ratio': [1.0], 'pre_processes': [{'type': 'Resize', 'args': {'img_h': 32, 'img_w': 120, 'pad': True, 'random_crop': False}}], 'transforms': [{'type': 'ToTensor', 'args': {}}], 'img_mode': 'RGB', 'ignore_chinese_punctuation': True, 'remove_blank': True}}, 'loader': {'batch_size': 16, 'shuffle': True, 'pin_memory': False, 'num_workers': 6}}, 'validate': {'dataset': {'type': 'ImageDataset', 'args': {'data_path': ['path/val.txt'], 'pre_processes': [{'type': 'Resize', 'args': {'img_h': 32, 'img_w': 120, 'pad': True, 'random_crop': False}}], 'transforms': [{'type': 'ToTensor', 'args': {}}], 'img_mode': 'RGB', 'ignore_chinese_punctuation': True, 'remove_blank': True}}, 'loader': {'batch_size': 4, 'shuffle': True, 'pin_memory': False, 'num_workers': 6}}}}

此外，我用的是国外的操作系统，文字编码问题会不会成为报错原因？
希望能得到您的答复，谢谢！

学习率调整step=60是否合理

你好，
学习率调整默认是60epoh调整一次，
我训练中文数据集，前60的epoh loss一直降不下去，验证集准确率一直是30多，
请问学习率调整step=60是否合理，是有验证过所以选择这个数值吗？

Recall on ICDAR2015 dataset

training setting:
icdar2015
resnet-50(pre-trained)

result:
the best model on icdar2015 validation set achieves recall=75.9%, precision=~88.xx%, hmean=~81.xx%

question:
recall = 75.9% vs 79.9%(reported in this repo) vs 82.7%(reported in official paper)

recall is a little lower than both in this repo and official paper. Is this normal?

检测模糊文字的思路

大佬，看了你许多源代码，收益匪浅。
请教你一个问题，我有一个坑爹的task需要识别模糊字。
但学习模糊字的特征真的是太困难了，我有个相反的思路:
用大量高清的样本去训练模型，不让模型具备泛化能力，然后在推理阶段同样用高清图片识别文字内容(我的task可以具备这样的推理环境)，理论上模糊字会识别成奇怪的内容，这样可以通过比较识别结果和原本文的差异来判断是否有模糊字。

我的task因为从头到尾预测阶段只有一个样本(只是为了判断一张发票上有没有印模糊或者印错的字，不需要考虑泛化性和鲁棒性)，您觉得这个方案可不可行。

关于crnn转mnn的一些bug

大佬你好
请问你尝试将crnn转过mnn吗？
目前发现crnn转mnn有一些问题，主要是在lstm部分，貌似lstm转mnn是不对的。。。导致mnn模型输出有错

error occure when start evaluate -rec_train.py

crnn baseline尝试

作者您好，感谢您优秀的工作，似乎目前repo中的crnn处于不可用的状态，我进行了一个简单的中文识别的尝试，但是输出中acc一直为0，loss跳变，请问是否有比较稳定的识别的分支可供参考，用以寻找bug，或者是否有什么建议可以提示我需要注意的信息

2020-07-14 09:45:39,163 - torchocr - INFO - [80/200] - [1300/1587] - lr:8.271806125530277e-28 - loss:4.5751 - acc:0.0312 - norm_edit_dis:0.2261 - time:51.0078
2020-07-14 09:46:29,372 - torchocr - INFO - [80/200] - [1400/1587] - lr:8.271806125530277e-28 - loss:4.9632 - acc:0.0000 - norm_edit_dis:0.1499 - time:50.2079
2020-07-14 09:47:19,655 - torchocr - INFO - [80/200] - [1500/1587] - lr:8.271806125530277e-28 - loss:5.0862 - acc:0.0000 - norm_edit_dis:0.1370 - time:50.2833
2020-07-14 09:48:54,075 - torchocr - INFO - [81/200] - [100/1587] - lr:4.1359030627651385e-28 - loss:4.6698 - acc:0.0000 - norm_edit_dis:0.2278 - time:50.8783
2020-07-14 09:49:43,349 - torchocr - INFO - [81/200] - [200/1587] - lr:4.1359030627651385e-28 - loss:4.7732 - acc:0.0000 - norm_edit_dis:0.1903 - time:49.2733
2020-07-14 09:50:32,552 - torchocr - INFO - [81/200] - [300/1587] - lr:4.1359030627651385e-28 - loss:4.7276 - acc:0.0156 - norm_edit_dis:0.2363 - time:49.2031
2020-07-14 09:51:22,197 - torchocr - INFO - [81/200] - [400/1587] - lr:4.1359030627651385e-28 - loss:5.2084 - acc:0.0156 - norm_edit_dis:0.1432 - time:49.6451
2020-07-14 09:52:12,403 - torchocr - INFO - [81/200] - [500/1587] - lr:4.1359030627651385e-28 - loss:4.4954 - acc:0.0469 - norm_edit_dis:0.2290 - time:50.2050
2020-07-14 09:53:04,682 - torchocr - INFO - [81/200] - [600/1587] - lr:4.1359030627651385e-28 - loss:4.8725 - acc:0.0000 - norm_edit_dis:0.1751 - time:52.2787
2020-07-14 09:53:54,543 - torchocr - INFO - [81/200] - [700/1587] - lr:4.1359030627651385e-28 - loss:4.5594 - acc:0.0000 - norm_edit_dis:0.2187 - time:49.8607

您好，请问该代码在windows10可以跑通吗？

我报了如下错误：OSError:[WinError 126]找不到指定的模块。

同样训练准确率一直为0

用了lmdb和textline两个格式的数据集训练出来的accuracy都为0. 崩溃了。
比如下图:

是不是linux环境下textline前面的img_path都要是绝对路径？
比如改成/home/user/Pytorch-master/imageset.*.png之类的？
望解答，谢谢！

CTPN

Hi, interesting work.
Are there plans to implement CTPN?

使用mjsynth训练rec时出错

首先我使用您提供的脚本完成数据集的转换，转换后的数据集如下：

mnt/ramdisk/max/90kDICT32px/2697/6/466_MONIKER_49537.jpg MONIKER mnt/ramdisk/max/90kDICT32px/2697/6/465_Ecclesiastics_24500.jpg Ecclesiastics

我使用转换后数据集训练文本识别模型，执行如下命令，会出现随机的段错误或者内存错误：

python ./tools/rec_train.py --config ./config/rec_train_config_self.py

其中rec_train_config_self.py的部分内容如下（我只修改了字典、训练集、验证集的路径）：

`
config.loss = {
'type': 'CTCLoss',
'blank_idx': 0,
}
config.dataset = {
# 'alphabet': r'path/dic.txt',
'alphabet': r'torchocr/datasets/alphabets/enAlphaNumPunc90.txt',
# 'alphabet': r'torchocr/datasets/alphabets/digit.txt',
'train': {
'dataset': {
'type': 'RecTextLineDataset',
'file': r'path/mnt/ramdisk/max/90kDICT32px/annotation_train_other.txt',
'input_h': 32,
'mean': 0.5,
'std': 0.5,
'augmentation': False,
},
'loader': {
'type': 'DataLoader', # 使用torch dataloader只需要改为 DataLoader
'batch_size': 4,
'shuffle': True,
'num_workers': 1,
'collate_fn': {
'type': 'RecCollateFn',
'img_w': 120
}
}
},
'eval': {
'dataset': {
'type': 'RecTextLineDataset',
'file': r'path/mnt/ramdisk/max/90kDICT32px/annotation_val_other.txt',
'input_h': 32,
'mean': 0.5,
'std': 0.5,
'augmentation': False,
},
'loader': {
'type': 'RecDataLoader',
'batch_size': 4,
'shuffle': False,
'num_workers': 1,
'collate_fn': {
'type': 'RecCollateFn',
'img_w': 120
}
}
}
}

在执行上述训练命令后，会出现随机错误。日志如下（我在您的每一行代码下加了print，便于定位错误的位置）：

` train loop: 1

0
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(6.5425, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
1
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
2
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
3
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
4
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
5
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
Segmentation fault (core dumped)`

希望得到您的回复和指导，非常感谢！

怎么改变GPUid？

如题，怎么修改GPUID? 现在是默认‘cuda:0’，我修改为其他的id都会报错。。。

很赞的项目

建议将字符设置为变量。方便引用。
添加label过滤器。
设置字符特征统计函数，动态设置训练宽高比。

error: random_crop_data.py

poly [[1533.5558 3912.3967]
[1289.3611 3954.4272]
[1296.0596 3993.3525]
[1540.2543 3951.322 ]]2020-09-23 15:54:25,165 - torchocr - ERROR - Traceback (most recent call last):
File "tools/det_train.py", line 211, in train
for i, batch_data in enumerate(train_loader): # traverse each batch in the epoch
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nlp/lmy/PytorchOCR/torchocr/datasets/DetDataSet.py", line 103, in getitem
data = self.apply_pre_processes(data)
File "/home/nlp/lmy/PytorchOCR/torchocr/datasets/DetDataSet.py", line 92, in apply_pre_processes
data = aug(data)
File "/home/nlp/lmy/PytorchOCR/torchocr/datasets/det_modules/random_crop_data.py", line 54, in call
poly = ((poly - (crop_x, crop_y)) * scale).tolist()
TypeError: unsupported operand type(s) for -: 'list' and 'tuple'

crop_x,crop_y (0, 0)

训练出来的CRNN 推理时没结果

详细请见图

预测结果和概率值都为空，感觉是model没有读出来？

mobilenetv3模型问题

大佬您好：请问为什么我使用预训练模型 MobileNetV3_large_x0_5.pth 训练 icdar2015 得到的 MobileNetV3 模型有21M多？去除梯度信息后还有7M多，大约是您百度网盘里模型的两倍，您是如何训练得到这么小的模型？谢谢！

【数据合成代码收集】

https://github.com/airob0t/idcardgenerator
用于合成身份证图像

【任务认领】第一阶段复现paddleocr

这是第一阶段复现paddle ocr的任务认领，对哪个模块有兴趣的小伙伴可以at我，然后添加任务
backbone：

实现mbv3（检测+识别）并能加载paddle ocr模型，且保证diff小于1e-6 @novioleo @WenmuZhou
实现res34（检测+识别）并能加载paddle ocr模型，且保证diff小于1e-6 @novioleo @WenmuZhou

neck:

实现fpn @Adolf69

head:

实现db @WenmuZhou
实现east
实现识别的attention head

rec：

实现crnn（mb和res34）并能加载paddle ocr模型 @novioleo

paddle 里面的dynamic lstm无法在pytorch中有对应的，这块会有差异

det:

实现east
实现db @WenmuZhou

utils：

实现模型转换 @novioleo

train：

基础训练代码 @Jinreejing
基础数据增广代码 @wushilian

data：

icdar2015 rec数据加载 @ZenFSheng @novioleo
icdar2015 det数据加载 @ZenFSheng @novioleo

demo：

python版demo实现
可视化页面 @afterimagex
两阶段的ocr的推断用demo @novioleo

其他资源：

不同语言语料的收集
不同语言不同字体的收集与归档

感谢作者的工作，代码写的非常有参考意义，我尝试将dbnet模型转换成onnx，但是失败了，报RuntimeError: ONNX export failed: Couldn't export operator aten::threshold，目前在排查中，不知作者有否尝试过

maxpool后会高度为0

PytorchOCR/torchocr/networks/backbones/RecMobileNetV3.py

Line 136 in 4a4f705

self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)

if __name__ == '__main__':
    import torch
    from torchsummary import summary

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    net = MobileNetV3(3, scale=0.5, model_name='small').to(device)
    # print(net)
    summary(net, input_size=(3, 32, 320))

会报错，我改成self.pool = nn.MaxPool2d(kernel_size=(1, 2), stride=(1, 2), padding=0)可以运行

讯飞数据集使用DBNet loss不动

我使用讯飞的数据集训练，用了作者的数据转换脚本，
下载使用的resnet50预训练模型，
模型其他的配置都没有改，
在训练的时候，发现loss从头到尾都是1.000, loss_shrink和loss_threshold一直是0.
我想问一下这种情况是怎么回事。

在预测的时候我我用了很多张图片发现每张图都没有点，预测出来的全为空。
希望大佬能指点一下，谢谢谢谢！

请教大佬们，DBNet多卡训练如何设置啊？

训练的时候可以设置成多卡，但是验证的时候就会报错~大佬解决了这个问题了吗？

模型权重下载百度云密码错误

icdar2015训练DB，precision 0.92 recall 0.63

召回率有点偏低，请问是不是跟数据集过少有关。
不懂就问有什么提升召回率的思路？

在ICDAR2015上训练CRNN，eval_acc很低

如图

这个代码用起来真的是相当难受了。。。

参数初始化?

请问为何采用预设的参数初始化而不用其他的? 例如He initialization

训练准确率一直为0.。。。

大佬，我按照文档配置自己的路径之类，
训练准确率一直为0，，，，
eval阶段准确率也是0.。。。。

decode with no CTC?

PytorchOCR/tools/rec_infer.py

Line 33 in 5028c9d

txt = self.converter.decode(out.softmax(dim=2).detach().cpu().numpy())

TypeError: 'NoneType' object is not callable

2020-07-16 09:45:12,213 - torchocr - ERROR - Traceback (most recent call last):
File "tools/det_train.py", line 205, in train
for i, batch_data in enumerate(train_loader): # traverse each batch in the epoch
File "/home/mlp/python/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/home/mlp/python/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/home/mlp/python/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
TypeError: 'NoneType' object is not callable

tools文件夹下的minimum_2stage_inference.py的使用方法

作者，您好！我对您们所做的这个代码一直都有在关注，现在我遇到的情况是，det_infer/det_train和rec_infer/rec_train都在自己的电脑上跑通了，但是minimum_2stage_inference.py这个没办法去跑通它，如何在这个代码中放入图片后，出检测框和文字识别的结果一直困扰着我，希望您可以给出2stage_inference的readme文件，非常感谢！希望能得到您的回复，谢谢！

读取8位图报错

你好
运行det_train.py遇到位深度位8的图片会报错

遇到8位图时 im = cv2.imread(data['img_path'], 1 if self.img_mode != 'GRAY' else 0)
这句结果是None

cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-kne9u3r2/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

错误code位置：

    def __getitem__(self, index):
        # try:
        data = copy.deepcopy(self.data_list[index])
        print(data['img_path'])
        im = cv2.imread(data['img_path'], 1 if self.img_mode != 'GRAY' else 0)  
        if self.img_mode == 'RGB':
            im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)   #报错
        data['img'] = im
        data['shape'] = [im.shape[0], im.shape[1]]
        data = self.apply_pre_processes(data)

请问如何解决

Chinese dataset training and demo support?

Does there any training support on both Chinese charactors detection and recognition?

Can you provide the mbv3 config file?

Alternate download link for models on baidu

请发布mega.nz或GoogleDrive

rec_train.py训练时准确率一直为0

你好，我这里使用icdar 2017数据集训练，发现准确率一直为0，使用的配置项为rec_train_config.py，没有做什么太大的修改，只是修改了batch_size，alphabet的路径，dataset的路径，训练时log的打印如下：

期间进入到RecMetric.py中查看模型的输出值predictions，发现里面预测的结果都为0,0,0...0的tensor

训练数据的标注如下（图片路径与文字内容中间以\t进行分割）：

E:\DataSets\icdar2017rctw\icdar2017rctw\recognition\train\image_0_0.jpg 金氏眼镜
E:\DataSets\icdar2017rctw\icdar2017rctw\recognition\train\image_0_1.jpg 创于1989
E:\DataSets\icdar2017rctw\icdar2017rctw\recognition\train\image_0_2.jpg 城建店

alphabet中的信息如下，一共5529个字符，加上blank的话那么最后的类别数n_class设置为5530：

另外config中的设置项：
2020-11-05 15:10:33,337 - torchocr - INFO - {'exp_name': 'CRNN', 'train_options': {'resume_from': '', 'third_party_name': '', 'checkpoint_save_dir': './output/CRNN/checkpoint', 'device': 'cuda:0', 'epochs': 200, 'fine_tune_stage': ['backbone', 'neck', 'head'], 'print_interval': 20, 'val_interval': 3000, 'ckpt_save_type': 'HighestAcc', 'ckpt_save_epoch': 4}, 'SEED': 927, 'optimizer': {'type': 'Adam', 'lr': 0.001, 'weight_decay': 0.0001}, 'lr_scheduler': {'type': 'StepLR', 'step_size': 60, 'gamma': 0.1}, 'model': {'type': 'RecModel', 'backbone': {'type': 'ResNet', 'layers': 18}, 'neck': {'type': 'PPaddleRNN'}, 'head': {'type': 'CTC', 'n_class': 5530}, 'in_channels': 3}, 'loss': {'type': 'CTCLoss', 'blank_idx': 0}, 'dataset': {'alphabet': 'E:/pro/ncnn_ocr/models/keys.txt', 'train': {'dataset': {'type': 'RecTextLineDataset', 'file': 'E:/pro/chineseocr-master/train/ocr/txt/icdar2017Backup.txt', 'input_h': 32, 'mean': 0.5, 'std': 0.5, 'augmentation': False}, 'loader': {'type': 'DataLoader', 'batch_size': 4, 'shuffle': True, 'num_workers': 1, 'collate_fn': {'type': 'RecCollateFn', 'img_w': 120}}}, 'eval': {'dataset': {'type': 'RecTextLineDataset', 'file': 'E:/pro/chineseocr-master/train/ocr/txt/2017valBackup.txt', 'input_h': 32, 'mean': 0.5, 'std': 0.5, 'augmentation': False}, 'loader': {'type': 'RecDataLoader', 'batch_size': 4, 'shuffle': False, 'num_workers': 1, 'collate_fn': {'type': 'RecCollateFn', 'img_w': 120}}}}}

希望得到回复，谢谢！

When the length of the input label is zero, the error of divisor 0 will appear in the calculation of editing distance at the beginning of training

PytorchOCR/tools/rec_train.py

Line 234 in 866651e

acc_dict = metric(output, batch_data['label'])

PytorchOCR/torchocr/metrics/RecMetric.py

Lines 22 to 24 in 866651e

    
           for (pred, pred_conf), target in zip(preds_str, labels): 
        
               norm_edit_dis += Levenshtein.distance(pred, target) / max(len(pred), len(target)) 
        
               show_str.append(f'{pred} -> {target}')

If the length exceeds the set length, it also needs to be filtered. At present, I did not find relevant Settings in config file

PytorchOCR/torchocr/datasets/RecDataSet.py

Lines 90 to 95 in 5028c9d

    
           # todo 添加 过滤最长 
        
           # if len(label) > config.max_len: 
        
           #     # print(f'The length of the label is longer than max_length: length 
        
           #     # {len(label)}, {label} in dataset {self.root}') 
        
           #     continue

PytorchOCR/config/rec_train_lmdb_config.py

Lines 73 to 80 in 866651e

    
           'dataset': { 
        
               'type': 'RecLmdbDataset', 
        
               'file': r'path/lmdb/train',  # LMDB 数据集路径 
        
               'input_h': 32, 
        
               'mean': 0.5, 
        
               'std': 0.5, 
        
               'augmentation': False, 
        
           },

	for (pred, pred_conf), target in zip(preds_str, labels):
	norm_edit_dis += Levenshtein.distance(pred, target) / max(len(pred), len(target))
	show_str.append(f'{pred} -> {target}')

	# todo 添加过滤最长
	# if len(label) > config.max_len:
	# # print(f'The length of the label is longer than max_length: length
	# # {len(label)}, {label} in dataset {self.root}')
	# continue

	'dataset': {
	'type': 'RecLmdbDataset',
	'file': r'path/lmdb/train', # LMDB 数据集路径
	'input_h': 32,
	'mean': 0.5,
	'std': 0.5,
	'augmentation': False,
	},