Giter Site home page Giter Site logo

pytorchocr's Introduction

PytorchOCR

从PaddleOCR转换模型到PytorchOCR

模型对齐信息

环境

  • torch: 2.0.1
  • paddle: 2.5.1
  • 系统:win10 cpu

目录说明

  • ppocr目录仅做代码转换用,全部模型转换完成后删掉
  • padiff为权重转换工具,全部模型转换完成后删掉

对齐列表

注意:不在下述列表中的模型代表还未经过验证

模型下载地址

百度云: 链接:https://pan.baidu.com/s/17NVg9VSBmrDmbX5MmubZgQ?pwd=ppdz 提取码:ppdz

PP系列

模型 是否对齐 对齐误差 配置文件
ch_PP-OCRv4_rec_distill X 配置不一致 config
ch_PP-OCRv4_rec_teacher Y 1.4605024e-10 config
ch_PP-OCRv4_rec_student Y 3.6277156e-06 config
ch_PP-OCRv4_det_student Y 0 config
ch_PP-OCRv4_det_teacher Y maps 7.811429e-07
cbn_maps 1.0471307e-06
config
ch_PP-OCRv4_det_cml Y Student_res 0.0
Student2_res 0.0
Teacher_maps 1.1398747e-06
Teacher_cbn_maps 1.2791393e-06
config
ch_PP-OCRv3_rec Y 4.615016e-11 config
ch_PP-OCRv3_rec_distillation.yml Y Teacher_head_out_res 7.470646e-10
Student_head_out_res 4.615016e-11
config
ch_PP-OCRv3_det_student Y 1.766314e-07 config
ch_PP-OCRv3_det_cml Y Student_res 1.766314e-07
Student2_res 3.1212483e-07
Teacher_res 8.829421e-08
config
ch_PP-OCRv3_det_dml Y ok config
cls_mv3 Y 5.9604645e-08 config

识别模型

模型 是否对齐 对齐误差 配置文件
rec_mv3_none_none_ctc Y 2.114354e-09 config
rec_r34_vd_none_none_ctc Y 3.920279e-08 config
rec_mv3_none_bilstm_ctc Y 1.1861777e-09 config
rec_r34_vd_none_bilstm_ctc Y 1.9336952e-08 config
rec_mv3_tps_bilstm_ctc Y 1.1886948e-09 config
rec_r34_vd_tps_bilstm_ctc N 0.0035705192 config
rec_mv3_tps_bilstm_att Y 1.8528418e-09 config
rec_r34_vd_tps_bilstm_att N 0.0006942689 config
rec_r31_sar Y 7.348353e-08 config
rec_mtb_nrtr N res_0 8.64
res_1 0.13501492
config

TODO

功能性:

  • 端到端推理
  • det推理
  • rec推理
  • cls推理
  • 导出为onnx
  • onnx推理
  • tensorrt 推理
  • 训练,评估,测试

使用方式

数据准备

参考PaddleOCR

train

# 单卡
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

# 多卡
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=1 --nproc_per_node=4 tools/train.py --c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

eval

CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints=xxx.pth

infer

python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=xxx.pth

export

python tools/export.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=xxx.pth

会将模型导出为onnx格式(默认,torch script未做测试),同时导出后处理和预处理参数

predict

# det + cls + rec
python .\tools\infer\predict_system.py --det_model_dir=path/to/det/export_dir  --cls_model_dir=path/to/cls/export_dir  --rec_model_dir=path/to/rec/export_dir  --image_dir=doc/imgs/1.jpg --use_angle_cls=true

# det
python .\tools\infer\predict_det.py --det_model_dir=path/to/det/export_dir --image_dir=doc/imgs/1.jpg

# cls
python .\tools\infer\predict_cls.py --cls_model_dir=path/to/cls/export_dir --image_dir=doc/imgs/1.jpg

# rec
python tools/infer/predict_rec.py --rec_model_dir=path/to/rec/export_dir --image_dir=doc/imgs_words/en/word_1.png

ref:

  1. https://github.com/PaddlePaddle/PaddleOCR
  2. https://github.com/frotms/PaddleOCR2Pytorch

pytorchocr's People

Contributors

afterimagex avatar bourne-m avatar dependabot[bot] avatar ebreak avatar jinreejing avatar light201212 avatar morestart avatar novioleo avatar wenmuzhou avatar wushilian avatar yuantangliang avatar zenfsheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorchocr's Issues

新错误类型:Attribute error

您好,出现了新的错误:
模型虽然跑了起来,但是第一个saving point之后就出现了错误。
而且accuracy的值一直是0.

error2first
error_log2

望解答!由衷感谢!

识别模型训练时RuntimeError: CUDA out of memory

请教下各位学霸,为什么我尝试训练识别模型时显存好像在不停的增长,然后就RuntimeError: CUDA out of memory,这个要怎么解决?
(1080的老卡,bs=16,开始运行时只要3G多显存,逐渐上升。)
2020-09-10 06:55:41,145 - torchocr - INFO - [0/200] - [7650/85482] - lr:0.001 - loss:0.8423 - acc:0.2500 - norm_edit_dis:0.8858 - time:3.2507
2020-09-10 06:55:44,284 - torchocr - INFO - [0/200] - [7660/85482] - lr:0.001 - loss:0.8961 - acc:0.1875 - norm_edit_dis:0.8449 - time:3.1395
2020-09-10 06:55:47,402 - torchocr - INFO - [0/200] - [7670/85482] - lr:0.001 - loss:0.5101 - acc:0.5625 - norm_edit_dis:0.9381 - time:3.1169
2020-09-10 06:55:49,011 - torchocr - ERROR - Traceback (most recent call last):
File "tools/rec_train.py", line 237, in train
loss_dict['loss'].backward()
File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 232.00 MiB (GPU 0; 7.93 GiB total capacity; 6.42 GiB already allocated; 110.19 MiB free; 7.22 GiB reserved in total by PyTorch)
Exception raised from malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f19c980e1e2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x1e64b (0x7f19c9a6464b in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x1f464 (0x7f19c9a65464 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x1faa1 (0x7f19c9a65aa1 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: at::native::empty_cuda(c10::ArrayRef, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x11e (0x7f19cc78c52e in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xf51329 (0x7f19cabc8329 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xf6b157 (0x7f19cabe2157 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0x10e9c7d (0x7f1a0194cc7d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: + 0x10e9f97 (0x7f1a0194cf97 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::empty(c10::ArrayRef, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0xfa (0x7f1a01a57a1a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::native::empty_like(at::Tensor const&, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x49e (0x7f1a016d5c3e in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: + 0x12880c1 (0x7f1a01aeb0c1 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x12c3863 (0x7f1a01b26863 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::empty_like(at::Tensor const&, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x101 (0x7f1a01a3ab31 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: at::native::contiguous(at::Tensor const&, c10::MemoryFormat) + 0x89 (0x7f1a016f2469 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: + 0x1290470 (0x7f1a01af3470 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: + 0x12c351f (0x7f1a01b2651f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: at::Tensor::contiguous(c10::MemoryFormat) const + 0xe8 (0x7f1a01b912e8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: at::Tensor at::native::(anonymous namespace)::host_softmax_backward<at::native::(anonymous namespace)::LogSoftMaxBackwardEpilogue, true>(at::Tensor const&, at::Tensor const&, long, bool) + 0x14b (0x7f19cc01826b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #19: at::native::log_softmax_backward_cuda(at::Tensor const&, at::Tensor const&, long, at::Tensor const&) + 0x65a (0x7f19cc0026da in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #20: + 0xf3efa0 (0x7f19cabb5fa0 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #21: + 0x11141d6 (0x7f1a019771d6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: at::_log_softmax_backward_data(at::Tensor const&, at::Tensor const&, long, at::Tensor const&) + 0x119 (0x7f1a01a05649 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: + 0x2ec639f (0x7f1a0372939f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: + 0x11141d6 (0x7f1a019771d6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: at::_log_softmax_backward_data(at::Tensor const&, at::Tensor const&, long, at::Tensor const&) + 0x119 (0x7f1a01a05649 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #26: torch::autograd::generated::LogSoftmaxBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x1d7 (0x7f1a035a5057 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #27: + 0x3375bb7 (0x7f1a03bd8bb7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #28: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f1a03bd4400 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #29: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f1a03bd4fa1 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #30: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f1a03bcd119 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #31: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f1a1136dc8a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #32: + 0xc70f (0x7f1a10a3070f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #33: + 0x76ba (0x7f1a1454a6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #34: clone + 0x6d (0x7f1a1428041d in /lib/x86_64-linux-gnu/libc.so.6)

Hi!有一些bug不知道怎么引起的。

2020-07-20 01:40:12,632 - torchocr - INFO - Training... 2020-07-20 01:40:12,632 - torchocr - INFO - train dataset has 10000 samples,1250 in dataloader 2020-07-20 01:40:12,632 - torchocr - INFO - eval dataset has 10000 samples,10000 in dataloader 2020-07-20 01:40:12,765 - torchocr - ERROR - Traceback (most recent call last): File "tools/det_train.py", line 205, in train for i, batch_data in enumerate(train_loader): # traverse each batch in the epoch File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) cv2.error: Caught error in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/wqzhaha/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/media/wqzhaha/TOSHIBA EXT/ocr/PytorchOCR/torchocr/datasets/DetDataSet.py", line 104, in __getitem__ im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.2.0) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

您可以试着跑一下训练,torch1.0.1 和 torch1.5.0都出现这个问题,数据集试了两个都出现这个问题。

训练报错

您好!
我完全按照教程的步骤,出现net.module 不错,模型里没有module 吧 ?
还有logging一直报错./weights/resnet50_vd.pth not exists, 真的只要把resnet50放入weights就可以吗 ?
我总觉得你们的说明缺少了什么。

谢谢,期待回答!

启动训练时, keyerror 无法找到字典中的映射。

您好,我使用您的crnn.pytorch仓库中的代码进行训练,却发生了keyerror的问题。训练日志如下:
error_log

数据集是我自己准备的,数据标签文件格式如下:

img/word_20347.png 中國國際航空公司
img/word_20351.png 头等舱
img/word_20363.png 北京市科普教育基地

按照您的readme,我使用gen_key.py通过上面的数据标签生了字典文件:













使用的yaml文件如下(其实用的是您仓库里的imagedataset_None_VGG_RNN_CTC.yaml文件):
{'name': 'crnn', 'base': ['config/image_dataset.yaml'], 'arch': {'type': 'Model', 'trans': {'type': 'None', 'input_size': [32, 320], 'num_fiducial': 20}, 'backbone': {'type': 'VGG', 'conv_type': 'BasicConv'}, 'neck': {'type': 'RNNDecoder', 'hidden_size': 256}, 'head': {'type': 'CTC'}}, 'loss': {'type': 'CTCLoss', 'blank': 0}, 'optimizer': {'type': 'Adam', 'args': {'lr': 0.001}}, 'lr_scheduler': {'type': 'StepLR', 'args': {'step_size': 30, 'gamma': 0.1}}, 'trainer': {'seed': 2, 'gpus': [0], 'epochs': 10, 'log_iter': 10, 'resume_checkpoint': '', 'finetune_checkpoint': '', 'output_dir': 'output', 'tensorboard': True}, 'dataset': {'alphabet': 'digit.txt', 'train': {'dataset': {'type': 'ImageDataset', 'args': {'data_path': [['path/train.txt']], 'data_ratio': [1.0], 'pre_processes': [{'type': 'Resize', 'args': {'img_h': 32, 'img_w': 120, 'pad': True, 'random_crop': False}}], 'transforms': [{'type': 'ToTensor', 'args': {}}], 'img_mode': 'RGB', 'ignore_chinese_punctuation': True, 'remove_blank': True}}, 'loader': {'batch_size': 16, 'shuffle': True, 'pin_memory': False, 'num_workers': 6}}, 'validate': {'dataset': {'type': 'ImageDataset', 'args': {'data_path': ['path/val.txt'], 'pre_processes': [{'type': 'Resize', 'args': {'img_h': 32, 'img_w': 120, 'pad': True, 'random_crop': False}}], 'transforms': [{'type': 'ToTensor', 'args': {}}], 'img_mode': 'RGB', 'ignore_chinese_punctuation': True, 'remove_blank': True}}, 'loader': {'batch_size': 4, 'shuffle': True, 'pin_memory': False, 'num_workers': 6}}}}

此外,我用的是国外的操作系统,文字编码问题会不会成为报错原因?
希望能得到您的答复,谢谢!

学习率调整step=60是否合理

你好,
学习率调整默认是60epoh调整一次,
我训练中文数据集,前60的epoh loss一直降不下去,验证集准确率一直是30多,
请问学习率调整step=60是否合理,是有验证过所以选择这个数值吗?

Recall on ICDAR2015 dataset

training setting:
icdar2015
resnet-50(pre-trained)

result:
the best model on icdar2015 validation set achieves recall=75.9%, precision=~88.xx%, hmean=~81.xx%

question:
recall = 75.9% vs 79.9%(reported in this repo) vs 82.7%(reported in official paper)

recall is a little lower than both in this repo and official paper. Is this normal?

检测模糊文字的思路

大佬,看了你许多源代码,收益匪浅。
请教你一个问题,我有一个坑爹的task需要识别模糊字。
但学习模糊字的特征真的是太困难了,我有个相反的思路:
用大量高清的样本去训练模型,不让模型具备泛化能力,然后在推理阶段同样用高清图片识别文字内容(我的task可以具备这样的推理环境),理论上模糊字会识别成奇怪的内容,这样可以通过比较识别结果和原本文的差异来判断是否有模糊字。

我的task因为从头到尾预测阶段只有一个样本(只是为了判断一张发票上有没有印模糊或者印错的字,不需要考虑泛化性和鲁棒性), 您觉得这个方案可不可行。

关于crnn转mnn的一些bug

大佬你好
请问你尝试将crnn转过mnn吗?
目前发现crnn转mnn有一些问题,主要是在lstm部分,貌似lstm转mnn是不对的。。。导致mnn模型输出有错

crnn baseline尝试

作者您好,感谢您优秀的工作,似乎目前repo中的crnn处于不可用的状态,我进行了一个简单的中文识别的尝试,但是输出中acc一直为0,loss跳变, 请问是否有比较稳定的识别的分支可供参考,用以寻找bug,或者是否有什么建议可以提示我需要注意的信息

2020-07-14 09:45:39,163 - torchocr - INFO - [80/200] - [1300/1587] - lr:8.271806125530277e-28 - loss:4.5751 - acc:0.0312 - norm_edit_dis:0.2261 - time:51.0078
2020-07-14 09:46:29,372 - torchocr - INFO - [80/200] - [1400/1587] - lr:8.271806125530277e-28 - loss:4.9632 - acc:0.0000 - norm_edit_dis:0.1499 - time:50.2079
2020-07-14 09:47:19,655 - torchocr - INFO - [80/200] - [1500/1587] - lr:8.271806125530277e-28 - loss:5.0862 - acc:0.0000 - norm_edit_dis:0.1370 - time:50.2833
2020-07-14 09:48:54,075 - torchocr - INFO - [81/200] - [100/1587] - lr:4.1359030627651385e-28 - loss:4.6698 - acc:0.0000 - norm_edit_dis:0.2278 - time:50.8783
2020-07-14 09:49:43,349 - torchocr - INFO - [81/200] - [200/1587] - lr:4.1359030627651385e-28 - loss:4.7732 - acc:0.0000 - norm_edit_dis:0.1903 - time:49.2733
2020-07-14 09:50:32,552 - torchocr - INFO - [81/200] - [300/1587] - lr:4.1359030627651385e-28 - loss:4.7276 - acc:0.0156 - norm_edit_dis:0.2363 - time:49.2031
2020-07-14 09:51:22,197 - torchocr - INFO - [81/200] - [400/1587] - lr:4.1359030627651385e-28 - loss:5.2084 - acc:0.0156 - norm_edit_dis:0.1432 - time:49.6451
2020-07-14 09:52:12,403 - torchocr - INFO - [81/200] - [500/1587] - lr:4.1359030627651385e-28 - loss:4.4954 - acc:0.0469 - norm_edit_dis:0.2290 - time:50.2050
2020-07-14 09:53:04,682 - torchocr - INFO - [81/200] - [600/1587] - lr:4.1359030627651385e-28 - loss:4.8725 - acc:0.0000 - norm_edit_dis:0.1751 - time:52.2787
2020-07-14 09:53:54,543 - torchocr - INFO - [81/200] - [700/1587] - lr:4.1359030627651385e-28 - loss:4.5594 - acc:0.0000 - norm_edit_dis:0.2187 - time:49.8607

同样训练准确率一直为0

用了lmdb和textline两个格式的数据集训练出来的accuracy都为0. 崩溃了。
比如下图:
image

是不是linux环境下textline前面的img_path都要是绝对路径?
比如改成/home/user/Pytorch-master/imageset.*.png之类的?
望解答,谢谢!

CTPN

Hi, interesting work.
Are there plans to implement CTPN?

使用mjsynth训练rec时出错

首先我使用您提供的脚本完成数据集的转换,转换后的数据集如下:

mnt/ramdisk/max/90kDICT32px/2697/6/466_MONIKER_49537.jpg MONIKER mnt/ramdisk/max/90kDICT32px/2697/6/465_Ecclesiastics_24500.jpg Ecclesiastics

我使用转换后数据集训练文本识别模型,执行如下命令,会出现随机的段错误或者内存错误:

python ./tools/rec_train.py --config ./config/rec_train_config_self.py

其中rec_train_config_self.py的部分内容如下(我只修改了字典、训练集、验证集的路径):

`
config.loss = {
'type': 'CTCLoss',
'blank_idx': 0,
}
config.dataset = {
# 'alphabet': r'path/dic.txt',
'alphabet': r'torchocr/datasets/alphabets/enAlphaNumPunc90.txt',
# 'alphabet': r'torchocr/datasets/alphabets/digit.txt',
'train': {
'dataset': {
'type': 'RecTextLineDataset',
'file': r'path/mnt/ramdisk/max/90kDICT32px/annotation_train_other.txt',
'input_h': 32,
'mean': 0.5,
'std': 0.5,
'augmentation': False,
},
'loader': {
'type': 'DataLoader', # 使用torch dataloader只需要改为 DataLoader
'batch_size': 4,
'shuffle': True,
'num_workers': 1,
'collate_fn': {
'type': 'RecCollateFn',
'img_w': 120
}
}
},
'eval': {
'dataset': {
'type': 'RecTextLineDataset',
'file': r'path/mnt/ramdisk/max/90kDICT32px/annotation_val_other.txt',
'input_h': 32,
'mean': 0.5,
'std': 0.5,
'augmentation': False,
},
'loader': {
'type': 'RecDataLoader',
'batch_size': 4,
'shuffle': False,
'num_workers': 1,
'collate_fn': {
'type': 'RecCollateFn',
'img_w': 120
}
}
}
}

`

在执行上述训练命令后,会出现随机错误。日志如下(我在您的每一行代码下加了print,便于定位错误的位置):

` train loop: 1

0
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(6.5425, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
1
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
2
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
3
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
4
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
[INFO] end backward
[INFO] end clip_grad_norm_
[INFO] end optimizer step
5
[INFO] end zero_grad
[INFO] end forward
[INFO] loss: tensor(nan, grad_fn=)
[INFO] end loss
Segmentation fault (core dumped)`

希望得到您的回复和指导,非常感谢!

怎么改变GPUid?

如题,怎么修改GPUID? 现在是默认‘cuda:0’,我修改为其他的id都会报错。。。

很赞的项目

  • 建议将字符设置为变量。方便引用。
  • 添加label过滤器。
  • 设置字符特征统计函数,动态设置训练宽高比。

error: random_crop_data.py


poly [[1533.5558 3912.3967]
[1289.3611 3954.4272]
[1296.0596 3993.3525]
[1540.2543 3951.322 ]]2020-09-23 15:54:25,165 - torchocr - ERROR - Traceback (most recent call last):
File "tools/det_train.py", line 211, in train
for i, batch_data in enumerate(train_loader): # traverse each batch in the epoch
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nlp/lmy/PytorchOCR/torchocr/datasets/DetDataSet.py", line 103, in getitem
data = self.apply_pre_processes(data)
File "/home/nlp/lmy/PytorchOCR/torchocr/datasets/DetDataSet.py", line 92, in apply_pre_processes
data = aug(data)
File "/home/nlp/lmy/PytorchOCR/torchocr/datasets/det_modules/random_crop_data.py", line 54, in call
poly = ((poly - (crop_x, crop_y)) * scale).tolist()
TypeError: unsupported operand type(s) for -: 'list' and 'tuple'

crop_x,crop_y (0, 0)


mobilenetv3模型问题

大佬您好:请问为什么我使用预训练模型 MobileNetV3_large_x0_5.pth 训练 icdar2015 得到的 MobileNetV3 模型有21M多?去除梯度信息后还有7M多,大约是您百度网盘里模型的两倍,您是如何训练得到这么小的模型?谢谢!

【任务认领】第一阶段复现paddleocr

这是第一阶段复现paddle ocr的任务认领,对哪个模块有兴趣的小伙伴可以at我,然后添加任务
backbone:

  • 实现mbv3(检测+识别)并能加载paddle ocr模型,且保证diff小于1e-6 @novioleo @WenmuZhou
  • 实现res34(检测+识别)并能加载paddle ocr模型,且保证diff小于1e-6 @novioleo @WenmuZhou

neck:

head:

  • 实现db @WenmuZhou
  • 实现east
  • 实现识别的attention head

rec:

  • 实现crnn(mb和res34)并能加载paddle ocr模型 @novioleo

paddle 里面的dynamic lstm无法在pytorch中有对应的,这块会有差异

det:

utils:

train:

data:

demo:

其他资源:

  • 不同语言语料的收集
  • 不同语言不同字体的收集与归档

maxpool后会高度为0

self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)

if __name__ == '__main__':
    import torch
    from torchsummary import summary

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    net = MobileNetV3(3, scale=0.5, model_name='small').to(device)
    # print(net)
    summary(net, input_size=(3, 32, 320))

会报错,我改成self.pool = nn.MaxPool2d(kernel_size=(1, 2), stride=(1, 2), padding=0)可以运行

讯飞数据集使用DBNet loss不动

我使用讯飞的数据集训练,用了作者的数据转换脚本,
下载使用的resnet50预训练模型,
模型其他的配置都没有改,
在训练的时候,发现loss从头到尾都是1.000, loss_shrink和loss_threshold一直是0.
我想问一下这种情况是怎么回事。

在预测的时候我我用了很多张图片发现每张图都没有点,预测出来的全为空。
希望大佬能指点一下,谢谢谢谢!

参数初始化?

请问为何采用预设的参数初始化而不用其他的? 例如He initialization

TypeError: 'NoneType' object is not callable

2020-07-16 09:45:12,213 - torchocr - ERROR - Traceback (most recent call last):
File "tools/det_train.py", line 205, in train
for i, batch_data in enumerate(train_loader): # traverse each batch in the epoch
File "/home/mlp/python/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/home/mlp/python/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/home/mlp/python/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
TypeError: 'NoneType' object is not callable

tools文件夹下的minimum_2stage_inference.py的使用方法

作者,您好!我对您们所做的这个代码一直都有在关注,现在我遇到的情况是,det_infer/det_train和rec_infer/rec_train都在自己的电脑上跑通了,但是minimum_2stage_inference.py这个没办法去跑通它,如何在这个代码中放入图片后,出检测框和文字识别的结果一直困扰着我,希望您可以给出2stage_inference的readme文件,非常感谢!希望能得到您的回复,谢谢!

读取8位图报错

你好
运行det_train.py遇到位深度位8的图片会报错

遇到8位图时 im = cv2.imread(data['img_path'], 1 if self.img_mode != 'GRAY' else 0)
这句结果是None

cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-kne9u3r2/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

错误code位置:

    def __getitem__(self, index):
        # try:
        data = copy.deepcopy(self.data_list[index])
        print(data['img_path'])
        im = cv2.imread(data['img_path'], 1 if self.img_mode != 'GRAY' else 0)  
        if self.img_mode == 'RGB':
            im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)   #报错
        data['img'] = im
        data['shape'] = [im.shape[0], im.shape[1]]
        data = self.apply_pre_processes(data)

请问如何解决

rec_train.py训练时准确率一直为0

你好,我这里使用icdar 2017数据集训练,发现准确率一直为0,使用的配置项为rec_train_config.py,没有做什么太大的修改,只是修改了batch_size,alphabet的路径,dataset的路径,训练时log的打印如下:
image
期间进入到RecMetric.py中查看模型的输出值predictions,发现里面预测的结果都为0,0,0...0的tensor

训练数据的标注如下(图片路径与文字内容中间以\t进行分割):

E:\DataSets\icdar2017rctw\icdar2017rctw\recognition\train\image_0_0.jpg 金氏眼镜
E:\DataSets\icdar2017rctw\icdar2017rctw\recognition\train\image_0_1.jpg 创于1989
E:\DataSets\icdar2017rctw\icdar2017rctw\recognition\train\image_0_2.jpg 城建店

alphabet中的信息如下,一共5529个字符,加上blank的话那么最后的类别数n_class设置为5530:
image

另外config中的设置项:
2020-11-05 15:10:33,337 - torchocr - INFO - {'exp_name': 'CRNN', 'train_options': {'resume_from': '', 'third_party_name': '', 'checkpoint_save_dir': './output/CRNN/checkpoint', 'device': 'cuda:0', 'epochs': 200, 'fine_tune_stage': ['backbone', 'neck', 'head'], 'print_interval': 20, 'val_interval': 3000, 'ckpt_save_type': 'HighestAcc', 'ckpt_save_epoch': 4}, 'SEED': 927, 'optimizer': {'type': 'Adam', 'lr': 0.001, 'weight_decay': 0.0001}, 'lr_scheduler': {'type': 'StepLR', 'step_size': 60, 'gamma': 0.1}, 'model': {'type': 'RecModel', 'backbone': {'type': 'ResNet', 'layers': 18}, 'neck': {'type': 'PPaddleRNN'}, 'head': {'type': 'CTC', 'n_class': 5530}, 'in_channels': 3}, 'loss': {'type': 'CTCLoss', 'blank_idx': 0}, 'dataset': {'alphabet': 'E:/pro/ncnn_ocr/models/keys.txt', 'train': {'dataset': {'type': 'RecTextLineDataset', 'file': 'E:/pro/chineseocr-master/train/ocr/txt/icdar2017Backup.txt', 'input_h': 32, 'mean': 0.5, 'std': 0.5, 'augmentation': False}, 'loader': {'type': 'DataLoader', 'batch_size': 4, 'shuffle': True, 'num_workers': 1, 'collate_fn': {'type': 'RecCollateFn', 'img_w': 120}}}, 'eval': {'dataset': {'type': 'RecTextLineDataset', 'file': 'E:/pro/chineseocr-master/train/ocr/txt/2017valBackup.txt', 'input_h': 32, 'mean': 0.5, 'std': 0.5, 'augmentation': False}, 'loader': {'type': 'RecDataLoader', 'batch_size': 4, 'shuffle': False, 'num_workers': 1, 'collate_fn': {'type': 'RecCollateFn', 'img_w': 120}}}}}

希望得到回复,谢谢!

文字识别方面的问题

楼主,你好。我用resnet18作为backbone训练crnn,训练过程很正常,infer的过程也政策。但是我用res50和res34作为backbone,loss一直降不下去,acc一直为0,是不是训练代码有bug呢?

关于CTCLabelConverter的疑问

decode结构中加了如下的判断条件会不会使拥有相同字母的单词无法被打印出来(比如'Hello'中的'l')
难道CTC算法默认每个字符中存在blank?
不太了解,望解答。
image

预测的时候报错

我用楼主提供的预训练的模型调用det_infer做预测的时候,会报错,提示 如下错误:

KeyError:'cfg',
是不是预训练的模型里面没有cfg

datalaoder issues

When loading data, the length of the data and the characters not in the character set should be filtered, some complex mistakes can be made in the course of the experiment.

  • When the length of the input label is zero, the error of divisor 0 will appear in the calculation of editing distance at the beginning of training

acc_dict = metric(output, batch_data['label'])

for (pred, pred_conf), target in zip(preds_str, labels):
norm_edit_dis += Levenshtein.distance(pred, target) / max(len(pred), len(target))
show_str.append(f'{pred} -> {target}')

  • If the length exceeds the set length, it also needs to be filtered. At present, I did not find relevant Settings in config file

# todo 添加 过滤最长
# if len(label) > config.max_len:
# # print(f'The length of the label is longer than max_length: length
# # {len(label)}, {label} in dataset {self.root}')
# continue

'dataset': {
'type': 'RecLmdbDataset',
'file': r'path/lmdb/train', # LMDB 数据集路径
'input_h': 32,
'mean': 0.5,
'std': 0.5,
'augmentation': False,
},

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.