chineseocr / chineseocr Goto Github PK

View Code? Open in Web Editor NEW

5.8K 189.0 1.7K 35.77 MB

yolo3+ocr

License: MIT License

Python 63.13% Shell 0.07% JavaScript 24.44% HTML 2.41% Dockerfile 0.48% Jupyter Notebook 4.46% CSS 5.02%

yolo3 chinese-text-detect chinese-ocr opencv-dnn darknet-text-detect idcard trainticket ocr

chineseocr's Introduction

本项目基于yolo3 与crnn 实现中文自然场景文字检测及识别

darknet 优化版本：https://github.com/chineseocr/darknet-ocr.git

训练代码（master分支）

ocr训练数据集

ocr ctc训练数据集(压缩包解码:chineseocr)
百度网盘地址:链接: https://pan.baidu.com/s/1UcUKUUELLwdM29zfbztzdw 提取码: atwn
gofile地址:http://gofile.me/4Nlqh/uT32hAjbx 密码 https://github.com/chineseocr/chineseocr

实现功能

环境部署

GPU部署参考:setup.md
CPU部署参考:setup-cpu.md

下载编译darknet(如果直接运用opencv dnn或者keras yolo3 可忽略darknet的编译)

git clone https://github.com/pjreddie/darknet.git 
mv darknet chineseocr/
##编译对GPU、cudnn的支持 修改 Makefile
#GPU=1
#CUDNN=1
#OPENCV=0
#OPENMP=0
make

修改 darknet/python/darknet.py line 48
root = '/root/'##chineseocr所在目录
lib = CDLL(root+"chineseocr/darknet/libdarknet.so", RTLD_GLOBAL)

下载模型文件

模型文件地址:

百度网盘:https://pan.baidu.com/s/1gTW9gwJR6hlwTuyB6nCkzQ
other-links:http://gofile.me/4Nlqh/fNHlWzVWo
复制文件夹中的所有文件到models目录

模型转换（非必须）

pytorch ocr 转keras ocr

python tools/pytorch_to_keras.py  -weights_path models/ocr-dense.pth -output_path models/ocr-dense-keras.h5

darknet 转keras

python tools/darknet_to_keras.py -cfg_path models/text.cfg -weights_path models/text.weights -output_path models/text.h5

keras 转darknet

python tools/keras_to_darknet.py -cfg_path models/text.cfg -weights_path models/text.h5 -output_path models/text.weights

模型选择

参考config.py文件

构建docker镜像

##下载Anaconda3 python 环境安装包（https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh） 放置在chineseocr目录下   
##建立镜像   
docker build -t chineseocr .   
##启动服务   
docker run -d -p 8080:8080 chineseocr /root/anaconda3/bin/python app.py

web服务启动

cd chineseocr## 进入chineseocr目录
python app.py 8080 ##8080端口号，可以设置任意端口

访问服务

http://127.0.0.1:8080/ocr

识别结果展示

参考

yolo3 https://github.com/pjreddie/darknet.git
crnn https://github.com/meijieru/crnn.pytorch.git
ctpn https://github.com/eragonruan/text-detection-ctpn
CTPN https://github.com/tianzhi0549/CTPN
keras yolo3 https://github.com/qqwweee/keras-yolo3.git
darknet keras 模型转换参考参考：https://www.cnblogs.com/shouhuxianjian/p/10567201.html
语言模型实现 https://github.com/lukhy/masr

chineseocr's People

Contributors

Stargazers

Watchers

Forkers

ypc-winner qq240035000 blackarrow3542 xikunlun001 weitaoatvison cronaldo1997 jiangxiluning muyilangjun1 daoyijushi lxj0276 fuchao01 cony07f5 zergmk2 zylo117 libralx rkshuai hubeibei007 hufangjian slidelucask codezero00 gehongpeng juventi zhang-yd15 phil-chow wkhunter feirenlg wqh17101 einherjarcai guowenjia tomtao al9501 lss616263 msfenlei yangtaoxf ctolib yunhai0920 zgsxwsdxg matrixping bobrey chrisyang sjz207 ieee820 donaldlee2008 lpcelite jiangce0810 xiaolaodi meitianjinbu guidewsp gq124 netural0ly0 xggiou smilewsw alannewimage fstar talusl cwbjyy lincaiming xuewengeophysics iwater dyjng linzongkao blackcodeman zhenjun-fan aliushn 425183525 happog lyqsr tragedyn liyzh beimingmaster wingniuqichao hrwleo ami66 fjfabz fendaq crelon redaready caotianwei gu5hanl1gh7n1n wangxw0820 gavin666github peterzhang2029 zenwan tryking tarsbase jefyjiang airob recharjamson shentanyue dreamcodeman pustar jiahenghuang tonyxia2016 xufabing thorpham mocorr gongwk stormning lovelan521 kkkzxx

chineseocr's Issues

keep_inds=nms(np.hstack((text_proposals, scores)), TEXT_PROPOSALS_NMS_THRESH)

你好，CPU版本的测试没有问题，但是我用GPU版本的时候一直报这个错误，
Traceback (most recent call last):
File "backend_demo.py", line 32, in
leftAdjust=True,rightAdjust=True,alph=0.1)
File "/home/hufangjian/train_ticket/chineseocr/model.py", line 193, in model
text_recs,tmp = text_detect(**config)
File "/home/hufangjian/train_ticket/chineseocr/model.py", line 42, in text_detect
MIN_NUM_PROPOSALS)
File "/home/hufangjian/train_ticket/chineseocr/detector/detectors.py", line 65, in detect
keep_inds=nms(np.hstack((text_proposals, scores)), TEXT_PROPOSALS_NMS_THRESH)##nms \u8fc7\u6ee4\u91cd\u590d\u7684box
File "/root/anaconda3/envs/chineseocr/lib/python3.6/site-packages/numpy/core/shape_base.py", line 291, in hstack
return _nx.concatenate(arrs, 0)
ValueError: all the input arrays must have same number of dimensions

DockerFile 找不到相应的requirements.txt

如何生成特定的图片文字？

请问如何生成特定的背景图片下的特定字体文字，如营业执照的黄色背景下，宋体，字号20，字体与背景平行这样的规则样本呢？

请问作者针对文字这一块是怎么对yolo3训练数据进行打标的?分了两类吗?

如题,谢谢

Couldn't open file: ../chineseocr/models/text.names

ipython app.py 8080
报错:
Loading weights from /root/ocr/chineseocr/models/text.weights...Done!
Couldn't open file: ../chineseocr/models/text.names
text.names和text.weights都在models文件夹下却报无法读取text.names
请问:读取这个文件的代码应该改哪里

关于keras模型使用问题

作者好, 我尝试用git上给出的地址进行了模型转换(转换到keras),使用转换模型给的测试yolo_video.py 进行image方式的测试(我只看文字区域的结果), 但是貌似结果是不对的...基本上就没有正确的bbox.不知道不清楚我的操作那个步骤出问题了....

fine tune 模型“”ocr.pth“”时出现这个错误请问如何解决？

训练loss很低，但是识别几乎为0

我用crnn.pytorch训练数字，训练的时候loss很低了：
888--22---6--11--00--66---44-99---4--11--11--11--00--66--66---33- => 8261064941110663 , gt: 8261064941110663
99----66--6---5--00--33--88--88--22--11--22--33--88--00--44---44- => 9665038821238044 , gt: 9665038821238044
777---55-77--00---5--66--00--99--22--99--77--11--66--55--99---99- => 7570560929716599 , gt: 7570560929716599
000--99---44--8--22--55--33--33--33--44--88--99--00--77--00---66- => 0948253334890706 , gt: 0948253334890706
555---5--22---8--11--11--88--77--55--11--22--33--77--11--99---99- => 5528118751237199 , gt: 5528118751237199
444--22---5--99--66--77--55--55--99--44--55--99--33--33--66---55- => 4259675594593365 , gt: 4259675594593365
999--11--77--00---5--55--77--99--11--22--22--22--77--11--66---99- => 9170557912227169 , gt: 9170557912227169
000--22--22--00--00--77--55--22--00--88--33--11--33--11--77---66- => 0220075208313176 , gt: 0220075208313176
444---66--44--8--66--11--77--00--33--00--11--55--99--00--99---44- => 4648617030159094 , gt: 4648617030159094
777--99--99--77--88--11--22--55---4--44--66--11--22--11--33---55- => 7997812544612135 , gt: 7997812544612135
epoch:273,step:640961,Test loss:0.00031879107700660825,accuracy:1.0,train loss:9.42973485962284e-07

但是用训练出来的模型去识别的时候，效果如下：
/root/number_ocr_8_3/ocr/chineseocr/test2/0 0 1 0 2 0 9 0 0 9 2 7 2 0 6 9_31.jpg
[{'cx': 133.5, 'cy': 19.5, 'text': '012229', 'w': 266, 'h': 23.0, 'degree': 0.0}]
/root/number_ocr_8_3/ocr/chineseocr/test2/0 0 0 7 6 9 2 9 1 9 6 5 0 6 3 7_931.jpg
[{'cx': 133.5, 'cy': 19.5, 'text': '1217', 'w': 266, 'h': 21.0, 'degree': 0.0}]
大侠们，这个问题出在哪？

一个小问题

cd chineseocr
sh setup.sh #(cpu sh setpu-cpu.sh)

setup.sh该文件最后一行是pushd detector/utils && sh make-for-cpu.sh && popd
setup.sh 是gpu版本的，是不是应该改为
pushd detector/utils && sh make.sh && popd 啊？

crnn怎么训练的，是要自己写一个训练代码吗

中文ocr的训练数据集如何生成呢?

您好,想咨询一下中文ocr的训练数据集如何生成呢?

出现模型错误

对app.py进行了修改，用于视频中的文字检测与识别。对于一些视频，运行一段时间后，出现如下错误：
File "main_chineseocr.py", line 31, in TEXT
POST(img)
File "main_chineseocr.py", line 49, in POST
leftAdjust=True,rightAdjust=True,alph=0.1)
File "/home/import/djj/chineseocr-master/model.py", line 227, in model
result = crnnRec(np.array(img),newBox,ifIm,leftAdjust,rightAdjust,alph)
File "/home/import/djj/chineseocr-master/model.py", line 85, in crnnRec
simPred = crnnOcr(partImg_)##识别的文本
File "/home/import/djj/chineseocr-master/crnn/crnn.py", line 61, in crnnOcr
preds = model(image)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/import/djj/chineseocr-master/crnn/models/crnn.py", line 78, in forward
conv = utils.data_parallel(self.cnn, input, self.ngpu)
File "/home/import/djj/chineseocr-master/crnn/models/utils.py", line 12, in data_parallel
output = model(input)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 143, in forward
self.return_indices)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 334, in max_pool2d
ret = torch._C._nn.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (128x16x1). Calculated output size: (128x8x0). Output size is too small at /pytorch/torch/lib/THCUNN/generic/SpatialDilatedMaxPooling.cu:69

不知道是什么原因

似乎无法识别单行图片

图片如下

你好，请问这个项目大概什么时候重新发布呢？

我在我的网址转发了你的产品，然后很多人最近问我项目为什么被删除了，我想问下什么时候可以上呢？我好跟他们说明一下，链接是：https://www.ptorch.com/news/158.html

选择定长与不定长训练，是否在预测的时候也应该一一对应

作者您好！当我两个客户端同时发起请求时，也就是任务并行时，会出现如下错误。是否正常呢？

Traceback (most recent call last):
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 257, in process
return self.handle()
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 248, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 488, in _delegate
return handle_class(cls)
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 466, in handle_class
return tocall(*args)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/app.py", line 71, in POST
leftAdjust=True, rightAdjust=True, alph=0.1)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/model.py", line 214, in model
text_recs,tmp = text_detect(**config)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/model.py", line 47, in text_detect
MIN_NUM_PROPOSALS)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/detectors.py", line 70, in detect
text_lines=self.text_proposal_connector.get_text_lines(text_proposals, scores, size)##合并文本行
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/text_proposal_connector.py", line 29, in get_text_lines
tp_groups=self.group_text_proposals(text_proposals, scores, im_size)##find the text line
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/text_proposal_connector.py", line 12, in group_text_proposals
graph=self.graph_builder.build_graph(text_proposals, scores, im_size)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/text_proposal_graph_builder.py", line 88, in build_graph
boxes_table[int(box[0])].append(index)
IndexError: list index out of range

数据标定如何进行的

如题

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

检测的图片较多,运行一段时间之后就会出现内存不足的问题
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
代码是否需改进成可以分批检测的

速度如何

Corrupted image for 3090 Corrupted image for 3092......

crnn.pytorch's result is correct but different from chineseocr

关于gru过拟合

你好，我现在大概有50万张图片，如果直接用卷积网络加全连接输出会有一定的过拟合但是验证集也能到80多的精度。可是如果在卷积层和最后全连接输出中间加上双向gru（神经元宽度256，dropout0.5）就会过拟合特别严重，甚至约训练验证集loss越高，请问是因为数据量太少么？

如何在你训练的基础上增加训练集？

请问一下，我测试发现ⅡⅢⅣⅤⅥⅠ这些字很难识别出来，请问我如何在你的基础上增加训练这些类型的字？

如何识别出字符间的空格

请问怎样能识别出字符间空格？尤其是英文单词间，CRNN识别结果都是连在一起的。请指教

请教yolo3和crnn的训练数据

你好！请问用于训练yolov3的文本检测和crnn的识别模型的数据是公开的么？或者有哪些公开的中文语料可以用于训练这两个模型的吗？谢谢！

请教下YOLOv3文字检测和Crnn训练的问题

昨晚装了个虚拟机折腾了好久，总算能跑起来了，试了下下增值税发票的识别，识别效果真的挺不错的，不过发现一个问题就是左右相邻的字段会被合并掉，比如下图的地址和密码区的，还有额税率税额。想文字下这个是YOLOv3文字检测的问题吗(我看YOLOv3的论文，它是不是对于小的文字区域是不是就会容易遗漏啊？，因为它不像east文字检测一样会upsample到原图大小(可能原图一半忘了)去做回归 ) ，或者你后期预处理把相邻的合并了？哦忘了说了我图像是缩放长1024×768，不是用你原来的608×608

能不能简单描述下你用YOLOv3训练文件检测过程啊？是不是输入是图像，目标就是所有的文字的外接矩啊？(就是训练普通的YOLO一样？)
还有想问下你训练的样本是用什么的？(自己生成的？)
大概用了多少的样本训练文字检测啊？
顺便问下你训练crnn的样本是怎么样的,你类似与https://github.com/senlinuc/caffe_ocr的，都用固定长度的图片吗？
训练的字体是不是有大有小的(因为我试了下增值税发票上的有的字段，前面的字的大小比后面的大很多，你的crnn模型都能很好地识别？)
你训练样本有多少万啊？

你好,请问模型什么时候能上传上来呢?

CPU速度正常么

我按照教程搭建环境
之前那版，1.6min一张
现在这个新版，2.5min一张

请问是哪里有问题么
另外还有个这个错误

虽然不影响使用

前端页面显示异常，点击浏览无反应

后台的log
10.0.0.72:60398 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /ocr" - 200 OK
10.0.0.72:60399 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/js/jquery.js" - 304 Not Modified
10.0.0.72:60398 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/css/bootstrap.min.css" - 304 Not Modified
10.0.0.72:60400 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/js/jquery.form.js" - 304 Not Modified
10.0.0.72:60401 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/js/helps.js" - 304 Not Modified
10.0.0.72:60402 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/img/loading.gif" - 304 Not Modified
10.0.0.72:60420 - - [14/Sep/2018 15:25:32] "HTTP/1.1 GET /ocr" - 200 OK
10.0.0.72:60420 - - [14/Sep/2018 15:25:33] "HTTP/1.1 GET /css/zzsc.css" - 404 Not Found
10.0.0.72:60424 - - [14/Sep/2018 15:25:33] "HTTP/1.1 GET /favicon.ico" - 404 Not Found

关于识别的图片有两种字体的情况

您好，我尝试用chineseocr去识别彩票的图片，发现号码的识别非常低，汉字的识别很高，可能是因为彩票上的数字和彩票上的汉字是两种字体，我想知道的是，像彩票这种有两种字体的情况或者多种字体的情况，如何提高识别率？

AttributeError: module 'darknet' has no attribute 'load_net'

cv2.dnn支持GPU吗

cv2.dnn模块加载darknet模型支持GPU吗？看到网上说不支持。在识别的时候，可以采用多GPU处理同一张图片吗？速度会有提升吗？

第一张图片检测特别慢，需要等250多秒，之后正常识别

作者您好，我的GPU是GTX1060 6G，按照说明配置的环境(pytorch0.2.0报错，改成了0.3.0版本)，能够成功运行。但是第一张图片加载或者说运行的非常慢，大约需要250多秒，之后的图片就正常了，一张大约0.5～2s

经过调试发现，问题出在“darknet_detect.py ”文件，detect_np()函数的第25行“dn.predict_image(net, im) ”函数中，从darknet中调用这个函数要加载200多秒。

def detect_np(net, meta, image, thresh=.5, hier_thresh=.5, nms=.45):
im = array_to_image(image)
num = dn.c_int(0)
pnum = dn.pointer(num)
dn.predict_image(net, im)
dets = dn.get_network_boxes(net, im.w, im.h, thresh, hier_thresh, None, 0, pnum)
num = pnum[0]
if (nms): dn.do_nms_obj(dets, num, meta.classes, nms)
res = []
for j in range(num):
for i in range(meta.classes):
if dets[j].prob[i] > 0:
b = dets[j].bbox
res.append((meta.names[i], dets[j].prob[i], (b.x, b.y, b.w, b.h)))
res = sorted(res, key=lambda x: -x[1])
dn.free_detections(dets, num)
return res

之后考虑是不是darknet的问题，重新配置过，还跑了里面的例程“darknet.py”没有问题。请问作者遇到过这种情况吗，或者有什么好的建议吗？谢谢！

关于不定长训练数据的问题

你好，想问一下train.py里的imgW设置为256，是按照10位纯中文字符的长度来设置的吗？如果是中英文加数字的混合数字应该怎样设置imgW呢

请问会公开yolo3的训练code么，yolo3除了在时效上好于ctpn, 准确度上有改进吗

请问会公开yolo3的训练code么，
yolo3除了在时效上优于ctpn, 准确度上有改进吗

how long to train a model

作者您好,训练crnn的代码会公布么?

如何提高识别速度？

识别类似食品经营许可证这种图片，在服务器上（有一块gpu）还要9秒钟，在提高识别速度上，有什么改进的地方吗，或者有什么思路？

cpu模式下报错，没有gpu

使用cpu模式的setup 和 make，
执行“ipython app.py 8080”，
报错”AssertionError:
Found no NVIDIA driver on your system“

how much datasets need!~

Any other link for model download?

Hi,
I live outside China and I am not able to properly download the model weights from pan.baidu.com
is there any other links that I can use? If it's no trouble to you, could you upload it in this repo?

Thank you

为什么对于截取的新闻图像上识别效果不行呢

您好，就是我直接拿训练好的模型去测试发现对新闻标题识别正确率很低，只有30多一点有什么解决方法呢

想问下用opencv调用text.weights的文字检测模型应该怎么改？

我参考这个链接，yolov3-tiny.weights的这个模型用OpenCV的dnn是调用起来了的
https://github.com/spmallick/learnopencv/blob/master/ObjectDetection-YOLO/object_detection_yolo.cpp
但是直接替换模型路径为text.weights的输出不了文字区域坐标哦，反正前向传播是完成了的，net.forward(outs, getOutputsNames(net));就是没结果啊不知道为啥。
还有请问下你的text.weights的直接输出不是文字外接矩吗？看你的代码的detector里面好像做了很多后处理啊，能稍微解释下整个流程吗？(yolov3的text.weights输出是什么，具体后面要做什么才能获得检测的文字的外接矩)

我看你的text_detect 有好多参数，这个不知道用OpenCV怎么传入啊
def text_detect(img,
MAX_HORIZONTAL_GAP=30,
MIN_V_OVERLAPS=0.6,
MIN_SIZE_SIM=0.6,
TEXT_PROPOSALS_MIN_SCORE=0.7,
TEXT_PROPOSALS_NMS_THRESH=0.3,
TEXT_LINE_NMS_THRESH = 0.3,
MIN_RATIO=1.0,
LINE_MIN_SCORE=0.8,
TEXT_PROPOSALS_WIDTH=5,
MIN_NUM_PROPOSALS=1
):
2) ps 这个opencv的dnn调用你的text.weights速度还真的挺快的，我输入1024*768的图像，在我的普通的笔记本上cpu模式下只要1s左右

ref:
https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp

net = cv2.dnn.readNetFromDarknet(yoloCfg, yoloWeights)
cv2.error: OpenCV(3.4.1) /opt/conda/conda-bld/opencv-suite_1530789967746/work/modules/dnn/src
/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function 
ReadDarknetFromCfgFile