Giter Site home page Giter Site logo

chineseocr / chineseocr Goto Github PK

View Code? Open in Web Editor NEW
5.8K 189.0 1.7K 35.77 MB

yolo3+ocr

License: MIT License

Python 63.13% Shell 0.07% JavaScript 24.44% HTML 2.41% Dockerfile 0.48% Jupyter Notebook 4.46% CSS 5.02%
yolo3 chinese-text-detect chinese-ocr opencv-dnn darknet-text-detect idcard trainticket ocr

chineseocr's Introduction

本项目基于yolo3crnn 实现中文自然场景文字检测及识别

训练代码(master分支)

ocr训练数据集

ocr ctc训练数据集(压缩包解码:chineseocr)
百度网盘地址:链接: https://pan.baidu.com/s/1UcUKUUELLwdM29zfbztzdw 提取码: atwn
gofile地址:http://gofile.me/4Nlqh/uT32hAjbx 密码 https://github.com/chineseocr/chineseocr

实现功能

  • 文字方向检测 0、90、180、270度检测(支持dnn/tensorflow)
  • 支持(darknet/opencv dnn /keras)文字检测,支持darknet/keras训练
  • 不定长OCR训练(英文、中英文) crnn\dense ocr 识别及训练 ,新增pytorch转keras模型代码(tools/pytorch_to_keras.py)
  • 支持darknet 转keras, keras转darknet, pytorch 转keras模型
  • 身份证/火车票结构化数据识别
  • 新增CNN+ctc模型,支持DNN模块调用OCR,单行图像平均时间为0.02秒以下
  • CPU版本加速
  • 支持基于用户字典OCR识别
  • 新增语言模型修正OCR识别结果
  • 支持树莓派实时识别方案

环境部署

GPU部署 参考:setup.md
CPU部署 参考:setup-cpu.md

下载编译darknet(如果直接运用opencv dnn或者keras yolo3 可忽略darknet的编译)

git clone https://github.com/pjreddie/darknet.git 
mv darknet chineseocr/
##编译对GPU、cudnn的支持 修改 Makefile
#GPU=1
#CUDNN=1
#OPENCV=0
#OPENMP=0
make 

修改 darknet/python/darknet.py line 48
root = '/root/'##chineseocr所在目录
lib = CDLL(root+"chineseocr/darknet/libdarknet.so", RTLD_GLOBAL)

下载模型文件

模型文件地址:

模型转换(非必须)

pytorch ocr 转keras ocr

python tools/pytorch_to_keras.py  -weights_path models/ocr-dense.pth -output_path models/ocr-dense-keras.h5

darknet 转keras

python tools/darknet_to_keras.py -cfg_path models/text.cfg -weights_path models/text.weights -output_path models/text.h5

keras 转darknet

python tools/keras_to_darknet.py -cfg_path models/text.cfg -weights_path models/text.h5 -output_path models/text.weights

模型选择

参考config.py文件

构建docker镜像

##下载Anaconda3 python 环境安装包(https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh) 放置在chineseocr目录下   
##建立镜像   
docker build -t chineseocr .   
##启动服务   
docker run -d -p 8080:8080 chineseocr /root/anaconda3/bin/python app.py

web服务启动

cd chineseocr## 进入chineseocr目录
python app.py 8080 ##8080端口号,可以设置任意端口

访问服务

http://127.0.0.1:8080/ocr

识别结果展示

参考

  1. yolo3 https://github.com/pjreddie/darknet.git
  2. crnn https://github.com/meijieru/crnn.pytorch.git
  3. ctpn https://github.com/eragonruan/text-detection-ctpn
  4. CTPN https://github.com/tianzhi0549/CTPN
  5. keras yolo3 https://github.com/qqwweee/keras-yolo3.git
  6. darknet keras 模型转换参考 参考:https://www.cnblogs.com/shouhuxianjian/p/10567201.html
  7. 语言模型实现 https://github.com/lukhy/masr

chineseocr's People

Contributors

callinglove avatar wenlihaoyu avatar wsxqyws avatar zergmk2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chineseocr's Issues

keep_inds=nms(np.hstack((text_proposals, scores)), TEXT_PROPOSALS_NMS_THRESH)

你好,CPU版本的测试没有问题,但是我用GPU版本的时候一直报这个错误,
Traceback (most recent call last):
File "backend_demo.py", line 32, in
leftAdjust=True,rightAdjust=True,alph=0.1)
File "/home/hufangjian/train_ticket/chineseocr/model.py", line 193, in model
text_recs,tmp = text_detect(**config)
File "/home/hufangjian/train_ticket/chineseocr/model.py", line 42, in text_detect
MIN_NUM_PROPOSALS)
File "/home/hufangjian/train_ticket/chineseocr/detector/detectors.py", line 65, in detect
keep_inds=nms(np.hstack((text_proposals, scores)), TEXT_PROPOSALS_NMS_THRESH)##nms \u8fc7\u6ee4\u91cd\u590d\u7684box
File "/root/anaconda3/envs/chineseocr/lib/python3.6/site-packages/numpy/core/shape_base.py", line 291, in hstack
return _nx.concatenate(arrs, 0)
ValueError: all the input arrays must have same number of dimensions

如何生成特定的图片文字?

请问如何生成特定的背景图片下的特定字体文字,如营业执照的黄色背景下,宋体,字号20,字体与背景平行这样的规则样本呢?

Couldn't open file: ../chineseocr/models/text.names

ipython app.py 8080
报错:
Loading weights from /root/ocr/chineseocr/models/text.weights...Done!
Couldn't open file: ../chineseocr/models/text.names
text.names和text.weights都在models文件夹下 却报无法读取text.names
请问:读取这个文件的代码应该改哪里

关于keras模型使用问题

作者好, 我尝试用git上给出的地址进行了模型转换(转换到keras),使用 转换模型给的测试yolo_video.py 进行image方式的测试(我只看文字区域的结果), 但是貌似结果是不对的...基本上就没有正确的bbox.不知道 不清楚我的操作那个步骤出问题了....

训练loss很低,但是识别几乎为0

我用crnn.pytorch训练数字,训练的时候loss很低了:
888--22---6--11--00--66---44-99---4--11--11--11--00--66--66---33- => 8261064941110663 , gt: 8261064941110663
99----66--6---5--00--33--88--88--22--11--22--33--88--00--44---44- => 9665038821238044 , gt: 9665038821238044
777---55-77--00---5--66--00--99--22--99--77--11--66--55--99---99- => 7570560929716599 , gt: 7570560929716599
000--99---44--8--22--55--33--33--33--44--88--99--00--77--00---66- => 0948253334890706 , gt: 0948253334890706
555---5--22---8--11--11--88--77--55--11--22--33--77--11--99---99- => 5528118751237199 , gt: 5528118751237199
444--22---5--99--66--77--55--55--99--44--55--99--33--33--66---55- => 4259675594593365 , gt: 4259675594593365
999--11--77--00---5--55--77--99--11--22--22--22--77--11--66---99- => 9170557912227169 , gt: 9170557912227169
000--22--22--00--00--77--55--22--00--88--33--11--33--11--77---66- => 0220075208313176 , gt: 0220075208313176
444---66--44--8--66--11--77--00--33--00--11--55--99--00--99---44- => 4648617030159094 , gt: 4648617030159094
777--99--99--77--88--11--22--55---4--44--66--11--22--11--33---55- => 7997812544612135 , gt: 7997812544612135
epoch:273,step:640961,Test loss:0.00031879107700660825,accuracy:1.0,train loss:9.42973485962284e-07

但是用训练出来的模型去识别的时候,效果如下:
/root/number_ocr_8_3/ocr/chineseocr/test2/0 0 1 0 2 0 9 0 0 9 2 7 2 0 6 9_31.jpg
[{'cx': 133.5, 'cy': 19.5, 'text': '012229', 'w': 266, 'h': 23.0, 'degree': 0.0}]
/root/number_ocr_8_3/ocr/chineseocr/test2/0 0 0 7 6 9 2 9 1 9 6 5 0 6 3 7_931.jpg
[{'cx': 133.5, 'cy': 19.5, 'text': '1217', 'w': 266, 'h': 21.0, 'degree': 0.0}]
大侠们,这个问题出在哪?

一个小问题

cd chineseocr
sh setup.sh #(cpu sh setpu-cpu.sh)

setup.sh该文件最后一行是pushd detector/utils && sh make-for-cpu.sh && popd
setup.sh 是gpu版本的,是不是 应该改为
pushd detector/utils && sh make.sh && popd 啊?

出现模型错误

对app.py进行了修改,用于视频中的文字检测与识别。对于一些视频,运行一段时间后,出现如下错误:
File "main_chineseocr.py", line 31, in TEXT
POST(img)
File "main_chineseocr.py", line 49, in POST
leftAdjust=True,rightAdjust=True,alph=0.1)
File "/home/import/djj/chineseocr-master/model.py", line 227, in model
result = crnnRec(np.array(img),newBox,ifIm,leftAdjust,rightAdjust,alph)
File "/home/import/djj/chineseocr-master/model.py", line 85, in crnnRec
simPred = crnnOcr(partImg_)##识别的文本
File "/home/import/djj/chineseocr-master/crnn/crnn.py", line 61, in crnnOcr
preds = model(image)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/import/djj/chineseocr-master/crnn/models/crnn.py", line 78, in forward
conv = utils.data_parallel(self.cnn, input, self.ngpu)
File "/home/import/djj/chineseocr-master/crnn/models/utils.py", line 12, in data_parallel
output = model(input)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 143, in forward
self.return_indices)
File "/home/import/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 334, in max_pool2d
ret = torch._C._nn.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (128x16x1). Calculated output size: (128x8x0). Output size is too small at /pytorch/torch/lib/THCUNN/generic/SpatialDilatedMaxPooling.cu:69

不知道是什么原因

作者您好!当我两个客户端同时发起请求时,也就是任务并行时,会出现如下错误。是否正常呢?

Traceback (most recent call last):
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 257, in process
return self.handle()
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 248, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 488, in _delegate
return handle_class(cls)
File "/Users/karmanzeng/anaconda3/envs/chineseocr/lib/python3.6/site-packages/web/application.py", line 466, in handle_class
return tocall(*args)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/app.py", line 71, in POST
leftAdjust=True, rightAdjust=True, alph=0.1)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/model.py", line 214, in model
text_recs,tmp = text_detect(**config)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/model.py", line 47, in text_detect
MIN_NUM_PROPOSALS)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/detectors.py", line 70, in detect
text_lines=self.text_proposal_connector.get_text_lines(text_proposals, scores, size)##合并文本行
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/text_proposal_connector.py", line 29, in get_text_lines
tp_groups=self.group_text_proposals(text_proposals, scores, im_size)##find the text line
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/text_proposal_connector.py", line 12, in group_text_proposals
graph=self.graph_builder.build_graph(text_proposals, scores, im_size)
File "/Users/karmanzeng/PycharmProjects/chineseocr-master/detector/text_proposal_graph_builder.py", line 88, in build_graph
boxes_table[int(box[0])].append(index)
IndexError: list index out of range

关于gru过拟合

你好,我现在大概有50万张图片,如果直接用卷积网络加全连接输出会有一定的过拟合但是验证集也能到80多的精度。可是如果在卷积层和最后全连接输出中间加上双向gru(神经元宽度256,dropout0.5) 就会过拟合特别严重,甚至约训练验证集loss越高,请问是因为数据量太少么?

请教yolo3和crnn的训练数据

你好!请问用于训练yolov3的文本检测和crnn的识别模型的数据是公开的么?或者有哪些公开的中文语料可以用于训练这两个模型的吗?谢谢!

请教下YOLOv3文字检测和Crnn训练的问题

  1. 昨晚装了个虚拟机折腾了好久,总算能跑起来了,试了下下增值税发票的识别,识别效果真的挺不错的,不过发现一个问题就是左右相邻的字段会被合并掉,比如下图的地址和密码区的,还有额税率税额。想文字下这个是YOLOv3文字检测的问题吗(我看YOLOv3的论文,它是不是对于小的文字区域是不是就会容易遗漏啊?,因为它不像east文字检测一样会upsample到原图大小(可能原图一半 忘了)去做回归 ) ,或者你后期预处理把相邻的合并了? 哦忘了说了 我图像是缩放长1024×768,不是用你原来的608×608

2

  1. 能不能简单描述下 你 用YOLOv3训练文件检测过程啊? 是不是输入是图像,目标就是所有的文字的外接矩啊?(就是训练普通的YOLO一样?)
    还有想问下 你训练的样本是用什么的?(自己生成的?)
    大概用了多少的样本训练文字检测啊?

  2. 顺便问下 你训练crnn的样本是怎么样的,你类似与https://github.com/senlinuc/caffe_ocr的,都用固定长度的图片吗?
    训练的字体是不是有大有小的(因为我试了下 增值税发票上的有的字段,前面的字的大小比后面的大很多,你的crnn模型都能很好地识别?)
    你训练样本有多少万啊?

CPU速度正常么

我按照教程搭建环境
之前那版,1.6min一张
现在这个新版,2.5min一张
image
请问是哪里有问题么
另外还有个这个错误
image
虽然不影响使用

前端页面显示异常,点击浏览无反应

后台的log
10.0.0.72:60398 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /ocr" - 200 OK
10.0.0.72:60399 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/js/jquery.js" - 304 Not Modified
10.0.0.72:60398 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/css/bootstrap.min.css" - 304 Not Modified
10.0.0.72:60400 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/js/jquery.form.js" - 304 Not Modified
10.0.0.72:60401 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/js/helps.js" - 304 Not Modified
10.0.0.72:60402 - - [14/Sep/2018 15:25:17] "HTTP/1.1 GET /static/img/loading.gif" - 304 Not Modified
10.0.0.72:60420 - - [14/Sep/2018 15:25:32] "HTTP/1.1 GET /ocr" - 200 OK
10.0.0.72:60420 - - [14/Sep/2018 15:25:33] "HTTP/1.1 GET /css/zzsc.css" - 404 Not Found
10.0.0.72:60424 - - [14/Sep/2018 15:25:33] "HTTP/1.1 GET /favicon.ico" - 404 Not Found

关于识别的图片有两种字体的情况

您好,我尝试用chineseocr去识别彩票的图片,发现号码的识别非常低,汉字的识别很高,可能是因为彩票上的数字和彩票上的汉字是两种字体,我想知道的是,像彩票这种有两种字体的情况或者多种字体的情况,如何提高识别率?

cv2.dnn支持GPU吗

cv2.dnn模块加载darknet模型支持GPU吗?看到网上说不支持。在识别的时候,可以采用多GPU处理同一张图片吗?速度会有提升吗?

第一张图片检测特别慢,需要等250多秒,之后正常识别

作者您好,我的GPU是GTX1060 6G,按照说明配置的环境(pytorch0.2.0报错,改成了0.3.0版本),能够成功运行。但是第一张图片加载或者说运行的非常慢,大约需要250多秒,之后的图片就正常了,一张大约0.5~2s

经过调试发现,问题出在“darknet_detect.py ”文件,detect_np()函数的第25行“dn.predict_image(net, im) ”函数中,从darknet中调用这个函数要加载200多秒。

def detect_np(net, meta, image, thresh=.5, hier_thresh=.5, nms=.45):
im = array_to_image(image)
num = dn.c_int(0)
pnum = dn.pointer(num)
dn.predict_image(net, im)
dets = dn.get_network_boxes(net, im.w, im.h, thresh, hier_thresh, None, 0, pnum)
num = pnum[0]
if (nms): dn.do_nms_obj(dets, num, meta.classes, nms)
res = []
for j in range(num):
for i in range(meta.classes):
if dets[j].prob[i] > 0:
b = dets[j].bbox
res.append((meta.names[i], dets[j].prob[i], (b.x, b.y, b.w, b.h)))
res = sorted(res, key=lambda x: -x[1])
dn.free_detections(dets, num)
return res

之后考虑是不是darknet的问题,重新配置过,还跑了里面的例程“darknet.py”没有问题。请问作者遇到过这种情况吗,或者有什么好的建议吗?谢谢!

关于不定长训练数据的问题

你好,想问一下train.py里的imgW设置为256,是按照10位纯中文字符的长度来设置的吗?如果是中英文加数字的混合数字应该怎样设置imgW呢

如何提高识别速度?

识别类似食品经营许可证这种图片,在服务器上(有一块gpu)还要9秒钟,在提高识别速度上,有什么改进的地方吗,或者有什么思路?

cpu模式下报错,没有gpu

使用cpu模式的setup 和 make,
执行“ipython app.py 8080”,
报错”AssertionError:
Found no NVIDIA driver on your system“
error

Any other link for model download?

Hi,
I live outside China and I am not able to properly download the model weights from pan.baidu.com
is there any other links that I can use? If it's no trouble to you, could you upload it in this repo?

Thank you

想问下用opencv调用text.weights的文字检测模型应该怎么改?

我参考这个链接,yolov3-tiny.weights的这个模型用OpenCV的dnn是调用起来了的
https://github.com/spmallick/learnopencv/blob/master/ObjectDetection-YOLO/object_detection_yolo.cpp
但是直接替换模型路径为text.weights的输出不了文字区域坐标哦,反正前向传播是完成了的,net.forward(outs, getOutputsNames(net));就是没结果啊 不知道为啥。
还有请问下你的text.weights的直接输出不是文字外接矩吗?看你的代码的detector里面好像做了很多后处理啊,能稍微解释下整个流程吗?(yolov3的text.weights输出是什么,具体后面要做什么才能获得检测的文字的外接矩)

我看你的text_detect 有好多参数,这个不知道用OpenCV怎么传入啊
def text_detect(img,
MAX_HORIZONTAL_GAP=30,
MIN_V_OVERLAPS=0.6,
MIN_SIZE_SIM=0.6,
TEXT_PROPOSALS_MIN_SCORE=0.7,
TEXT_PROPOSALS_NMS_THRESH=0.3,
TEXT_LINE_NMS_THRESH = 0.3,
MIN_RATIO=1.0,
LINE_MIN_SCORE=0.8,
TEXT_PROPOSALS_WIDTH=5,
MIN_NUM_PROPOSALS=1
):
2) ps 这个opencv的dnn调用你的text.weights速度还真的挺快的,我输入1024*768的图像,在我的普通的笔记本上cpu模式下只要1s左右

ref:
https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp

英文的文字档效果不是很理想

非常感谢作者开源了这么好的一个项目,我测了一下文字检测部分,中文的效果不错,英文很多检测不到。是训练集中英文比较少的缘故吗?
007_d
icdar13_118_d

cv2.dnn.readNetFromDarknet Error

net = cv2.dnn.readNetFromDarknet(yoloCfg, yoloWeights)
cv2.error: OpenCV(3.4.1) /opt/conda/conda-bld/opencv-suite_1530789967746/work/modules/dnn/src
/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function 
ReadDarknetFromCfgFile

自己造的训练集在训练的时候出错

有一个问题是在crnn_main.py中的val函数中,有一个preds=preds.squeeze(2)报错,说RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.