Giter Site home page Giter Site logo

tr's Introduction

tr - Text Recognition

一款针对扫描文档的离线文本识别SDK,核心代码全部采用C++开发,并提供Python接口

编译环境: Ubuntu 16.04


让CRNN支持多行文本的识别 CRNN For Text With Multiple Lines ⭐

将CRNN与Transformer Encoder/Decoder相结合,从而使CRNN支持多行文本的识别。标注时不再需要标注文本行的边界框,大大降低标注和开发人员的工作量。适用于弯曲文本等场景。
如果您要识别的图片类似于以下图片,使用现有OCR无法解决时,那么可以试一试多行CRNN。


抢鲜体验: crnn_for_text_with_multiple_lines


带Transformer的CRNN

https://github.com/myhub/tr/tree/master/v2.8

  • 采用当前流行的YOLO系列主干网络
  • 加入轻量级Transformer Encoder结构提升模型根据上下文纠错的能力
  • 降低对真实样本的依赖,训练集仅仅包含100多个真实样本

Install 安装:

pip install tr==2.8.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
说明: 不同版本的精度有差异,新版本精度不一定更高
旧版本安装:
+ pip install tr==2.8.1

Windows 64位系统安装:
pip install tr==2.8.6 -i https://pypi.org/simple/

Example 代码示例:

import tr
crnn = tr.CRNN()                                # 初始化文本行识别网络
chars, scores = crnn.run("imgs/line.png")       # 识别文本行
print("".join(chars))                           # 打印结果

GUI 截图识别

# 需要安装PyQt5,PIL依赖
python -m tr.gui

更新说明

  • c++接口支持
  • 添加python2支持
  • 去除opencv-python、Pillow依赖,降低部署难度
  • 支持多线程

Requirements

  • python2/python3,需要安装numpy
  • 不支持Windows、CentOS 6、ARM

Install

  • 安装方法一
git clone https://github.com/myhub/tr.git
cd ./tr
sudo python setup.py install
  • 安装方法二
sudo pip install git+https://github.com/myhub/tr.git@master

Test

python2 demo.py               # python2兼容测试
python3 test.py               # 可视化测试
python3 test-multi-thread.py  # 多线程测试
python3 test_crnn_pyqt5.py    # 截图识别

关联项目

  • 若需要Web端调用,推荐参考TrWebOCR

Python Example

import tr

# detect text lines, return list of (cx, cy, width, height, angle)
print(tr.detect("imgs/web.png", tr.FLAG_RECT))

# detect text lines with angle, return list of (cx, cy, width, height, angle)
print(tr.detect("imgs/id_card.jpeg", tr.FLAG_ROTATED_RECT))

# recognize text line, return (text, confidence)
print(tr.recognize("imgs/line.png"))

# detect and recognize text lines with angle, return list of ((cx, cy, width, height, angle), text, confidence)
print(tr.run("imgs/id_card.jpeg"))

C++ Example

tr_init(0, 0, "crnn.bin", NULL);

#define MAX_WIDTH		512
int unicode[MAX_WIDTH];
float prob[MAX_WIDTH]; 

auto ws = tr_recognize(0, (void *)"line.png", 0, 0, 0, unicode, prob, MAX_WIDTH);

tr_release(0);

效果展示


tr's People

Contributors

myhub avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.