Giter Site home page Giter Site logo

vinairesearch / dict-guided Goto Github PK

View Code? Open in Web Editor NEW
128.0 6.0 37.0 3.33 MB

Dictionary-guided Scene Text Recognition (CVPR-2021)

License: GNU Affero General Public License v3.0

Python 87.41% C++ 4.71% Cuda 6.81% Dockerfile 0.19% Shell 0.88%
scene-text-recognition ocr text-recognition vintext dictionary-guided dataset cvpr cvpr2021 scene-text-detection-recognition

dict-guided's Introduction

Table of Content
  1. Introduction
  2. Dataset
  3. Getting Started
  4. Training & Evaluation
  5. Acknowledgement

Dictionary-guided Scene Text Recognition

  • We propose a novel dictionary-guided sense text recognition approach that could be used to improve many state-of-the-art models.
  • We also introduce a new benchmark dataset (namely, VinText) for Vietnamese scene text recognition.
architecture.png
Comparison between the traditional approach and our proposed approach.

Details of the dataset construction, model architecture, and experimental results can be found in our following paper:

@inproceedings{m_Nguyen-etal-CVPR21,
      author = {Nguyen Nguyen and Thu Nguyen and Vinh Tran and Triet Tran and Thanh Ngo and Thien Nguyen and Minh Hoai},
      title = {Dictionary-guided Scene Text Recognition},
      year = {2021},
      booktitle = {Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)},
    }

Please CITE our paper whenever our dataset or model implementation is used to help produce published results or incorporated into other software.


Dataset

We introduce ✨ a new VinText dataset.

By downloading this dataset, USER agrees:

  • to use this dataset for research or educational purposes only
  • to not distribute or part of this dataset in any original or modified form.
  • and to cite our paper whenever this dataset are employed to help produce published results.
Name #imgs #text instances Examples
VinText 2000 About 56000 example.png

Detail about ✨ VinText dataset can be found in our paper. Download Converted dataset to try with our model

Dataset variant Input format Link download
Original x1,y1,x2,y2,x3,y3,x4,y4,TRANSCRIPT Download here
Converted dataset COCO format Download here

VinText

Extract data and copy folder to folder datasets/

datasets
└───vintext
	└───test.json
		│train.json
		|train_images
		|test_images
└───evaluation
	└───gt_vintext.zip

Getting Started

Requirements
  • python=3.7
  • torch==1.4.0
  • detectron2==0.2
Installation
conda create -n dict-guided -y python=3.7
conda activate dict-guided
conda install -y pytorch torchvision cudatoolkit=10.0 -c pytorch
python -m pip install ninja yacs cython matplotlib tqdm opencv-python shapely scipy tensorboardX pyclipper Polygon3 weighted-levenshtein editdistance

# Install Detectron2
python -m pip install detectron2==0.2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html

Check out the code and install:

git clone https://github.com/nguyennm1024/dict-guided.git
cd dict-guided
python setup.py build develop
Download vintext pre-trained model
Usage

Prepare folders

mkdir sample_input
mkdir sample_output

Copy your images to sample_input/. Output images would result in sample_output/

python demo/demo.py --config-file configs/BAText/VinText/attn_R_50.yaml --input sample_input/ --output sample_output/ --opts MODEL.WEIGHTS path-to-trained_model-checkpoint
qualitative results.png
Qualitative Results on VinText.

Training and Evaluation

Training

For training, we employed the pre-trained model tt_attn_R_50 from the ABCNet repository for initialization.

python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_tt_attn_R_50_checkpoint

Example:

python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./tt_attn_R_50.pth

Trained model output will be saved in the folder output/batext/vintext/ that is then used for evaluation

Evaluation

python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_trained_model_checkpoint

Example:

python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./output/batext/vintext/trained_model.pth

Acknowledgement

This repository is built based-on ABCNet

dict-guided's People

Contributors

nguyennm1024 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dict-guided's Issues

what is Li in paper?

l(y, v) is NLLLoss
image
in your paper: "You first convert the list of negative log likelihood values into a probability distribution":
image

but in your code:
image
it is softmax( 1/loss) not softmax(-loss)?
Can you explain me?

Predict on CPU

Hi,
I am presently utilizing this repository to anticipate the text's location. However, an error occurs when I attempt to predict on the CPU because I do not have a GPU.

[02/18 23:18:40 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='configs/BAText/VinText/attn_R_50.yaml', input=['sample_input/'], opts=['MODEL.WEIGHTS', 'model_0033999.pth'], output='sample_output/', video_input=None, webcam=False)
WARNING [02/18 23:18:40 d2.config.compat]: Config 'configs/BAText/VinText/attn_R_50.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
File "demo/demo.py", line 74, in
demo = VisualizationDemo(cfg)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/dict-guided/demo/predictor.py", line 36, in init
self.predictor = DefaultPredictor(cfg)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 282, in init
self.model = build_model(self.cfg)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/detectron2/modeling/meta_arch/build.py", line 23, in build_model
model.to(torch.device(cfg.MODEL.DEVICE))
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 899, in to
return self._apply(convert)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in _apply
param_applied = fn(param)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 897, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/lynguyenminh/Workspace/Projects/dict-guided/venv/lib/python3.8/site-packages/torch/cuda/init.py", line 214, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Is it support prediction in CPU?

How many iterations until E2E_RESULTS becomes non-zero?

Hi,

I would like to know how many iterations you need to train until E2E_RESULTS becomes non-zero. I've trained for about 32k iters and it's still 0:

[11/09 21:43:50] d2.evaluation.testing INFO: copypaste: Task: E2E_RESULTS
[11/09 21:43:50] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[11/09 21:43:50] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000
[11/09 21:43:50] d2.evaluation.testing INFO: copypaste: Task: DETECTION_ONLY_RESULTS
[11/09 21:43:50] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[11/09 21:43:50] d2.evaluation.testing INFO: copypaste: 0.9037,0.7955,0.8461

run demo.py error

When I run this commmand:
python demo/demo.py --config-file configs/BAText/VinText/attn_R_50.yaml --input sample_input/ --output sample_output/ --opts MODEL.WEIGHTS ./trained_model.pth
A error occured
Traceback (most recent call last):
File "demo/demo.py", line 10, in
from adet.config import get_cfg
File "/usr/lml/dict-guided-main/adet/init.py", line 1, in
from adet import modeling
File "/usr/lml/dict-guided-main/adet/modeling/init.py", line 2, in
from .backbone import build_fcos_resnet_fpn_backbone
File "/usr/lml/dict-guided-main/adet/modeling/backbone/init.py", line 1, in
from .dla import build_fcos_dla_fpn_backbone
File "/usr/lml/dict-guided-main/adet/modeling/backbone/dla.py", line 13, in
from detectron2.layers import ShapeSpec
File "/root/anaconda3/envs/dict-guided/lib/python3.7/site-packages/detectron2/layers/init.py", line 3, in
from .deform_conv import DeformConv, ModulatedDeformConv
File "/root/anaconda3/envs/dict-guided/lib/python3.7/site-packages/detectron2/layers/deform_conv.py", line 10, in
from detectron2 import _C
ImportError: /root/anaconda3/envs/dict-guided/lib/python3.7/site-packages/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass
I am wondering how to debug?

Important fix for cpu usage in adet/modeling/attn_predictor.py

Hello again Mr author,
There is a bug I came a cross when I use only cpu for predicting,
image
What my approach was to change every to.("cuda") in to to(rois.device) ## Since I want to be consistent with your code anyway.
image
After that, run python setup.py install -v again ... and voila it worked!
image

Tks for reading this.

Only train text recognition

Hi,
Thank you for your work. I really want to try it.
However, I have a dataset of already cropped images of words, so don't need to detect anymore and want to recognize them. I want to try your recognition model only. Is there any way that I can do that? Could you give me some suggestions on how to do that?

I appreciate your help.

Missing evaluation data

Dataset at this link should be follow described format, right?

|Converted dataset| [COCO format](https://cocodataset.org/#format-data) |[Download here](https://drive.google.com/file/d/1AXl2iOTvLtMG8Lg2iU6qVta8VuWSXyns/view?usp=sharing)|

datasets
└───vintext
	└───test.json
		│train.json
		|train_images
		|test_images
└───evaluation
	└───gt_vintext.zip

Its looks like missing evaluation/gt_vintext.zip file.

ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

Using /usr/local/envs/dict-guided/lib/python3.7/site-packages
Finished processing dependencies for AdelaiDet==0.2.0
Traceback (most recent call last):
  File "demo/demo.py", line 10, in <module>
    from adet.config import get_cfg
  File "/usr/local/envs/dict-guided/lib/python3.7/site-packages/AdelaiDet-0.2.0-py3.7-linux-x86_64.egg/adet/__init__.py", line 1, in <module>
    from adet import modeling
  File "/usr/local/envs/dict-guided/lib/python3.7/site-packages/AdelaiDet-0.2.0-py3.7-linux-x86_64.egg/adet/modeling/__init__.py", line 3, in <module>
    from .batext import BAText
  File "/usr/local/envs/dict-guided/lib/python3.7/site-packages/AdelaiDet-0.2.0-py3.7-linux-x86_64.egg/adet/modeling/batext/__init__.py", line 1, in <module>
    from .batext import BAText
  File "/usr/local/envs/dict-guided/lib/python3.7/site-packages/AdelaiDet-0.2.0-py3.7-linux-x86_64.egg/adet/modeling/batext/batext.py", line 5, in <module>
    from adet.layers import DFConv2d, IOULoss
  File "/usr/local/envs/dict-guided/lib/python3.7/site-packages/AdelaiDet-0.2.0-py3.7-linux-x86_64.egg/adet/layers/__init__.py", line 1, in <module>
    from .bezier_align import BezierAlign
  File "/usr/local/envs/dict-guided/lib/python3.7/site-packages/AdelaiDet-0.2.0-py3.7-linux-x86_64.egg/adet/layers/bezier_align.py", line 2, in <module>
    from adet import _C
ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

I run command by your following:
python demo/demo.py --config-file configs/BAText/VinText/attn_R_50.yaml --input sample_input/ --output sample_output/ --opts MODEL.WEIGHTS path-to-trained_model-checkpoint

But had this bug. How to fix this ? Thank you.

Vintext-COCO format

When I visualized the dataset, I found that each image lacks one annotation information. Can you provide a new COCO format?

cannot import name '_C' from 'adet'

Hi guys,
I'm trying to run your demo on colab. I installed detectron2 and run setup.py as your guide but I have this error:
Traceback (most recent call last):
File "demo.py", line 10, in
from adet.config import get_cfg
File "/content/drive/.shortcut-targets-by-id/1llJY-zZKWqIvj_P5Do1sm_P_no82K__h/VinText/dict-guided/adet/init.py", line 1, in
from adet import modeling
File "/content/drive/.shortcut-targets-by-id/1llJY-zZKWqIvj_P5Do1sm_P_no82K__h/VinText/dict-guided/adet/modeling/init.py", line 3, in
from .batext import BAText
File "/content/drive/.shortcut-targets-by-id/1llJY-zZKWqIvj_P5Do1sm_P_no82K__h/VinText/dict-guided/adet/modeling/batext/init.py", line 1, in
from .batext import BAText
File "/content/drive/.shortcut-targets-by-id/1llJY-zZKWqIvj_P5Do1sm_P_no82K__h/VinText/dict-guided/adet/modeling/batext/batext.py", line 5, in
from adet.layers import DFConv2d, IOULoss
File "/content/drive/.shortcut-targets-by-id/1llJY-zZKWqIvj_P5Do1sm_P_no82K__h/VinText/dict-guided/adet/layers/init.py", line 1, in
from .bezier_align import BezierAlign
File "/content/drive/.shortcut-targets-by-id/1llJY-zZKWqIvj_P5Do1sm_P_no82K__h/VinText/dict-guided/adet/layers/bezier_align.py", line 2, in
from adet import _C
ImportError: cannot import name '_C' from 'adet' (/content/drive/.shortcut-targets-by-id/1llJY-zZKWqIvj_P5Do1sm_P_no82K__h/VinText/dict-guided/adet/init.py)

Can you help me, thank you very much!

Create custom dataset

I have created dataset, and the annotation follows this format:
118,15,147,15,148,46,118,46,LƯỢNG
149,9,165,9,165,43,150,43,TỐT
167,9,180,9,179,43,167,42,ĐỂ

How can I convert it to a coco format like the coco format of the author?
{"licenses": [], "info": {}, "categories": [{"id": 1, "name": "text", "supercategory": "beverage", "keypoints": ["mean", "xmin", "x2", "x3", "xmax", "ymin", "y2", "y3", "ymax", "cross"]}], "images": [{"coco_url": "", "date_captured": "", "file_name": "im1201.jpg", "flickr_url": "", "id": 1201, "license": 0, "width": 800, "height": 600}, {"coco_url": "", "date_captured": "", "file_name": "im1202.jpg", "flickr_url": "", "id": 1202, "license": 0, "width": 460, "height": 275}, {"coco_url": "", "date_captured": "", "file_name": "im1203.jpg", "flickr_url": "", "id": 1203, "license": 0, "width": 289, "height": 512}, {"coco_url": "", "date_captured": "", "file_name": "im1204.jpg", "flickr_url": "", "id": 1204, "license": 0, "width": 3264, "height": 2448}, {"coco_url": "", "date_captured": "", "file_name": "im1205.jpg", "flickr_url": "", "id": 1205, "license": 0, "width": 800, "height": 600}, {"coco_url": "", "date_captured": "", "file_name": "im1206.jpg", "flickr_url": "", "id": 1206, "license": 0, "width": 800, "height": 400}, {"coco_url": "", "date_captured": "", "file_name": "im1207.jpg", "flickr_url": "", "id": 1207, "license": 0, "width": 1024, "height": 681}, {"coco_url": "", "date_captured": "", "file_name": "im1208.jpg", "flickr_url": "", "id": 1208, "license": 0, "width": 800, "height": 600}, {"coco_url": "", "date_captured": "", "file_name": "im1209.jpg", "flickr_url": "", "id": 1209, "license": 0, "width": 640, .......

ParSeq

Hello,
Thank you for your wonderful work.
Can we use your framework for other models. For example, how can i use dictionary guidance with the SOTA PARseQ algorithm for text recognition? Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.