Giter Site home page Giter Site logo

abdur75648 / utrnet-high-resolution-urdu-text-recognition Goto Github PK

View Code? Open in Web Editor NEW
33.0 6.0 8.0 119 KB

UTRNet: High-Resolution Urdu Text Recognition In Printed Documents (ICDAR'23)

Home Page: https://abdur75648.github.io/UTRNet/

License: Other

Python 100.00%
document-analysis high-resolution hrnet icdar icdar2023 ocr scene-text-recognition text-detection text-recognition unet

utrnet-high-resolution-urdu-text-recognition's Introduction

UTRNet: High-Resolution Urdu Text Recognition

UTRNet Website arXiv SpringerLink SpringerLink Demo

Official Implementation of the paper "UTRNet: High-Resolution Urdu Text Recognition In Printed Documents"

The Poster:

P2 49-poster

Using This Repository

Environment

  • Python 3.7
  • Pytorch 1.9.1+cu111
  • Torchvision 0.10.1+cu111
  • CUDA 11.4

Installation

  1. Clone the repository
git clone https://github.com/abdur75648/UTRNet-High-Resolution-Urdu-Text-Recognition.git
  1. Install the requirements
conda create -n urdu_ocr python=3.7
conda activate urdu_ocr
pip3 install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

Running the code

  1. Training
python3 train.py --train_data path/to/LMDB/data/folder/train/ --valid_data path/to/LMDB/data/folder/val/ --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC --exp_name UTRNet-Large --num_epochs 100 --batch_size 8

  1. Testing
CUDA_VISIBLE_DEVICES=0 python test.py --eval_data path/to/LMDB/data/folder/test/ --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC --saved_model saved_models/UTRNet-Large/best_norm_ED.pth
  1. Character-wise Accuracy Testing
  • To create character-wise accuracy table in a CSV file, run the following command
CUDA_VISIBLE_DEVICES=0 python3 char_test.py --eval_data path/to/LMDB/data/folder/test/ --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC  --saved_model saved_models/UTRNet-Large/best_norm_ED.pth
  • Visualize the result by running char_test_vis
  1. Reading individual images
  • To read a single image, run the following command
CUDA_VISIBLE_DEVICES=0 python3 read.py --image_path path/to/image.png --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC  --saved_model saved_models/UTRNet-Large/best_norm_ED.pth
  1. Visualisation of Salency Maps
  • To visualize the salency maps for an input image, run the following command
python3 vis_salency.py --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC --saved_model saved_models/UTRNet-Large/best_norm_ED.pth --vis_dir vis_feature_maps --image_path path/to/image.pngE

Dataset

  1. Create your own lmdb dataset
pip3 install fire
python create_lmdb_dataset.py --inputPath data/ --gtFile data/gt.txt --outputPath result/train

The structure of data folder as below.

data
├── gt.txt
└── test
    ├── word_1.png
    ├── word_2.png
    ├── word_3.png
    └── ...

At this time, gt.txt should be {imagepath}\t{label}\n
For example

test/word_1.png label1
test/word_2.png label2
test/word_3.png label3
...

Downloads

Trained Models

  1. UTRNet-Large
  2. UTRNet-Small

Datasets

  1. UTRSet-Real
  2. UTRSet-Synth
  3. IIITH (Updated) (Original)
  4. UPTI (Source)
  5. UrduDoc - Will be made available subject to the execution of a no-cost license agreement. Please contact the authors for the same.

Text Detection (Supplementary)

The text detection inference code & model based on ContourNet is here. As mentioned in the paper, it may be integrated with UTRNet for a combined text detection+recognition and hence an end-to-end Urdu OCR.

Synthetic Data Generation using Urdu-Synth (Supplementary)

The UTRSet-Synth dataset was generated using a custom-designed robust synthetic data generation module - Urdu Synth.

End-To-End Urdu OCR Webtool

This tool was developed by integrating the UTRNet (https://abdur75648.github.io/UTRNe) with a text detection model (YoloV8 finetuned on UrduDoc) for end-to-end Urdu OCR.

The application is deployed on Hugging Face Spaces and is available for a live demo. You can access it here. If you prefer to run it locally, you can clone its repository and follow the instructions given there - Repo.

Note: This version of the application uses a YoloV8 model for text detection. The original version of UTRNet uses ContourNet for this purpose. However, due to deployment issues, we have opted for YoloV8 in this demo. While YoloV8 is as accurate as ContourNet, it offers the advantages of faster processing and greater efficiency.

website

Updates

  • 01/01/21 - Project Initiated
  • 21/11/22 - Abstract accepted at WIDAFIL-ICFHR 2022
  • 12/12/22 - Repository Created
  • 20/12/22 - Results Updated
  • 19/04/23 - Paper accepted at ICDAR 2023
  • 23/08/23 - Poster presentation at ICDAR 2023
  • 31/08/23 - Webtool made available
  • 31/01/24 - Updated Webtool (with YoloV8) made available via HuggingFace here

Acknowledgements

Contact

Note

This is an official repository of the project. The copyright of the dataset, code & models belongs to the authors. They are for research purposes only and must not be used for any other purpose without the author's explicit permission.

Citation

If you use the code/dataset, please cite the following paper:

@InProceedings{10.1007/978-3-031-41734-4_19,
		author="Rahman, Abdur
		and Ghosh, Arjun
		and Arora, Chetan",
		editor="Fink, Gernot A.
		and Jain, Rajiv
		and Kise, Koichi
		and Zanibbi, Richard",
		title="UTRNet: High-Resolution Urdu Text Recognition in Printed Documents",
		booktitle="Document Analysis and Recognition - ICDAR 2023",
		year="2023",
		publisher="Springer Nature Switzerland",
		address="Cham",
		pages="305--324",
		isbn="978-3-031-41734-4",
		doi="https://doi.org/10.1007/978-3-031-41734-4_19"
}

License

Creative Commons License. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License for Noncommercial (academic & research) purposes only and must not be used for any other purpose without the author's explicit permission.

utrnet-high-resolution-urdu-text-recognition's People

Contributors

abdur75648 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

utrnet-high-resolution-urdu-text-recognition's Issues

The used Synthetic text dataset

Thanks for the excellent work and sharing the code. How the synthetic text are prepared (the one that you created) ? Is it possible to share the code of that, so we also can create for other language.

Code & Dataset

Hi,
Great work on Urdu OCR, @abdur75648!
It would be great if you tell when will all the codes & datasets mentioned in the UTRNet paper be released?
Thanks in advance

Issue in read.py

Hello,
Thankyou for the code. I have tried to run this line to read image and get output in a .txt file, but I didnt get any text urdu output.

CUDA_VISIBLE_DEVICES=0 python3 read.py --image_path path/to/image.png --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC --saved_model saved_models/UTRNet-Large/best_norm_ED.pth

This is the output I received:


CUDA_VISIBLE_DEVICES=0 python3 read.py --image_path images/page_2.png --FeatureExtraction HRNet --SequenceModeling DBiLSTM --Prediction CTC  --saved_model saved_models/best_norm_ED.pth
Device :  cuda
model input parameters 32 400 20 1 32 256 182 100 HRNet DBiLSTM CTC
Loaded pretrained model from saved_models/best_norm_ED.pth
5

Error while loading the bigger model weights

When I try to replace the small model weights used in HF demo with the larger model weights mentioned in the GitHub repo it throws the following error:

size mismatch for SequenceModeling.0.rnn.weight_ih_l0: copying a param with shape torch.Size([1024, 32]) from checkpoint, the shape in current model is torch.Size([1024, 512]).
size mismatch for SequenceModeling.0.rnn.weight_ih_l0_reverse: copying a param with shape torch.Size([1024, 32]) from checkpoint, the shape in current model is torch.Size([1024, 512]).

Do I need to modify any code to make it work?

Issue with pip install -r requirements.txtpip install -r requirements.txt

I have tried many times.. But this is where I get stuck.
I have checked !apt-get install git-lfs
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 38 not upgraded.
ISSUE
Using cached opencv-contrib-python-4.5.1.48.tar.gz (148.8 MB)
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.