fudanvi / fudanocr Goto Github PK

View Code? Open in Web Editor NEW

334.0 6.0 61.0 137.83 MB

A toolbox of scene text super-resolution and recognition

Python 100.00% Jupyter Notebook 0.01%

fudanocr's Introduction

FudanOCR

This toolbox contains the implementations of the following papers:

Scene Text Segmentation with Text-Focused Transformers [Yu et al., ACM MM-23]
Weakly-Supervised Text Instance Segmentation [Zu et al., ACM MM-23]
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning [Yu et al., ICCV-23 (Oral)]
Orientation-Independent Chinese Text Recognition in Scene Images [Yu et al., IJCAI-23]
Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning [Zu et al., IJCAI-23]
Chinese Character Recognition with Augmented Character Profile Matching [Zu et al., ACM MM-22]
Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution [Chen et al., AAAI-22]
Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition [Chen et al., IJCAI-21]
Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et al., CVPR-21]

The README.md file in each folder contains the instruction about how to run the code

fudanocr's People

Contributors

Stargazers

Watchers

Forkers

duxiangcheng jingyechen kanika02 xxana wangjh9953 thorpham kiminh wind222 junhocho fireae guardskill yfaqh bierpengbusi bigkingxxl zhuxyme ayush-sharma2601 1hexf1 rch7241 ogunnoo kongyuzhuo justdolearning nabang1010 antocommi smilelite xyzu1996 tinyriver changjiangxie drzaiius wenwenyu buoyrina josieyoo boneyag rufusexe ginwins 7shi7 davidharrod mllearnerakash lgf-place aniketgurav maxpark white6zz qingrujiansu everythingismetaphor vamsidharmuthireddy linhong00316 max975 aniketntnu henrywoo git-tengsun jasondu1993 ra890927 magicmashroom0 lixuran erjpc flamingring kezhao00 yusuhuaixi sungjin-lee dongkat

fudanocr's Issues

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\tmp\\build\\80754af9\\cffi_1613246439577\\work'

When I exec the command "pip install -r requement.txt", this error is occur

Code must be executed in SCREEN!

Hello, excuse me, what is the Code must be executed in SCREEN! in your stroke decomposition code? I keep reporting errors **** Experience Name: [0605] YOUR_EXPERIMENT_NAME****

I want to run inference on text image to get bigger text, what would be the recommended way to do that?

The scene-text-telescope is provided with TBSRN weights. What would be the best way to use that for LR image inference?

How to train with my dataset in Text Gestalt.?

hi! thx for your sharing !~~~ now, I want to train with my dataset. But I noted that the textzoom dataset and the dataloading method take the hr_img and lr_img with the same size. So if I want to train with my dataset ,which is the files-img form and the hr with 128128, lr with 6464. Should I resize the size of my dataset? Or train my model directly? Does this affect training accuracy and results @JingyeChen

RecursionError: maximum recursion depth exceeded

When I am running the stroke-level-decomposition code,
I got the following errors. Could you please give any advice for solving the problem?
Thanks in advance!

Traceback (most recent call last):
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\train.py", line 205, in
data = dataloader.next()
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataset.py", line 308, in getitem
return self.datasets[dataset_idx][sample_idx]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 73, in getitem
return self[random.randint(0, len(self)-1)]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 73, in getitem
return self[random.randint(0, len(self)-1)]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 73, in getitem
return self[random.randint(0, len(self)-1)]
[Previous line repeated 982 more times]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 63, in getitem
img = Image.open(buf)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py", line 3016, in open
im = _open_core(fp, filename, prefix, formats)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py", line 3002, in _open_core
im = factory(fp, filename)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\JpegImagePlugin.py", line 798, in jpeg_factory
im = JpegImageFile(fp, filename)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImageFile.py", line 121, in init
self._open()
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\JpegImagePlugin.py", line 379, in _open
handler(self, i)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\JpegImagePlugin.py", line 64, in APP
n = i16(self.fp.read(2)) - 2
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL_binary.py", line 81, in i16be
return unpack_from(">H", c, o)[0]
RecursionError: maximum recursion depth exceeded while calling a Python object

Can i get the input images before converting them to lmdb

Could you share the Chinese decomposition model?

@JingyeChen @AImpressionist Thanks for your sharing, and I am very interested in your work. But I just find the English disposition model, Could you share the Chinese decomposition model with me ? Thank you very much!!

手写体和中文如何测试呢？

你好，非常感谢你开源这么棒的工作，我看你们论文中有写到说支持手写体和中文的SR，请问这个应该如何测试呢？望回复，谢谢！！

【accuracy】I trained your modle ，when the epoch is 350，the accuracy is 56.45%、41.81%、33.51% by CRNN

is that correct？
The paper results is 59.6%、47.1%、35.3% by CRNN,could you describe your trianed detail？ that will give me tremendous help！
Thank you！

Scene Text Telescope: 预训练模型日志缺失

super resolution.py

Do we have to change these paths ( folder_GT and folder_GN)in super_resolution.py?

Also, can we find the PSNR and SSIM of all images and store them in a CSV? How to do that?

【block】class TBSRN:in the tbsrn.py file,what does it mean? Thank you very much!

def forward:
             ......
             for i in range(self.srb_nums + 1):
                        block[str(i + 2)] = getattr(self, 'block%d' % (i + 2))(block[str(i + 1)])
            
                    block[str(self.srb_nums + 3)] = getattr(self, 'block%d' % (self.srb_nums + 3)) \
                        ((block['1'] + block[str(self.srb_nums + 2)]))
                    output = torch.tanh(block[str(self.srb_nums + 3)])

First quention:
block means basic unit of tbsrn,right?
i guess:
in the class of TBSRN,block1 means conv1 ?
block2-7 means TBSRN-n ? where n=srb_nums? but initinal param srb_nums euqals 5.
block8 means subsampling block ?

Sencond quention:
in the def forward:
how do upsampling block implement？
i just saw the code: block[str(self.srb_nums + 3)] = getattr(self, 'block%d' % (self.srb_nums + 3)) \ ((block['1'] + block[str(self.srb_nums + 2)]))
Thank you~~~~~~

How do I convert HWDB data sets to LMDB format?

Hello, I have tried many methods, but failed to convert HWDB data set, could you please provide how to convert HWDB to LMDB?

I will appreciate it very much.

测试过程中的YOUR_MODEL应该是什么，请举个例子

CUDA_VISIBLE_DEVICES=GPU_NUM python main.py --batch_size=16 --STN --exp_name EXP_NAME --text_focus --resume YOUR_MODEL --test --test_data_dir ./dataset/mydata/test
作者您好，请问这个YOUR_MODEL 具体应该填啥，我试了很多次都没有成功运行

Scene Text Telescope: missing log file

The link to the log file (https://drive.google.com/file/d/19M5twD_cUAq88YuENPpR_7hIvLULb6mF/view?usp=sharin) in the STT readme does not appear to be a valid google drive link.

Output images ?

Hey,
How can we visualize the output images after the demo command?

Can I train a new model with my own dataset using SceneText Telescope?

Hi authors, I would like to train my model with my own dataset. How can I do that? Thank you.

【Demo】RuntimeError: Sizes of tensors must match except in dimension 2. Got 1024 and 8192 (The offending index is 0)

I should create a demo directory in the scene-text-telescope directory？
Then python main.py --batch_size=16 --STN --exp_name EXP_NAME --text_focus --demo --demo_dir ./demo

RuntimeError: Sizes of tensors must match except in dimension 2. Got 1024 and 8192 (The offending index is 0)

in the demo directory，I put a picture or data.mdb,right?
thank you

About the pretrained transformer ?

hello,what is the structure of the pretrained transformer ?thanks!

Where can I get files named 'train_1000' and 'test_1000'？

I don`t understand why I should use split().

for dataset_root in config['train_dataset'].split(',')

test 准确率一直为0

paper

hello， where is the supplementary material of this paper ,i have some questions about the content of this paper? Thanks!

A question about the TextZoom Dataset

我对Scene Text Telescope论文中的数据集有一些疑问，请问一下这里的resize是在代码里面做的还是配置数据集的时候就已经完成了？
$V86TY8V5Z_{4H9WS_QE7M$

stroke-level-decomposition dataset problem

Could you please provide the data sets that you have used and processed? I would appreciate it if it could be provided.

怎么处理大尺寸的图像？需要自己分割成小尺寸的，然后逐一处理吗？

我们产品的需求是对屏幕共享图像的视频进行超分，提高低码率下桌面共享图像的清晰度。请问下作者，该如何处理像1080P这样分辨率的图像？
另外，如何加入针对中文的超分训练？在数据集中加入中文吗？如何加入？

感谢！

关于y_pred：经过transformer预测出的笔画序列是如何拆分回单个字符的？

例如英文笔画字典点中EU和FJ，它们的序列都是21117，是怎么拆分回到底是EU还是FJ的呢

Dataset is not opening

I was trying to open the .mdb dataset files, but I could not access and read them. I have MS access, and I tried other options too. Could you help me out?

Demo will not run due to tensor size mismatch

The demo won't run for me due to a tensor size mismatch. I'm calling it as:

python main.py --demo --demo_dir='./demo/' --resume='./checkpoint/louis/model_best.pth' --STN --mask --exp_name louis

The mismatch is arising on line 367 of interfaces/super_resolution.py at:

            images_sr = model(images_lr)

The traceback reports "got 1024 and 8192 (The offending index is 0)":

Namespace(STN=True, arch='tbsrn', batch_size=None, demo=True, demo_dir='./demo/', exp_name='louis', hd_u=32, mask=True, mixed=False, rec='crnn', resume='./checkpoint/louis/model_best.pth', srb=5, syn=False, test=False, test_data_dir='./dataset/mydata/test/easy', text_focus=False)
loading pre-trained model from ./checkpoint/louis/model_best.pth 
Total Parameters 3220992
loading pretrained crnn model from ./dataset/mydata/crnn.pth
  0%|                                                                                                                                                                                                                                                                                                                            | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 40, in <module>
    main(config, args)
  File "main.py", line 13, in main
    Mission.demo()
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/interfaces/super_resolution.py", line 367, in demo
    images_sr = model(images_lr)
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/model/tbsrn.py", line 221, in forward
    block[str(i + 2)] = getattr(self, 'block%d' % (i + 2))(block[str(i + 1)])
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/model/tbsrn.py", line 255, in forward
    residual = self.feature_enhancer(residual)
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/model/tbsrn.py", line 85, in forward
    conv_feature = torch.cat([conv_feature, position2d],1) # batch, 128(64+64), 32, 128
RuntimeError: Sizes of tensors must match except in dimension 2. Got 1024 and 8192 (The offending index is 0)

This pretrained model is the one provided in Dropbox (have you switched to a different model since uploading perhaps?)

Chinese dataset question

How does the data need to be converted english_decomposition.txt into chinese_decomposition.txt? Because my train_data is in Chinese text.If yes, how should I generate chinese_decomposition.txt？thanks

Unable to load the trained checkpoints

Hey, Can you help me with the trained checkpoints? I am not able to load the checkpoint.pth
What is the command to resume from the checkpoints? please help me solve the issue .

lmdb.Error: ./data/mydata/train_data.mdb: Not a directory

Hello, my experiment has been this error， lmdb.Error: ./data/mydata/train_data.mdb: Not a directory. Have you ever come across such a situation? How can I solve it? I would appreciate it if you could help me.

If you

RuntimeError: Sizes of tensors must match except in dimension 2. Got 1024 and 8192 (The offending index is 0)

Would you please help me solve this error? I don't understand what I am doing wrong.

结果不太对

按照readme进行测试，测试结果如下，环境都是按照要求来的

Checkpoint not match model .

I run command :python main.py --demo --demo_dir='./demo/' --resume='./checkpoint/model_best.pth' --STN --mask --exp_name thor --batch_size=16 --text_focus
but error not match checkpoint. Can you check again .

Traceback (most recent call last):
File "main.py", line 40, in
main(config, args)
File "main.py", line 13, in main
Mission.demo()
File "/home/thorpham/Documents/challenge/super-resolution/FudanOCR/scene-text-telescope/interfaces/super_resolution.py", line 347, in demo
model_dict = self.generator_init()
File "/home/thorpham/Documents/challenge/super-resolution/FudanOCR/scene-text-telescope/interfaces/base.py", line 184, in generator_init
model.load_state_dict(torch.load(self.resume)['state_dict_G'])
File "/home/thorpham/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TBSRN:
size mismatch for block1.0.weight: copying a param with shape torch.Size([64, 3, 9, 9]) from checkpoint, the shape in current model is torch.Size([64, 4, 9, 9]).
size mismatch for block8.1.weight: copying a param with shape torch.Size([3, 64, 9, 9]) from checkpoint, the shape in current model is torch.Size([4, 64, 9, 9]).
size mismatch for block8.1.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).
size mismatch for stn_head.stn_convnet.0.0.weight: copying a param with shape torch.Size([32, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 4, 3, 3]).

【time】how long does it take to train？

【moran、aster】how to set recognition head to moran or aster？
and Do I need to retrain every time the recognition head is set?

thank you

TBSRN链接失效

google drive上TBSRN链接可能失效了

About weight_cross_entropy

This paper is quite impressive research.

I have a question about implementation.

In this paper, weight cross-entropy (Content-Aware Module) is employed for calculating loss between the prediction of SR image and gt. However, the code seems to calculate loss between the prediction of HR image and gt.

FudanOCR/scene-text-telescope/loss/text_focus_loss.py

Line 97 in 342eb1c

recognition_loss = weight_cross_entropy(hr_pred, text_gt)

where is chinese_decomposition.txt ?

Great jobs! Thanks for sharing the code.
I have one question. I find that there is english_decomposition.txt in the baidu link. But I want to use pretrain-transformer-stroke-decomposition-chinese.pth and there is not chinese_decomposition.txt available. Could you please tell me how to use pretrain-transformer-stroke-decomposition-chinese.pth ?

How to convert my own dataset into lmdb

Can you provide the code for converting my own dataset to lmdb ? thank you in advance

About the printed artistic dataset

Hi, can you release the printed artistic characters dataset? Or where can I download these font files?

怎样得到我自己数据的"Right-shifted Stroke-level label S'gt "?

@AImpressionist @JingyeChen 您好，首先非常感谢您分享这么出色的工作。想请教两个问题：
1.论文中的Right-shifted Stroke-level label S'gt 是什么意思呢？将字符标签拆分成笔画标签，这个我能理解，那如何得到右移的标签呢？
2.我想利用我自己的数据进行训练，如何获取"Right-shifted Stroke-level label S'gt "这个label呢，我自己的数据仅为了进行超分，而没有任何的字符识别的label。期待您的回复，谢谢。

Checkpoint experiment directory is cleaned (all models destroyed) upon testing

Firstly there isn't full usage guidance for this so I'm making some assumptions based on the TextZoom repo this was adapted from, please advise if I'm better off using it differently!

To train the model I ran:

python main.py --batch_size=32 --STN --mask --exp_name louis --text_focus

which successfully put checkpoint.pth and model_best.pth in checkpoints/louis/, however when I went to test these it printed

Clean the old checkpoint louis

and then complained:

FileNotFoundError: [Errno 2] No such file or directory: './checkpoint/louis/checkpoint.pth'

...so the saved model state was destroyed by trying to test it... Am I calling it wrong somehow? I suspect this should only be done when training (not testing) a model of the same experiment name as a pre-existing one.

error occur when run the testing

This error happen and I don't know how to fix it . Can you help me to solve this problem

pre-trained model pth

Hi, I want to download your two model_best.pth.
But I couldn't download because I don't have baidu ID and PW.
So, can you upload two mode_best.pth in google drive?

I downloaded one thing pretrain_trasformer_stroke_decomposition.pth

Thank you!

fudanvi / fudanocr Goto Github PK

fudanocr's Introduction

FudanOCR

fudanocr's People

Contributors

Stargazers

Watchers

Forkers

fudanocr's Issues

I should create a demo directory in the scene-text-telescope directory？ Then python main.py --batch_size=16 --STN --exp_name EXP_NAME --text_focus --demo --demo_dir ./demo

RuntimeError: Sizes of tensors must match except in dimension 2. Got 1024 and 8192 (The offending index is 0)

Recommend Projects

Recommend Topics

Recommend Org

I should create a demo directory in the scene-text-telescope directory？
Then python main.py --batch_size=16 --STN --exp_name EXP_NAME --text_focus --demo --demo_dir ./demo