Giter Site home page Giter Site logo

fudanocr's Introduction

fudanocr's People

Contributors

aaai22anonymous avatar aimpressionist avatar hyangyu avatar jingyechen avatar xyzu1996 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fudanocr's Issues

Code must be executed in **SCREEN**!

Hello, excuse me, what is the Code must be executed in SCREEN! in your stroke decomposition code? I keep reporting errors **** Experience Name: [0605] YOUR_EXPERIMENT_NAME****

How to train with my dataset in Text Gestalt.?

hi! thx for your sharing !~~~ now, I want to train with my dataset. But I noted that the textzoom dataset and the dataloading method take the hr_img and lr_img with the same size. So if I want to train with my dataset ,which is the files-img form and the hr with 128128, lr with 6464. Should I resize the size of my dataset? Or train my model directly? Does this affect training accuracy and results @JingyeChen

RecursionError: maximum recursion depth exceeded

When I am running the stroke-level-decomposition code,
I got the following errors. Could you please give any advice for solving the problem?
Thanks in advance!

Traceback (most recent call last):
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\train.py", line 205, in
data = dataloader.next()
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataset.py", line 308, in getitem
return self.datasets[dataset_idx][sample_idx]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 73, in getitem
return self[random.randint(0, len(self)-1)]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 73, in getitem
return self[random.randint(0, len(self)-1)]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 73, in getitem
return self[random.randint(0, len(self)-1)]
[Previous line repeated 982 more times]
File "C:\Users\Desktop\project\FudanOCR\stroke-level-decomposition\data\lmdbReader.py", line 63, in getitem
img = Image.open(buf)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py", line 3016, in open
im = _open_core(fp, filename, prefix, formats)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py", line 3002, in _open_core
im = factory(fp, filename)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\JpegImagePlugin.py", line 798, in jpeg_factory
im = JpegImageFile(fp, filename)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImageFile.py", line 121, in init
self._open()
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\JpegImagePlugin.py", line 379, in _open
handler(self, i)
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\JpegImagePlugin.py", line 64, in APP
n = i16(self.fp.read(2)) - 2
File "C:\ProgramData\Anaconda3\lib\site-packages\PIL_binary.py", line 81, in i16be
return unpack_from(">H", c, o)[0]
RecursionError: maximum recursion depth exceeded while calling a Python object

手写体和中文如何测试呢?

你好,非常感谢你开源这么棒的工作,我看你们论文中有写到说支持手写体和中文的SR,请问这个应该如何测试呢?望回复,谢谢!!

super resolution.py

image

Do we have to change these paths ( folder_GT and folder_GN)in super_resolution.py?

Also, can we find the PSNR and SSIM of all images and store them in a CSV? How to do that?

【block】class TBSRN:in the tbsrn.py file,what does it mean? Thank you very much!

def forward:
             ......
             for i in range(self.srb_nums + 1):
                        block[str(i + 2)] = getattr(self, 'block%d' % (i + 2))(block[str(i + 1)])
            
                    block[str(self.srb_nums + 3)] = getattr(self, 'block%d' % (self.srb_nums + 3)) \
                        ((block['1'] + block[str(self.srb_nums + 2)]))
                    output = torch.tanh(block[str(self.srb_nums + 3)])

First quention:
block means basic unit of tbsrn,right?
i guess:
in the class of TBSRN,block1 means conv1 ?
block2-7 means TBSRN-n ? where n=srb_nums? but initinal param srb_nums euqals 5.
block8 means subsampling block ?


Sencond quention:
in the def forward:
how do upsampling block implement?
i just saw the code: block[str(self.srb_nums + 3)] = getattr(self, 'block%d' % (self.srb_nums + 3)) \ ((block['1'] + block[str(self.srb_nums + 2)]))
Thank you~~~~~~

测试过程中的YOUR_MODEL应该是什么,请举个例子

CUDA_VISIBLE_DEVICES=GPU_NUM python main.py --batch_size=16 --STN --exp_name EXP_NAME --text_focus --resume YOUR_MODEL --test --test_data_dir ./dataset/mydata/test
作者您好,请问这个YOUR_MODEL 具体应该填啥,我试了很多次都没有成功运行

Output images ?

Hey,
How can we visualize the output images after the demo command?

paper

hello, where is the supplementary material of this paper ,i have some questions about the content of this paper? Thanks!

A question about the TextZoom Dataset

我对Scene Text Telescope论文中的数据集有一些疑问,请问一下这里的resize是在代码里面做的还是配置数据集的时候就已经完成了?
V86TY8V5Z_{4H9WS_QE7M

Dataset is not opening

I was trying to open the .mdb dataset files, but I could not access and read them. I have MS access, and I tried other options too. Could you help me out?

Demo will not run due to tensor size mismatch

The demo won't run for me due to a tensor size mismatch. I'm calling it as:

python main.py --demo --demo_dir='./demo/' --resume='./checkpoint/louis/model_best.pth' --STN --mask --exp_name louis

The mismatch is arising on line 367 of interfaces/super_resolution.py at:

            images_sr = model(images_lr)

The traceback reports "got 1024 and 8192 (The offending index is 0)":

Namespace(STN=True, arch='tbsrn', batch_size=None, demo=True, demo_dir='./demo/', exp_name='louis', hd_u=32, mask=True, mixed=False, rec='crnn', resume='./checkpoint/louis/model_best.pth', srb=5, syn=False, test=False, test_data_dir='./dataset/mydata/test/easy', text_focus=False)
loading pre-trained model from ./checkpoint/louis/model_best.pth 
Total Parameters 3220992
loading pretrained crnn model from ./dataset/mydata/crnn.pth
  0%|                                                                                                                                                                                                                                                                                                                            | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 40, in <module>
    main(config, args)
  File "main.py", line 13, in main
    Mission.demo()
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/interfaces/super_resolution.py", line 367, in demo
    images_sr = model(images_lr)
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/model/tbsrn.py", line 221, in forward
    block[str(i + 2)] = getattr(self, 'block%d' % (i + 2))(block[str(i + 1)])
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/model/tbsrn.py", line 255, in forward
    residual = self.feature_enhancer(residual)
  File "/home/louis/miniconda3/envs/sttsr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/louis/dev/sr/FudanOCR/scene-text-telescope/model/tbsrn.py", line 85, in forward
    conv_feature = torch.cat([conv_feature, position2d],1) # batch, 128(64+64), 32, 128
RuntimeError: Sizes of tensors must match except in dimension 2. Got 1024 and 8192 (The offending index is 0)

This pretrained model is the one provided in Dropbox (have you switched to a different model since uploading perhaps?)

Chinese dataset question

How does the data need to be converted english_decomposition.txt into chinese_decomposition.txt? Because my train_data is in Chinese text.If yes, how should I generate chinese_decomposition.txt?thanks

Unable to load the trained checkpoints

Hey, Can you help me with the trained checkpoints? I am not able to load the checkpoint.pth
What is the command to resume from the checkpoints? please help me solve the issue .

结果不太对

按照readme进行测试,测试结果如下,环境都是按照要求来的
image

Checkpoint not match model .

I run command :python main.py --demo --demo_dir='./demo/' --resume='./checkpoint/model_best.pth' --STN --mask --exp_name thor --batch_size=16 --text_focus
but error not match checkpoint. Can you check again .

Traceback (most recent call last):
File "main.py", line 40, in
main(config, args)
File "main.py", line 13, in main
Mission.demo()
File "/home/thorpham/Documents/challenge/super-resolution/FudanOCR/scene-text-telescope/interfaces/super_resolution.py", line 347, in demo
model_dict = self.generator_init()
File "/home/thorpham/Documents/challenge/super-resolution/FudanOCR/scene-text-telescope/interfaces/base.py", line 184, in generator_init
model.load_state_dict(torch.load(self.resume)['state_dict_G'])
File "/home/thorpham/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TBSRN:
size mismatch for block1.0.weight: copying a param with shape torch.Size([64, 3, 9, 9]) from checkpoint, the shape in current model is torch.Size([64, 4, 9, 9]).
size mismatch for block8.1.weight: copying a param with shape torch.Size([3, 64, 9, 9]) from checkpoint, the shape in current model is torch.Size([4, 64, 9, 9]).
size mismatch for block8.1.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).
size mismatch for stn_head.stn_convnet.0.0.weight: copying a param with shape torch.Size([32, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 4, 3, 3]).

【time】how long does it take to train?

【time】how long does it take to train?


【moran、aster】how to set recognition head to moran or aster?
and Do I need to retrain every time the recognition head is set?

thank you

About weight_cross_entropy

This paper is quite impressive research.

I have a question about implementation.

In this paper, weight cross-entropy (Content-Aware Module) is employed for calculating loss between the prediction of SR image and gt. However, the code seems to calculate loss between the prediction of HR image and gt.

recognition_loss = weight_cross_entropy(hr_pred, text_gt)

where is chinese_decomposition.txt ?

Great jobs! Thanks for sharing the code.
I have one question. I find that there is english_decomposition.txt in the baidu link. But I want to use pretrain-transformer-stroke-decomposition-chinese.pth and there is not chinese_decomposition.txt available. Could you please tell me how to use pretrain-transformer-stroke-decomposition-chinese.pth ?

怎样得到我自己数据的"Right-shifted Stroke-level label S'gt "?

@AImpressionist @JingyeChen 您好,首先非常感谢您分享这么出色的工作。想请教两个问题:
1.论文中的Right-shifted Stroke-level label S'gt 是什么意思呢?将字符标签拆分成笔画标签,这个我能理解,那如何得到右移的标签呢?
2.我想利用我自己的数据进行训练,如何获取"Right-shifted Stroke-level label S'gt "这个label呢,我自己的数据仅为了进行超分,而没有任何的字符识别的label。期待您的回复,谢谢。

Checkpoint experiment directory is cleaned (all models destroyed) upon testing

Firstly there isn't full usage guidance for this so I'm making some assumptions based on the TextZoom repo this was adapted from, please advise if I'm better off using it differently!

To train the model I ran:

python main.py --batch_size=32 --STN --mask --exp_name louis --text_focus

which successfully put checkpoint.pth and model_best.pth in checkpoints/louis/, however when I went to test these it printed

Clean the old checkpoint louis

and then complained:

FileNotFoundError: [Errno 2] No such file or directory: './checkpoint/louis/checkpoint.pth'

...so the saved model state was destroyed by trying to test it... Am I calling it wrong somehow? I suspect this should only be done when training (not testing) a model of the same experiment name as a pre-existing one.

pre-trained model pth

Hi, I want to download your two model_best.pth.
But I couldn't download because I don't have baidu ID and PW.
So, can you upload two mode_best.pth in google drive?

I downloaded one thing pretrain_trasformer_stroke_decomposition.pth

Thank you!

请问能否提供TR-PSNR/TR-SSIM需要用到的segmentation mask?

您好,您论文中所提到的自建指标TR-PSNR/TR-SSIM需要用到text region 的segmentation mask,但下载的数据中貌似没有包含这一部分的内容,请问您是否能提供用于计算指标所用到的mask呢,或者训练mask的UNet的具体训练参数(论文和补充材料中都没有对训练的具体环境和参数有完整描述)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.