Giter Site home page Giter Site logo

yizt / crnn.pytorch Goto Github PK

View Code? Open in Web Editor NEW
231.0 8.0 51.0 97.83 MB

crnn实现水平和垂直方向中文文字识别, 提供在3w多个中文字符训练的水平识别和垂直识别的预训练模型; 欢迎关注,试用和反馈问题... ...

License: Apache License 2.0

Python 100.00%
ocr crnn vertical-text-recognition text-recognition

crnn.pytorch's People

Contributors

yizt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crnn.pytorch's Issues

请教*2

对不起我又来提问了。
image
image

第二个125的公式 是经过计算的吗?就是512宽度经过卷积缩小后的 长度?还是凑巧 都是125?

另一个问题是 这个125 可以是别的数字吗?如果要改成别的数字的话 是两个地方还是要保持一致吗?(比如减少最大长度512,然后再经过计算得到一个新的值)

有这些疑问是因为 我想要把 您的pytorch实现自己通过mxnet重新实现一下,但是发现mxnet的CTCLoss的入参要求不太一样,
调了一会 不知道怎么修改,得到的loss非常奇怪,要么非常小 要么非常大 其他地方感觉差的不多,就是这个loss的应用
所以就想把您的实现搞清楚点,就碰到了上面的疑问,希望能得到您的回复 谢谢

遇到一个问题,就是用 fontutils.py中对我的字体做并集

然后得到一个类似于你的word.txt,但是在做 idx = [chars[c] for c in text]取类别的时候发现,对于数字出现Key error,后来我查了下,我保存下来的word.txt中的数字都是windows-1252编码,而我的系统都是UTF-8编码,所以会出现这种情况,请问你遇到过这种情况么

我尝试用cpu训练。报错了,怎么解决?微信nlanguage 。 py -3 train.py --direction horizontal

我尝试用cpu训练。报错了,怎么解决?微信nlanguage 。 py -3 train.py --direction horizontal

L:\trocr\crnn.pytorch-master>py -3 train.py --direction horizontal
Namespace(batch_size=64, device='cpu', direction='horizontal', dist_backend='nccl', dist_url='env://', distributed=False, epochs=90, init_epoch=0, local_rank=0, lr=0.01, lr_gamma=0.1, lr_step_size=30, momentum=0.9, output_dir='./output', sync_bn=False, weight_decay=1e-05, workers=4, world_size=1)
0%| | 0/47902 [00:02<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "train.py", line 192, in
train(arguments)
File "train.py", line 138, in train
loss = train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args)
File "train.py", line 65, in train_one_epoch
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
for image, target, input_len, target_len in tqdm(data_loader):
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\tqdm\std.py", line 1165, in iter
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
for obj in iterable:
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Font' object

我在all words.txt里放了10个汉字,尝试运行generator.py,报了以下错误,怎么解决?微信nlanguage Traceback (most recent call last): File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 227, in <module> test_image_gen('horizontal') File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 207, in test_image_gen im, indices, target_len = gen.gen_image() File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 158, in gen_image text = np.random.choice(FONT_CHARS_DICT[font_path], target_len) File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice ValueError: 'a' cannot be empty unless no samples are taken

我在all words.txt里放了10个汉字,尝试运行generator.py,报了以下错误,怎么解决?微信nlanguage
Traceback (most recent call last):
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 227, in
test_image_gen('horizontal')
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 207, in test_image_gen
im, indices, target_len = gen.gen_image()
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 158, in gen_image
text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)
File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken

about trained model

Hello, I did not find your training data, can you share the trained model?

请教

我修改了Word对象的all_word的返回,我的目标字符只有英文和数字 加上 逗号 空格 减号一些简单的符号,训练完200epoch
之后使用这个模型预测的时候发现 无论识别什么图片(先不管最终结果的正确),在结果的label中的各个字符前后都有很多0
比如:
我的图片中是 hello world
识别出来的是 h0e0l0l0o0 0w0o0r0l0d0 类似于这样的。
然后我尝试修改all_word的返回 把空格放在第一个字符位 -, 123456... (原先是0123456...abc...)
之后训练出来模型的结果也是类似的情况,只是0变成了空格
hello world就会变成
h e l l o w o r l d
请问为什么会这样,为什么会和all_word的第一个字符有关系呢
现在代码截图
image

还有一个问题是怎么在训练中加入验证集的验证,以及metric的指标还有准确率acc(现在只有一个loss,也不区分训练loss还是验证的loss)

希望得到您的回复,万分感激!

number of dims don't match in permute

楼主楼主
报错如下:
90 x = self.cnn(x) # [B,512,W/16,1]
91 x = torch.squeeze(x, 3) # [B,512,W]
---> 92 x = x.permute([0, 2, 1]) # [B,W,512]
93 x, h1 = self.rnn1(x)
94 x, h2 = self.rnn2(x, h1)

RuntimeError: number of dims don't match in permute

是因为我前面CTPN程序里的图片裁得太细了吗?换了张大点的图片可以呢

图片尺寸:(2581, 276, 3)
(2580, 283, 3)
(2545, 257, 3)
(2058, 321, 3)

win10直接运行train.py 报错。微信nlanguage ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'Font' object

win10直接运行train.py 报错。微信nlanguage

C:\Users\Ni\AppData\Local\Programs\Python\Python38\python.exe F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py
Namespace(batch_size=32, device='cuda', direction='horizontal', dist_backend='nccl', dist_url='env://', distributed=False, epochs=1, init_epoch=0, local_rank=0, lr=0.01, lr_gamma=0.1, lr_step_size=30, momentum=0.9, output_dir='./output', sync_bn=False, weight_decay=1e-05, workers=4, world_size=1)
0%| | 0/95804 [00:47<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 195, in
train(arguments)
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 138, in train
loss = train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args)
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 65, in train_one_epoch
for image, target, input_len, target_len in tqdm(data_loader):
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\tqdm\std.py", line 1165, in iter
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
for obj in iterable:
self = reduction.pickle.load(from_parent)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
EOFError: Ran out of input
return self._get_iterator()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Font' object

Process finished with exit code 1

training data generation

hi,great code. Thanks for sharing. 在训练过程中,发现了一个地方有些疑问。
在数据生成的代码中有一些疑问, 在gernerator.py 的 line 180,这里需要随机生成文字。但是看到这里的逻辑确实从所有font文件中加载所有字符,而不是使用Generaotr初始化时传入的字典(self.alpha)。这个可能会导致不能更换字符集的问题。

`

def gen_image(self):
    idx = np.random.randint(len(self.max_len_list))
    image = self.gen_background()
    image = image.astype(np.uint8)
    target_len = int(np.random.uniform(self.min_len, self.max_len_list[idx], size=1))

    # 随机选择size,font
    size_idx = np.random.randint(len(self.font_size_list))
    font_idx = np.random.randint(len(self.font_path_list))
    font = self.font_list[size_idx][font_idx]
    font_path = self.font_path_list[font_idx]
    # 在选中font字体的可见字符中随机选择target_len个字符
    text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)
    text = ''.join(text)

`

考虑使用:
text = random.choices(self.alpha[1:], k=target_len)

替换

text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)

但是不知道会不会出现有些字符在font文件中不存在的情况。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.