yizt / crnn.pytorch Goto Github PK

View Code? Open in Web Editor NEW

231.0 8.0 51.0 97.83 MB

crnn实现水平和垂直方向中文文字识别, 提供在3w多个中文字符训练的水平识别和垂直识别的预训练模型; 欢迎关注,试用和反馈问题... ...

License: Apache License 2.0

Python 100.00%

ocr crnn vertical-text-recognition text-recognition

crnn.pytorch's People

Contributors

Stargazers

Watchers

Forkers

dun933 zengchan jjz-learning linhong00316 zhuhongwei123 vitansoz yanggui19891007 dlml alwc my-stone xyouyou ysnower fireae tdlist wuxiaolianggit light201212 yunwjr yangyin2016 jadentan ruanmk xrosliang gyx1qaz vincezengqiang dongzhongxian jingmouren pigpub wayneszq shmtu-herven wwwanghao wengbenjue hommmm crazyfox07 getway zzray yunfsbc yale1417 cqray1990 adamjupiter teacrown dengxuankun tobeeeelite twilightzcx cegfdb wailovet winterxx zylove006 zhuofalin jzjsunshine xiaotf229375159 201831771214 caoy123456

crnn.pytorch's Issues

请教*2

对不起我又来提问了。

第二个125的公式是经过计算的吗？就是512宽度经过卷积缩小后的长度？还是凑巧都是125？

另一个问题是这个125 可以是别的数字吗？如果要改成别的数字的话是两个地方还是要保持一致吗？（比如减少最大长度512，然后再经过计算得到一个新的值）

有这些疑问是因为我想要把您的pytorch实现自己通过mxnet重新实现一下，但是发现mxnet的CTCLoss的入参要求不太一样，
调了一会不知道怎么修改，得到的loss非常奇怪，要么非常小要么非常大其他地方感觉差的不多，就是这个loss的应用
所以就想把您的实现搞清楚点，就碰到了上面的疑问，希望能得到您的回复谢谢

遇到一个问题，就是用 fontutils.py中对我的字体做并集

然后得到一个类似于你的word.txt,但是在做 idx = [chars[c] for c in text]取类别的时候发现，对于数字出现Key error，后来我查了下，我保存下来的word.txt中的数字都是windows-1252编码，而我的系统都是UTF-8编码，所以会出现这种情况，请问你遇到过这种情况么

我尝试用cpu训练。报错了，怎么解决？微信nlanguage 。 py -3 train.py --direction horizontal

L:\trocr\crnn.pytorch-master>py -3 train.py --direction horizontal
Namespace(batch_size=64, device='cpu', direction='horizontal', dist_backend='nccl', dist_url='env://', distributed=False, epochs=90, init_epoch=0, local_rank=0, lr=0.01, lr_gamma=0.1, lr_step_size=30, momentum=0.9, output_dir='./output', sync_bn=False, weight_decay=1e-05, workers=4, world_size=1)
0%| | 0/47902 [00:02<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "train.py", line 192, in
train(arguments)
File "train.py", line 138, in train
loss = train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args)
File "train.py", line 65, in train_one_epoch
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
for image, target, input_len, target_len in tqdm(data_loader):
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\tqdm\std.py", line 1165, in iter
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
for obj in iterable:
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Font' object

请问下，你data下面的三个txt字典文件有啥区别

How could I use it for Latin characters?

is there any way to do that the model can predict for latin alphabets?

印刷体纯数字识别效果不是太好，有解决办法吗？

印刷体图片二值化后，数字识别效果不是很好，请问有解决办法吗？

Can the horizontal and vertical text in a picture be detected at the same time？

字符集中有的字符例如\u3000在python下没法显示报错，怎么去掉？微信nlanguage

File "F:\pycharm2020.2\crnn\utils\aftertreatment.py", line 26, in
text = [self.dict[char] for [char] in text]
KeyError: ' '

请问有的字体绘制不出来是怎么解决的

How to obtain semantic information between texts by using the randomly generated data set for training？

你好，我有一个疑惑。这里使用的是随机生成的数据集来训练模型，每个句子都是随机生成，句子中的文字之间毫无联系，请问用到的网络是否利用了文字之间的语义信息？如果没有利用文字之间的语义信息，那这单纯是个分类问题吗？

我在all words.txt里放了10个汉字，尝试运行generator.py，报了以下错误，怎么解决？微信nlanguage Traceback (most recent call last): File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 227, in <module> test_image_gen('horizontal') File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 207, in test_image_gen im, indices, target_len = gen.gen_image() File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 158, in gen_image text = np.random.choice(FONT_CHARS_DICT[font_path], target_len) File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice ValueError: 'a' cannot be empty unless no samples are taken

我在all words.txt里放了10个汉字，尝试运行generator.py，报了以下错误，怎么解决？微信nlanguage
Traceback (most recent call last):
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 227, in
test_image_gen('horizontal')
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 207, in test_image_gen
im, indices, target_len = gen.gen_image()
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 158, in gen_image
text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)
File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken

是否可以修改预训练模型的输出层后进行迁移学习呢？

楼主，30000+的输出结果有点多，我想着用常用字就行了。是额外加一层全连接层来训练好，还是把预训练模型的全连接层修改了训练好呀？

about trained model

Hello, I did not find your training data, can you share the trained model?

请教

我修改了Word对象的all_word的返回，我的目标字符只有英文和数字加上逗号空格减号一些简单的符号，训练完200epoch
之后使用这个模型预测的时候发现无论识别什么图片（先不管最终结果的正确），在结果的label中的各个字符前后都有很多0
比如：
我的图片中是 hello world
识别出来的是 h0e0l0l0o0 0w0o0r0l0d0 类似于这样的。
然后我尝试修改all_word的返回把空格放在第一个字符位 -, 123456... （原先是0123456...abc...）
之后训练出来模型的结果也是类似的情况，只是0变成了空格
hello world就会变成
h e l l o w o r l d
请问为什么会这样，为什么会和all_word的第一个字符有关系呢
现在代码截图

还有一个问题是怎么在训练中加入验证集的验证，以及metric的指标还有准确率acc（现在只有一个loss，也不区分训练loss还是验证的loss）

希望得到您的回复，万分感激！

如何使用自己的训练样本

你好，我有一批自己的训练集
image格式:
demo_0.jpg
样本_1.jpg
...

谢谢提供思路

训练数据怎么生成的？

你好作者，请问一下你的训练数据是怎么生成的？能否提供一下数据生成的代码呢？

问下，你验证集上，acc可以大概达到多少

hi，thank you for the great codes. But I have met some problems. I just download the code, and run using distribute training mode,

请教一下如何训练自己的数据集，数据格式以及存放目录怎么配置

如题

请问可以用什么函数判断繁体啊

这个不可以转ONNX测试吧

这个不能转ONNX吧

用自己的字典替换你的all_word.txt文件，训练报错

indices = np.array([self.alpha.index(c) for c in text])
ValueError: substring not found

下载目录无权限

您好，这个是百度云下载时的报错，是连接失效了么？

单卡和多卡设置问题

请教个问题：eth0这里指的是第一块网卡的IP吗？

what's your train envriment?

for example, the pytorch version?
you can use conda to export your python env

number of dims don't match in permute

楼主楼主
报错如下：
90 x = self.cnn(x) # [B,512,W/16,1]
91 x = torch.squeeze(x, 3) # [B,512,W]
---> 92 x = x.permute([0, 2, 1]) # [B,W,512]
93 x, h1 = self.rnn1(x)
94 x, h2 = self.rnn2(x, h1)

RuntimeError: number of dims don't match in permute

是因为我前面CTPN程序里的图片裁得太细了吗？换了张大点的图片可以呢

图片尺寸：(2581, 276, 3)
(2580, 283, 3)
(2545, 257, 3)
(2058, 321, 3)

win10直接运行train.py 报错。微信nlanguage ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'Font' object

win10直接运行train.py 报错。微信nlanguage

C:\Users\Ni\AppData\Local\Programs\Python\Python38\python.exe F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py
Namespace(batch_size=32, device='cuda', direction='horizontal', dist_backend='nccl', dist_url='env://', distributed=False, epochs=1, init_epoch=0, local_rank=0, lr=0.01, lr_gamma=0.1, lr_step_size=30, momentum=0.9, output_dir='./output', sync_bn=False, weight_decay=1e-05, workers=4, world_size=1)
0%| | 0/95804 [00:47<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 195, in
train(arguments)
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 138, in train
loss = train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args)
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 65, in train_one_epoch
for image, target, input_len, target_len in tqdm(data_loader):
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\tqdm\std.py", line 1165, in iter
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
for obj in iterable:
self = reduction.pickle.load(from_parent)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
EOFError: Ran out of input
return self._get_iterator()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Font' object

Process finished with exit code 1

接着预训练模型接着训练一直是inf?

有同学接着楼主的模型训练吗？为啥我训练后一直是INF呀。。。调小了学习率也没用。。。

training data generation

hi，great code. Thanks for sharing. 在训练过程中，发现了一个地方有些疑问。
在数据生成的代码中有一些疑问，在gernerator.py 的 line 180，这里需要随机生成文字。但是看到这里的逻辑确实从所有font文件中加载所有字符，而不是使用Generaotr初始化时传入的字典（self.alpha)。这个可能会导致不能更换字符集的问题。

def gen_image(self):
    idx = np.random.randint(len(self.max_len_list))
    image = self.gen_background()
    image = image.astype(np.uint8)
    target_len = int(np.random.uniform(self.min_len, self.max_len_list[idx], size=1))

    # 随机选择size,font
    size_idx = np.random.randint(len(self.font_size_list))
    font_idx = np.random.randint(len(self.font_path_list))
    font = self.font_list[size_idx][font_idx]
    font_path = self.font_path_list[font_idx]
    # 在选中font字体的可见字符中随机选择target_len个字符
    text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)
    text = ''.join(text)

考虑使用：
text = random.choices(self.alpha[1:], k=target_len)

替换

text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)

但是不知道会不会出现有些字符在font文件中不存在的情况。