yizt / crnn.pytorch Goto Github PK
View Code? Open in Web Editor NEWcrnn实现水平和垂直方向中文文字识别, 提供在3w多个中文字符训练的水平识别和垂直识别的预训练模型; 欢迎关注,试用和反馈问题... ...
License: Apache License 2.0
crnn实现水平和垂直方向中文文字识别, 提供在3w多个中文字符训练的水平识别和垂直识别的预训练模型; 欢迎关注,试用和反馈问题... ...
License: Apache License 2.0
第二个125的公式 是经过计算的吗?就是512宽度经过卷积缩小后的 长度?还是凑巧 都是125?
另一个问题是 这个125 可以是别的数字吗?如果要改成别的数字的话 是两个地方还是要保持一致吗?(比如减少最大长度512,然后再经过计算得到一个新的值)
有这些疑问是因为 我想要把 您的pytorch实现自己通过mxnet重新实现一下,但是发现mxnet的CTCLoss的入参要求不太一样,
调了一会 不知道怎么修改,得到的loss非常奇怪,要么非常小 要么非常大 其他地方感觉差的不多,就是这个loss的应用
所以就想把您的实现搞清楚点,就碰到了上面的疑问,希望能得到您的回复 谢谢
然后得到一个类似于你的word.txt,但是在做 idx = [chars[c] for c in text]取类别的时候发现,对于数字出现Key error,后来我查了下,我保存下来的word.txt中的数字都是windows-1252编码,而我的系统都是UTF-8编码,所以会出现这种情况,请问你遇到过这种情况么
另开一个issue,请教下对于prob的输出有什么建议吗。
请问模型下载地址打不开
有提供别的链接吗
我尝试用cpu训练。报错了,怎么解决?微信nlanguage 。 py -3 train.py --direction horizontal
L:\trocr\crnn.pytorch-master>py -3 train.py --direction horizontal
Namespace(batch_size=64, device='cpu', direction='horizontal', dist_backend='nccl', dist_url='env://', distributed=False, epochs=90, init_epoch=0, local_rank=0, lr=0.01, lr_gamma=0.1, lr_step_size=30, momentum=0.9, output_dir='./output', sync_bn=False, weight_decay=1e-05, workers=4, world_size=1)
0%| | 0/47902 [00:02<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "train.py", line 192, in
train(arguments)
File "train.py", line 138, in train
loss = train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args)
File "train.py", line 65, in train_one_epoch
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
for image, target, input_len, target_len in tqdm(data_loader):
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\tqdm\std.py", line 1165, in iter
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
for obj in iterable:
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Font' object
is there any way to do that the model can predict for latin alphabets?
印刷体图片二值化后,数字识别效果不是很好,请问有解决办法吗?
字符集中有的字符例如\u3000在python下没法显示报错,怎么去掉?微信nlanguage
File "F:\pycharm2020.2\crnn\utils\aftertreatment.py", line 26, in
text = [self.dict[char] for [char] in text]
KeyError: ' '
你好,我有一个疑惑。这里使用的是随机生成的数据集来训练模型,每个句子都是随机生成,句子中的文字之间毫无联系,请问用到的网络是否利用了文字之间的语义信息?如果没有利用文字之间的语义信息,那这单纯是个分类问题吗?
我在all words.txt里放了10个汉字,尝试运行generator.py,报了以下错误,怎么解决?微信nlanguage
Traceback (most recent call last):
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 227, in
test_image_gen('horizontal')
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 207, in test_image_gen
im, indices, target_len = gen.gen_image()
File "F:/pycharm2020.2/crnn.pytorch_generator/generator.py", line 158, in gen_image
text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)
File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
楼主,30000+的输出结果有点多,我想着用常用字就行了。是额外加一层全连接层来训练好,还是把预训练模型的全连接层修改了训练好呀?
Hello, I did not find your training data, can you share the trained model?
我修改了Word对象的all_word的返回,我的目标字符只有英文和数字 加上 逗号 空格 减号一些简单的符号,训练完200epoch
之后使用这个模型预测的时候发现 无论识别什么图片(先不管最终结果的正确),在结果的label中的各个字符前后都有很多0
比如:
我的图片中是 hello world
识别出来的是 h0e0l0l0o0 0w0o0r0l0d0 类似于这样的。
然后我尝试修改all_word的返回 把空格放在第一个字符位 -, 123456... (原先是0123456...abc...)
之后训练出来模型的结果也是类似的情况,只是0变成了空格
hello world就会变成
h e l l o w o r l d
请问为什么会这样,为什么会和all_word的第一个字符有关系呢
现在代码截图
还有一个问题是怎么在训练中加入验证集的验证,以及metric的指标还有准确率acc(现在只有一个loss,也不区分训练loss还是验证的loss)
希望得到您的回复,万分感激!
你好,我有一批自己的训练集
image格式:
demo_0.jpg
样本_1.jpg
...
谢谢提供思路
你好作者,请问一下你的训练数据是怎么生成的?能否提供一下数据生成的代码呢?
如题
这个不能转ONNX吧
indices = np.array([self.alpha.index(c) for c in text])
ValueError: substring not found
请教个问题:eth0这里指的是第一块网卡的IP吗?
for example, the pytorch version?
you can use conda to export your python env
楼主楼主
报错如下:
90 x = self.cnn(x) # [B,512,W/16,1]
91 x = torch.squeeze(x, 3) # [B,512,W]
---> 92 x = x.permute([0, 2, 1]) # [B,W,512]
93 x, h1 = self.rnn1(x)
94 x, h2 = self.rnn2(x, h1)
RuntimeError: number of dims don't match in permute
是因为我前面CTPN程序里的图片裁得太细了吗?换了张大点的图片可以呢
图片尺寸:(2581, 276, 3)
(2580, 283, 3)
(2545, 257, 3)
(2058, 321, 3)
win10直接运行train.py 报错。微信nlanguage
C:\Users\Ni\AppData\Local\Programs\Python\Python38\python.exe F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py
Namespace(batch_size=32, device='cuda', direction='horizontal', dist_backend='nccl', dist_url='env://', distributed=False, epochs=1, init_epoch=0, local_rank=0, lr=0.01, lr_gamma=0.1, lr_step_size=30, momentum=0.9, output_dir='./output', sync_bn=False, weight_decay=1e-05, workers=4, world_size=1)
0%| | 0/95804 [00:47<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 195, in
train(arguments)
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 138, in train
loss = train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args)
File "F:/pycharm2020.2/crnn.pytorch_generator/train_Sentence.py", line 65, in train_one_epoch
for image, target, input_len, target_len in tqdm(data_loader):
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\tqdm\std.py", line 1165, in iter
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
for obj in iterable:
self = reduction.pickle.load(from_parent)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
EOFError: Ran out of input
return self._get_iterator()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Ni\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Font' object
Process finished with exit code 1
有同学接着楼主的模型训练吗?为啥我训练后一直是INF呀。。。调小了学习率也没用。。。
hi,great code. Thanks for sharing. 在训练过程中,发现了一个地方有些疑问。
在数据生成的代码中有一些疑问, 在gernerator.py 的 line 180,这里需要随机生成文字。但是看到这里的逻辑确实从所有font文件中加载所有字符,而不是使用Generaotr初始化时传入的字典(self.alpha)。这个可能会导致不能更换字符集的问题。
`
def gen_image(self):
idx = np.random.randint(len(self.max_len_list))
image = self.gen_background()
image = image.astype(np.uint8)
target_len = int(np.random.uniform(self.min_len, self.max_len_list[idx], size=1))
# 随机选择size,font
size_idx = np.random.randint(len(self.font_size_list))
font_idx = np.random.randint(len(self.font_path_list))
font = self.font_list[size_idx][font_idx]
font_path = self.font_path_list[font_idx]
# 在选中font字体的可见字符中随机选择target_len个字符
text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)
text = ''.join(text)
`
考虑使用:
text = random.choices(self.alpha[1:], k=target_len)
替换
text = np.random.choice(FONT_CHARS_DICT[font_path], target_len)
但是不知道会不会出现有些字符在font文件中不存在的情况。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.