mingtzge / 2019-ccf-bdci-ocr-mczj-ocr-identificationidelement Goto Github PK
View Code? Open in Web Editor NEW2019CCF-BDCI大赛 最佳创新探索奖获得者 基于OCR身份证要素提取赛题冠军 天晨破晓团队 赛题源码
License: MIT License
2019CCF-BDCI大赛 最佳创新探索奖获得者 基于OCR身份证要素提取赛题冠军 天晨破晓团队 赛题源码
License: MIT License
hi 在使用测试代码运行时,发现如下报错,json文件是从百度网盘下载的,预览看的时候也显示有问题,其他两个json读取没有问题,请问下是什么可能的原因呢?谢谢!
`UnicodeDecodeError Traceback (most recent call last)
in ()
2 unit_json = "./data_correction_and_generate_csv_file/data/unit.json" # 签发机关数据库
3 id_json = "./data_correction_and_generate_csv_file/data/repitle_address_extract.json" # 地址数据库
----> 4 unit_id_json = json.load(open(id_json, "r", encoding="utf-8"))
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/json/init.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
294
295 """
--> 296 return loads(fp.read(),
297 cls=cls, object_hook=object_hook,
298 parse_float=parse_float, parse_int=parse_int,
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 21908442: invalid start byte`
如果适用的话,方便指导一下如何修改代码吗,万分感谢
由于百度云盘链接很容易失效。所以需要数据的朋友请添加我为百度云盘好友,我的账号1713983016@@qq.com,我将通过好友分享的方式分享给大家,备注信息:”2019 CCF BDCI OCR 赛题数据“
我的cuda是10版的不支持tensorflow-gpu1.12.0所以下的1.13.1版本的tensorflow-gpu,运行mainprocess.py时,在recognize_process下的test_crnn_jmz.py的一行代码‘saver = tf.train.Saver()'报错了,错误信息是“OutOfRangeError:Read less bytes than requested”,该怎么处理呢?
您好,
我想用自己的数据集finetune一下这个crnn的模型,想请教一下,训练用到的char_map.json是否为一个自己生成的字典文件?根据自己的训练数据里面出现的字符列出即可?
另外,在annotation file里面是不是就是 path/xxx.jpg 标注文字这样的格式的一个标注文件?标注文字是没有根据字典文件转换index的?
想确认一下数据格式是否正确!感谢!
大赛给的样本都是打了水印的。想用GAN去生成没有水印的图像,首先就要有没有水印的标签,而且还要满足仅仅没有水印,其他位置和带有水印一致。 你们是怎么获得这样的数据的?
RT。
model_save下的识别模型是不是有问题,运行mytest_crnn.py加载模型总是不对,
021-02-01 09:46:44.007098: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_tensor.cc:175 : Data loss: Unable to open table file /recognize_process/model_save: Failed precondition: ./recognize_process/model_save; Is a directory: perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
File ".local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File ".local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file r/recognize_process/model_save: Failed precondition: /recognize_process/model_save; Is a directory: perhaps your file is in a different file format and you need to use a different restore operator?
[[{{node save/RestoreV2}}]]
请问文字定位模型使用的哪一个呢?可否提供呢?
后面考虑移植到移动端吗?
NameError Traceback (most recent call last)
~/code/OCR/main_process.py in ()
77 # 去水印和对图片进行切割和处理
78 watermask_handler = WatermarkRemover(args, header_dir, cut_twisted_save_path)
---> 79 watermask_handler.watermask_remover_run()
80 recognize_image_path = os.path.join(header_dir, "test_data_preprocessed")
81 recognize_txt_path = os.path.join(header_dir, "test_data_txts")
~/code/OCR/watermask_remover_and_split_data/watermask_process.py in watermask_remover_run(self)
222 self.gan_gen_result()
223 if not args.no_rec_img: # 将去好水印的图片恢复到原图
--> 224 print("running rec_img ....")
225 self.recover_origin_img()
226 self.ori_img_path = self.recover_image_dir # 将原始图片路径改为去过水印之后的图片路径
~/code/OCR/watermask_remover_and_split_data/watermask_process.py in recover_origin_img(self)
168 result_dir = os.path.join(self.gan_result_dir, self.pixel_mode, "test_latest", "images")
169 if not os.path.exists(result_dir):
--> 170 print("not exists gan result dir")
171 exit(0)
172 result_img_names = os.listdir(result_dir)
NameError: name 'exit' is not defined
Hi, same to the title of the title link. the data of the model has been block by baidu. Could you reload the data again?
文本检测咱们团队用的什么方法。。。。。
大佬,项目中unit.json中签发机关的内容不太对,以’南京市栖霞区公安局‘为例,身份证背面的签发机关真正的内容为’南京市公安局栖霞分局‘,是还有其他转换逻辑吗?
还有其他的下载渠道吗?谢谢了~
您好,当我使用自定义身份证图片进行测试时,发现twist模块将原本正着的图像进行了反转,我怀疑时用于进行比较的template中的模版图片的通用性不够,故而想自己生成。
想请教下,这里template文件的生成方式?是否是twist_part中的gaussian_blur函数即可?还是否需要进行灰度锐化等操作?
感谢!
watermask_remover_and_split_data部分好像没有main主函数,不能执行
如题
unable to access 'https://github.com/Mingtzge/2019-CCF-BDCI-OCR-MCZJ-OCR-IdentificationIDElement/': Failed to connect to github.com port 443: Timed out提示这样的错误,但查找一些资料也没有完全解决,直接下载zip文件夹会使项目文件不全。
Hey, You mentioned you make some modifications to the Pix2pix model in this repo.
Roughly checking your repo and original Pix2Pix model, I got these modifications below:
--add_contrast
in base_options.py
networks.py
I wonder is there more modification on the Pix2Pix part of your repo? You also claimed you make data augmentation and could you please add more detail of that?
git lfs无法下载识别模型,能否提供baidu云盘方便下载?
作者能不能分享一下repitle_address_extract.json文件,这个好像也下载不了,谢谢!
I got a similar task recently, but there are 20838 characters in my dic. I have trained recognition model for almost 2 weeks,and it will last for another 2 weeks at least i guess. It's my first time to use CRNN ,so i got cofussed is that normal to train a model for these long time?(BTW: i used simgle gpu)
HI:
what datasets you train CGAN to erase watermark like?,i test an full image but the data you generate is not an full image,it's just a small picture which include watermark only
您好百度盘的链接失效了,可否提供一下新的下载地址,我这边已经在百度盘上加您了
请问识别模型为什么没有ckpt文件
Hi,
It shows 25.6 MB but what I can see are 3 lines, could you please add a download link for this file?
version https://git-lfs.github.com/spec/v1
oid sha256:f6021cc1b6e451a4572698eedf9a48cfad2c14fdd392e6d9fcc8644541816f00
size 26857155
你好~
身份证元素提取模板图片在template_imgs里面没有,能够提供一下吗,想研究一下模板匹配这一块的代码?
when i run test.py in pytorch-CycleGAN-and-pix=pix2pix to predict and i loadmodel with latest_net_G.pth and i raise AttributeError: 'Sequential' object has no attribute 'model'
def __patch_instance_norm_state_dict(self, state_dict, module, keys, i=0):
"""Fix InstanceNorm checkpoints incompatibility (prior to 0.4)"""
key = keys[i]
if i + 1 == len(keys): # at the end, pointing to a parameter/buffer
if module.__class__.__name__.startswith('InstanceNorm') and \
(key == 'running_mean' or key == 'running_var'):
if getattr(module, key) is None:
state_dict.pop('.'.join(keys))
if module.__class__.__name__.startswith('InstanceNorm') and \
(key == 'num_batches_tracked'):
state_dict.pop('.'.join(keys))
else:
self.__patch_instance_norm_state_dict(state_dict, getattr(module, key), keys, i + 1)
hi-我试图运行代码,但这个lfs文件无法下载,请问下有没有什么解决方案?感谢!~
git clone https://github.com/Mingtzge/2019-CCF-BDCI-OCR-MCZJ-OCR-IdentificationIDElement.git
Cloning into '2019-CCF-BDCI-OCR-MCZJ-OCR-IdentificationIDElement'...
remote: Enumerating objects: 150, done.
remote: Counting objects: 100% (150/150), done.
remote: Compressing objects: 100% (135/135), done.
remote: Total 305 (delta 38), reused 97 (delta 11), pack-reused 155
Receiving objects: 100% (305/305), 1.82 MiB | 32.72 MiB/s, done.
Resolving deltas: 100% (56/56), done.
Downloading data_correction_and_generate_csv_file/data/repitle_address_extract.json (27 MB)
Error downloading object: data_correction_and_generate_csv_file/data/repitle_address_extract.json (f6021cc): Smudge error: Error downloading data_correction_and_generate_csv_file/data/repitle_address_extract.json (f6021cc1b6e451a4572698eedf9a48cfad2c14fdd392e6d9fcc8644541816f00): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Errors logged to /home/ec2-user/SageMaker/2019-CCF-BDCI-OCR-MCZJ-OCR-IdentificationIDElement/.git/lfs/logs/20200218T054034.97028677.log
Use git lfs logs last
to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: data_correction_and_generate_csv_file/data/repitle_address_extract.json: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.