paddlepaddle / vimer Goto Github PK

View Code? Open in Web Editor NEW

483.0 22.0 91.0 40.54 MB

视觉预训练基础模型仓库

Python 99.30% Makefile 0.01% Cython 0.55% Shell 0.14%

vimer's Introduction

#VIMER 视觉预训练基础模型仓库

CAE

通用视觉自监督预训练模型

StrucTexT

字段级多模态特征增强的OCR结构化预训练模型

UFO

统一特征表示预训练模型

UMS

统一多源信息建模的商品图文表征预训练模型

vimer's People

Contributors

Stargazers

Watchers

Forkers

dreameronair junyu-bd zchengquan shengzhang90 jfkuang charlesjong chros425 gztangde lyimage vogaliccb slinetstan wanghuogen vanpersie32 xialei2821212670 suptt rogeryu123 wfh-china jeremi-nh hussius mlshenkai lrlinyt xiteng01 stjordanis gggzzz1212 jluhuangj etrigger louwenjie123 zuodaofengbi misspenguin lifloveyou zinzinhust96 boxfishlab yipeng-sun initxu beyondyourself japisuru lastrei south-china-university-of-technology shmning zhuxiongwei24 saradai chester-w-xie shenghuacheng smaritvision hkksimple atten4vis brucew91 lumin115 leon-cas xaoojian yangfukui bovifocr carol510 aayushshah196 bixiaopeng0 sunxingxingtf ajunlonglive fangwudi 190000018 grisya qiongqiong520 fyting shiyutang chaofeibu topxxuki wuhuachaocoding janus-zheng tianzhentz fengzx99 zhangxinyu-xyz zhwesky2010 felix4902 xcleancode tygrer lzandy qinhuaping hmbe nemonameless yanhuidua sniperrifle71 yellowlight021 vinsenttezla stonezhao28 geekwish ericzhi123 yuezh000 hbhflw2000 pinzhang cyber1026 ethananro

vimer's Issues

layout_analysis with structtextv2

请问v2版本layout analysis 该怎么运行呢？我看代码里面似乎没有支持着一个任务的task，但是readme里面似乎是写了能做这个任务的

FileNotFoundError: [Errno 2] No such file or directory: 'product1m_test_image_features.npy'

能不能提供一下必要的文件，问题好多，希望能指导一下，跑通ums，respect！

StrucTexT中1 亿张文档图像数据的类型

请问：
1.是真实场景下的图片还是类似于pdf的图片呢？
2.如果是真实场景下的图片，是由经过透视变换矫正的还是有歪歪斜斜的？

StrucText V2 end-2-end 信息抽取train代码啥时候公开呀

StrucText V2 Pre-training

Hi,

Do you have plan to release the pre-training codes? Thanks!

How to fine tune structext model for custom dataset?Can you provide the training instructions?

Why is the implementation of Relationship Extraction Module inconsistent with the description in the paper?

First of all, thank you for sharing this amazing work!
I encountered some confusion while reading the code and hope the author can answer it. I am very grateful for that.
The description of the Relationship Extraction Module in the paper is as follows:

However, the implementation in https://github.com/PaddlePaddle/VIMER/blob/main/StrucTexT/external/linking/modules/model.py#L101 is as follows and which is a linear transformation of the absolute value of the difference between the features of the two nodes.

Is there any special consideration here?
Thanks again to the author for your reading and help!

端到端 StrucTexT v2 是不是比 LayoutXLM 好？什么时候开放 StrucTexT v2 训练源码哈？

如题

Is there a plan to release the training script for the UFO models?

First of all, thank you for sharing this amazing work!
I want to ask if you plan to release the training script for multi-task models? I'm most intrigued in details of how the models are trained.

CAE 缺失encoder_weight.pd decoder_weight.pd

CAE 缺失encoder_weight.pd decoder_weight.pd，请问可以提供下下载链接吗？

感谢作者的卓越工作，不知道能不能开放一下StructText的训练代码，十分感谢！

请求是否支持语义分析，关键信息提取？

如题

Entity Linking Inference Model for StrucText V2

Do you have Entity Linking Inference Model for StrucText V2 available for download?

Can anyone share the inference code ?

关于StrucText v2 中的table recognize的问题？

请问StrucText v2 中的table recognize中的link_up、link_down、link_left和link_right代表什么？
link_up = link_probs[:, 0:1, :, :]
link_down = link_probs[:, 1:2, :, :]
link_left = link_probs[:, 2:3, :, :]
link_right = link_probs[:, 3:4, :, :]
非常期待您的解答！

StrucText V2跑不起来啊，好多bug

在v2中运行下面脚本出现好多问题
python -u ./tools/eval.py --config_file=configs/end2end_ocr/ocr_funsd_base.json --task_type=end2end_ocr --label_path=./data/funsd/dataset/testing_data/otations --image_path=./data/funsd/dataset/testing_data/images --weights_path=StrucTexT_v2_end2end_ie_base.pdparams

模块缺失
from tasks.text_spotting_db.recg_head import RecgHead
from tasks.text_spotting_db.dataset import LabelConverter
text_spotting_db 找遍了整个仓库都没有名字不对，应该是
from tasks.end2end_ocr.recg_head import RecgHead
from tasks.end2end_ocr.dataset import LabelConverter
代码缺失
Traceback (most recent call last):
File "./tools/eval.py", line 103, in
eval(config)
File "./tools/eval.py", line 87, in eval
model = Model(model_config, eval_config['feed_names'])
File "/home/xiaxy/VIMER/StrucTexT/v2/src/tasks/end2end_ocr/model.py", line 150, in init
self.db_loss = DBLoss()
NameError: name 'DBLoss' is not defined
同样没有找到DBLoss()的定义，不知道代码是否完整啊，怎么跑这个推理？？？？？

Entity linking scores significantly better than the number

Hi, I noticed that your FUNSD entity linking scores reported in this repo. are higher than the number in the paper. For example, "StrucTexT-chn&eng base" is 0.7045 and "StrucTexT-eng base (paper)" is 0.4410. Could you let me know what contributes to the improvement here? Or is anything wrong with the original paper's approach? Thanks!

Why the linking label be {"head": (row, row+1), "tail": (col, col+1)}

i have read the code and confused about the code below:
utils/metrics/rescore_metric.py. line 38~45:

 for row in range(rows):
                for col in range(cols):
                    if label_b[row, col] == 1:
                        rel = {"head": (row, row+1),
                               "tail": (col, col+1),

why should we set "head": (row, row+1) rather than "head": (row)

好多依赖实现缺失

在读代码中发现好多类或者函数实现缺失，比如 NameAdapter等好多处，希望作者能够补全，谢谢

structext什么时候公开训练代码？

finetune Structext for information extraction on funsd dataset

Hi,

Are u planning to provide finetuning example for information extraction?

StrucTexT Base for FUNSD labeling model loading problem

Hi I am trying run the inference code on FUSD labeling task but I am getting loading problem. Can anyone put a comment on this.

in load load_result = pickle.load(f, encoding='latin1') MemoryError

单张图片的预测代码有开源吗？

@linan142857 请问单张图片的预测代码有开源吗？谢谢

请问有StrucText V2 pre-trained model，不包含下游任务层的

如题

请问下，预训练的时候中文和英文数据一共需要多少？包含的数据是什么样的呢？

License?

Hi! Nice work on https://github.com/PaddlePaddle/VIMER/tree/main/StrucTexT/v2.
What is the license of the pretrained model and the code?
Is it Apache License (same as the Paddle and the PaddleOCR repo?)

CC: @zhouwei25

How to visulize the inference after running the script in "Infer fine-tuned models"

Hi VIMER Team, I am trying to run a visualization of the prediction using the script from "Infer fine-tuned models" but it only returns me the metrics of model itself. Could you please guide me how to do the visualization like the demo picture at the bottom of the Read.me? Thanks!