Giter Site home page Giter Site logo

vimer's Introduction

#VIMER 视觉预训练基础模型仓库

CAE

通用视觉自监督预训练模型

StrucTexT

字段级多模态特征增强的OCR结构化预训练模型

UFO

统一特征表示预训练模型

UMS

统一多源信息建模的商品图文表征预训练模型

vimer's People

Contributors

hanshumin001 avatar linan142857 avatar qsjack avatar raindrops2sea avatar rogeryu123 avatar weiquanwa avatar welleast avatar xiteng01 avatar yipeng-sun avatar zhwesky2010 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vimer's Issues

数据集

image
想问下各位,这种数据集是如何标注的,一对多吗

layout_analysis with structtextv2

请问v2版本layout analysis 该怎么运行呢?我看代码里面似乎没有支持着一个任务的task,但是readme里面似乎是写了能做这个任务的

标注工具

请问下,xfund和funsd数据集是用什么工具标注的

Why is the implementation of Relationship Extraction Module inconsistent with the description in the paper?

First of all, thank you for sharing this amazing work!
I encountered some confusion while reading the code and hope the author can answer it. I am very grateful for that.
The description of the Relationship Extraction Module in the paper is as follows:
image

However, the implementation in https://github.com/PaddlePaddle/VIMER/blob/main/StrucTexT/external/linking/modules/model.py#L101 is as follows and which is a linear transformation of the absolute value of the difference between the features of the two nodes.
image
Is there any special consideration here?
Thanks again to the author for your reading and help!

关于StrucText v2 中的table recognize的问题?

请问StrucText v2 中的table recognize中的link_up、link_down、link_left和link_right代表什么?
link_up = link_probs[:, 0:1, :, :]
link_down = link_probs[:, 1:2, :, :]
link_left = link_probs[:, 2:3, :, :]
link_right = link_probs[:, 3:4, :, :]
非常期待您的解答!

StrucText V2跑不起来啊,好多bug

在v2中运行下面脚本出现好多问题
python -u ./tools/eval.py --config_file=configs/end2end_ocr/ocr_funsd_base.json --task_type=end2end_ocr --label_path=./data/funsd/dataset/testing_data/otations --image_path=./data/funsd/dataset/testing_data/images --weights_path=StrucTexT_v2_end2end_ie_base.pdparams

  1. 模块缺失
    from tasks.text_spotting_db.recg_head import RecgHead
    from tasks.text_spotting_db.dataset import LabelConverter
    text_spotting_db 找遍了整个仓库都没有名字不对,应该是
    from tasks.end2end_ocr.recg_head import RecgHead
    from tasks.end2end_ocr.dataset import LabelConverter
  2. 代码缺失
    Traceback (most recent call last):
    File "./tools/eval.py", line 103, in
    eval(config)
    File "./tools/eval.py", line 87, in eval
    model = Model(model_config, eval_config['feed_names'])
    File "/home/xiaxy/VIMER/StrucTexT/v2/src/tasks/end2end_ocr/model.py", line 150, in init
    self.db_loss = DBLoss()
    NameError: name 'DBLoss' is not defined
    同样没有找到DBLoss()的定义,不知道代码是否完整啊,怎么跑这个推理?????

Entity linking scores significantly better than the number

Hi, I noticed that your FUNSD entity linking scores reported in this repo. are higher than the number in the paper. For example, "StrucTexT-chn&eng base" is 0.7045 and "StrucTexT-eng base (paper)" is 0.4410. Could you let me know what contributes to the improvement here? Or is anything wrong with the original paper's approach? Thanks!

Why the linking label be {"head": (row, row+1), "tail": (col, col+1)}

i have read the code and confused about the code below:
utils/metrics/rescore_metric.py. line 38~45:

 for row in range(rows):
                for col in range(cols):
                    if label_b[row, col] == 1:
                        rel = {"head": (row, row+1),
                               "tail": (col, col+1),

why should we set "head": (row, row+1) rather than "head": (row)

在 aistudio.baidu.com 中测试 StrucTexT v2.0,无法下载 RVL-CDIP文档图像分类 数据文件

按照 https://aistudio.baidu.com/aistudio/datasetdetail/147611 创建项目,看到 data 已经预先存放 4 个模型文件,然后按照README 安装好 requirements, 当要下载 RVL-CDIP文档图像分类 文件时,不成功,看样子是由于文件存放在google docs 的原因,这类存放在无法下载网址的文件,官方可否事先存放在 aistudio.baidu.com 可下载的位置?

这个项目很贴近真实应用场景,非常有前景,但由于文档和示例缺失,感觉很难参与进来,希望官方重视这点。如果像PaddleOCR之类有完善的测试环境和详尽的文档的项目,参与者热情肯定高涨。迫切希望官方先在 aistudio.baidu.com 提供一个测试示例,感激不尽!

UFO的模型结构问题The Model Structure Problem of UFO

请问有人又关于UFO详细的网络模型结构吗?能不能发出来参考一下,感谢!
May I ask if anyone has any further information about the detailed network model structure of UFO? Can you send it for reference? Thank you!

目标检测

请问有完整的目标检测微调和推理代码嘛

关于UFO超网训练的问题

Hi,完成超网训练后,发现保存的模型是task粒度的,即每个任务的模型目录下保存了一个模型。与extract_task_specific_model.py中展示的超网模型的参数结构有所不同,请问下,是否有一个额外的整合所有task模型的过程?如有,可否提供下整合的流程或者代码?
非常期待您的解答~

How to pre-train the detection task?

How to conduct the pre-training of detection tasks, and whether to support downstream detection tasks, and whether there are relevant documents for reference? thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.