Giter Site home page Giter Site logo

训练数据集 about chinese-clip HOT 6 CLOSED

ofa-sys avatar ofa-sys commented on May 21, 2024
训练数据集

from chinese-clip.

Comments (6)

diyiiyiii avatar diyiiyiii commented on May 21, 2024 1

谢谢啦!

from chinese-clip.

yangapku avatar yangapku commented on May 21, 2024

您好,关于您关心的数据集相关问题,我们在技术报告论文中已经给出了比较详尽的说明哈。论文链接:https://arxiv.org/pdf/2211.01335.pdf

from chinese-clip.

ywdong avatar ywdong commented on May 21, 2024

想问一下这2亿对的图片pair对可以提供下载链接吗?

from chinese-clip.

yangapku avatar yangapku commented on May 21, 2024

@ywdong 您好,感谢您对我们工作的关注!目前我们暂时还不提供2亿图片直接打包下载的方式,这部分数据主体(~1.8亿)是WukongLAION-5B中文部分两个公开数据集组成,都已经是可根据图片url公开下载的,推荐使用LAION官方人员提供的img2dataset开源工具下载即可,我们也是使用了这个工具,效率很高。直接将这些图片打包提供,可能也会涉及原数据集和原始图片的版权问题。希望理解~

from chinese-clip.

liaoxijuneu avatar liaoxijuneu commented on May 21, 2024

你好,感谢分享开源中文模型。想问下wukong和VG数据应用细节:1、wukong数据把训练集和测试集数据都加入训练了吗? 2、VG数据的应用方式是把box对应的区域&文本作为输入,还是整图&文本作为数据?

from chinese-clip.

yangapku avatar yangapku commented on May 21, 2024

@liaoxijuneu 您好,感谢对于我们工作的认可!关于Wukong数据,我们只使用了训练集,没有加入其测试集。关于VG机翻数据,我们是将一张图的区域描述文本(机翻版本)拼接在一起,与整图作为一个图文对。VG这部分数据量级,相比于Wukong和LAION两个来源比例是非常小的。

from chinese-clip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.