Giter Site home page Giter Site logo

Comments (6)

Yang-QW avatar Yang-QW commented on June 9, 2024

image
我好像找到了设置batch_size的地方,希望以后可以加到配置文件中

from data-juicer.

BeachWang avatar BeachWang commented on June 9, 2024

Hi, 我们这里的batch主要考虑mapper中一个样本生成多个样本的情况,返回时需要包装成batch,目前只有mapper支持batch功能,且输入batch大小固定为1。确实每个类型的op都应支持batch比较合理,且batch大小的设置应该开放给用户。但是这边用户可能需要考虑一下打batch的开销,如果batch_op的加速不足以cover住这部分开销可能速度会更慢。

from data-juicer.

sherrytonger avatar sherrytonger commented on June 9, 2024

Hi, 我们这里的batch主要考虑mapper中一个样本生成多个样本的情况,返回时需要包装成batch,目前只有mapper支持batch功能,且输入batch大小固定为1。确实每个类型的op都应支持batch比较合理,且batch大小的设置应该开放给用户。但是这边用户可能需要考虑一下打batch的开销,如果batch_op的加速不足以cover住这部分开销可能速度会更慢。

batch的开销有什么呢?内存占用?

from data-juicer.

HYLcool avatar HYLcool commented on June 9, 2024

batch的开销有什么呢?内存占用?

是的,内存是一个点,并行度相同的情况下,batch size越大,同时在处理的数据越多,内存占用可能越大。

目前大部分Filter算子能力暂时都只支持单样本依次处理,增加batch size带来的加速空间相对来说没有那么大,在内存等资源允许的情况下,不如增大并行度np。

此外,部分Mapper为batched OP的原因主要为这些Mapper是用来进行数据增强或者数据生成的,因此不同于普通Mapper的1->1的映射过程,它需要一个1->N映射过程,我们这里使用batch化来支持这种新类型。

from data-juicer.

github-actions avatar github-actions commented on June 9, 2024

This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day.

from data-juicer.

github-actions avatar github-actions commented on June 9, 2024

Close this stale issue.

from data-juicer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.