Giter Site home page Giter Site logo

我使用accelerate和deepspeed zero-stage3微调的模型,使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load? about moss HOT 12 CLOSED

openmoss avatar openmoss commented on May 18, 2024
我使用accelerate和deepspeed zero-stage3微调的模型,使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load?

from moss.

Comments (12)

ChawDoe avatar ChawDoe commented on May 18, 2024

No description provided.

请问您解决这个问题了吗?

from moss.

ChawDoe avatar ChawDoe commented on May 18, 2024

No description provided.

我也遇到相同的问题了

from moss.

rurubaobao avatar rurubaobao commented on May 18, 2024

我也有这个问题怎么解决的呀

from moss.

Salierioo avatar Salierioo commented on May 18, 2024

from moss.

Salierioo avatar Salierioo commented on May 18, 2024

No description provided.

请问您解决这个问题了吗?

No description provided.

我也遇到相同的问题了

我的解决方法回复在下面了,不知道您是否已经解决了。

from moss.

rurubaobao avatar rurubaobao commented on May 18, 2024

你好,我把它合起来大概60多个G,但是官方模型30多个G,你是直接加载60多个G的模型嘛?

from moss.

Salierioo avatar Salierioo commented on May 18, 2024

from moss.

Salierioo avatar Salierioo commented on May 18, 2024

from moss.

usun1997 avatar usun1997 commented on May 18, 2024

卡在where expected condition to be a boolean tensor, but got a tensor with dtype Half 这一步了,人傻了。用zero_to_fp32.py 将checkpoint .pt文件转换成了一个60多GB的pytorch_model.bin,然后在pytorch_model.bin.index 将所有的模型名称改成了pytorch_model.bin, 运行推理的时候报这个错误

from moss.

mayurou avatar mayurou commented on May 18, 2024

用zero_to_fp32.py 将checkpoint .pt文件转换成了一个60多GB的pytorch_model.bin,然后在pytorch_model.bin.index 将所有的模型名称改成了pytorch_model.bin

无法运行! 用zero_to_fp32.py 将checkpoint .pt文件转换成了一个60多GB的pytorch_model.bin,然后在pytorch_model.bin.index 将所有的模型名称改成了pytorch_model.bin. 这样子报错TypeError: expected str, bytes or os.PathLike object, not NoneType

from moss.

Salierioo avatar Salierioo commented on May 18, 2024

我之前做finetune的事情已经过去挺久了,可能是因为我没记清

但我确实对手动添加编写.index.json没什么印象,是跟着文档说明就能直接load,也没遇到什么问题。

from moss.

lmc8133 avatar lmc8133 commented on May 18, 2024

解决方法我在accelerate的github中与deepspeed相关的文档中找到了,我贴个链接: https://github.com/huggingface/accelerate/blob/e60f3cab7a54a5519bf8f200fa1c998ce46e75bb/docs/source/usage_guides/deepspeed.mdx 方法在saving and loading那一节中。 简单来说就是用save_checkpoint方法生成的py文件对ckpt进行合并(这步对存储空间和内存大小都有要求),生成pytorch_model.bin之后就可以用from_pretrain来load了。 yiahs @.***  

------------------ 原始邮件 ------------------ 发件人: "OpenLMLab/MOSS" @.>; 发送时间: 2023年5月5日(星期五) 下午3:21 @.>; @.>;"State @.>; 主题: Re: [OpenLMLab/MOSS] 我使用accelerate和deepspeed zero-stage3微调的模型,使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load? (Issue #18) 我也有这个问题怎么解决的呀 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

这个链接看的更舒服点~
https://huggingface.co/docs/accelerate/usage_guides/deepspeed#saving-and-loading

from moss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.