🐛 Describe the bug When using dcp.save or dcp.async_save as torch

cc., <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Looking into this, thanks <a class="user-mention notranslate" data-hovercard-type="use

[DCP] DCP load for non-tensor values about pytorch HOT 3 OPEN

ultranity commented on May 20, 2024

[DCP] DCP load for non-tensor values

from pytorch.

Comments (3)

ultranity commented on May 20, 2024

And now there is no direct method to load full dcp saved state dict as torch.load does, while finding there is a dcp_to_torch_save impl. in dcp.format_utils, maybe we can make is better accessable by adding a function like dcp_to_torch_save but just return the state_dict? for example:

def dcp_load(
    dcp_checkpoint_dir: Union[str, os.PathLike],
    no_dist=True,
):
    """
    Given a directory containing a DCP checkpoint, this function will load it into a
    state dict and return it as torch.load does.

    Args:
        dcp_checkpoint_dir: Directory containing the DCP checkpoint.

    .. warning::
        To avoid OOM, it's recommended to only run this function on a single rank.
    """
    sd: STATE_DICT_TYPE = {}

    _load_state_dict(
        sd,
        storage_reader=FileSystemReader(dcp_checkpoint_dir),
        planner=_EmptyStateDictLoadPlanner(),
        no_dist=no_dist,
    )
    return sd

by using this method we can check that dcp.async_save do save other values (state_dict['epoch']==20 in that case), but dcp.load is not updating the dict

from pytorch.

fegin commented on May 20, 2024

cc., @LucasLLC

from pytorch.

LucasLLC commented on May 20, 2024

Looking into this, thanks @ultranity

from pytorch.

[DCP] DCP load for non-tensor values about pytorch HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent