Giter Site home page Giter Site logo

Comments (7)

LiheYoung avatar LiheYoung commented on September 24, 2024

你好,需要仿照unimatch.py里对model用DDP wrap一下再load_state_dict:

model = DeepLabV3Plus(cfg)
model.cuda()
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank], broadcast_buffers=False,
                                                  output_device=local_rank, find_unused_parameters=False)
model.load_state_dict(checkpoint['model'])

from unimatch.

xiaoqiang-lu avatar xiaoqiang-lu commented on September 24, 2024

您好,出现了一个新的问题:

image

似乎是单卡加载分布式训练模型时,所存的模型需要为:model.module.state_dict()
我注意到最初版本的代码里存模型时用的是model.module.state_dict(),当前版本用的是model.state_dict()

from unimatch.

LiheYoung avatar LiheYoung commented on September 24, 2024

如果是以model.state_dict()直接保存的(state_dict的keys里包含"module"),那需要DDP wrap一下再load;如果是以model.module.state_dict()保存的(state_dict的keys里不包含"module"),那可以直接load。上面这个报错是你没有配置DDP,可以仿照unimatch.py里设置一下:

rank, world_size = setup_distributed(port=args.port)

from unimatch.

xiaoqiang-lu avatar xiaoqiang-lu commented on September 24, 2024

您好,添加上述之后:

image

出现错误:

image

from unimatch.

LiheYoung avatar LiheYoung commented on September 24, 2024

需要加上这个:

parser.add_argument('--local_rank', default=0, type=int)

另外注意启动方式也使用train.sh里的

from unimatch.

xiaoqiang-lu avatar xiaoqiang-lu commented on September 24, 2024

感谢您的解答,这似乎绕回到分布式构建模型,但是指定了单卡。我将torch.nn.parallel.DistributedDataParallel替换为torch.nn.DataParallel解决了此问题,能够直接启动。

谢谢您的耐心指点,祝科研顺利~

from unimatch.

LiheYoung avatar LiheYoung commented on September 24, 2024

好的~

from unimatch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.