Giter Site home page Giter Site logo

Comments (3)

keisukefukuda avatar keisukefukuda commented on August 17, 2024

Hi @Fhrozen , thanks for trying ChainerMN.
First, which ChainerMN version are you using? (1.3 or master?)

About the issue of NCCL, NCCL is actually a black box library and the error message unhandled system error does not contain much information. I will discuss it with my colleagues anyways.

On the second issue, please try Open MPI 2.x series (2.1.3 would be a good choice). Open MPI 3.x has a bug on GPU-Direct communication (issue number 3792 on open-mpi/ompi repository, as indicated in our issue #221 ).
From the error messages on the first line, it seems you use UCX BTL component. The ompi bug is in openib component, so I'm not sure it really fixes the issue, but it worth trying.

Thanks,
Keisuke

from chainermn.

keisukefukuda avatar keisukefukuda commented on August 17, 2024

Also, will you try again with the environmental variable NCCL_DEBUG=INFO ?
It is not very useful in many cases, but better than nothing in this case.

Thanks.
Keisuke

from chainermn.

Fhrozen avatar Fhrozen commented on August 17, 2024

@keisukefukuda Thank you for your support.
I am using the current version from pip 1.3.
I am running now some test with NCCL and MPI to check the full functionallity of the last. I also running some test with Open MPI 2.x. will update you as soon as I get any new information.

from chainermn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.