Giter Site home page Giter Site logo

Support MPI distributed training about llm.c HOT 6 OPEN

karpathy avatar karpathy commented on June 12, 2024
Support MPI distributed training

from llm.c.

Comments (6)

Yiltan avatar Yiltan commented on June 12, 2024 1

I regularly write MPI code, so this shouldn't be too complicated to implement. I've started to look though the CPU version to get started. However, I do have questions regarding the ML side.

There a few options I can see:

  1. Data Parallelism using MPI_Allreduce to average gradients
    - I think we would do this around here:
    - https://github.com/karpathy/llm.c/blob/master/train_gpt2.c#L906C1-L906C5
  2. Tensor parallelism (similar to lamma.cpp)
  3. Model Parallelism

Is there preference to how this could be scaled with MPI? If option 2 or 3, seem like the best option, do you have a suggestion as to where in the code I should dig into?

from llm.c.

karpathy avatar karpathy commented on June 12, 2024 1

Sounds great! I expect to get started with the backward pass somewhere over the weekend most likely.
(I spent today optimizing the forward pass still)
Once we have the backward pass getting data parallel training in will be super awesome

from llm.c.

chadbrewbaker avatar chadbrewbaker commented on June 12, 2024

I have this in mind for the Mojo target issue - which is really about having the Makefile support composability like the one for llama.cpp. Probably copy-pasta most of what llama.cpp has so the build is using mpicc. Would still need to write the MPI code.

from llm.c.

karpathy avatar karpathy commented on June 12, 2024

definitely! but this is pretty far down the line, i think we first need to get the 1-GPU version to be super solid.

from llm.c.

chadbrewbaker avatar chadbrewbaker commented on June 12, 2024

I would do MPI-2 as MPI IO is all you need and it is most widely supported.

from llm.c.

Yiltan avatar Yiltan commented on June 12, 2024

llm c_train

The MPI version of this is mostly working at this point. I've tested it up to 8 nodes. It reduces training by many hours.

@karpathy Do you still have interest in a NCCL version? If so, are resources for multi-GPU resource that you could share?

from llm.c.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.