Giter Site home page Giter Site logo

brettkoonce / accelerate Goto Github PK

View Code? Open in Web Editor NEW

This project forked from huggingface/accelerate

0.0 2.0 0.0 536 KB

A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

License: Apache License 2.0

Python 98.67% Makefile 0.57% Shell 0.76%

accelerate's Introduction



License Documentation GitHub release Contributor Covenant

Run your *raw* PyTorch training script on any kind of device

Easy to integrate

๐Ÿค— Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

๐Ÿค— Accelerate abstracts exactly and only the boilerplate code related to multi-GPUs/TPU/fp16 and leaves the rest of your code unchanged.

Here is an example:

  import torch
  import torch.nn.functional as F
  from datasets import load_dataset
+ from accelerate import Accelerator

+ accelerator = Accelerator()
- device = 'cpu'
+ device = accelerator.device

  model = torch.nn.Transformer().to(device)
  optim = torch.optim.Adam(model.parameters())

  dataset = load_dataset('my_dataset')
  data = torch.utils.data.DataLoader(dataset, shuffle=True)

+ model, optim, data = accelerator.prepare(model, optim, data)

  model.train()
  for epoch in range(10):
      for source, targets in data:
          source = source.to(device)
          targets = targets.to(device)

          optimizer.zero_grad()

          output = model(source)
          loss = F.cross_entropy(output, targets)

+         accelerator.backward(loss)
-         loss.backward()

          optimizer.step()

As you can see in this example, by adding 5-lines to any standard PyTorch training script you can now run on any kind of single or distributed node setting (single CPU, single GPU, multi-GPUs and TPUs) as well as with or without mixed precision (fp16).

In particular, the same code can then be run without modification on your local machine for debugging or your training environment.

๐Ÿค— Accelerate even handles the device placement for you (which requires a few more changes to your code, but is safer in general), so you can even simplify your training loop further:

  import torch
  import torch.nn.functional as F
  from datasets import load_dataset
+ from accelerate import Accelerator

+ accelerator = Accelerator()
- device = 'cpu'

+ model = torch.nn.Transformer()
- model = torch.nn.Transformer().to(device)
  optim = torch.optim.Adam(model.parameters())

  dataset = load_dataset('my_dataset')
  data = torch.utils.data.DataLoader(dataset, shuffle=True)

+ model, optim, data = accelerator.prepare(model, optim, data)

  model.train()
  for epoch in range(10):
      for source, targets in data:
-         source = source.to(device)
-         targets = targets.to(device)

          optimizer.zero_grad()

          output = model(source)
          loss = F.cross_entropy(output, targets)

+         accelerator.backward(loss)
-         loss.backward()

          optimizer.step()

Launching script

๐Ÿค— Accelerate also provides an optional CLI tool that allows you to quickly configure and test your training environment before launching the scripts. No need to remember how to use torch.distributed.launch or to write a specific launcher for TPU training! On your machine(s) just run:

accelerate config

and answer the questions asked. This will generate a config file that will be used automatically to properly set the default options when doing

accelerate launch my_script.py --args_to_my_script

For instance, here is how you would run the GLUE example on the MRPC task (from the root of the repo):

accelerate launch examples/nlp_example.py

Why should I use ๐Ÿค— Accelerate?

You should use ๐Ÿค— Accelerate when you want to easily run your training scripts in a distributed environment without having to renounce full control over your training loop. This is not a high-level framework above PyTorch, just a thin wrapper so you don't have to learn a new library, In fact the whole API of ๐Ÿค— Accelerate is in one class, the Accelerator object.

Why shouldn't I use ๐Ÿค— Accelerate?

You shouldn't use ๐Ÿค— Accelerate if you don't want to write a training loop yourself. There are plenty of high-level libraries above PyTorch that will offer you that, ๐Ÿค— Accelerate is not one of them.

Installation

This repository is tested on Python 3.6+ and PyTorch 1.4.0+

You should install ๐Ÿค— Accelerate in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install PyTorch: refer to the official installation page regarding the specific install command for your platform. Then ๐Ÿค— Accelerate can be installed using pip as follows:

pip install accelerate

Supported integrations

  • CPU only
  • single GPU
  • multi-GPU on one node (machine)
  • multi-GPU on several nodes (machines)
  • TPU
  • FP16 with native AMP (apex on the roadmap)

accelerate's People

Contributors

arjunchandra avatar lewtun avatar lysandrejik avatar philschmid avatar sgugger avatar thomwolf avatar voidful avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.