Giter Site home page Giter Site logo

fine-tuning-instability's Introduction

fine-tuning-instability

Fine-Tuning Instability for Large Language Models

This repository contains the implementation for the AdamWL2SP optimizer, as described in this blog post. AdamWL2SP is the adaptive moment estimation (Adam) optimizer with decoupled weight decay and $L^2-\mathrm{SP}$ regularization.

The optimizer is implemented in src/transformers_fine_tuning/optim/adamwl2sp.py and is based on the PyTorch implementation of AdamW.

src/transformers_fine_tuning/transformers/trainer_optimizer_init.py is a subclass of Trainer from the ๐Ÿค— transformers library that facilitates custom optimizers, such as our AdamWL2SP. It is not strictly necessary but we prefer the design to Trainer. (In case of further interest, see Passing optimizer to Trainer constructor does not work #18635.)

The example script fine-tune.py demonstrates using our code to fine-tune ALBERT on the RTE task, using optimizers such as AdamW from torch, or our custom AdamWL2SP optimizer. The hyperparameters are set the same as were used in our experiments. The model, optimizer, task, random seeds and hyperparameters can be modified by setting the appropriate global variables in the script.

It will work with CPU, GPU via cuda and TPU via torch_xla, with optional concurrency if multiple TPU cores are available.

Note that this is not the actual script that was used to run our experiments, which performs additional tracking of the metrics. For given seeds the results will not reproduce those reported in the blog. However, a series of fine-tuning runs with fine-tune.py should produce qualitatively similar results.

Install

The python module itself transformers_fine_tuning is in the subdirectory src/transformers_fine_tuning.

In order to prepare an environment to run the example python script fine-tune.py, clone this repository and run

source setup.sh

in the console, which will install dependencies and set up environment variables.

Usage

For fine-tuning without concurrency simply run in the console:

python fine-tune.py

It will automatically use the TPU processor if one is available.

For multi-core TPU environments, concurrent training can be done as follows. For example, if 8 cores are available:

python transformers/examples/pytorch/xla_spawn.py --num_cores 8 fine-tune.py

For fixed random seeds, concurrent training will not replicate a non-concurrent training run.

fine-tuning-instability's People

Contributors

quantitative-technologies avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.