davisyoshida / lorax Goto Github PK

LoRA for arbitrary JAX models and functions

License: MIT License

Python 100.00%

lorax's Introduction

Lorax: LoRA for JAX functions

This is a JAX transform which implements LoRA: Low-Rank Adaptation of Large Language Models. LoRA replaces operations like Wx with (W + BA)x where A and B are skinny rectangular matrices. You can then train only A and B, and leave W frozen, which dramatically reduces the amount of memory needed for things like optimizer states.

Lorax should work on most JAX models. I did my testing with my models which use Haiku, and you can find an example of applying it to a HuggingFace Flax model in the [examples directory(examples/).

Installation

pip install jax-lorax

Changelog

0.2.0

Replaced backend with Qax
Overhauled API to simplify usage (No more need to separately handle frozen/tunable params)

Running tests

Install dev dependencies:

git clone https://github.com/davisyoshida/lorax.git
cd lorax
pip install poetry
poetry install

Run tests:

pytest tests.py

Minimal example

Lorax makes it so you can take model code which wasn't written with LoRA in mind, and transform it so that it does! For example, consider the following MLP code:

import jax
import jax.numpy as jnp

import optax

def model(params, x):
    """My model, written in the dark ages before LoRA, using gratuitous amounts of VRAM when trained"""
    for massive_w in params:
        x = jax.nn.relu(x @ massive_w)
    return jnp.sum(x)

dim = 5000

# Initialize about 3 GB of params
params = [jax.random.normal(jax.random.PRNGKey(i), (dim, dim)) / (dim ** 0.5) for i in range(30)]
optimizer = optax.adam(learning_rate=3e-4)

# OOM on 7GB GPU :(
opt_state = optimizer.init(params)

The optimizer states are way too expensive, but applying Lorax lets you just train two 5000 x 64 matrices for each original weight.

First import lorax and transform your model:

import lorax

# Transform the model code
lora_model = lorax.lora(model)

Next initialize the new LoRA parameters:

# Tell LoRA what to use as the small dimension of B and A
rank_constraint = 64
lora_spec = [rank_constraint for param in params]

# Initialize a set of LoRA factors for each parameter
lora_params = lorax.init_lora(param_tree=params, spec=lora_spec, rng=jax.random.PRNGKey(0))

# The transformed model has the same call signature, but it can now handle parameters
# of type lorax.LoraWeight
lora_model(lora_params, jnp.ones((dim,)))

# Wrap the optimizer so it will freeze parameters not marked as trainable by the spec
optimizer = lorax.wrap_optimizer(optimizer, lora_spec)

# Now the optimizer can be used just like normal
opt_state = optimizer.init(lora_params)

That's it for the Lorax specific stuff. The wrapped lora_model function is just an ordinary JAX function, and the LoraWeight instances a pytrees.

# Normal update function:
@jax.jit
def update_fn(lora_params, opt_state, x):
    grad_fn = jax.value_and_grad(lora_model)
    loss, grad = grad_fn(lora_params, x)

    updates, new_opt_state = optimizer.update(grad, opt_state, params=lora_params)
    updated_params = optax.apply_updates(lora_params, updates)
    return loss, new_opt_state, updated_params

Now for some dummy data and the training loop:

x = jax.random.normal(jax.random.PRNGKey(0), (dim,))
for i in range(10):
    loss, opt_state, lora_params = update_fn(lora_params, opt_state, x)
    print(f'Step: {i} loss: {loss:.4e}') # Number goes down!
# Step: 0 loss: 6.6614e-02
# Step: 1 loss: 4.4402e-02
# Step: 2 loss: 3.0241e-02
# Step: 3 loss: 1.8457e-02
# Step: 4 loss: 1.2326e-02
# Step: 5 loss: 8.8878e-03
# Step: 6 loss: 6.0599e-03
# Step: 7 loss: 4.3899e-03
# Step: 8 loss: 3.0839e-03
# Step: 9 loss: 2.2423e-03

Number goes down! We can now merge the trained LoRA params with the frozen params, and use them with the unmodified model:

lora_output = lora_model((frozen_params, tunable_params), x)

# Now we merge the params to get params usable in the original model
merged_params = lorax.merge_params(lora_params)
orig_model_output = model(merged_params, x)

# Verify that the model outputs are the same
print(f'Difference between split and merged outputs: {orig_model_output - lora_output:.3e}')
# Difference between split and merged params: 1.164e-10

See examples/huggingface_gpt2.py for an example applying Lorax to a realistic model.

lorax's People

Contributors

Stargazers

Watchers

Forkers

xhl-video kelvin-ng giganttheo youliangtan

lorax's Issues

LoRA for trasnformers are typically only applied to Linear layers, how to achieve this with this package？

ValueError: safe_zip() argument 2 is shorter than argument 1

Thank you for your work. However, I encounter a bug when running the simple example.

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/wenbo/Documents/data/miniconda3/envs/octo/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/wenbo/Documents/data/miniconda3/envs/octo/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/wenbo/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
cli.main()
File "/home/wenbo/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/wenbo/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "/home/wenbo/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/wenbo/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/wenbo/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/media/wenbo/12T/manipulation_project/lorax/examples/simple.py", line 36, in
lora_model(lora_params, jnp.ones((dim,)))
File "/home/wenbo/Documents/data/miniconda3/envs/octo/lib/python3.10/site-packages/qax/implicit/implicit_array.py", line 59, in implicit_f
outs_flat = f_wrapped.call_wrapped(*flat_args)
File "/home/wenbo/Documents/data/miniconda3/envs/octo/lib/python3.10/site-packages/jax/_src/linear_util.py", line 192, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/media/wenbo/12T/manipulation_project/lorax/examples/simple.py", line 10, in model
x = jax.nn.relu(x @ massive_w)
File "/home/wenbo/Documents/data/miniconda3/envs/octo/lib/python3.10/site-packages/jax/_src/numpy/array_methods.py", line 265, in deferring_binary_op
return binary_op(*args)
File "/home/wenbo/Documents/data/miniconda3/envs/octo/lib/python3.10/site-packages/qax/implicit/implicit_array.py", line 302, in process_primitive
outs = _default_handlers[primitive.name](primitive, *vals, params=params)
File "/home/wenbo/Documents/data/miniconda3/envs/octo/lib/python3.10/site-packages/qax/implicit/implicit_array.py", line 401, in _handle_pjit
outs = primitive.bind(*subfuns, *flat_inputs, **bind_params)
ValueError: safe_zip() argument 2 is shorter than argument 1

How to use lorax in python3.8

good works!
I hope use lorax in python3.8, but qax-0.2.0 require python3.10. How to use lorax in python3.8?
Please help me!

lorax with haiku

Awesome work!

I'm trying to use the combination of Lorax and Haiku. However, due to the design features of Haiku itself, I'm not sure if Lorax is feasible. I hope you can provide some assistance.

Here is my first part for preparing a linear module.

import numbers
from typing import Union, Sequence
import haiku as hk
import jax.numpy as jnp
import numpy as np

class Linear(hk.Module):
  """Protein folding specific Linear module.

  This differs from the standard Haiku Linear in a few ways:
    * It supports inputs and outputs of arbitrary rank
    * Initializers are specified by strings
  """

  def __init__(self,
               num_output: Union[int, Sequence[int]],
               initializer: str = 'linear',
               num_input_dims: int = 1,
               use_bias: bool = True,
               bias_init: float = 0.,
               precision = None,
               name: str = 'linear'):
    """Constructs Linear Module.

    Args:
      num_output: Number of output channels. Can be tuple when outputting
          multiple dimensions.
      initializer: What initializer to use, should be one of {'linear', 'relu',
        'zeros'}
      num_input_dims: Number of dimensions from the end to project.
      use_bias: Whether to include trainable bias
      bias_init: Value used to initialize bias.
      precision: What precision to use for matrix multiplication, defaults
        to None.
      name: Name of module, used for name scopes.
    """
    super().__init__(name=name)
    if isinstance(num_output, numbers.Integral):
      self.output_shape = (num_output,)
    else:
      self.output_shape = tuple(num_output)
    self.initializer = initializer
    self.use_bias = use_bias
    self.bias_init = bias_init
    self.num_input_dims = num_input_dims
    self.num_output_dims = len(self.output_shape)
    self.precision = precision

  def __call__(self, inputs):
    """Connects Module.

    Args:
      inputs: Tensor with at least num_input_dims dimensions.

    Returns:
      output of shape [...] + num_output.
    """

    num_input_dims = self.num_input_dims

    if self.num_input_dims > 0:
      in_shape = inputs.shape[-self.num_input_dims:]
    else:
      in_shape = ()

    weight_init = get_initializer_scale(self.initializer, in_shape)

    in_letters = 'abcde'[:self.num_input_dims]
    out_letters = 'hijkl'[:self.num_output_dims]

    weight_shape = in_shape + self.output_shape
    weights = hk.get_parameter('weights', weight_shape, inputs.dtype,
                               weight_init)

    equation = f'...{in_letters}, {in_letters}{out_letters}->...{out_letters}'

    output = jnp.einsum(equation, inputs, weights, precision=self.precision)

    if self.use_bias:
      bias = hk.get_parameter('bias', self.output_shape, inputs.dtype,
                              hk.initializers.Constant(self.bias_init))
      output += bias

    return output

These are some of my attempts, and the process of obtaining LoraWeight through lorax.init went smoothly.

def _model(x, n_out):
    module = Linear(num_output=n_out, name='linear')
    return module(x)
model = hk.transform(_model)

n_in = 5000
n_out = 3000
dummy_x = jnp.ones((n_in))
rng_key = jax.random.PRNGKey(42)
params = model.init(rng=rng_key, x=dummy_x, n_out=n_out)

import lorax
from lorax.constants import LORA_FREEZE, LORA_FULL

def decision_fn(path, param):
    if 'bias' in path:
        print(f'Fully finetuning param {path}')
        return LORA_FULL
    dim = 32
    print(f'Using LoRA with dim={dim} for param {path}')
    return dim

lora_spec = lorax.simple_spec(params, decision_fn=decision_fn, tune_vectors=True)
lora_params = lorax.init_lora(params, lora_spec, jax.random.PRNGKey(42))

However, obtaining the lora_model and utilizing the lora_params did not work for me.

# code may like these
lora_model = lorax.lora(model)
out = lora_model(params=lora_params, x=dummy_x, rng=jax.random.PRNGKey(42), n_out=n_out)

Looking forward to your response!：）

Predicting LoRA weights

I would like to use a separate neural network to predict LoRA weights for a main neural network, while training both neural networks at the same time. How can I manipulate the pytrees or to achieve this if it is possible at all?

How to save the parameters non-destructively?

Hi, thank you for this very useful library.
How to save the LoRA parameters?
I need to implement training checkpointing, so I don't want to merge the parameters in a way that is not possible to unmerge. When using a msgpack on the LoRA weights, I get an error TypeError: can not serialize 'LoraWeight' object.
I guess it would be possible to export the dictionnary as a regular PyTree of JAX Array and then import the weights again as a LoRA weight class. However, I am not sure how to write this.

Equinox support?

This looks neat! I'm just curious about supporting Equinox as a possible backend neural network library.

This is typically called as:

model = eqx.nn.MLP(...)
model(data)

but this can still be thought of in an init/apply paradigm if you want it to:

init = eqx.nn.MLP
apply = eqx.nn.MLP.__call__

params = init(...)
apply(params, data)

c.f. also this example

So I'm guessing this should be straightforward/elegant to support.

(I'll own up to the fact that I'm discussing compatibility with one of my own projects here!)

Integration into EasyLM

Hi! Cool project. I wonder how hard it would be to implement an integration of this library into something like @young-geng's EasyLM. That would make using lorax really easy as all the training would be handled by EasyLM.

LoRA Hypernetworks for 3 dimensional kernels

          I am currently playing around with training the Transformer. I am also now wondering how I could apply this to convolutions as the kernel for the convolution has 3 dimensions?

Originally posted by @PuR3Luck in #6 (comment)