Giter Site home page Giter Site logo

Update README and examples about git-theta HOT 3 CLOSED

craffel avatar craffel commented on August 14, 2024
Update README and examples

from git-theta.

Comments (3)

muqeeth avatar muqeeth commented on August 14, 2024

git-theta

git extension for collaborative, continual, and communal model development.

How to use this repository?

LFS installation

Download the LFS package from the website. For Linux users, download the amd64 version from the list of assests in the website.

Getting started

clone the repository

git clone https://github.com/r-three/git-theta.git

Install the packages by running:

cd git-theta
pip install -e .

Installing git theta

You can initialize git theta in the root directory of the codebase to track code and models as follows:

git theta install

The following lines will be added to the .gitconfig file in the home directory of the user after the successful installation.

[filter "lfs"]
        smudge = git-lfs smudge -- %f
        required = true
        clean = git-lfs clean -- %f
[filter "theta"]
        clean = git-theta-filter clean %f
        smudge = git-theta-filter smudge %f
        required = true

Example Usage

First, initialize git in the root directory of the codebase

git init

In order to start tracking the model using git theta, run this command

git theta track {path_to_model_checkpoint}

The above command adds the following lines to the .gitattributes files in the home directory.

".git_theta/{path_to_model_checkpoint}/**/params/[0-9]*" filter=lfs diff=lfs merge=lfs -text
{path_to_model_checkpoint} filter=theta

Once tracked, stage any changes made to the model by running the command

git theta add {path_to_model_checkpoint}

This creates a .git_theta/{path_to_model_checkpoint} folder in the root directory of the codebase.

This will store the parameters of the model in the tensorstore format inside the .git_theta/{path_to_model_checkpoint} folder. For example, consider a parameter name decoder.block.0.layer.0.SelfAttention.k.weight in the model checkpoint with path pytorch_model.bin, the corresponding parameters are stored as the following hierarchy .git_theta/pytorch_model.bin/decoder.block.0.layer.0.SelfAttention.k.weight.

At this step, run git status, you should see all the .git_theta/{path_to_model_checkpoint}/{parameter_name} files in "Changes to be committed" along with the model checkpoint file and the .gitattributes file.

After adding the model checkpoint, add any other code/text files that are modified using git add. You can then commit the changes and push to remote.

The remote will have the .git_theta/{path_to_model_checkpoint} folder in it where instead of the actual params, git remote shows the params are stored as LFS objects. A metadata file describing the contents of the params like shape, dtype, and hash are stored inside .git_theta/{path_to_model_checkpoint}/{parameter_name}on git remote. The actual model checkpoint path as seen on the remote will be a file containing the hash, shape and type of each of the keys in the checkpoint .

TBA

git diff on the model checkpoint will identify which parameter groups are modified or added or removed.

git merge will assume that all merges to the checkpoint (i.e. to parameter group files) result in merge conflicts and offer various possible automated merging strategies that can be tried and vetted.

git checkout to a commit will construct a checkpoint based on the contents of .git_theta/<model_checkpoint_name> at that commit.

from git-theta.

craffel avatar craffel commented on August 14, 2024

Make a pull request please (after updating the name).

from git-theta.

craffel avatar craffel commented on August 14, 2024

Outline

  • Brief overview of what git-theta is
    • How and why it's different from treating the checkpoint as a blob of data and what it supports with lots of links
  • Somewhat comprehensive usage example
    • Use the example from the paper
  • Specific usage examples
    • Parameter-efficient updates
      • Enumerate the different ways that we support this
    • Performing a merge
    • Trying out a different version of a model on a branch
  • Extending git-theta
    • Adding update types (talk to Muqeeth)
    • Adding merge methods
    • Adding checkpoint formats
  • Why do I need git-theta (how the internals work)
    • Why not git-lfs
    • How we handle parameter-efficient updates
    • How we handle merging
    • How we do hashing

from git-theta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.