Giter Site home page Giter Site logo

Comments (5)

nkandpa2 avatar nkandpa2 commented on August 14, 2024

This is one possible solution:

When a user runs git theta add <checkpoint> the checkpoint's updated parameter groups get saved under .git_theta/<path to model> and then the command internally runs git add <checkpoint>. However, when the user unintentionally runs git add <checkpoint> themselves, the updated parameter groups never get saved under .git_theta/<path to model>. In this case the checkpoint file and its representation in .git_theta/<path to model> are inconsistent. We can check for this inconsistency in the clean filter. In cases where the checkpoint file and and directory are inconsistent, the clean filter can fail (so that the file does not get git add-ed) and log some message saying to use git theta add instead.

Doing this naively would be expensive for large models. It would require loading the model parameters into memory twice -- once for the checkpoint file and once for the .git_theta/<path to model> directory. Instead, in .git_theta/<path to model> directory, we can directly store the model metadata file (produced by the clean filter) containing the shape, type, and hash of each parameter group. If this is stored in the directory, then there would be no need to load up the parameters in the directory. Instead we can just check for consistent hashes.

This approach of saving the metadata file under .git_theta/<path to model> also has the benefit of simultaneously helping with #60 since it makes it simple to see what parameter groups have changed.

from git-theta.

muqeeth avatar muqeeth commented on August 14, 2024

The quick solution is to have metadata file in the .git_theta/<path-to-model> directory. So when the user runs git add <checkpoint>, we know that it calls clean filter. Inside clean filter, we check if metadata file exists. If it exists, we compare if the contents of metadata are same as contents of metadata made from the current checkpoint. If they are not same or the metadata file doesn't exists, we throw an error saying the user has to do git theta add <checkpoint>.

Once bad case is: if the user runs git add <checkpoint> without any modifications to the checkpoint. we don't throw any error but overwrite the staged files with the same contents as before.

When I implement this: git status and git diff fails when ran after modifying checkpoint because the clean filter is called and there is now a mismatch between checkpoint and metadata inside the git_theta/<path-to-model>

from git-theta.

blester125 avatar blester125 commented on August 14, 2024

we can directly store the model metadata file (produced by the clean filter) containing the shape, type, and hash of each parameter group.

We talked about how you can't run git add within a clean but are other git commands allowed? Instead of having a copy of the metadata file in the .git_theta/ dir we could use git to look at he value of the checkpoint file at head which would be the metadata version. Would this be easier/avoid any divergence in state between the checkedin metadata file under .git_theta/ and the one checked in as the replacement for the checkpoint file?

from git-theta.

nkandpa2 avatar nkandpa2 commented on August 14, 2024

@muqeeth @blester125

The major issue with this idea is that the clean filter isn't just run upon git add, so checking for consistency between the model metadata file and the .git_theta/ directory in the clean filter won't work. An alternative solution would be to do this consistency check inside of a git pre-commit hook.

Implementation-wise we could just make a script bin/git-theta-check (or something like that). The script would

  1. create a git.Repo object
  2. check the files being committed by looking at the git index with repo.index.entries
  3. check whether any of them are being tracked by git-theta by looking at entries in .gitattributes
  4. for any files being tracked by git-theta, verify that the checkpoint file contents and the .git_theta/ directory are consistent

If they are inconsistent, we should abort the commit and tell the user to run git theta add <model> before trying to commit again.

This script would just need to be called from .git/hooks/pre-commit so that git runs the check just before committing. We would add this call to .git/hooks/pre-commit when the user runs git theta track <model>.

from git-theta.

nkandpa2 avatar nkandpa2 commented on August 14, 2024

Not needed after #114

from git-theta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.