January 19th, 2022 Summary of work

March 9th, 2022 Meeting Summary <ul dir="auto

March 23rd, 2022 Work from previous week <li

March 30th, 2022 Work from previous week <ul dir="auto"

April 13th, 2022 Work from previous week <ul dir="auto"

April 20th, 2022 Work from previous week <ul dir="auto"

Comments (16)

zoeqevans commented on August 14, 2024

February 2nd, 2022

Summary of work the previous week

[Anisha] - Went through ONNX docs, ProtoBuff docs and the onnx-surgery repo. Played around with renaming parts of the onnx model graph, specifically the weights/initializers. Could not figure out how initializers are given random numeric names. Also explored using git for onnx models

Meeting Summary

We ought to start writing down what functions we need to exist.
Take in an Onnx graph, and rename the initializer names (with the mapping as an argument).
Specify a parameter (or subset of parameters), and a value to update it to. (What is the best way to refer to this parameter?)
Apply changes to the computational graph. If we have two Onnx graphs, one with an additional layer compared to the other, how do we add that layer into the other model?

Do model building libraries name nodes / layers? It would be nice to ask for the kernel of the 3rd convolution layer and change the value, as opposed to asking for parameter 983.

Tenghao put together an Onnx explainer document.

Does / can Onnx explicitly represent a hierarchical block structure? Maybe, because there are visualization tools that take Onnx models, and output a graph of the model.
Should the person doing the training identify ahead of time how they’re going to update the model, or should the diff be automatically calculated (git style)?
Tell our system “I updated these parameters in this way”, and then our system stores the update. E.g. “I did a sparse update on layer 2”.

How do we describe the sequence of changes to the model in a way that is somewhat human readable?

For example, Varun & Yi-Lin write a paper on Fisher Weighted updates. If you train a model using their method, you should then produce some kind of “message” to tell the VCS, at some level of abstraction, what you did.

E.g. if I do a sparse update W’ = W + S, it may not be obvious by looking at W’ and W what S was? (In the case of sparse updates, it is actually obvious, but maybe low-rank updates are slightly harder to detect?) Permuting the rows is another tricky example.

One goal might be: give me two Onnx checkpoints, and I’ll give you the minimal patch to get from one to the other.

Git has a system for representing, removing and applying diffs. It also has a system for calculating diffs. Those do not need to be built at the same time (though are obviously complementary).

How does git internally represent and apply diffs / patches?

Action Items for Next Week

Put together a “User Story” kind of document which has the different use cases we want to support, operations etc
Borrow ideas from onnx-surgery and see how many operations we can and want to implement.

from git-theta.

zoeqevans commented on August 14, 2024

February 9th, 2022

Summary of work the previous week

Meeting Summary

Added a bunch of issues to the tracker regarding necessary operations for a "patch" storage format for models.

Discussed patch generation from a model, and decided that it doesn't need to be part of this system.

Tried to understand git internals.

Action Items for Next Week

Assigned issues.

from git-theta.

vishalathreya commented on August 14, 2024

February 23rd, 2022

Meeting Summary

Revisited the doc for "Software components of a version control system for models"
Discussed need for a CLI as part of the tool
Anisha walked through colab code for fetching/updating names for nodes

Names are compulsory for ONNX initializers but not for intermediate nodes.
Need a strategy to "fill in" missing names for nodes (e.g inferring name from its associated initializer?) since they are needed for graph surgery, adding/removing layers, etc

Action Items for Next Week

Continue working on POC functionalities (updating nodes, initializers, etc)

from git-theta.

zoeqevans commented on August 14, 2024

March 2nd, 2022

Meeting Summary

(Monty & Anisha) Discussion of model diff update file formats.

Instead of using Pickle, could use NPZ or similar to restrict the type of storable data to float tensors (and their keys), to prevent potential security issues.
Instead of separating updates into index and content files, just store the content file, and provide functionality to show a human-readable version of the content file at speed.
Try using the Github issue itself as your notebook while working on something, so your thoughts are more visible to the group.

(Vishal) Discussion of Model Surgery

Outlined three step process for generation and application of model updates, and added functionality to handle shape mismatch between old and new initializer values.
How does the name of the model weight get surfaced to the diff file? From the user? In general, it seems we want the update type information from the user, and the initializer name to be generated automatically (since users won't know the names of each initializer).
Training code that the user runs will have to interface with our library (not the user).
Potentially this requires the user to use a wrapper around TensorFlow / PyTorch... how realistic is this?

Next Week Action Items

Check in the code we've written so far
Write a system diagram to get a group understanding of what the system flow should be
Document a schema for the diff file format
Take a look at new file formats to use instead of pickling to store updates
Clarify with Colin how exactly we want to generate our diff files (generating from models A & B, versus generating on the fly as we train)
Come up with a way to order and organize diff files
Write a function to generate the dense-update diff between two models

from git-theta.

vishalathreya commented on August 14, 2024

March 9th, 2022

Meeting Summary

Anisha raised pull request for #11
code review using Github comments initially
Settled on using Python Black formatter (https://github.com/psf/black)
Vishal and Anisha discussed High Level System Diagram details #9
Clarified questions mentioned in the doc linked in issue

Clarifications from previous meeting

We aren't worried about the algo behind generating the diffs. For POC, we will be using mock diffs after finalizing format
Our library/tool will be called by user after their training code irrespective of the framework they're using (Pytorch/TF/MXNet, etc). User specifies which params were updated using which type (sparse/dense/low-rank). Hence, no need to generate the diff between 2 ONNX files which is expensive and non-trivial.

Next Week Action Items

Setup CI/CD to auto run tests on pull requests
Use Python Black formatter
Iron out POC diff file format

from git-theta.

anisham197 commented on August 14, 2024

March 23rd, 2022

Work from previous week

Anisha checked in some code for ONNX functions
Monty checked in code for creating and applying a diff -
Vishal looked into Github Actions to set up CI

Meeting Summary

Monty gave a walk through of the difftools code
Discussed CI options for repo. Github Actions is paid for private repos.

Next Week Action Item

Vishal: Check-in code for 'set-weights' function.
Colin and Vishal: Resolve how to proceed with CI for private repo
Monty and Anisha: Figure out how a user would interface with the ModelUpdate function written in difftools

Future Action Items

Finalise Project Structure
Implement VCS Operations
Implement File System
Move from pickle to other file format for diff files

from git-theta.

zoeqevans commented on August 14, 2024

March 30th, 2022

Work from previous week

Clarified several issues around the design of the diff files.
Pushed code to resolve those issues.
Identified concerns around file system architecture for storing diffs.

Meeting Summary

Discussed storage mechanisms for sparse arrays, and in general considerations around converting all input to numpy, and the subsequent problem that numpy does not support sparse arrays.
Decided on an initially dense training run, followed by a sparse (Fish Mask, or even a random mask) training run on a different dataset (maybe flipped-MNIST)
Decided to build the system as a python library, and punt the consideration of a command-line tool (since any such tool will be based on the underlying library).

Next Week Action Item

Monty & Anisha: Setup the dense + sparse training run.
Monty & Anisha: Decide on a preliminary diff-storage architecture to facilitate the above.
Monty & Anisha: Demonstrate ability to move between two commits in the above case, and compare storage requirements to checkpoints.

from git-theta.

anisham197 commented on August 14, 2024

April 6th, 2022

Work from previous week

Monty worked on setting up sparse training run. Ran into some blockers.

Meeting Summary

Pytorch has an export to ONNX method, but does not support importing an ONNX model.
Need to use a 3rd party library - onnx2pytorch that is not fully fleshed out. Library has disclaimers about using a batch size > 1 and not having testing fine-tuning and training of converted models.
When using the above library to import an Onnx model, the parameter names change and we see some additional parameter names too. So the model is not represented by a unique computational graph. These name changes are problematic as it affects being able to set weight names correctly.
We cannot pivot to training in Tensorflow or some other framework that better supports ONNX as we want to support models built using Pytorch.

Questions to Consider

Why is the parameter name changing? Is it performing some optimization when importing?
Can we set a flag to not perform optimizations? Eg. do_constant_folding=False when exporting?
What are the additional names we see? Can we locate the code in the library where these changes are happening?
Do we look into other formats aside from ONNX? Can we find a workaround that resolves the names changes?

Next Week Action Item

Monty & Anisha: Setup the dense + sparse training run.
Vishal: Implement underlying filesystem to support version control

from git-theta.

anisham197 commented on August 14, 2024

April 13th, 2022

Work from previous week

Anisha worked on understanding FISH Mask implementation to use in the sparse training run.
Vishal worked on implementing underlying filesystem.

Meeting Summary

Vishal walked through the checkout functionality implementation. Indexing diff files using a doubly linked list concept with links to the previous diff and next diff.
Subversion stores intermediate snapshot diffs aside from the initial diff and latest diff. Do we want to implement that since each snapshot would consist of storing millions of params. Allow the user to decide when to snapshot.
Onnx2pytorch library changes the parameter names every time it's used to import the model to pytorch, hence no stable format. Setting the do_constant_folding=False did not help.
Need to look into whether the onnx2pytorch model name change issue exists in older versions of pytorch. Have not isolated the code in the library that creates this issue.
Can use a name mapping function as a hack to change the names back to the original names.
Can also look into Tensorstore (no computational graphs) or other alternatives instead of onnx

Next Week Action Item

Monty & Anisha: Investigate ONNX issues and implement sparse training run
Vishal: Check in code that implements underlying filesystem to support version control

from git-theta.

anisham197 commented on August 14, 2024

April 20th, 2022

Work from previous week

Monty checked in code that does a sparse training run by handpicking parameter groups to train.

Meeting Summary

There are two items in the imported model's state dict for each parameter in the original model, but besides the names, they seem to be copies of eachother. For the sparse training run, went ahead and select one of the copies to update.

Action Items for Next Week

Vishal: Check in code that implements underlying filesystem to support version control
Anisha: Implement high-level functions for git-actions and tie together PoC
Anisha: Document design decisions of the PoC

Action Items to have a finished PoC

Need to test that picking one set of params from the duplicated param names when importing to Pytorch works with other architectures.
Fix the name change issue in the onnx2pytorch library. Alternatively need to write a name mapping function to map changed names to original names in the imported model, so that we can correctly identify what weights have been updated and store them in the diff file.
Reorganize the code into a project structure and write tests
Tie all the pieces together

Define high level functions for the various git-like actions.
Implement a sparse training run and using the high level functions communicate the parameter updates to our VCS system.

from git-theta.

zoeqevans commented on August 14, 2024

April 27th, 2022

Meeting Summary

Think about when we should freeze the project and start writing a write-up.
Monty has built a PoC to show checkout / diff application.
Caught everyone up to speed on globally / locally dense / sparse updates.
Vishal has pushed long-range checkout / diff application code, that abstracts Monty's above experiment.
Anisha wrote a novel's worth of meeting notes :)
Discussion of checkpoint-storage benefits versus more exciting benefits, like sharing updates and merging models.

Action Items for Next Week

Marry the current example code with the checkout functionality, and include necessary Git abstractions.
Write something to capture and reflect on what we've done.

from git-theta.

anisham197 commented on August 14, 2024

May 4th, 2022

Meeting Summary

Monty wrote code to perform multiple rounds of training and output diffs in sequence
Vishal used Monty's example and wrote code to travel forward through multiple diffs.
Anisha started working on design decisions doc
Currently init function does not exist, have to test travelling backward through diffs, define checkout function without specifying checkpoint file.
Discussed finding a better use case for a demo, sequentially training parameter groups

Action Items for Next Week

Marry the current example code with the init and checkout functionality, and include necessary Git abstractions.
Complete writing design decisions doc

from git-theta.

craffel commented on August 14, 2024

9/7/22

What are people interested in working on
- Tara: Help out wherever she can, smaller projects,
- Nikhil: Missed the lab meeting so wanted to know what people were working on
- Haokun: Working on existing project exploring why conditional computation is not working, past work on intermediate task training is kind of an inspiration
- Anisha: Last semester we made progress in thinking about what a system might look like, but like the idea of having an actual model and treating them as end-users
- Vishal: Wants to continue working on the version control system but also interested in the model
- Muqeeth: Interesting thing is the scale of the project - having a model that works on 1,000 tasks, what challenges happen when we get to that scale, with lots of brainstorming on design choices.
Existing work on version control system
- Have a base model checkpoint (originally ONNX)
- Training code can communicate what parameters were changed
- Interacts with an API that creates a diff file
- Diff file can be committed to the version control system itself
- Existing Python API for checking out and applying a particular version and creating diff files
Next steps for version control system
- Different checkpoint format?
  - Would basically be key/value representation and
  - ONNX was designed mainly for inference, was not designed for changing the model
- Integrate into git?
  - We could also be a wrapper around git
Brainstorming model
- Is it important to have learned routing in the model?
  - Probably doesn't hurt
  - Question about whether we need to update the router when we add a new adapter/expert
- What should the base model be?
  - T5 or T0
- What should the adapter format be?
  - TBD, possibly IA3
- At what level should we route?
External people

from git-theta.

craffel commented on August 14, 2024

9/14/22

VCS backbone
- Nikhil: Impression is that git is surprisingly extensible
- Git-LFS as an example
  - Git is bad at tracking large files because if you minimally change the binary it duplicates the entirety of the file
  - Video on how git LFS was implemented: https://www.youtube.com/watch?v=w-037RcHjAA&t=589s
  - LFS has a server that stores all the LFS files; git only stores the hash of the large file
  - When you do a pull, you get the hash and metadata history but only the current version of the file
  - With LFS you first do git lfs track, which adds attributes to the .gitattributes file that designates filter=lfs
  - LFS just adds a blurb to the gitconfig and hot-swaps commands for files that are tracked by LFS (filter lfs)
  - This might mean that we can designate a checkpoint file as being tracked by our system, and then re-route commands on those files to our system
  - LFS also creates hooks that can run custom code in certain situations
- What would we do?
  - Use git to simultaneously track the code for creating the computational graph and the model checkpoint
  - Probably assume the end-user is responsible for making sure the things are in sync
- Action items:
  - Everyone watch the video
  - Consider forking LFS for core functionality?
  - Other systems: git annex, dvc
  - Write a hello world system that just circumvents git on diff/add/commit or something
  - Vishal will look into universal checkpoint format/backend
  - Anisha/Nikhil will look into using git but both are busy
Tasks/datasets to consider
- Talked about promptsource, spreadsheet from T0, etc.

from git-theta.

craffel commented on August 14, 2024

9/21/22

Git backend
- Haven't set up a hello world
- Nikhil trying to understand smudge and clean
- smudge happens when you move something to the staging area; creates the small diff file
- clean happens when moving stuff out of the staging area and gets rid of the diff file
- These could be the only commands that get filtered
- Make a diagram/plan of what happens when you run different commands
- Most useful thing when trying to setup this system: examples of what the end API we want is
- Action items:
  - Write down an API/what git should do under certain operations
  - In doing so, figure out if there are any places where our assumptions break
Model
- Assume the model is a text-to-text model that has a bunch of adapters and a learned router
- Open experimental questions:
  - What expert/adapter architecture should we use?
    - Compare different parameter-efficient adapter methods to feedforward network-style adapters/experts
  - If we can use factorizeable adapters (LoRA, IA3, etc.) then can we do a soft mixture of experts?
  - Pre-training
    - Should we do unsupervised pre-training -> multitask fine-tuning or multitask pre-training?
    - If we do multitask pre-training, should we allocate adapters for the pre-training task?
    - Should we just start with an existing pre-trained model?
  - Task generalization
    - If we apply the model to a new task for which there is no adapter, can it perform that task? If so, how does it do it?
    - Comparing no updates vs. adding a new adapter
    - Does performance get better as you add new tasks
  - If you add a new adapter, can you avoid updating the router?
    - Can you basically add a new "key" vector (class vector) to the router without updating the router
    - If not are there any router architectures that would encourage this behavior
  - Should we try to have separate domain/task/language routers
  - How should we actually train the router
    - If we're going to do supervised router training, is there a best way to do that (e.g. semi-supervised being better than supervised, or other regularizers)
  - Should we try other router architectures
    - Define the router in a more flexible way
  - Should we route examples or tokens or tasks
    - Task routing is from here: Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
- Determine an evaluation setting
  - What to use for held-out tasks?
    - RAFT
    - BIG bench
    - Held-out stuff from T0, FLAN, whatever...
    - Ideally we should have some generation tasks

from git-theta.

craffel commented on August 14, 2024

9/28/22

Git API
- Nikhil wrote down a not-very-comprehensive version
- Important filters are clean and smudge
  - clean runs a program whenever a thing is added
  - smudge runs a program when something is checked out
- Two workflows: One where the checkpoint is updated in-place, one where a .diff file is produced during training

from git-theta.

Comments (16)

February 2nd, 2022

Summary of work the previous week

Meeting Summary

Action Items for Next Week

February 9th, 2022

Summary of work the previous week

Meeting Summary

Action Items for Next Week

February 23rd, 2022

Meeting Summary

Action Items for Next Week

March 2nd, 2022

Meeting Summary

March 9th, 2022

March 23rd, 2022

Work from previous week

Meeting Summary

Next Week Action Item

Future Action Items

March 30th, 2022

Work from previous week

Meeting Summary

Next Week Action Item

April 6th, 2022

Work from previous week

Meeting Summary

Questions to Consider

Next Week Action Item

April 13th, 2022

Work from previous week

Meeting Summary

Next Week Action Item

April 20th, 2022

Work from previous week

Meeting Summary

Action Items for Next Week

Action Items to have a finished PoC

April 27th, 2022

Meeting Summary

Action Items for Next Week

May 4th, 2022

Meeting Summary

Action Items for Next Week

9/7/22

9/14/22

9/21/22

9/28/22

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org