Giter Site home page Giter Site logo

sparse_coding's Introduction

Work done with Logan Riggs who wrote the original replication notebook. Thanks to Pierre Peigne for the data generating code and Lee Sharkey for answering questions.

Sparse Coding

python replicate_toy_models.py runs code which allows for the replication of the first half of the post Taking features out of superposition with sparse autoencoders.

run.py contains a more flexible set of functions for generating datasets using Pile10k and then running sparse coding activations on real models, incluidng gpt-2-small and custom models.

The repo also contains utils for running code on vast.ai computers which can speed up these sweeps.

## Automatic Interpretation

Currently using OpenAI's automoatic-interpretability repo but can't get it to install so the repo currently works by installing the automatic-interpretability/neuron-explainer/neuron-explainer cloned and saved in the top level of the repo, under the name neuron_explainer.

Training a custom small transformer

The next part of the sparse coding work uses a very small transformer to do some early tests using sparse autoencoders to find features. There doesn't appear to be an open-source model of this kind, and the original model is proprietary, so below are the instructions I followed to create a similar small transformer.

Make sure you have >200GB space. Tested using a vast.ai RTX3090 and pytorch:latest docker image.

git clone https://github.com/karpathy/nanoGPT
cd nanoGPT
python -m venv .env
source .env/bin/activate
apt install -y build-essential
pip install torch numpy transformers datasets tiktoken wandb tqdm

Change config/train_gpt2.py to have:

import time
wandb_project = 'sparsecode'
wandb_run_name = 'supertiny-' + str(time.time())
n_layer = 6 # (same as train_shakespeare and Lee's work)
n_embd = 16 # (same as Lee's)
n_head = 8 # (needs to divide n_embd)
dropout = 0.2 # (used in shakespeare_char)
block_size = 256 # (just to make faster?)
batch_size = 64

To set up the dataset run:

python data/openwebtext/prepare.py

Then if using multiple gpus, run:

torchrun --standalone --nproc_per_node={N_GPU} train.py config/train_gpt2.py

else simply run:

python train.py

sparse_coding's People

Contributors

hoagyc avatar loganriggs avatar baidicoot avatar julie-steele avatar roberthuben avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.