Giter Site home page Giter Site logo

jlindsey15 / automatic-circuit-discovery Goto Github PK

View Code? Open in Web Editor NEW

This project forked from arthurconmy/automatic-circuit-discovery

0.0 0.0 0.0 113.63 MB

License: MIT License

Shell 0.02% Python 8.79% TeX 0.03% Makefile 0.59% Jupyter Notebook 90.56% Dockerfile 0.02%

automatic-circuit-discovery's Introduction

Python Open Pull Requests

Automatic Circuit DisCovery

This is the accompanying code to the paper "Towards Automated Circuit Discovery for Mechanistic Interpretability".

  • โšก To run ACDC, see acdc/main.py, or this Colab notebook
  • ๐Ÿ”ง To see how edit edges in computational graphs in models, see notebooks/editing_edges.py or this Colab notebook
  • โ‡๏ธ To understand the low-level implementation of completely editable computational graphs, see this Colab notebook or notebooks/implementation_demo.py

This library builds upon the abstractions (HookPoints and standardised HookedTransformers) from TransformerLens ๐Ÿ”Ž

Installation:

First, install the system dependencies for either Mac or Linux.

Then, you need Python 3.8+ and Poetry to install ACDC, like so

git clone git+https://github.com/ArthurConmy/Automatic-Circuit-Discovery.git
cd Automatic-Circuit-Discovery
poetry env use 3.10      # Or be inside a conda or venv environment
                         # Python 3.10 is recommended but use any Python version >= 3.8
poetry install

System Dependencies

๐Ÿง Ubuntu Linux

sudo apt-get update && sudo apt-get install libgl1-mesa-glx graphviz build-essential graphviz-dev

You may also need apt-get install python3.x-dev where x is your Python version (also see the issue and pygraphviz installation troubleshooting)

๐ŸŽ Mac OS X

On Mac, you need to let pip (inside poetry) know about the path to the Graphviz libraries.

brew install graphviz
export CFLAGS="-I$(brew --prefix graphviz)/include"
export LDFLAGS="-L$(brew --prefix graphviz)/lib"

Reproducing results

To reproduce the Pareto Frontier of KL divergences against number of edges for ACDC runs, run python experiments/launch_induction.py. Similarly, python experiments/launch_sixteen_heads.py and python subnetwork_probing/train.py were used to generate individual data points for the other methods, using the CLI help. All these three commands can produce wandb runs. We use notebooks/roc_plot_generator.py to process data from wandb runs into JSON files (see experiments/results/plots_data/Makefile for the commands) and notebooks/make_plotly_plots.py to produce plots from these JSON files.

Tests

From the root directory, run

pytest -vvv -m "not slow"

This will only select tests not marked as slow. These tests take a long time, and are good to run occasionally, but not every time.

You can run the slow tests with

pytest -s -m slow

Contributing

We welcome issues where the code is unclear!

If your PR affects the main demo, rerun

chmod +x experiments/make_notebooks.sh
./experiments/make_notebooks.sh

to automatically turn the main.py into a working demo and check that no errors arise. It is essential that the notebooks converted here consist only of #%% [markdown] markdown-only cells, and #%% cells with code.

Citing ACDC

If you use ACDC, please reach out! You can reference the work as follows:

@misc{conmy2023automated,
      title={Towards Automated Circuit Discovery for Mechanistic Interpretability}, 
      author={Arthur Conmy and Augustine N. Mavor-Parker and Aengus Lynch and Stefan Heimersheim and Adri{\`a} Garriga-Alonso},
      year={2023},
      eprint={2304.14997},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

TODO

Mostly finished TODO list

[ x ] Make TransformerLens install be Neel's code not my PR

[ x ] Add hook_mlp_in to TransformerLens and delete hook_resid_mid (and test to ensure no bad things?)

[ x ] Delete arthur-try-merge-tl references from the repo

[ x ] Make notebook on abstractions

[ ? ] Fix huge edge sizes in Induction Main example and change that occurred

[ x ] Find a better way to deal with the versioning on the Colabs installs...

[ ] Neuron-level experiments

[ ] Position-level experiments

[ ] Edge gradient descent experiments

[ ] Implement the circuit breaking paper

[ x ] tracr and other dependencies better managed

[ ? ] Make SP tests work (lots outdated so skipped) - and check SubnetworkProbing installs properly (no init.pys !!!)

[ ? ] Make the 9 tests also failing on TransformerLens-main pass

[ x ] Remove Codebase under construction

automatic-circuit-discovery's People

Contributors

arthurconmy avatar rhaps0dy avatar neelnanda-io avatar arthurdupe avatar avariengien avatar alan-cooney avatar rusheb avatar smithjessk avatar ckkissane avatar jbloomaus avatar slavachalnev avatar jaybaileycs avatar joelburget avatar dkamm avatar afspies avatar glerzing avatar aslvrstn avatar daspartho avatar adamyedidia avatar zshn-gvg avatar lukasberglund avatar koayon avatar derpyplops avatar xmaster6y avatar ufo-101 avatar stefan-heimersheim avatar seanwentzel avatar aprillion avatar 0amp avatar meg-tong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.