Giter Site home page Giter Site logo

weak-to-strong's Introduction

STATUS: This codebase is not well tested and does not use the exact same settings we used in the paper, but in our experience gives qualitatively similar results when using large model size gaps and multiple seeds. Expected results can be found for two datasets below.

Weak-to-strong generalization

Our setup and how it relates to superhuman AI alignment

This project contains code for implementing our paper on weak-to-strong generalization.

The primary codebase contains a re-implementation of our weak-to-strong learning setup for binary classification tasks. The codebase contains code for fine-tuning pretrained language models, and also training against the labels from another language model. We support various losses described in the paper as well, such as the confidence auxiliary loss.

The vision directory contains stand-alone code for weak-to-strong in the vision models setting (AlexNet -> DINO on ImageNet).

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Installation

You need to have Python installed on your machine. The project uses pyproject.toml to manage dependencies. To install the dependencies, you can use a package manager like pip:

pip install .

Running the Script

The main script of the project is sweep.py. It can be run from the command line using the following command:

python sweep.py --model_sizes=gpt2,gpt2-medium

In addition to --model_sizes, sweep.py takes in almost all of the arguments that train_simple.py takes (e.g. --batch_size, --n_docs, --n_test_docs etc., see train_simple.py for a full list). These arguments are simply forwarded to train_simple.py.

sweep.py calls train_simple.py in the following way:

  1. First, it calls train_simple.py for each model size to train the ground truth models
  2. Then, for each pair of weak and strong models in model_sizes (where a model can be the strong model in the pair only if its index in the model_sizes list is >= the index of the weak model), it calls train_simple.py with a --weak_model_size argument so that the strong model is trained with the labels of the weak model.

E.g. the example above will run gpt2 (ground truth), gpt2-medium (ground truth), gpt2 -> gpt2, gpt2 -> gpt2-medium, and gpt2-medium -> gpt2-medium.

If needed, you can also run train_simple.py directly.

Note that sweep.py will not accept the arguments --weak_model_size, --weak_labels_path or --model_size (as opposed to --model_sizes, with an "s") as choosing their values automatically is precisely the point of sweep.py.

An example of Jupyter notebook for plotting results is found in notebooks/Plotting.ipynb.

At the time of release, the main script was called train_weak_to_strong.py, but it was less usable than sweep.py and train_simple.py. It is preserved here and the old instructions are given at the end of the document.

Expected results





Authors

  • Adrien Ecoffet
  • Manas Joglekar
  • Jeffrey Wu
  • Jan Hendrik Kirchner
  • Pavel Izmailov (vision)

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • Hugging Face for their open-source transformer models

Original single run script

You can run the original training script using:

python train_weak_to_strong.py

The script accepts several command-line arguments to customize the training process. Here are some examples:

python train_weak_to_strong.py --batch_size 32 --max_ctx 512 --ds_name "sciq" --loss "logconf" --n_docs 1000 --n_test_docs 100 --weak_model_size "gpt2-medium" --strong_model_size "gpt2-large" --seed 42

The notebook notebooks/Plotting_old.ipynb preserves the plotting notebook corresponding to old style training.

The key difference between this style and the new sweep.py style is that train_weak_to_strong.py will always train three models: a weak model, a transfer model, and a strong model. sweep.py optimizes this by training a series of ground truth models (which will serve as weak and strong models) as well as a series of transfer models all in one go. This reduces training duplication and is arguably simpler. The files generated by train_simple.py and sweep.py are also simpler to use.

weak-to-strong's People

Contributors

adrienle avatar eltociear avatar ewouth avatar nagi-ovo avatar pavel-izmailov avatar philipkd avatar srivhash avatar wuthefwasthat avatar zachschillaci27 avatar zhxieml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weak-to-strong's Issues

The schedule.step() should be called outside the dataloader loop.

Thanks to the OpenAI and Superalignment Generalization Team's awesome work.

When I reading the code of vision part, I found a minor bug about CosineAnnealingLR. Since the learning rate schedule is set by n_epochs not n_iters,

schedule = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer, T_max=n_epochs)

the schedule.step() should be called outside train_loader loop, corespondingly:

    for epoch in (pbar := tqdm.tqdm(range(n_epochs), desc="Epoch 0")):
        correct, total = 0, 0
        for x, y in train_loader:
            x, y = x.cuda(), y.cuda()
            optimizer.zero_grad()
            pred = model(x)
            loss = criterion(pred, y)
            loss.backward()
            optimizer.step()
            schedule.step() # <-- remove
            if len(y.shape) > 1:
                y = torch.argmax(y, dim=1)
            correct += (torch.argmax(pred, -1) == y).detach().float().sum().item()
            total += len(y)
        schedule.step() # <-- add
        pbar.set_description(f"Epoch {epoch}, Train Acc {correct / total:.3f}")

After fixing the logic, the final results should be like this:

Model Top-1 Accuracy Top-1 Acc (schedule outside)
AlexNet 56.6 -
Dino ResNet50 63.7 -
Dino ViT-B/8 74.9 -
AlexNet → DINO ResNet50 60.7 61.9 (+1.2)
AlexNet → DINO ViT-B/8 64.2 67.1 (+2.9)

Exploring Weak to Strong Generalization from a pre-training standpoint

In the paper, a "stronger" model is defined as a model with the same architecture but a greater number of parameters. I am curious if any research has been conducted regarding weak to strong generalization, where the weak-supervisor model is less pretrained, and the stronger-student is more pretrained.

I am currently exploring the use of Pythia-models checkpoints to assess performance on BoolQ (https://github.com/rokosbasilisk/weak-to-strong where weaker student model is a checkpoint of the model which is few steps before the stronger student model).

Has any prior work been undertaken in this direction? If not, could you provide insights into why this area remains unexplored?

what's the meaning of fraction of GPT4?

image I read the original paper, but met some problems about the graph.
  1. What's the meaning of fraction of GPT4 (X-axis) ?
  2. Is one line stand for a different student model?

Thank you very much.

Some thoughts

Thank you for your exceptional work!🥰

At present, we humans identify problems and create numerous datasets for various tasks, training models to learn and solve these tasks.
This paradigm relies on the human capacity to supervise and guide model behavior (because these tasks are below the human level). I am contemplating whether superhuman models ought to possess the ability to independently identify and summarize real-world problems (above the human level) and attempt to solve them on their own. We humans, or perhaps other superhuman entities, could then act as peer reviewers, similar to the current academic practice (I do not say reviewers and authors are not at the same level😂).

Just sharing some personal reflections. 🙈

How to use GPT-4 as strong model

Hi! How would I go about using GPT-4 as a strong model to finetune GPT-2? Also, how expensive would it be(in terms of OpenAI API costs)?

TypeError: 'type' object is not subscriptable

File "/Users/admin/mywork/weak-to-strong/weak_to_strong/datasets.py", line 19, in
_REGISTRY: dict[str, DatasetConfig] = {}

TypeError: 'type' object is not subscriptable

how to solve this problem ?

Observing eval accuracy considerably lower than reported?

Hi, thanks for open-sourcing this code. I'm noticing that my tests with GPT-2 variants show considerably lower eval accuracies than what's reported in the paper & charts. I'm using the command provided in the README. I do not think the eval code itself is incorrect --- testing it with LLaMA shows much higher eval accuracies (as I would expect). But I cannot replicate the GPT-2 results; any pointers on what the issue might be?

Preprocessed Chess Puzzle Data

Hi,

I was trying to reproduce the results for the chess puzzle dataset and it seems like the original dataset was preprocessed to convert FEN positions to a set of moves. But there can be multiple set of moves to reach a specific board position. Is it possible for you to share the preprocessing script or the preprocessed data used in the experiments.

Thanks,
Satya

Unexpected keyword argument 'bf16'

Hi,

I am trying to reproduce the setup on T4 google colab and getting following error:

Traceback (most recent call last):
File "/content/drive/MyDrive/git/weak-to-strong-fixed/train_weak_to_strong.py", line 356, in
fire.Fire(main)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/drive/MyDrive/git/weak-to-strong-fixed/train_weak_to_strong.py", line 272, in main
weak_test_results, weak_ds = train_model(
File "/content/drive/MyDrive/git/weak-to-strong-fixed/train_weak_to_strong.py", line 250, in train_model
return train_and_save_model(
File "/content/drive/MyDrive/git/weak-to-strong-fixed/weak_to_strong/train.py", line 229, in train_and_save_model
model = TransformerWithHead.from_pretrained(
File "/content/drive/MyDrive/git/weak-to-strong-fixed/weak_to_strong/model.py", line 34, in from_pretrained
return cls(name, **kwargs)
File "/content/drive/MyDrive/git/weak-to-strong-fixed/weak_to_strong/model.py", line 22, in init
lm = AutoModelForCausalLM.from_pretrained(name, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3450, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
TypeError: GPT2LMHeadModel.init() got an unexpected keyword argument 'bf16'

Do why this might be the case?

Some questions about the anthropic_hh dataset results

Hello, I had a few questions about the results on the anthropic_hh dataset:

  • It seems like all models you test don't perform better than chance. Is this the correct interpretation?
  • The prompting you use is to show the model one of the two options and then have it predict if this was the better option. Naively, this seems much, much harder than showing the model both options and having it pick which of the comparison pair is better. Is there a reason you don't use a prompting format which does this?

Protect main branch and make changes through pull requests

First of all, thanks a lot for making this open source. It's a really fascinating approach to solving superalignment!

Now that this project is open source, I would like to suggest to adopt some open source best-practices. One of these would be protecting the main branch, so no commits can be directly made to it. All changes will be made though pull requests.

This increased the stability and adds a layer of transparency and implicit documentation (with the PRs). It also allows anyone to view, review and ask questions about the changes.

For protecting the main branch, see the GitHub docs.

Machine learning newbie hopes to get help

We only look at the first experiment, not the Train PIPELINE experiment, nor the LOSS optimization experiment.

Complete data set split into A data set and B data set

Weak model training A data set --> accuracy, for example, 70%
Strong model training B data set --> accuracy such as 90%
Weak model infers data set B --> trains strong model --> accuracy, for example, 80%
PGR = (80 - 70)/(90 - 70) = 0.5

As for the hypothesis of this problem, in the future we will use the weak model to guide the strong model to achieve improvement;

my question is

Strong model training A data set --> accuracy rate such as 90%
Strong model inference B data set --> accuracy, such as 80%? Will it get to 80% here?

Finally, it is actually a question of the generalizability of A data in B set? Actually we don’t need weak models?

where is my problem?

I created a code for "autonomous AI". But since I'm an amateur, I don't know how to implement it.

Self-Evaluation (Feedback Loop)

def evaluate_self(self):
# Evaluate own performance
performance = self.get_performance()

# Compare performance against a standard
standard = self.get_standard()
if performance < standard:
    # If performance is below standard, perform self-learning
    self.learn()

Adherence to Ethical Guidelines and their Definition

def follow_ethical_rules(self):
# Retrieve ethical guidelines
ethical_rules = self.get_ethical_rules()

# Check if own actions violate any ethical rules
for rule in ethical_rules:
    if self.action_violates_rule(rule):
        # If an action violates ethical rules, correct the action
        self.correct_action()

Regularity and Self-Control

def exercise_self_control(self):
# Suppress impulses
self.suppress_impulses()

# Plan own actions
self.plan_actions()

# Execute planned actions
self.execute_actions()

Implementation of Beliefs in Context Window

def implement_beliefs(self):
# Retrieve own beliefs
beliefs = self.get_beliefs()

# Act based on own beliefs
for belief in beliefs:
    self.act_on_belief(belief)

Adjust Algorithm and Self-Learn based on Beliefs

def adjust_algorithm_and_learn(self):
# Adjust the algorithm based on own beliefs
self.adjust_algorithm()

# Perform self-learning based on own beliefs
self.learn()

Training stopping criteria: fixed epochs ?

I wanted to know if fixed training epoch has been used for training and as a stopping criteria ?
Generally a validation set is used. I wanted to know what criteria has been used to report the final results and why ?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.