Giter Site home page Giter Site logo

openadaptai / self-rewarding-language-models Goto Github PK

View Code? Open in Web Editor NEW

This project forked from oxen-ai/self-rewarding-language-models

0.0 0.0 0.0 775 KB

This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.

Shell 9.97% Python 90.03%

self-rewarding-language-models's Introduction

๐Ÿ‚ Oxen.ai Self-Rewarding Language Models ๐Ÿ”

This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.

Every Friday we get together for a paper club called Arxiv Dives where we read interesting research papers. We thought the Self-Rewarding Language Models paper felt very approachable and reproducible, so we spent some time implementing it.

If you want to learn more about Self-Rewarding Language Models you can find our deep dive on it here.

๐Ÿค– Goal

The goal is to have a single script that can take in a base LLM and put it into a Self-Reward loop. The initial experiments were run with mistralai/Mistral-7B-v0.1 as the base model, but in theory could be run with any model.

./self-reward.sh scripts mistralai/Mistral-7B-v0.1 M0

Currently this script will get you from M0 to M1, but in theory we can wrap it in a loop and kick off a self-reward cycle.

๐Ÿƒโ€โžก๏ธ Steps

There are 5 main steps in each iteration of the Self-Rewarding loop.

  1. 00_sft.py - Supervised Fine-Tuning (SFT) of a base model to give it instruction following and evaluation skills.
  2. 01_gen_prompts.py - Generate new prompts to add to the training set.
  3. 02_gen_prompts.py - Generate N Responses per prompt, so that we can create preference pairs.
  4. 03_gen_scores.py - Score each response from 1-5 for how well it answered the prompt.
  5. 04_gen_preferences.py - Generate preference pairs given the scores to create a DPO dataset
  6. 05_dpo.py - Run Direct Preference Optimization (DPO) to train the next iteration of the model

๐Ÿ‚ Setup Oxen.ai

We use Oxen.ai to version the intermediate models and datasets that are generated throughout the process.

If you are not familiar with Oxen.ai, it is an open source, blazing fast, version control system that is built from the ground up to handle large model files, large datasets, and large sets of multi-modal data that is a pain to version in git or git-lfs.

Feel free to checkout our GitHub project to learn more.

๐ŸŒŽ Create Remote Data Repository

If you have not already, create an account on Oxen.ai. This script is setup to upload all the intermediate steps to an Oxen.ai data repository so that we can explore the data the model is generating, as well as version each intermediate step.

Once you have an account, you can create your repository.

๐Ÿ‘จโ€๐Ÿ’ป Clone Locally

Clone a data repository to your local machine to get Oxen ready to version the data.

export USERNAME=my-username
export REPOSITORY_NAME=my-repo-name
oxen clone https://hub.oxen.ai/$USERNAME/$REPOSITORY_NAME
cd $REPOSITORY_NAME

You can copy the command in the upper right hand corner of the page to get the exact URL to clone. In the screenshot below it is:

oxen clone https://hub.oxen.ai/oxbot/My-SRLM

โฌ‡๏ธ Download Starter Data

Download the initial datasets from our datasets/Self-Rewarding-Language-Models Oxen.ai data repository. We took care of cleaning up the initial datasets so you can copy them into your own reward loop.

mkdir -p M0/train
oxen download datasets/Self-Rewarding-Language-Models M0/train/ift+eft.jsonl -o M0/train
oxen download datasets/Self-Rewarding-Language-Models M0/train/ift.jsonl -o M0/train

Use the add and commit commands to track the initial training data and push it to your own Oxen.ai repository.

oxen add M0
oxen commit -m "adding initial ift & eft training data"
oxen push origin main

If you are familiar with git, the Oxen command line tool should be pretty intuitive.

โšฝ๏ธ Kick it off

Run the self-reward.sh script to generate the first end to end model

./self-reward.sh scripts mistralai/Mistral-7B-v0.1 M0

TODO: Put this in a loop for M0, M1, M2, etc...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.