fine-tuning-language-models-from-human-preferences-pytorch-implementation's Introduction

lm-human-preferences

This repo is a rewrite in Pytorch 2.0 of lm-human-preferences which contains code for the paper Fine-Tuning Language Models from Human Preferences, implemented using Tensorflow v1.

As I only have one single GPU with 24GB VRAM, I also made a number of changes to avoid OutOfMemoryError. Search "OutOfMemoryError" to see details.

Setup

Create a python environment, install packages listed in requirements.txt. My environment is Python 3.9.18, Pytorch 2.1.2.

Download pretrained checkpoint shared by OpenAI, and then run saved_models/prepare_pytorch_checkpoint.py to convert it to a Pytorch checkpoint. Tensorflow 2.x is need to load the downloaded Tensorflow checkpoint, which corresponds to the language model $\rho$ to initialize reward and policy model in the paper.

OpenAI's books dataset link is broken, I use bookcorpus dataset hosted by hugging face. To prepare book dataset, run datasets/books.py.

Run

Recommend Projects

wuwowuyi / fine-tuning-language-models-from-human-preferences-pytorch-implementation Goto Github PK

fine-tuning-language-models-from-human-preferences-pytorch-implementation's Introduction

lm-human-preferences

Setup

Run

fine-tuning-language-models-from-human-preferences-pytorch-implementation's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent