Zero-Shot Replication Framework

Overview

The Zero-Shot Replication Framework is a minimal environment designed to replicate zero-shot results from past academic papers. It currently supports OpenAI, Anthropic, and HuggingFace models to generate completions for various datasets and provides tools for handling, evaluating, and storing these completions.

Results (all models accessed on 08/24-8/25, 2023)

Category	gpt-3.5-turbo-0301	gpt-3.5-turbo-0613	claude-2	gpt-4-0314	gpt-4-0613	wizard-coder-34b	gpt-4 Baseline	Sources
Standard Bench
HumanEval	67.0	61.5	65.2	86.0	84.1	70.7	67.0	[1]
HumanEval+	59.1	54.2	54.9	80.5	74.4	60.3	N/A
MATH	35.4	37.2	17.6	51.6	50.3	N/A	42.2	[3]
LeetCodeSparks								[1,2]
Easy	60.0	76.2	52.4	76.2	61.2	38.1	68.2-75.6	[1,2]*
Medium	15.0	22.0	9.8	19.5	31.7	12.2	26.7-40.0	[1,2]*
Hard	0.0	0.0	0.0	4.6	13.6	0.0	6.6-10.7	[1,2]*
LeetCode100
Easy	83.0	80.0	73.0	91.0	88.0	71.0	N/A
Medium	16.0	16.0	16.0	26.0	21.0	9.0	N/A
Hard	1.0	3.0	2.0	6.0	6.0	2.0	N/A

*The gpt-4 LeetCodeSparks baseline is approximate, as we do not see a precise list of LeetCode problems listed in the referenced reports. We define 'LeetCodeSparks' as the 84 problems used for the human evaluation measurement mentioned in [2]

'LeetCode_100' is an expected out-of-sample dataset we introduce of 100 recent easy, medium, and hard LeetCode problems. The problems live in the range 2554-2818.

Features

Easy configuration of models and parameters.
Ability to choose datasets to run on.
Extensibility through a pluggable problem generator.

Requirements

Python >= 3.10 and < 3.12
Poetry for package management

Min. Dependencies

anthropic: "0.3.10"
astunparse: "1.6.3"
black: ^23.3.0
evalplus: ^0.1.6
numpy: "^1.25.2"
openai: 0.27.8
pandas: ^2.0.3
python-dotenv: ^1.0.0
python-leetcode: "1.2.1"

Extra Dependencies

automata
transformers: "^4.32.0"
torch: "1.13.1"
accelerate: "^0.22.0"
sentencepiece: "^0.1.99"
protobuf: "^4.24.1"

Dev Dependencies

flake8: "6.1.0"
isort: "5.12.0"
mypy: "^1.5.1"
pre-commit: "^3.3.3"
sourcery: "^1.6.0"
types-requests: "^2.31.0.2"
types-attrs: "^19.1.0"
yapf: "0.40.1"

Installation

Make sure you have Poetry installed, then clone the repository and install the dependencies.

git clone https://github.com/your-username/zero-shot-replication.git
cd zero-shot-replication
git submodule update --init --recursive
poetry install # to install automata, poetry install -E automata
cp .env.example .env # Copy the example environment file
# Edit the .env file to add your OpenAI API key, etc.


# Optional

# If developing, install the pre-commit hooks
# pre-commit install 

# If using automata, install the repo
# git submodule add -f https://github.com/emrgnt-cmplxty/zero-shot-replication.git zero_shot_replication/automata

Usage

You can run the zero-shot replication by executing the runner.py file with various command-line arguments.

poetry run python runner.py --provider openai --dataset human-eval --model gpt-4-0613 --temperature 0.7

Command-Line Arguments

--provider: Which provider to use for zero-shot completions (default: "openai").
--dataset: Which dataset to run on (default: "human-eval").
--model: Model name to load from the provider (default: "gpt-3.5-turbo").
--temperature: Temperature parameter for the provided model (default: 0.7).
--output_file_name: Filename to override the default output file name with.

To see explicit commands ran to generate the reported results, check out the commands.md menu.

License

This project is licensed under the Apache-2.0 License.

Sources

[1] GPT-4 Technical Report

[2] Sparks of Artificial General Intelligence

[3] Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

fjsnogueira / zero-shot-replication Goto Github PK

zero-shot-replication's Introduction

Zero-Shot Replication Framework

Overview

Results (all models accessed on 08/24-8/25, 2023)

Features

Requirements

Min. Dependencies

Extra Dependencies

Dev Dependencies

Installation

Usage

Command-Line Arguments

License

Sources

zero-shot-replication's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent