Giter Site home page Giter Site logo

zero-shot-replication's Introduction

Zero-Shot Replication Framework

Overview

The Zero-Shot Replication Framework is a minimal environment designed to replicate zero-shot results from past academic papers. It currently supports OpenAI, Anthropic, and HuggingFace models to generate completions for various datasets and provides tools for handling, evaluating, and storing these completions.

Results (all models accessed on 08/24-8/25, 2023)

Category gpt-3.5-turbo-0301 gpt-3.5-turbo-0613 claude-2 gpt-4-0314 gpt-4-0613 wizard-coder-34b gpt-4 Baseline Sources
Standard Bench
HumanEval 67.0 61.5 65.2 86.0 84.1 70.7 67.0 [1]
HumanEval+ 59.1 54.2 54.9 80.5 74.4 60.3 N/A
MATH 35.4 37.2 17.6 51.6 50.3 N/A 42.2 [3]
LeetCodeSparks [1,2]
Easy 60.0 76.2 52.4 76.2 61.2 38.1 68.2-75.6 [1,2]*
Medium 15.0 22.0 9.8 19.5 31.7 12.2 26.7-40.0 [1,2]*
Hard 0.0 0.0 0.0 4.6 13.6 0.0 6.6-10.7 [1,2]*
LeetCode100
Easy 83.0 80.0 73.0 91.0 88.0 71.0 N/A
Medium 16.0 16.0 16.0 26.0 21.0 9.0 N/A
Hard 1.0 3.0 2.0 6.0 6.0 2.0 N/A

*The gpt-4 LeetCodeSparks baseline is approximate, as we do not see a precise list of LeetCode problems listed in the referenced reports. We define 'LeetCodeSparks' as the 84 problems used for the human evaluation measurement mentioned in [2]

'LeetCode_100' is an expected out-of-sample dataset we introduce of 100 recent easy, medium, and hard LeetCode problems. The problems live in the range 2554-2818.

Features

  • Easy configuration of models and parameters.
  • Ability to choose datasets to run on.
  • Extensibility through a pluggable problem generator.

Requirements

  • Python >= 3.10 and < 3.12
  • Poetry for package management

Min. Dependencies

  • anthropic: "0.3.10"
  • astunparse: "1.6.3"
  • black: ^23.3.0
  • evalplus: ^0.1.6
  • numpy: "^1.25.2"
  • openai: 0.27.8
  • pandas: ^2.0.3
  • python-dotenv: ^1.0.0
  • python-leetcode: "1.2.1"

Extra Dependencies

  • automata
  • transformers: "^4.32.0"
  • torch: "1.13.1"
  • accelerate: "^0.22.0"
  • sentencepiece: "^0.1.99"
  • protobuf: "^4.24.1"

Dev Dependencies

  • flake8: "6.1.0"
  • isort: "5.12.0"
  • mypy: "^1.5.1"
  • pre-commit: "^3.3.3"
  • sourcery: "^1.6.0"
  • types-requests: "^2.31.0.2"
  • types-attrs: "^19.1.0"
  • yapf: "0.40.1"

Installation

Make sure you have Poetry installed, then clone the repository and install the dependencies.

git clone https://github.com/your-username/zero-shot-replication.git
cd zero-shot-replication
git submodule update --init --recursive
poetry install # to install automata, poetry install -E automata
cp .env.example .env # Copy the example environment file
# Edit the .env file to add your OpenAI API key, etc.


# Optional

# If developing, install the pre-commit hooks
# pre-commit install 

# If using automata, install the repo
# git submodule add -f https://github.com/emrgnt-cmplxty/zero-shot-replication.git zero_shot_replication/automata

Usage

You can run the zero-shot replication by executing the runner.py file with various command-line arguments.

poetry run python runner.py --provider openai --dataset human-eval --model gpt-4-0613 --temperature 0.7

Command-Line Arguments

  • --provider: Which provider to use for zero-shot completions (default: "openai").
  • --dataset: Which dataset to run on (default: "human-eval").
  • --model: Model name to load from the provider (default: "gpt-3.5-turbo").
  • --temperature: Temperature parameter for the provided model (default: 0.7).
  • --output_file_name: Filename to override the default output file name with.

To see explicit commands ran to generate the reported results, check out the commands.md menu.

License

This project is licensed under the Apache-2.0 License.

Sources

[1] GPT-4 Technical Report

[2] Sparks of Artificial General Intelligence

[3] Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

zero-shot-replication's People

Contributors

emrgnt-cmplxty avatar nolantrem avatar yifever avatar brutalsavage avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.