Giter Site home page Giter Site logo

benchmarks-v0's Introduction

This repo is used to run various AI benchmarks on Open Interpreter.

There is currently support for GAIA and SWE-bench


Setup

  1. Make sure the following software is installed on your computer.
  1. Run Docker

  2. Copy-paste the following into your terminal

git clone https://github.com/OpenInterpreter/benchmarks.git \
  && cd benchmarks \
  && python -m venv .venv \
  && source .venv/bin/activate \
  && python -m pip install -r requirements.txt \
  && docker build -t worker . \
  && python setup.py
  1. Enter your Huggingface token

Running Benchmarks

This section assumes:

  • benchmarks (downloaded via git in the preview section) is set as the current working directory.
  • You've activated the virtualenv with the installed prerequisite packages.
  • If using an OpenAI model, your OPENAI_API_KEY environment variable is set with a valid OpenAI API key.
  • If using a Groq model, your GROQ_API_KEY environment variable is set with a valid Groq API key.

Note: For running GAIA, you have to accept the conditions to access its files and content on Huggingface

Example: gpt-3.5-turbo, first 16 GAIA tasks, 8 docker containers

This command will output a file called output.csv containing the results of the benchmark.

python run_benchmarks.py \
  --command gpt35turbo \
  --ntasks 16 \
  --nworkers 8
  • --command gpt35turbo: Replace gpt35turbo with any existing key in the commands Dict in commands.py. Defaults to gpt35turbo.
  • --ntasks 16: Grabs the first 16 GAIA tasks to run. Defaults to all 165 GAIA validation tasks.
  • --nworkers 8: Number of docker containers to run at once. Defaults to whatever max_workers defaults to when constructing a ThreadPoolExecutor.

Troubleshooting

  • ModuleNotFoundError: No module named '_lzma' when running example.
  • ModuleNotFoundError: No module named 'pkg_resources' when running example.
    • Refer to this stackoverflow post for now.
    • OpenInterpreter should probably include setuptools in its list of dependencies, or should switch to another module that's in python's standard library.

benchmarks-v0's People

Contributors

imapersonman avatar mikebirdtech avatar killianlucas avatar

Stargazers

Satya Prakash Nayak  avatar  avatar  avatar Ben Steinher avatar victor avatar

Watchers

 avatar

benchmarks-v0's Issues

Unable to run

When I run

python run_benchmarks.py \
  --command gpt35turbo \
  --ntasks 16 \
  --nworkers 8

I get this error

Traceback (most recent call last):
  File "/Users/mike/oi-benchmarks/run_benchmarks.py", line 6, in <module>
    import gaia
  File "/Users/mike/oi-benchmarks/gaia.py", line 3, in <module>
    from datasets import load_dataset
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/datasets/__init__.py", line 43, in <module>
    from .arrow_dataset import Dataset
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 65, in <module>
    from .arrow_reader import ArrowReader
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/datasets/arrow_reader.py", line 30, in <module>
    from .download.download_config import DownloadConfig
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/datasets/download/__init__.py", line 10, in <module>
    from .streaming_download_manager import StreamingDownloadManager
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/datasets/download/streaming_download_manager.py", line 21, in <module>
    from ..filesystems import COMPRESSION_FILESYSTEMS
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/datasets/filesystems/__init__.py", line 16, in <module>
    from .s3filesystem import S3FileSystem  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/datasets/filesystems/s3filesystem.py", line 1, in <module>
    import s3fs
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/s3fs/__init__.py", line 1, in <module>
    from .core import S3FileSystem, S3File
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/s3fs/core.py", line 29, in <module>
    import aiobotocore.session
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/aiobotocore/session.py", line 1, in <module>
    from botocore import UNSIGNED, translate
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/botocore/translate.py", line 16, in <module>
    from botocore.utils import merge_dicts
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/botocore/utils.py", line 37, in <module>
    import botocore.httpsession
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/botocore/httpsession.py", line 22, in <module>
    from urllib3.util.ssl_ import (
ImportError: cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_' (/opt/homebrew/anaconda3/lib/python3.11/site-packages/urllib3/util/ssl_.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.