lupantech / chameleon-llm Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 84.0 226.64 MB

Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".

Home Page: https://chameleon-llm.github.io

License: Apache License 2.0

Shell 0.06% Jupyter Notebook 75.29% Python 24.64%

ai chatgpt gpt-4 llm openai python tool

chameleon-llm's People

Contributors

Stargazers

Watchers

Forkers

jamesthesnake dumpmemory plurigrid codeaudit doctorslimm phil-aid theonetrueguy ethanxli curiosity007 chuanyang-zheng evnydd0sf yifangao96 qzchenwl qqq-tech standardgalactic aruncivicscience yyht schwabischesbauernbrot amirulandalib techthiyanes i-z-z-y git-abouvier doctaf datacraft-ai cyd3nt erickwill stricklandf ykyou sudosu4pp jeromyjsmith kunlun-zhu pratik-behera nitishymtpl qmpham davgit czha168 awesome-openai rapidlugo nextblock-ai kaishengyao dxyy2019 grv805 henry-zeng kzke ai-psa viir touristshaun romilly nashid atlaspilotpuppy hhy5277 evdcush thehemi twilwa megaman222111 cafonso s1x-data-team frame-tech-solutions-ltd-co jxzhangjhu mrubash1 animesh ahatamiz myrichardx 2132660698 bytjn1416124 sycomix gyunggyung cartlandzhou utkarshx dilankiran apollohuang1 onecloner cemberk mwksandman csn6666 guspan-tanadi omarcr huzongxiang allthingsllm xingbpshen saiabhilash696 theuerc

chameleon-llm's Issues

ModuleNotFoundError: No module named 'func_timeout'

The example on the main page does not seem to work.
Am I missing something?

pip install -r requirements.txt
cd run_scienceqa

python run.py \
> --model chameleon \
> --label chameleon_gpt4 \
> --policy_engine gpt-4 \
> --kr_engine gpt-4 \
> --qg_engine gpt-4 \
> --sg_engine gpt-4 \
> --test_split test \
> --test_number -1
Traceback (most recent call last):
  File "/home/marc/code/chameleon-llm/run_scienceqa/run.py", line 11, in <module>
    from utilities import *
  File "/home/marc/code/chameleon-llm/utilities.py", line 4, in <module>
    import func_timeout
ModuleNotFoundError: No module named 'func_timeout'

What this mean?

../results/scienceqa/chameleon_chatgpt_minitest.json
Result file exists: ../results/scienceqa/chameleon_chatgpt_minitest.json
Count: 100, Correct: 44, Wrong: 56

Request for release "bing_file": "../data/scienceqa/bing_responses.json"

I noticed that you have generated bing_search results during the chameleon run, can you open source this part of the file?
” bing_file“:”. /data/scienceqa/bing_responses.json”

Discrepancy in accuracy on minitest set for gpt-3.5-turbo

Hi @lupantech, thank you for your excellent work.

I observed inconsistent accuracies on the minitest set. Specifically, I got acc_average values of 49.29 for gpt-3.5-turbo and 46.93 for Llama-2-7b, while gpt-3.5's reported test set accuracy is 79.93.

Upon analyzing the "true_false" values in chameleon_chatgpt_test_cache.jsonl with matching pids in minitest set, I calculated an accuracy of 0.7948.

Could you help to clarify this discrepancy or share your minitest evaluation results, if available?

Questions on TabWMP

Thank you for your work. When running the TabWMP dataset, I found that some of the examples are executing very slowly, is there any way you can speed them up?
It only takes 4-5 hours to complete on the Science QA dataset, but it can take 20 times longer on the TabWMP dataset.
Looking forward to hearing from you, thanks!

An issue: program_generator on TabMWP

Hi, thanks for your awesome work.

I use GPT-3.5 Turbo as my model. When I run run.py in run_tabmwp, my program_generator generates program for the first question in TabMWP

question description:
Hannah baked cookies each day for a bake sale. How many more cookies did Hannah bake on Saturday than on Sunday?

My program generated:

Your sample program:
cookies_baked = {"Friday": 163, "Saturday": 281, "Sunday": 263}\nans = cookies_baked["Saturday"] - cookies_baked["Sunday"]

I'd like to ask what could be the reason for my generated program to look so unreasonable, even though I run it with exactly the same parameters as you did.This situation frequently occurs in other data of TabMWP, and it can be said that all the programs I generated are unreasonable, resulting in an extremely low average accuracy.

python run.py
--model chameleon
--label chameleon_chatgpt
--test_split test
--policy_engine gpt-3.5-turbo
--rl_engine gpt-3.5-turbo
--cl_engine gpt-3.5-turbo
--tv_engine gpt-3.5-turbo
--kr_engine gpt-3.5-turbo
--sg_engine gpt-3.5-turbo
--pg_engine gpt-3.5-turbo
--test_number 1000
--rl_cell_threshold 18
--cl_cell_threshold 18

For open VQA task

Hi, thanks for your great work. I want to ask could this method apply to open VQA task where an open answer is needed instead of choosing from a given list?

Planner issues

Hello, I would like to ask if the planner here refers to the natural language planner or is there another name?

the code is a bit wrong?

The update_modules function in run_scienceqa/model.py, default_modules = eval(["solution_generator", "answer_generator"]), here executes eval() on a list

Why use_caption is disabled in default?

Hi authors,

Why use_caption is disabled in default? And do you use the caption for results reported in the paper?
Thanks a lot!

Request for Release of Image Captioner and Text Detector modules

Hi,

I am writing to kindly request an update regarding the release of the code for the Image Captioner and Text Detector modules as promised in the README file of the project.

**" For the current version, the results for the Image Captioner and Text Detector are off-the-shelf and stored in data/scienceqa/captions.json and data/scienceqa/ocrs.json, respectively. The live calling these two modules are coming soon! "

As an eager user of the project, I'm excited to explore and utilize this module's functionality. Therefore, I am reaching out to kindly inquire about the current status of the code release.

lupantech / chameleon-llm Goto Github PK

chameleon-llm's People

Contributors

Stargazers

Watchers

Forkers

chameleon-llm's Issues

ModuleNotFoundError: No module named 'func_timeout'

What this mean?

Request for release "bing_file": "../data/scienceqa/bing_responses.json"

Discrepancy in accuracy on minitest set for gpt-3.5-turbo

Questions on TabWMP

An issue: program_generator on TabMWP

For open VQA task

Planner issues

the code is a bit wrong?

Why use_caption is disabled in default?

Request for Release of Image Captioner and Text Detector modules

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent