lupantech / chameleon-llm Goto Github PK
View Code? Open in Web Editor NEWCodes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
Home Page: https://chameleon-llm.github.io
License: Apache License 2.0
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
Home Page: https://chameleon-llm.github.io
License: Apache License 2.0
The example on the main page does not seem to work.
Am I missing something?
pip install -r requirements.txt
cd run_scienceqa
python run.py \
> --model chameleon \
> --label chameleon_gpt4 \
> --policy_engine gpt-4 \
> --kr_engine gpt-4 \
> --qg_engine gpt-4 \
> --sg_engine gpt-4 \
> --test_split test \
> --test_number -1
Traceback (most recent call last):
File "/home/marc/code/chameleon-llm/run_scienceqa/run.py", line 11, in <module>
from utilities import *
File "/home/marc/code/chameleon-llm/utilities.py", line 4, in <module>
import func_timeout
ModuleNotFoundError: No module named 'func_timeout'
../results/scienceqa/chameleon_chatgpt_minitest.json
Result file exists: ../results/scienceqa/chameleon_chatgpt_minitest.json
Count: 100, Correct: 44, Wrong: 56
I noticed that you have generated bing_search results during the chameleon run, can you open source this part of the file?
” bing_file“:”. /data/scienceqa/bing_responses.json”
Hi @lupantech, thank you for your excellent work.
I observed inconsistent accuracies on the minitest set. Specifically, I got acc_average values of 49.29 for gpt-3.5-turbo and 46.93 for Llama-2-7b, while gpt-3.5's reported test set accuracy is 79.93.
Upon analyzing the "true_false" values in chameleon_chatgpt_test_cache.jsonl with matching pids in minitest set, I calculated an accuracy of 0.7948.
Could you help to clarify this discrepancy or share your minitest evaluation results, if available?
Thank you for your work. When running the TabWMP dataset, I found that some of the examples are executing very slowly, is there any way you can speed them up?
It only takes 4-5 hours to complete on the Science QA dataset, but it can take 20 times longer on the TabWMP dataset.
Looking forward to hearing from you, thanks!
Hi, thanks for your awesome work.
I use GPT-3.5 Turbo as my model. When I run run.py in run_tabmwp, my program_generator generates program for the first question in TabMWP
question description:
Hannah baked cookies each day for a bake sale. How many more cookies did Hannah bake on Saturday than on Sunday?
Your sample program:
cookies_baked = {"Friday": 163, "Saturday": 281, "Sunday": 263}\nans = cookies_baked["Saturday"] - cookies_baked["Sunday"]
I'd like to ask what could be the reason for my generated program to look so unreasonable, even though I run it with exactly the same parameters as you did.This situation frequently occurs in other data of TabMWP, and it can be said that all the programs I generated are unreasonable, resulting in an extremely low average accuracy.
python run.py
--model chameleon
--label chameleon_chatgpt
--test_split test
--policy_engine gpt-3.5-turbo
--rl_engine gpt-3.5-turbo
--cl_engine gpt-3.5-turbo
--tv_engine gpt-3.5-turbo
--kr_engine gpt-3.5-turbo
--sg_engine gpt-3.5-turbo
--pg_engine gpt-3.5-turbo
--test_number 1000
--rl_cell_threshold 18
--cl_cell_threshold 18
Hi, thanks for your great work. I want to ask could this method apply to open VQA task where an open answer is needed instead of choosing from a given list?
The update_modules
function in run_scienceqa/model.py, default_modules = eval(["solution_generator", "answer_generator"])
, here executes eval() on a list
Hi authors,
Why use_caption is disabled in default? And do you use the caption for results reported in the paper?
Thanks a lot!
Hi,
I am writing to kindly request an update regarding the release of the code for the Image Captioner and Text Detector modules as promised in the README file of the project.
**" For the current version, the results for the Image Captioner and Text Detector are off-the-shelf and stored in data/scienceqa/captions.json and data/scienceqa/ocrs.json, respectively. The live calling these two modules are coming soon! "
As an eager user of the project, I'm excited to explore and utilize this module's functionality. Therefore, I am reaching out to kindly inquire about the current status of the code release.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.