Comments (12)
Same trend on 2k-3k
Method | 2000-2100 | 2000-3000 |
---|---|---|
ReAct | 0.5345 / 0.28 | 0.5735 / 0.328 |
Act | 0.674 / 0.38 | 0.67 / 0.352 |
from react.
hi can you show your code and example trajectory?
from react.
I'm using this notebook, and using API from azure, so I change the llm function(call GPT-3.5).
the final result:
from react.
can you show some trajs
from react.
Hi, I'm tring to run ReAct with GPT-3.5-Turbo on hotpot dataset with provided jupyter notebook. But only get 0.182 accuracy, is it a reasonable result? I think it is much lower than result showed in paper.
Hi, I got similar reults. I think it is the size of GPT-3.5-Turbo and alignment tax result in the low score. :-)
from react.
In fact, the results of ReAct are no longer as good as directly allowing GPT3.5 to reason. Why did this happen?
from react.
can you show some trajectories? also, try the original text-davinci-002 and see if scores also become lower?
from react.
It looks like we observed the same phenomenon on at least a subset of tasks on webshop benchmark.
We run react/act using the official code on webshop task 2000~2100 with gpt-3.5-turbo-instruct
The result is
- ReAct: 0.5345 avg reward, 0.28 success rate
- Act: 0.674 avg reward, 0.38 success rate
You can find the raw trajectories here
from react.
Here is running log of gpt4 ReACT, still get lower result (GPT4 get 0.33).
https://github.com/Luoyang144/share/blob/main/gpt4_hotpot_react.log
from react.
Interesting. Is it only on HotpotQA or more tasks? Also, maybe check if text-davanci-002 result is reproducible?
https://github.com/Luoyang144/share/blob/main/gpt4_hotpot_react.log cannot be opened.
from react.
text-davinci-002 is not available now.
This link should be accessible now: https://github.com/Luoyang144/share/blob/main/gpt4_hotpot_react.log
from react.
My hypothesis is that later models after text-davinci-002 might be tuned on trajectories similar to Act, plus domains like QA have intuitive tools, and tasks like HotPotQA have intuitive reasoning patterns. On more out-of-distribution domains and tasks (e.g., WebShop, or AlfWorld), reasoning should still improve decision making generalization and transparency.
Close it for now but let me know if there's more findings or analysis into this.
from react.
Related Issues (20)
- Alfworld GPT-3 Results HOT 3
- I got zero score running Webshop.ipython HOT 8
- Paper, table2 HOT 2
- Question about webshopEnv HOT 6
- Could you please tell me how to access the url in the WebShop.ipynb: http://3.83.245.205:3000 ? HOT 4
- Have you ever considered to apply ReAct prompting to numerical reasoning task? HOT 1
- Could you provide text-davinci-002 log on HotpotQA 500 (30.8EM)? HOT 1
- Potential Implementation error on Webshop
- Questions on Table 3 (AlfWorld) HOT 1
- Webshop experiment details for numbers in paper HOT 1
- Question for the code HOT 1
- WEBSHOP_URL = "http://3.83.245.205:3000" 遇到一些问题 HOT 4
- Davinci-002 HOT 1
- Old or New openai version HOT 2
- [Reproducing Results] on Alfworld HOT 3
- How can I install ReAct? HOT 2
- How to finetune the small REACT model
- cot->react & react->cot HOT 2
- Jupyter output on HotpotQA HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from react.