mohammadrezapourreza / few-shot-nl2sql-with-prompting Goto Github PK

License: MIT License

Python 100.00%

few-shot-nl2sql-with-prompting's Introduction

Few-shot-NL2SQL-with-prompting

Dataset
Setup
Citation

dataset

To reproduce the results reported in the paper, please download the Spider dataset from the link below and create a data directory containing the tables.json and dev.json files.

$ Spider dataset = "https://drive.google.com/uc?export=download&id=1TqleXec_OykOYFREKKtschzY29dUcVAQ"

setup

To run this project, use the following commands:

$ pip3 install -r requirements.txt
$ echo "Start running DIN-SQL.py"
$ python3 DIN-SQL.py --dataset ./data/ --output predicted_sql.txt
$ echo "Finished running DIN-SQL.py"

citation

@article{pourreza2023din,
  title={DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction},
  author={Pourreza, Mohammadreza and Rafiei, Davood},
  journal={arXiv preprint arXiv:2304.11015},
  year={2023}
}

few-shot-nl2sql-with-prompting's People

Contributors

Stargazers

Watchers

Forkers

jboru gromitc dupkuvis httdty coder-yangxin junruixing yushun06 vagrantism neha-nupoor hpp7-greg goggeryang jfontestad inesriahi guoqiangjia jagan-at-juniper forestqin mingmwang goyalpramod amity871028 entong wuyueint lvchenyangai noidontfa elshor gagbaghdas ellamanasan dhanababum nicholasling databerry-io pdqn ashar236 kitrakp gptcrash iain117 shaiv-syg saurabh-tripathi flyfoxs zhaofuheng bugmaker-boyan grojasc vsai2314108 jmoork thivagar-manickam zhouzj1610 yanniszhou youlei5898y augiot emceea lightwind1

few-shot-nl2sql-with-prompting's Issues

What the "Spider_revised.csv" file in the code does

Running DIN-SQL on a new dataset.

Hey,

Great paper.

What all files and in what directories needs to be created to run the pipeline on a new dataset?

Your help is much appreciated.

Thanks,
Saud

May I ask how you analyze errors in predicted SQL using code? Could you share the relevant code with us?

Database schema

Hello, I would like to ask, in the text2sql experiment,
is your database schema bound with natural language (referring to the question based on the database schema),
or is the natural language question independent of the database schema (according to the natural language question to find the matching database schema into the corresponding prompt)?

How to run DIN-SQL with GPT-3.5/Davinci

Hi Mohammadreza,

I wanted to run the script with GPT-3.5 as I don't have access to GPT-4 (on the waitlist). After changing the model variable in GPT4_generation to gpt-3.5-turbo and running the script, I'm getting the following error:

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4504 tokens. Please reduce the length of the messages.

My understanding is that we need to reduce the prompt size as GPT-3.5 doesn't support the same number of tokens as GPT-4 (8096). The interesting thing is that Davinci has around the same context size as GPT-3.5 (4096), so the question boils down to figuring out how to run the script with Davinci, which was one of the LLMs used to test DIN-SQL in the paper.

Thanks for your help.

About the Exec Acc in your paper

I find that Liu show the Exec Acc is 70.1 in their (Liu et al., 2023a), but there is 60.1 in your paper. Is it a mistake here? Do you have used the same evaluation codes in Exec?

Codex

Your work is very helpful. Can you upload the dev results of using codex to test the model?

运行代码后，显示需要key，这个key指的openai上gpt4的key吗？

Release BIRD Dev predicted sqls

Hi, I am interested in your promising work, and working for benchmarking NL2SQL systems.

Could you release the predicted sqls (file) in BIRD-dev dataset?

zero-shot and few-shot prompt

Thanks for sharing your DIN-SQL prompt!
But I'm also curious about your zero-shot and few-shot prompt.
If you don't mind, could I ask about your zero-shot and few-shot prompt?
Thanks!

License?

Thank you for your project. However, I'm unable to find its license — would it be possible for you to add one so the terms of use are clear? https://choosealicense.com/ might be a good way for you to find a license that works for your project. Thanks in advance.

How to pick the examples for demonstrations ?

Thank you for the research and the code.
I'm trying this method on another dataset, I want to write some similar demonstrations for all the prompts.
I want to konw, when you write the demonstrations, did you pick example from the dataset randomly ? or with some strategy ?

运行后一直卡着不动

How long is your your predict script's runtime?

The Spider Submission Guideline says "Please restrict your predict script's runtime to under two hours and up to two models per submission."
I have tried your codes, and it takes about 1 hour to predict 100+ sqls. So it will take more than 2 hours to predict the whole dev dataset with your codes.
I wonder how you manage this problem, or whether the 2 hours restriction of spider matters.
Thanks a lot!

Question about the input size

Hi, interesting paper and great work! I try to run the code but i encounter the input length problem. When I run the schema linking prompt to get the schema linkings, I find out the length of the entire prompt is too long to feed to the model. Just wondering if I run the code wrongly. Btw, i get this issue on both turbo 3.5 and vicuna-13b

Script to compute exact-set-match accuracy

Hi. Thank you for your great work. Can you provide the script to evaluate exact-set-match accuracy? AnalysisResults.py only evaluates the execution accuracy. Thank you.

Cost Estimates

Hey great paper. Curious to know if you have done cost estimates on running a query? How does the cost vary between easy, non nested and nested queries? My assumption is that for answering one question, we hit LLM 4 times, is my understanding correct?

Cost of each experiment

Hi,
Thank you for providing your code.
It would be helpful if you could include the cost associated with each of your experiments for the Openai call in your paper.
This way, others can obtain a rough estimate of the costs involved.

Thanks

accuracy only 0.659 when use gpt-3.5

Since GPT-4 is expensive and Codex is deprecated. I use gpt-3.5 to test this method. Then I got the score:
easy medium hard extra all
count 248 446 174 166 1034
===================== EXECUTION ACCURACY =====================
execution 0.742 0.720 0.546 0.488 0.659

It's much bad then GPT-4 or Codex. Do yu have any ideas to make it better?

Confusion about the experimental results

Please confirm if the exec acc of DIN-SQL on the validation set is 74.1?

No codex

Is this project only implemented by invoking gpt4?

I did not find any implementation using codex in this project, is it not open source?