ryanzhumich / editsql Goto Github PK

View Code? Open in Web Editor NEW

201.0 201.0 68.0 5.37 MB

License: MIT License

Python 60.15% Shell 2.14% Jupyter Notebook 37.70%

editsql's People

Stargazers

Watchers

Forkers

siviltaram ohadrubin dragomirradev berlino trunghlt lianziqt erx00 colinsongf jiejyun-liu amolk sahara2001 zhaoyi23333 indirection42 ygan mars-wei nico3865 jungokasai yulunli shichao-wang jianqiang 489597448 jiapyliu xiwei-wang taocheng-yu param-raval yuweifamily metamind lucehe johanwasserman yinmh17 d4d3vd4v3 dahab000 knhaller sonalisrijan sonali-prog rollend captainhandyman andrewsjoy11 cindycandy kouroshhakha haarshit20 rk19016 rehman04 raj6713 trinh-hoang-hiep jonathanyang0127 yale-lily manishbachhu nghoanglong vikramsinghtomar m1nhtu99-hoan9 rushit812 shridevia chargedmonk poojashreekk lymyxyxyx rohan-mudaliar haowencs sdezoete mbrukman bigmoussa reset0898 irelandc ruchin-dhama nv259

editsql's Issues

Error: unrecognized arguments in logs/logs_sparc_editsql/valid_use_predicted_queries_predictions.json.eval

After running any test model I am getting this error in valid_use_predicted_queries_predictions.json.eval instead of the answer. Can I get any help?

usage: evaluation_sqa.py [-h] [--gold GOLD] [--pred PRED] [--db DB]
                         [--table TABLE] [--etype ETYPE]
evaluation_sqa.py: error: unrecognized arguments: Drive/editsql/output_temp.txt```

Train and dev pickle files not found

I am trying to run this model using test_spider_editsql.sh command and I am getting the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'processed_data_spider_removefrom/train.pkl'

The test files are looking for pickle files which are not present in the repo? Could you please look into this?

If we have enough data, what will the model achieve for hard SQL?

We are afraid that the hard SQL like TABLE JOIN is the limit for industrial application.

Thank you very much.

Does editsql also give a prediction on value in SQL conditions ?

If editsql gives a prediction on value in SQL conditions, please point out where it is in the code.
If not, how should we do it ?

how to run editsql on cosql?

I run the run_cosql_cdseq2seq.sh follow the readme, but I can't get the log file like logs/logs_cosql_editsql/log.txt. What should I change?

How to remove OOV and infer text directly to sql

While running run_sparc_editsql.sh script on my added data I am facing the following issue.

read_data_json data/sparc/train_no_value.json 3034
read_data_json data/sparc/dev_no_value.json 423
continue
OOV! Doc_name
continue

and I am not able to infer a direct query to SQL.
Any suggestions thanks.

No batch-training?

editsql/model_util.py

Line 186 in a381be8

assert len(interaction_batch) == 1

What the meaning of batch-size in sparc/spider setting?

About word_emb file

I'm trying to run run_cossql_cdseq2seq.sh but I found no glove.840B.300d.txt file in this git project. how can i get this file?

problems with run_sparc_edistql.sh and test_sparc_editsql.sh

Hi,

I am trying to run the run_sparc_editsql.sh and test_sparc_editsql.sh, I already installed all the requirements as suggested by the GitHub and by the following page: https://towardsdatascience.com/natural-language-to-sql-use-it-on-your-own-database-d4cd5784d081, I am using colab in order to work with a GPU.
I downloaded the glove.840B.300d.txt file and changed the path in the run_sparc_editsql.sh file and in the test_sparc_editsql.sh

When I run: bash run_sparc_editsql.sh initially I have the following error: No such file or directory: 'logs_sparc_editsql/valid_use_predicted_queries_predictions.json’, so I copied the given file which is present in logs/logs_sparc_editsql and paste it in the generated folder of logs_sparc_editsql. Then I try to run again the run_sparc_editsql.sh file and this time I have the following message:

File "/content/gdrive/My path/editsql/model/model.py", line 91, in
embedding = np.array([float(val) for val in l_split[-embedding_size:]])
ValueError: could not convert string to float: 'squeeze2’.

Moreover If I then try to run the test_sparc_editsql.sh I have the following error.

RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:51
(But I checked and CUDA seems to be available)

What can I do to solve the following problems and successfully run the code?

Thank you in advance.

Beam search decoding?

Hi, I was wondering if you support beam search decoding. I tried to check the code for that but could not find any relevant parts. If you support it, would you please point me to the relevant part in the code?
Thanks,

what is the removefrom

Excuse me, I find that when model training, remove from corpus will be used.
but i am doubt that, whether it will loss corpus information and incur error?
For example, the first utterance of 35 interaction is:
"utterance": "how many dorms have a TV Lounge ?",
and the original sql is
"select count ( * ) from dorm as t1 join has_amenity as t2 on t1 . dormid = t2 . dormid join dorm_amenity as t3 on t2 . amenid = t3 . amenid where t3 . amenity_name = value"
after removefrom, it becomes:
"select count ( * ) where dorm_amenity.amenity_name = value"

it seems that the removefrom sequence miss much information and not equal to the original SQL sequence

so my question is whether original SQL and removefrom SQL are equivalent, (i mean, whether they can be converted one-to-one). And has anyone (paper/article) done this in the same way before?

verify results

Hi,

Thanks for the code.

I ran the test_sparc_editsql.sh script, now I see
valid gold-passing STRING_ACCURACY: 53.78
in log.txt
Is this same as EditSQL (use gold query) | 53.4 ?
Also, can you please tell me where is interaction match accuracy is calculated?

Thanks a lot

About the release time of EditSQL code

Hi! I just read the paper of your EditSQL. Awesome work! By the way, when will the code of EditSQL be released?

Confused about the Result

Thank you for open sourcing your baselines and the awesome dataset :) I have a question about the performance of CD-seq2seq. In the paper and the leaderboard, the result is as follow:

But in the README.md file, I find it higher(almost improve 5% on question match result in dev) than the reported result.

So I want to know which one is correct? Or, does the latest result in the README follow the setting in the paper? Thanks for your reply.

Licensing

What type of license do you have for this code? Is it Apache 2.0?

getting actual values in predicted SQL queries

Hello,
I find that all the values in the input questions are replaced by 1 in the SQL query.

For example, if my input is (these examples are just illustrative, not real):

"Get the top 10 samples in column_x"
This top 10 query is translated to "SELECT column_x FROM table ORDER BY column_x DESC LIMIT 1"
So it replaced top 10 by top 1

Also in conditions:
"List a column X where column Y < 0.1"
Is translated to
"SELECT column_x FROM table WHERE column_y < 1"
Again 0.1 is replaced by 1

Do you have a mechanism to fix this?

Testing on custom questions/queries

I was able to train/test on the given Spider and "spider_data_removefrom" sets but I want to test the model on a given question-query. I tried editing the queries in the dev.json file but the predictions remain the same as the queries in the original dev.json. Can you help me out?

Instructions to retrain?

Hi,

Can you please give some instructions on how to retrain?

Thanks

Is this code applicable to Chinese data?

Error on running `run_sparc_editsql.sh`

Hi, I am getting this error when I try to run run_sparc_editsql.sh

FileNotFoundError: [Errno 2] No such file or directory: 'logs_sparc_editsql/valid_use_predicted_queries_predictions.json'

Also a warning prior to that:
Warning: arguments already exist in logs_sparc_editsql/args.log

I am following your steps in COLAB. Everything is installed and placed in mentioned folders.

Some problem with logs/logs_spider_editsql/save_12

When I ran the command "git lfs clone https://github.com/ryanzhumich/editsql", there was an error:

And the trained model logs/logs_spider_editsql/save_12 is a txt file:

What is the reason?
Anyone help me? Thanks a lot!

[Question] Did you set the SOTA on spider?

Given the name, I was wondering if the current SOTA on spider (RYANSQL + BERT, 12. November) is your work? And if so, was it based on the editsql model?

Many thanks for satisfy my curiosity :-)

I was wondering how to repeat your experiment

Hi,
I was wondering whether you could provide guidance to repeat your experiment.
I can not generate the final output to evaluate the results.
It seems the ``eval_step'' function in the model is deleted.
So how can we generate the final SQL output through inputting the dev.json?
I can only get 23.60% in valid gold-passing STRING_ACCURACY without BERT.
Maybe I make some mistakes or it may improve when evaluating it in Spider exact match evaluation method.
So I was wondering what I should do to get the same results as your paper.
Thank you.

Some questions about repeating the results on SParC.

I want to reproduce the result of SParC on my server, and i only download the code and run the run.sh following the README. The model 'saved_42' is chosen to get the final result.

But the final result is only 46.0, which is lower than the results in the paper. Although I repeat a lot of times, I get some similar results like 45.8 or 45.5.

Maybe are the experimental environments different? or you use some the different parameters, like learning rate or something.
I hope you can give me some suggestions.

Retraining on spider dataset doesn't reproduce results, `valid gold-passing STRING_ACCURACY = 0.00`.

I am retraining the model using run_spider_edisql, on the spider dataset.
These are the logs created

Original number of training utterances:	8642
Actual number of used training examples:	8642
(Shortened by output limit of 200)
Number of steps per epoch:	8642
Batch size:	1
Epoch:	0
train epoch loss:	0.044776277615753715
train final gold-passing LOSS:	29.60
train final gold-passing TOKEN_ACCURACY:	92.11
train final gold-passing STRING_ACCURACY:	57.00
valid gold-passing LOSS:	322.41
valid gold-passing TOKEN_ACCURACY:	50.00
valid gold-passing STRING_ACCURACY:	0.00
countdown:	9

Epoch:	1
train epoch loss:	0.01979759320021786
train final gold-passing LOSS:	19.82
train final gold-passing TOKEN_ACCURACY:	94.59
train final gold-passing STRING_ACCURACY:	64.00
valid gold-passing LOSS:	534.82
valid gold-passing TOKEN_ACCURACY:	50.00
valid gold-passing STRING_ACCURACY:	0.00
learning rate coefficient:	0.8
countdown:	8

Epoch:	2
train epoch loss:	0.012703387576976938
train final gold-passing LOSS:	13.85
train final gold-passing TOKEN_ACCURACY:	96.08
train final gold-passing STRING_ACCURACY:	74.00
valid gold-passing LOSS:	363.45
valid gold-passing TOKEN_ACCURACY:	50.00
valid gold-passing STRING_ACCURACY:	0.00
countdown:	7
........

It ran for 10 epochs, but the valid gold-passing TOKEN_ACCURACY and valid gold-passing STRING_ACCURACY remain constant with 50.00 and 0.00 respectively.

How do I make inferences on custom queries?

Right now, when I try to put just the question text without the "sql", "query", "query_toks" & "query_toks_no_value" keys in the dev.json, I get an error when running the run_spider.sh and test_spider.sh .

I want to understand how can I make predictions (in an inferencing environment) on my sample queries?

Someone please help me out here.

how to pre-process custom data to be the format of being ready to be trained?

for example, I only have simple question/query/table simple format data.

How that is to be preprocessed in order to train my own model?

the repo I download is working. however It is too complicated for me to figure out how to process my own data to train a model and predict my test data.

Thank you.

About the GPU resource

Hello! I'm now trying to run this code on CoSQL datasets but meet some problems.
I use the default parameters and my 16G GPU get out of memory. I have tried some adjustment but it didn't work. For example, batch_size is set to 16 in the default, and I still get out of memory even I change it to 1.(After I read codes briefly, I found that this parameter will be overwritten by interaction_level. It is always 1 so it didn't work.)
Could you please share something you know, like how to reduce the GPU needs, or point out that I have made some mistakes.

Batch training for spider dataset

Hi,

I notice that in the training bash for spider, the interaction_level was set for true.
Should I run spider (actually no interaction) in this setting ?
Since the current setting is:
a) batch-size was set to 1 in run.py (L48)
b) interaction_level

Is batch training available in interaction level using schema_interaction_model?

Thanks in advance!

ryanzhumich / editsql Goto Github PK

editsql's People

Stargazers

Watchers

Forkers

editsql's Issues

Recommend Projects

Recommend Topics

Recommend Org