ryanzhumich / editsql Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
After running any test model I am getting this error in valid_use_predicted_queries_predictions.json.eval instead of the answer. Can I get any help?
usage: evaluation_sqa.py [-h] [--gold GOLD] [--pred PRED] [--db DB]
[--table TABLE] [--etype ETYPE]
evaluation_sqa.py: error: unrecognized arguments: Drive/editsql/output_temp.txt```
I am trying to run this model using test_spider_editsql.sh
command and I am getting the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'processed_data_spider_removefrom/train.pkl'
The test files are looking for pickle files which are not present in the repo? Could you please look into this?
We are afraid that the hard SQL like TABLE JOIN is the limit for industrial application.
Thank you very much.
If editsql gives a prediction on value in SQL conditions, please point out where it is in the code.
If not, how should we do it ?
I run the run_cosql_cdseq2seq.sh follow the readme, but I can't get the log file like logs/logs_cosql_editsql/log.txt. What should I change?
While running run_sparc_editsql.sh script on my added data I am facing the following issue.
read_data_json data/sparc/train_no_value.json 3034
read_data_json data/sparc/dev_no_value.json 423
continue
OOV! Doc_name
continue
and I am not able to infer a direct query to SQL.
Any suggestions thanks.
Line 186 in a381be8
I'm trying to run run_cossql_cdseq2seq.sh but I found no glove.840B.300d.txt file in this git project. how can i get this file?
Hi,
I am trying to run the run_sparc_editsql.sh and test_sparc_editsql.sh, I already installed all the requirements as suggested by the GitHub and by the following page: https://towardsdatascience.com/natural-language-to-sql-use-it-on-your-own-database-d4cd5784d081, I am using colab in order to work with a GPU.
I downloaded the glove.840B.300d.txt file and changed the path in the run_sparc_editsql.sh file and in the test_sparc_editsql.sh
When I run: bash run_sparc_editsql.sh initially I have the following error: No such file or directory: 'logs_sparc_editsql/valid_use_predicted_queries_predictions.json’, so I copied the given file which is present in logs/logs_sparc_editsql and paste it in the generated folder of logs_sparc_editsql. Then I try to run again the run_sparc_editsql.sh file and this time I have the following message:
File "/content/gdrive/My path/editsql/model/model.py", line 91, in
embedding = np.array([float(val) for val in l_split[-embedding_size:]])
ValueError: could not convert string to float: 'squeeze2’.
Moreover If I then try to run the test_sparc_editsql.sh I have the following error.
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:51
(But I checked and CUDA seems to be available)
What can I do to solve the following problems and successfully run the code?
Thank you in advance.
Hi, I was wondering if you support beam search decoding. I tried to check the code for that but could not find any relevant parts. If you support it, would you please point me to the relevant part in the code?
Thanks,
Excuse me, I find that when model training, remove from corpus will be used.
but i am doubt that, whether it will loss corpus information and incur error?
For example, the first utterance of 35 interaction is:
"utterance": "how many dorms have a TV Lounge ?",
and the original sql is
"select count ( * ) from dorm as t1 join has_amenity as t2 on t1 . dormid = t2 . dormid join dorm_amenity as t3 on t2 . amenid = t3 . amenid where t3 . amenity_name = value"
after removefrom, it becomes:
"select count ( * ) where dorm_amenity.amenity_name = value"
it seems that the removefrom sequence miss much information and not equal to the original SQL sequence
so my question is whether original SQL and removefrom SQL are equivalent, (i mean, whether they can be converted one-to-one). And has anyone (paper/article) done this in the same way before?
Hi,
Thanks for the code.
I ran the test_sparc_editsql.sh script, now I see
valid gold-passing STRING_ACCURACY: 53.78
in log.txt
Is this same as EditSQL (use gold query) | 53.4 ?
Also, can you please tell me where is interaction match accuracy is calculated?
Thanks a lot
Hi! I just read the paper of your EditSQL. Awesome work! By the way, when will the code of EditSQL be released?
Thank you for open sourcing your baselines and the awesome dataset :) I have a question about the performance of CD-seq2seq
. In the paper and the leaderboard, the result is as follow:
But in the README.md file, I find it higher(almost improve 5% on question match result in dev) than the reported result.
So I want to know which one is correct? Or, does the latest result in the README follow the setting in the paper? Thanks for your reply.
What type of license do you have for this code? Is it Apache 2.0?
Hello,
I find that all the values in the input questions are replaced by 1 in the SQL query.
For example, if my input is (these examples are just illustrative, not real):
"Get the top 10 samples in column_x"
This top 10 query is translated to "SELECT column_x FROM table ORDER BY column_x DESC LIMIT 1"
So it replaced top 10 by top 1
Also in conditions:
"List a column X where column Y < 0.1"
Is translated to
"SELECT column_x FROM table WHERE column_y < 1"
Again 0.1 is replaced by 1
Do you have a mechanism to fix this?
I was able to train/test on the given Spider and "spider_data_removefrom" sets but I want to test the model on a given question-query. I tried editing the queries in the dev.json file but the predictions remain the same as the queries in the original dev.json. Can you help me out?
Hi,
Can you please give some instructions on how to retrain?
Thanks
Hi, I am getting this error when I try to run run_sparc_editsql.sh
FileNotFoundError: [Errno 2] No such file or directory: 'logs_sparc_editsql/valid_use_predicted_queries_predictions.json'
Also a warning prior to that:
Warning: arguments already exist in logs_sparc_editsql/args.log
I am following your steps in COLAB. Everything is installed and placed in mentioned folders.
When I ran the command "git lfs clone https://github.com/ryanzhumich/editsql", there was an error:
And the trained model logs/logs_spider_editsql/save_12 is a txt file:
What is the reason?
Anyone help me? Thanks a lot!
Given the name, I was wondering if the current SOTA on spider (RYANSQL + BERT, 12. November) is your work? And if so, was it based on the editsql model?
Many thanks for satisfy my curiosity :-)
Hi,
I was wondering whether you could provide guidance to repeat your experiment.
I can not generate the final output to evaluate the results.
It seems the ``eval_step'' function in the model is deleted.
So how can we generate the final SQL output through inputting the dev.json?
I can only get 23.60% in valid gold-passing STRING_ACCURACY without BERT.
Maybe I make some mistakes or it may improve when evaluating it in Spider exact match evaluation method.
So I was wondering what I should do to get the same results as your paper.
Thank you.
I want to reproduce the result of SParC on my server, and i only download the code and run the run.sh following the README. The model 'saved_42' is chosen to get the final result.
But the final result is only 46.0, which is lower than the results in the paper. Although I repeat a lot of times, I get some similar results like 45.8 or 45.5.
Maybe are the experimental environments different? or you use some the different parameters, like learning rate or something.
I hope you can give me some suggestions.
I am retraining the model using run_spider_edisql, on the spider dataset.
These are the logs created
Original number of training utterances: 8642
Actual number of used training examples: 8642
(Shortened by output limit of 200)
Number of steps per epoch: 8642
Batch size: 1
Epoch: 0
train epoch loss: 0.044776277615753715
train final gold-passing LOSS: 29.60
train final gold-passing TOKEN_ACCURACY: 92.11
train final gold-passing STRING_ACCURACY: 57.00
valid gold-passing LOSS: 322.41
valid gold-passing TOKEN_ACCURACY: 50.00
valid gold-passing STRING_ACCURACY: 0.00
countdown: 9
Epoch: 1
train epoch loss: 0.01979759320021786
train final gold-passing LOSS: 19.82
train final gold-passing TOKEN_ACCURACY: 94.59
train final gold-passing STRING_ACCURACY: 64.00
valid gold-passing LOSS: 534.82
valid gold-passing TOKEN_ACCURACY: 50.00
valid gold-passing STRING_ACCURACY: 0.00
learning rate coefficient: 0.8
countdown: 8
Epoch: 2
train epoch loss: 0.012703387576976938
train final gold-passing LOSS: 13.85
train final gold-passing TOKEN_ACCURACY: 96.08
train final gold-passing STRING_ACCURACY: 74.00
valid gold-passing LOSS: 363.45
valid gold-passing TOKEN_ACCURACY: 50.00
valid gold-passing STRING_ACCURACY: 0.00
countdown: 7
........
It ran for 10 epochs, but the valid gold-passing TOKEN_ACCURACY
and valid gold-passing STRING_ACCURACY
remain constant with 50.00 and 0.00 respectively.
Right now, when I try to put just the question text without the "sql", "query", "query_toks" & "query_toks_no_value" keys in the dev.json, I get an error when running the run_spider.sh and test_spider.sh .
I want to understand how can I make predictions (in an inferencing environment) on my sample queries?
Someone please help me out here.
for example, I only have simple question/query/table simple format data.
How that is to be preprocessed in order to train my own model?
the repo I download is working. however It is too complicated for me to figure out how to process my own data to train a model and predict my test data.
Thank you.
Hello! I'm now trying to run this code on CoSQL datasets but meet some problems.
I use the default parameters and my 16G GPU get out of memory. I have tried some adjustment but it didn't work. For example, batch_size is set to 16
in the default, and I still get out of memory even I change it to 1
.(After I read codes briefly, I found that this parameter will be overwritten by interaction_level
. It is always 1
so it didn't work.)
Could you please share something you know, like how to reduce the GPU needs, or point out that I have made some mistakes.
Hi,
I notice that in the training bash for spider, the interaction_level was set for true.
Should I run spider (actually no interaction) in this setting ?
Since the current setting is:
a) batch-size was set to 1 in run.py
(L48)
b) interaction_level
Is batch training available in interaction level using schema_interaction_model
?
Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.