Comments (14)
Awesome, I came up with something similar over the weekend. Thanks! So I used what I wrote on the climbing DB:
Input question: How tall is the mountain from Kenya
Output query: SELECT mountain.Height FROM mountain WHERE mountain.Country = 'terminal'
This happens when using other DBs, too. How can I get the where clause to insert the correct where value? Thank you!
from rat-sql.
Awesome, I came up with something similar over the weekend. Thanks! So I used what I wrote on the climbing DB:
Input question:How tall is the mountain from Kenya
Output query:SELECT mountain.Height FROM mountain WHERE mountain.Country = 'terminal'
This happens when using other DBs, too. How can I get the where clause to insert the correct where value? Thank you!
@gouldju1 I found there is a parameter call include_literals
in ./configs/spider/nl2code-base.libsonnet
. By default is set to be false
, which will cause where clause value val1
become Terminal
instead of the real value in the input prompt. It should be set to true
if you want where clause value included in your training data.
from rat-sql.
I'd really like to see this, too.
from rat-sql.
I'll push an end-to-end sample sometime next week. Here's an approximate untested snippet (adapting the code from infer.py) if you want to run inference against one of the preprocessed Spider schemas:
import json
import os
import _jsonnet
from ratsql.commands.infer import Inferer
from ratsql.datasets.spider import SpiderItem
from ratsql.utils import registry
exp_config = json.loads(_jsonnet.evaluate_file(exp_config_path))
model_config_path = os.path.join(root_dir, exp_config["model_config"])
model_config_args = exp_config.get("model_config_args")
infer_config = json.loads(_jsonnet.evaluate_file(model_config_path, tla_codes={'args': json.dumps(model_config_args)})
inferer = Inferer(infer_config)
inferer.device = torch.device("cpu")
model = inferer.load_model(model_dir, checkpoint_step)
dataset = registry.construct('dataset', inferer.config['data']['val'])
for _, schema in dataset.schemas.items():
model.preproc.enc_preproc._preprocess_schema(schema)
def question(q, db_id):
spider_schema = dataset.schemas[db_id]
data_item = SpiderItem(
text=None, # intentionally None -- should be ignored when the tokenizer is set correctly
code=None,
schema=spider_schema,
orig_schema=spider_schema.orig,
orig={"question": q}
)
model.preproc.clear_items()
enc_input = model.preproc.enc_preproc.preprocess_item(data_item, None)
preproc_data = enc_input, None
with torch.no_grad():
return inferer._infer_one(model, data_item, preproc_data, beam_size=1, use_heuristic=True)
from rat-sql.
Here's an example running python3 run.py eval experiments/spider-glove-run.jsonnet
:
"inferred_code": "SELECT singer.Name FROM singer JOIN singer_in_concert ON singer.Singer_ID = singer_in_concert.Singer_ID WHERE singer.Age = 'terminal'"
Here's the full output in step-xxxxx.infer
:
{
"index": 38,
"beams": [
{
"orig_question": "What are the names of the singers who performed in a concert in 2014?",
"model_output": {
"_type": "sql",
"select": {
"_type": "select",
"is_distinct": false,
"aggs": [
{
"_type": "agg",
"agg_id": {
"_type": "NoneAggOp"
},
"val_unit": {
"_type": "Column",
"col_unit1": {
"_type": "col_unit",
"agg_id": {
"_type": "NoneAggOp"
},
"col_id": 9,
"is_distinct": false
}
}
}
]
},
"sql_where": {
"_type": "sql_where",
"where": {
"_type": "Eq",
"val_unit": {
"_type": "Column",
"col_unit1": {
"_type": "col_unit",
"agg_id": {
"_type": "NoneAggOp"
},
"col_id": 13,
"is_distinct": false
}
},
"val1": {
"_type": "Terminal"
}
}
},
"sql_groupby": {
"_type": "sql_groupby"
},
"sql_orderby": {
"_type": "sql_orderby",
"limit": false
},
"sql_ieu": {
"_type": "sql_ieu"
},
"from": {
"_type": "from",
"table_units": [
{
"_type": "Table",
"table_id": 1
},
{
"_type": "Table",
"table_id": 3
}
],
"conds": {
"_type": "Eq",
"val_unit": {
"_type": "Column",
"col_unit1": {
"_type": "col_unit",
"agg_id": {
"_type": "NoneAggOp"
},
"col_id": 8,
"is_distinct": false
}
},
"val1": {
"_type": "ColUnit",
"c": {
"_type": "col_unit",
"agg_id": {
"_type": "NoneAggOp"
},
"col_id": 21,
"is_distinct": false
}
}
}
}
},
"inferred_code": "SELECT singer.Name FROM singer JOIN singer_in_concert ON singer.Singer_ID = singer_in_concert.Singer_ID WHERE singer.Age = 'terminal'",
"score": -2.1817409903378575
}
]
}
from rat-sql.
Hi I don't think any of the major models on the spider board supports where value prediction. after i get this running maybe i will publish my end-to-end model..
from rat-sql.
How tall is the mountain from Kenya
@gouldju1 can you share with us your code that take a question and return the SQL query ?
Thank you.
from rat-sql.
Hi,@gouldju1 @DevanshChoubey @mellahysf are you able to directly generate SQL query for text?
from rat-sql.
Yes @Akshaysharma29
from rat-sql.
Thank you @alexpolozov.
I used your code and it works for me.
from rat-sql.
Hi,
How can I run inference (adapting this code #17 (comment)) against a new database Schema? (The "question" method should takes, in this case, .sql file as the second argument)
Thanks
from rat-sql.
I adapted the code above to "work" with a new (not Spider) schema. It runs without errors, but the predicted sql is, at this point, not very impressive.
I had to separate loading the model from loading the dataset since inferer.load()
infers model dims from the dataset resulting in errors when using a new single-schema dataset.
import json
import os
import _jsonnet
import torch
from ratsql.commands.infer import Inferer
from ratsql.datasets.spider import SpiderItem
from ratsql.utils import registry
db_id = '<id of new database>'
root_dir = '<path to rat-sql>'
exp_config_path = '<path to spider-bert-run.jsonnet>'
model_dir = '<path to model>'
checkpoint_step = 16100 # whatever checkpoint you want to use
data_conf = {'db_path': '<path to new database>',
'name': '<name of new dataset>',
'paths': ['<path to dev.json for new dataset>'], # dev.json has to be generated using spider's preprocessing methods
'tables_paths': ['<path to tables.json for new dataset>']} # tables.json has to be generated using spider's preprocessing methods
exp_config = json.loads(_jsonnet.evaluate_file(exp_config_path)) # data_path: '<path to spider/>',
model_config_path = os.path.join(root_dir, exp_config["model_config"])
model_config_args = exp_config.get("model_config_args")
infer_config = json.loads(_jsonnet.evaluate_file(model_config_path, tla_codes={'args': json.dumps(model_config_args)}))
inferer = Inferer(infer_config)
inferer.device = torch.device("cpu")
model = inferer.load_model(model_dir, checkpoint_step) # load the model according to the spider dataset
dataset = registry.construct('dataset', data_conf) # load the new dataset (not spider)
for _, schema in dataset.schemas.items():
model.preproc.enc_preproc._preprocess_schema(schema)
def question(q, db_id):
schema = dataset.schemas[db_id]
data_item = SpiderItem(
text=None, # intentionally None -- should be ignored when the tokenizer is set correctly
code=None,
schema=schema,
orig_schema=schema.orig,
orig={"question": q}
)
model.preproc.clear_items()
enc_input = model.preproc.enc_preproc.preprocess_item(data_item, None)
preproc_data = enc_input, None
with torch.no_grad():
return inferer._infer_one(model, data_item, preproc_data, beam_size=1, use_heuristic=True)
from rat-sql.
thank you @kalleknast it works for me
from rat-sql.
Awesome, I came up with something similar over the weekend. Thanks! So I used what I wrote on the climbing DB:
Input question:How tall is the mountain from Kenya
Output query:SELECT mountain.Height FROM mountain WHERE mountain.Country = 'terminal'
This happens when using other DBs, too. How can I get the where clause to insert the correct where value? Thank you!@gouldju1 I found there is a parameter call
include_literals
in./configs/spider/nl2code-base.libsonnet
. By default is set to befalse
, which will cause where clause valueval1
becomeTerminal
instead of the real value in the input prompt. It should be set totrue
if you want where clause value included in your training data.
this doesn't seem to work....
It seems that the original rat-sql does not fill in the terminal
values at all. the closest one can get is to use NatSQL to fill in the values. However, the authors of NatSQL only generated the files for the Spider dataset and it's unclear how one would go about generating corresponding NatSQL files for their own dataset.
Unfortunately, this makes RAT-SQL not very useful in practice, which is a shame since the model actually performs pretty decently otherwise.
from rat-sql.
Related Issues (20)
- How did you generate the heatmap plots for the attentions? HOT 2
- Issue during preprocessing: 'CoreNLP' object has no attribute 'client' HOT 6
- Do you have notebook version for RATSQL ? HOT 5
- Can`t train the model with GPU on a server with RTX3090 HOT 1
- Execution accuracy
- An error in preprocess HOT 3
- preporcessing issue HOT 2
- 0% accuracy during evaluation from trained model. - GLOVE HOT 1
- Decoder vocabulary file not created/found HOT 1
- How to find the 'terminal' value in the questions HOT 1
- The process is killed every 280 steps HOT 2
- Schema Modeling HOT 3
- How to execute my own queries in BERT
- Failed to generate val.jsonl after the preprocess of BERT-version rat-sql
- Trained model
- System Requirements
- 这个模型代码,可以应用与中文的数据么
- Colab Implementation of RatSQL and SPIDER dataset HOT 1
- How to customize for other dataset?
- 如何将数据库列名与给定的query对齐
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rat-sql.