Giter Site home page Giter Site logo

sqlnet's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqlnet's Issues

ValueError: all input arrays must have the same shape

Traceback (most recent call last):
File "extract_vocab.py", line 62, in
emb_array = np.stack(embs, axis=0)
File "C:\python\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 347, in stack
raise ValueError('all input arrays must have the same shape')

OS:windows10 64bit
python:3.6.1
cmd:python extract_vocab.py

Problem when using it with dataset other than WikiSQL

while i m running the train.py file with a dataset other than WikiSQL. i meet the error as follow:

Traceback (most recent call last):
File "train.py", line 128, in
sql_data, table_data, TRAIN_ENTRY)
File
"/[email protected]#0/sqlnet/utils.py", line 146, in epoch_train
gt_where=gt_where_seq, gt_cond=gt_cond_seq, gt_sel=gt_sel_seq)
File
"/[email protected]#0/sqlnet/model/sqlnet.py", line 141, in forward
gt_where, gt_cond, reinforce=reinforce)
File "/opt/conda/envs/python2.7/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File
"/[email protected]#0/sqlnet/model/modules/sqlnet_condition_predict.py", line 253, in forward
cond_str_score[b, :, :, num:] = -100
IndexError: too many indices for tensor of dimension 3

Can anyone help?

How to use it with DB other than WikiSQL

I would like to understand how to use it database other than WikiSQL, I'm new to ML and would like to use it for querying attendance data. Can you please provide instructions to implement it?

about "order matters" problem

hi, Different conditional orders can produce the same query results. Our goal is to query the results. Why does this affect performance instead? Incomprehension
thanks

dataset error: sqlite3.ProgrammingError, sqlalchemy.exc.ProgrammingError:

Hi, when I run 'python test.py --ca' to get execution results, it fails at 'print("Dev execution acc: {}".format(epoch_exec_acc(model, BATCH_SIZE, val_sql_data, val_table_data, DEV_DB)))'.
The error is like this:
rror closing cursor
Traceback (most recent call last):
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1268, in fetchone
row = self._fetchone_impl()
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1148, in _fetchone_impl
return self.cursor.fetchone()
sqlite3.ProgrammingError: Cannot operate on a closed database.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1325, in _safe_close_cursor
cursor.close()
sqlite3.ProgrammingError: Cannot operate on a closed database.
Traceback (most recent call last):
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1268, in fetchone
row = self._fetchone_impl()
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1148, in _fetchone_impl
return self.cursor.fetchone()
sqlite3.ProgrammingError: Cannot operate on a closed database.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "test.py", line 83, in
model, BATCH_SIZE, test_sql_data, test_table_data, TEST_DB)))
File "/data/home/naturallanguage/text2sql/sqlnet/utils.py", line 178, in epoch_exec_acc
ret_gt = engine.execute(tid, sql_gt['sel'], sql_gt['agg'], sql_gt['conds'])
File "/data/home/naturallanguage/text2sql/sqlnet/lib/dbengine.py", line 25, in execute
table_info = self.db.query('SELECT sql from sqlite_master WHERE tbl_name = :name', name=table_id).all()[0].sql.replace('\n','')
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/records.py", line 195, in all
rows = list(self)
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/records.py", line 126, in iter
yield next(self)
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/records.py", line 136, in next
nextrow = next(self._rows)
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/records.py", line 365, in
row_gen = (Record(cursor.keys(), row) for row in cursor)
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 946, in iter
row = self.fetchone()
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1276, in fetchone
e, None, None, self.cursor, self.context
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1458, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 276, in reraise
raise value.with_traceback(tb)
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1268, in fetchone
row = self._fetchone_impl()
File "/data/anaconda/envs/py35/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1148, in _fetchone_impl
return self.cursor.fetchone()
sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) Cannot operate on a closed database. (Background on this error at: http://sqlalche.me/e/f405)

Any hint on how to solve this? Many thanks!

Column Slots Equation representation

Hi Xiaojun Xu ,

I'm trying to understand the below equation in your paper for finding the number of columns in where condition.
P#col(K|Q) = softmax(U#col1 tanh(U#col2 EQ|Q))i

Can you please explain Eq|q here?

Thanks,
Niyas

Help!! Don't have Cuda

Hi, This package uses gpu (cuda) for processing which is not available in my server. Can you please guide me how to use it without gpu.

P.S. working on a critical project. Would be really grateful towards early help.

Thanks & Regards,
Manas

Assertion `cur_target >= 0 && cur_target < n_classes' failed

I am getting error in the following line:-

loss += self.CE(sel_score, sel_truth_var)

Error:-
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at c:\new-builder_3\win-wheel\pytorch\aten\src\thnn\generic/ClassNLLCriterion.c:93

extract_vocab.py : all input arrays must have the same shape

When executing extract_vocab.py it raised this error :

(base) C:\Users\Albel\Documents\SQLNet>python extract_vocab.py
Loading from original dataset
Loading data from %s data/train_tok.jsonl
Loading data from %s data/train_tok.tables.jsonl
Loading data from %s data/dev_tok.jsonl
Loading data from %s data/dev_tok.tables.jsonl
Loading data from %s data/test_tok.jsonl
Loading data from %s data/test_tok.tables.jsonl
Loading word embedding from %s glove/glove.42B.300d.txt
Length of word vocabulary: %d 1917495
Length of used word vocab: %s 39936
Traceback (most recent call last):
File "extract_vocab.py", line 62, in
emb_array = np.stack(embs, axis=0)
File "C:\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 353, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

Memory Issue

while i m running the extract_vocab.py file... the memory used is pushed to 98% and i m scared to continue running the script and hence i had to stop the script.
Can anyone help
@xiaojunxu what should i do to deal with this?

Error in Train.py

?Seems that train.py generate errors. is there any pre-requisits
(

base) C:\Users\Albel\Documents\SQLNet>python train.py --ca
Loading from original dataset
Loading data from %s data/train_tok.jsonl
Loading data from %s data/train_tok.tables.jsonl
Loading data from %s data/dev_tok.jsonl
Loading data from %s data/dev_tok.tables.jsonl
Loading data from %s data/test_tok.jsonl
Loading data from %s data/test_tok.tables.jsonl
Loading word embedding from %s glove/glove.42B.300d.txt
Using fixed embedding
Traceback (most recent call last):
File "train.py", line 57, in
gpu=GPU, trainable_emb = args.train_emb)
File "C:\Users\Albel\Documents\SQLNet\sqlnet\model\sqlnet.py", line 43, in init
self.agg_pred = AggPredictor(N_word, N_h, N_depth, use_ca=use_ca)
File "C:\Users\Albel\Documents\SQLNet\sqlnet\model\modules\aggregator_predict.py", line 18, in init
dropout=0.3, bidirectional=True)
File "C:\Anaconda3\lib\site-packages\torch\nn\modules\rnn.py", line 425, in init
super(LSTM, self).init('LSTM', *args, **kwargs)
File "C:\Anaconda3\lib\site-packages\torch\nn\modules\rnn.py", line 52, in init
w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
TypeError: new() received an invalid combination of arguments - got (float, int), but expected one of:

  • (torch.device device)
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, torch.device device)
    didn't match because some of the arguments have invalid types: (�[31;1mfloat�[0m, �[31;1mint�[0m)
  • (object data, torch.device device)
    didn't match because some of the arguments have invalid types: (�[31;1mfloat�[0m,

�[31;1mint�[0m)

Aggregation Prediction uses select column from the ground truth

The below code snippet looks like using the column position for select in evaluation as well as training.

gt_sel_seq = [x[1] for x in ans_seq]

col_name_len, col_len, col_num, gt_sel=gt_sel)

chosen_sel_idx = torch.LongTensor(gt_sel)

Don't you think the model should predict the column for select instead of the column given as ground truth before aggregation prediction?
In fact, the select column will not be given in prediction time when applying this to real worlds.

You made selection prediction, so the output could be fed with the aggregation prediction.
In result, the evaluation number might be wrong while comparing with the original paper, seq2sql.

Why over-fitting is the biggest problem for WHERE-clause?

Epoch 300 @ 2018-03-25 16:22:07.151084
 Loss = 0.16111447376294669
 Train acc_qm: 0.957164404223
   breakdown result: [0.99918375 0.99655754 0.96026972]
 Dev acc_qm: 0.579147369671
   breakdown result: [0.880418   0.89882437 0.70383565]
 Best val acc = (0.8990618691366821, 0.9055931599572498, 0.7138107113169457), on epoch (3, 48, 259) individually

@xiaojunxu
Thank you!!

Error in rnn.py file

After making a few changes in the utils.py file, as well as changing from python 2 to Python 3, I am getting an error in the rnn.py file when i try to run the train.py file. The error states- "TypeError: super(type, obj): obj must be an instance or subtype of type". I changed "super(LSTM, self).init('LSTM', *args, **kwargs)" to the below two lines-
self.as_super = super(LSTM, self)
self.as_super.init('LSTM', *args, **kwargs) in the def init function,
but to no avail.
Any help would be appreciated.

Other datasets

has anyone been able to test this model with other datasets like the IMDB or SENLIDB. If yes could you please guide me as to how the files need to be prepared

Results for Logical Form Accuracy

Hi

I'm just curious about why the results for logical form accuracy is not included in Table 1 at https://arxiv.org/pdf/1711.04436.pdf, though it is mentioned in the text that SQLNet outperforms Seq2SQL by 10-13 points. Can you please explain?

And can you please help find the code, which is used to calculate logical form accuracy?

IndexError: too many indices for array

Hi,

I got the following error while training. (python train.py --ca)
error_sqlnet
i guess in utils.py line 145
loss.data.cpu().numpy() is an empty array

can you please let us know how to resolve this issue.

Ressources and runtimes

Hi,

I was wondering whether you could give some informations about the resources you used and which runtimes you achieved.

  • How many and what kind of GPUs did you use?
  • What runtimes did you obtain for the normal dataset and for the toy-dataset (for debugging)?

Best regards,
Sebastian

Errors during training : Help needed !

Python : 3.6
OS : windows 10

Dear all,

I tried to figure out what is going wrong but due to my limited knowledge, I'm still facing some issues :

1 / First : without changing anything to the code, I receiving this error :

(base) C:\Users\albel\Documents\SQLNet>python train.py --ca
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Using fixed embedding
Using column attention on aggregator predicting
Using column attention on selection predicting
Using column attention on where predicting
C:\Users\albel\Documents\SQLNet\sqlnet\model\modules\aggregator_predict.py:55: UserWarning: Implicit dimension choice for softmax has been deprecated. Change 
Init dev acc_qm: 0.0
  breakdown on (agg, sel, where): [0.09250683 0.17895737 0.        ]
Epoch 1 @ 2018-08-20 14:06:54.446966
Traceback (most recent call last):
  File "train.py", line 128, in <module>
    sql_data, table_data, TRAIN_ENTRY))
  File "C:\Users\albel\Documents\SQLNet\sqlnet\utils.py", line 144, in epoch_train
    loss = model.loss(score, ans_seq, pred_entry, gt_where_seq)
  File "C:\Users\albel\Documents\SQLNet\sqlnet\model\sqlnet.py", line 152, in loss
    data = torch.from_numpy(np.array(agg_truth))
**TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.** 

2/secondly : When I'm forcing the "dtype = float32" but I tried also the others and I'm still getting another error. Whatever I 'm doing to force the type of "data" variable, I'm still getting errors.

(base) C:\Users\albel\Documents\SQLNet>python train.py --ca
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Using fixed embedding
Using column attention on aggregator predicting
Using column attention on selection predicting
Using column attention on where predicting

Init dev acc_qm: 0.0
  breakdown on (agg, sel, where): [0.03811899 0.14772592 0.        ]
Epoch 1 @ 2018-08-20 13:58:02.098906
Traceback (most recent call last):
  File "train.py", line 128, in <module>
    sql_data, table_data, TRAIN_ENTRY))
  File "C:\Users\albel\Documents\SQLNet\sqlnet\utils.py", line 144, in epoch_train
    loss = model.loss(score, ans_seq, pred_entry, gt_where_seq)
  File "C:\Users\albel\Documents\SQLNet\sqlnet\model\sqlnet.py", line 152, in loss
    _**data = torch.from_numpy(np.array(agg_truth,dtype=np.float32))**_
**TypeError: float() argument must be a string or a number, not 'map'**

Can someone guides me to solve this ? Thanks in advance.

Help needed

Is there any one who can share workable code. I went through the whole installation and I'm ending up with errors.
or at least a trained model.
thanks

Addition of tensors of different size

I am having issue adding 2 tensors of different sizes. What could be the possible solution?

In seq2sql_condition_predict_rl.py (4D tensor addition)
cond_score = self.cond_out( self.cond_out_h(h_enc_expand) +self.cond_out_g(g_s_expand) ).squeeze()

In selection_predict_rl.py (3D tensor addition)
sel_score = self.sel_out( self.sel_out_K(K_sel_expand) + self.sel_out_col(e_col) ).squeeze()

error like The size of tensor a (26) must match the size of tensor b (15) at non-singleton dimension 0

Test time still using ground truth sql?

Hi, thanks for providing the source code of your model. There is one thing I am not quite sure. Is the model using part of the ground truth SQL query as input to the aggregator predictor during dev/test time?

The test calls epoc_acc function
This is in the epoc_acc function( ) in the sqlnet/utils.py epoc_acc function(), where the code is still feeding gt_sel_seq to the model. I think at dev/test time should first generate the columns in the select clause, and feed that result as input to the aggregator .

    q_seq, col_seq, col_num, ans_seq, query_seq, gt_cond_seq, raw_data = to_batch_seq(sql_data, table_data, perm, st, ed, ret_vis_data=True)
    raw_q_seq = [x[0] for x in raw_data]
    raw_col_seq = [x[1] for x in raw_data]
    query_gt, table_ids = to_batch_query(sql_data, perm, st, ed)
    gt_sel_seq = [x[1] for x in ans_seq]
    score = model.forward(q_seq, col_seq, col_num,
            pred_entry, gt_sel = gt_sel_seq)

Results not as good as in paper

Hi Xiaojun,

I trained the model without changing any hyperparameter's value. (python train.py --ca)

When executing the test.py, I obtain the following accuracy scores:

Dev acc_qm: 0.584253651585;
  breakdown on (agg, sel, where): [0.90048688 0.91307446 0.68459803]
Dev execution acc: 0.654435340221
Test acc_qm: 0.571671495151;
  breakdown on (agg, sel, where): [0.90212873 0.90370324 0.67092833]
Test execution acc: 0.641768484696

These results are several points below the ones reported in your paper.

Although you do not report Acc_qm and Acc_ex for your model when the word embedding isn't allowed to train, you mention in section 4.3 that the improvement is about 2 points when training the word_embedding.
After subtracting these 2 points to the results reported on table 1, there still is a 2-3 points difference between my results and yours.

My question is:
Are the results reported in the paper the best ones you obtained after running the whole training procedure multiple time? In that case, were the results obtained on average closer to mines or yours ? How many times did you run the training procedure to obtain those results ?

Thanks,
Thomas

Trained model

Hello! I am studying SQLNet and first I would like to congratulate you for the great work you have done in this paper. I got your code from https://github.com/xiaojunxu/SQLNet but I could not run the tests using your trained model once the "saved_model" folder is empty. Could you please share the trained model? Thank you!!

Tokenization script

Hi @xiaojunxu

Could you upload your tokenizatoin script?
The reason is that I found there are some difference in "question" and "query_tok" sometimes.
For example, at 25th data in dev_tok.jsonl,

  • "question": "What is the district when the total amount of trees is smaller than 150817.6878461314 and amount of old trees is 1,928 (1.89%)?",
  • However, in "query_tok": ["SELECT", "district", "WHERE", "total", "amount", "of", "trees", "LT", "150817.687846", "AND", "amount", "of", "old", "trees", "EQL", "1,928", "(", "1.89", "%", ")"],

You can see that float number is different somehow.
So, if possible, I would like to modify the tokenization script.

Thanks!

Prediction model

Do you think it would be possible to write a predict.py file to verify live test the result based on the trained model

Issue in running python extract_vocab.py

Error while loading word embedding glove

Logs:
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Traceback (most recent call last):
File "extract_vocab.py", line 23, in
use_small=USE_SMALL)
File "C:\Users\SQLNet\sqlnet\utils.py
", line 274, in load_word_emb
for idx, line in enumerate(inf):
File "C:\Users\miniconda3\lib\encodings\cp1252.py", line 23, in dec
ode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2438: cha
racter maps to

permission denied 'word2idx.json' ?

when using python extract_vocab.py, there is an error,
error 13: permission denied 'word2idx.json'

when I unzip the glove.XX.zip, there is no file named word2idx.json, would you please tell me how to deal with such an error ?

condition accuracy

If anyone can please assist i ran the code but i am finding the condition accuracy is not computing i am currently getting the below results. for both the SQLNet and Seq2SQL
the best_cond_acc = init_acc[1][2] gives 0.0 and i'm trying to debug but i am not finding where the error is.
Init dev acc_qm: 0.0
breakdown on (agg, sel, where): [0.046875 0.125 0. ]
Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.