Giter Site home page Giter Site logo

microsoft / rat-sql Goto Github PK

View Code? Open in Web Editor NEW
396.0 18.0 117.0 113 KB

A relation-aware semantic parsing model from English to SQL

Home Page: https://arxiv.org/abs/1911.04942

License: MIT License

Dockerfile 0.42% Jsonnet 3.93% Python 95.65%
program-synthesis semantic-parsing nlp question-answering dbqa nl2sql transformers

rat-sql's Introduction

RAT-SQL

This repository contains code for the ACL 2020 paper "RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers".

If you use RAT-SQL in your work, please cite it as follows:

@inproceedings{rat-sql,
    title = "{RAT-SQL}: Relation-Aware Schema Encoding and Linking for Text-to-{SQL} Parsers",
    author = "Wang, Bailin and Shin, Richard and Liu, Xiaodong and Polozov, Oleksandr and Richardson, Matthew",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    pages = "7567--7578"
}

Changelog

2020-08-14:

  • The Docker image now inherits from a CUDA-enabled base image.
  • Clarified memory and dataset requirements on the image.
  • Fixed the issue where token IDs were not converted to word-piece IDs for BERT value linking.

Usage

Step 1: Download third-party datasets & dependencies

Download the datasets: Spider and WikiSQL. In case of Spider, make sure to download the 08/03/2020 version or newer. Unpack the datasets somewhere outside this project to create the following directory structure:

/path/to/data
├── spider
│   ├── database
│   │   └── ...
│   ├── dev.json
│   ├── dev_gold.sql
│   ├── tables.json
│   ├── train_gold.sql
│   ├── train_others.json
│   └── train_spider.json
└── wikisql
    ├── dev.db
    ├── dev.jsonl
    ├── dev.tables.jsonl
    ├── test.db
    ├── test.jsonl
    ├── test.tables.jsonl
    ├── train.db
    ├── train.jsonl
    └── train.tables.jsonl

To work with the WikiSQL dataset, clone its evaluation scripts into this project:

mkdir -p third_party
git clone https://github.com/salesforce/WikiSQL third_party/wikisql

Step 2: Build and run the Docker image

We have provided a Dockerfile that sets up the entire environment for you. It assumes that you mount the datasets downloaded in Step 1 as a volume /mnt/data into a running image. Thus, the environment setup for RAT-SQL is:

docker build -t ratsql .
docker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql

Note that the image requires at least 4 GB of RAM to run preprocessing. By default, Docker Desktop for Mac and Docker Desktop for Windows run containers with 2 GB of RAM. The -m4g switch overrides it; alternatively, you can increase the default limit in the Docker Desktop settings.

If you prefer to set up and run the codebase without Docker, follow the steps in Dockerfile one by one. Note that this repository requires Python 3.7 or higher and a JVM to run Stanford CoreNLP.

Step 3: Run the experiments

Every experiment has its own config file in experiments. The pipeline of working with any model version or dataset is:

python run.py preprocess experiment_config_file  # Step 3a: preprocess the data
python run.py train experiment_config_file       # Step 3b: train a model
python run.py eval experiment_config_file        # Step 3b: evaluate the results

Use the following experiment config files to reproduce our results:

  • Spider, GloVE version: experiments/spider-glove-run.jsonnet
  • Spider, BERT version (requires a GPU with at least 16GB memory): experiments/spider-bert-run.jsonnet
  • WikiSQL, GloVE version: experiments/wikisql-glove-run.jsonnet

The exact model accuracy may vary by ±2% depending on a random seed. See paper for details.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

rat-sql's People

Contributors

alexpolozov avatar berlino avatar crafterkolyan avatar microsoftopensource avatar soyoscarrh-microsoft avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rat-sql's Issues

AttributeError: 'CoreNLP' object has no attribute 'client'

Hi, I build the environment without docker and I wanna ask which version "corenlp" should be ??? InUsing CoreNLP within other programming languages and packagesI can't find the proper "corenlp". I will appreciate it if somebody could help me out of this problem.

WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
DB connections: 100%|████████████████████████| 166/166 [00:00<00:00, 205.93it/s]
train section: 0%| | 0/8659 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/yuhai/workspace/nl2sql/rat-sql/run.py", line 109, in
main()
File "/home/yuhai/workspace/nl2sql/rat-sql/run.py", line 73, in main
preprocess.main(preprocess_config)
File "/data/yuhai/nl2sql/rat-sql/ratsql/commands/preprocess.py", line 53, in main
preprocessor.preprocess()
File "/data/yuhai/nl2sql/rat-sql/ratsql/commands/preprocess.py", line 34, in preprocess
self.model_preproc.add_item(item, section, validation_info)
File "/data/yuhai/nl2sql/rat-sql/ratsql/models/enc_dec.py", line 43, in add_item
self.enc_preproc.add_item(item, section, enc_info)
File "/data/yuhai/nl2sql/rat-sql/ratsql/models/spider/spider_enc.py", line 168, in add_item
preprocessed = self.preprocess_item(item, validation_info)
File "/data/yuhai/nl2sql/rat-sql/ratsql/models/spider/spider_enc.py", line 193, in preprocess_item
question, question_for_copying = self._tokenize_for_copying(item.text, item.orig['question'])
File "/data/yuhai/nl2sql/rat-sql/ratsql/models/spider/spider_enc.py", line 239, in _tokenize_for_copying
return self.word_emb.tokenize_for_copying(unsplit)
File "/data/yuhai/nl2sql/rat-sql/ratsql/resources/pretrained_embeddings.py", line 67, in tokenize_for_copying
ann = corenlp.annotate(text, self.corenlp_annotators)
File "/data/yuhai/nl2sql/rat-sql/ratsql/resources/corenlp.py", line 44, in annotate
_singleton = CoreNLP()
File "/data/yuhai/nl2sql/rat-sql/ratsql/resources/corenlp.py", line 20, in init
self.client = corenlp.CoreNLPClient()
AttributeError: module 'corenlp' has no attribute 'CoreNLPClient'
Exception ignored in: <function CoreNLP.del at 0x7fa50c979680>
Traceback (most recent call last):
File "/data/yuhai/nl2sql/rat-sql/ratsql/resources/corenlp.py", line 23, in del
self.client.stop()
AttributeError: 'CoreNLP' object has no attribute 'client'

Process finished with exit code 1

embedding of the edges

in the paper, you only said that the edge is a learned embedding for the relation, Can you explain in detail how the embedding is obtained? glove, BiLSTM,or the similarity of the two nodes. thank you.

There seems to be a bug in run.py on eval_output_path

In line 102,
eval.main(eval_config)
the eval_output_path is replaced (eval.py line 25):
args.output.replace('__LOGDIR__', real_logdir)

However, in line 104 res_json = json.load(open(eval_output_path)) the eval_output_path is not replaced.

I got a FileNotFoundError during eval on line 104 and I wonder if this is the reason.

Thanks!

Bert-large model not attaining ~65% accuracy even after training till 52k timesteps!

We are using p100 and 25 gb ram to train the bert large model.
But when we tried to run the default code with bs=6 and num_batch_accumulated=4, we got cuda out of memory error.
Thus we changed it to bs=2 and num_batch_accumulated=8 as you said anything between 16...24 would perform similarly.
But now after training till 52000 timesteps, the maximum accuracy we got is ~59.6% at 44000th timestep.
Is it taking more time because we changed the batch_size? Or is there anything else we are missing out?

RESULT at 48000 and 52000 timestep:

Loading model from logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/model_checkpoint-00048000
DB connections: 100% 166/166 [02:31<00:00, 1.10it/s]
100% 1034/1034 [05:45<00:00, 2.99it/s]
DB connections: 100% 166/166 [00:00<00:00, 448.81it/s]
Wrote eval results to logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/ie_dirs/bert_run_true_1-step48000.eval
48000 0.5638297872340425

Loading model from logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/model_checkpoint-00052000
DB connections: 100% 166/166 [00:00<00:00, 443.91it/s]
100% 1034/1034 [05:31<00:00, 3.12it/s]
DB connections: 100% 166/166 [00:00<00:00, 467.06it/s]
Wrote eval results to logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/ie_dirs/bert_run_true_1-step52000.eval
52000 0.586073500967118

Error during eval

Hi,
I'm trying to run your model, but during eval I'm getting the following error:

WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': '/data/spider/nl2code-glove,cv_link=true', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': True, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': '/data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': '/data/spider/nl2code-glove,cv_link=true', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}}
Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00030100
DB connections: 100%|███████████████████████████████████████████████████| 166/166 [00:02<00:00, 56.41it/s]
Traceback (most recent call last):
  File "run.py", line 109, in <module>
    main()
  File "run.py", line 91, in main
    infer.main(infer_config)
  File "/root/rat-sql/ratsql/commands/infer.py", line 163, in main
    inferer.infer(model, output_path, args)
  File "/root/rat-sql/ratsql/commands/infer.py", line 69, in infer
    assert len(orig_data) == len(preproc_data)
AssertionError

orig_data is SpiderDataset, len: 2
image

preproc_data is ZippedDataset, len:1034
image

Even 16GB isn't enough???

Hi,
Tried training the model with a P5000 and V100 with 16gb of Mem and still I got this error.
after 100 step, it goes out of memory with the current config.

[2020-07-26T09:46:17] Logging to logdir/bert_run/bs=6,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1
Loading model from logdir/bert_run/bs=6,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/model_checkpoint
[2020-07-26T09:46:46] Step 100 stats, train: loss = 157.97793579101562
[2020-07-26T09:46:54] Step 100 stats, val: loss = 187.46903228759766
[2020-07-26T09:47:08] Step 100: loss=180.5266
Traceback (most recent call last):
File "run.py", line 109, in
main()
File "run.py", line 77, in main
train.main(train_config)
File "/notebooks/rat-sql/ratsql/commands/train.py", line 274, in main
trainer.train(config, modeldir=args.logdir)
File "/notebooks/rat-sql/ratsql/commands/train.py", line 192, in train
norm_loss.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 14.80 GiB already allocated; 3.50 MiB free; 533.26 MiB cached)

preprocess.main(preprocess_config)

the problem is as follows:
WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
Traceback (most recent call last):
File "/home/pl/rat-sql-master/run.py", line 109, in
main()
File "/home/pl/rat-sql-master/run.py", line 73, in main
preprocess.main(preprocess_config)
File "/home/pl/rat-sql-master/ratsql/commands/preprocess.py", line 53, in main
preprocessor.preprocess()
File "/home/pl/rat-sql-master/ratsql/commands/preprocess.py", line 30, in preprocess
data = registry.construct('dataset', self.config['data'][section])
File "/home/pl/rat-sql-master/ratsql/utils/registry.py", line 33, in construct
lookup(kind, config),
File "/home/pl/rat-sql-master/ratsql/utils/registry.py", line 28, in lookup
return _REGISTRY[kind][name]
KeyError: 'spider'

i find the problem maybe caused by registry.py file func: signature = inspect.signature(callable)

import torch
class SpiderDataset(torch.utils.data.Dataset):
    def __init__(self, paths, tables_paths, db_path, demo_path=None, limit=None):
        pass

if __name__ == '__main__':
    import inspect
    b=inspect.signature(SpiderDataset)
    for i,j in b.parameters.items():
        print(i,j,j.kind)

print
args *args VAR_POSITIONAL
kwds **kwds VAR_KEYWORD

 i think the process want print  paths, tables_paths, db_path these arguments
i don't know how to modify it.

Asking for Codalab worksheets

Hi,

Can someone please share with me its Codalab worksheets to see how it works and how to evaluate officially on spider challenge?

Thanks.

preporcessing issue

Previous related issue
#21

My command line output:

DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 297.03it/s]
train section:   1%|█▏                                                                                                      | 99/8659 [00:02<03:46, 37.76it/s]100 sample done at c= 100
train section:   1%|█▏                                                                                                      | 99/8659 [00:02<04:07, 34.59it/s]
DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 281.66it/s]
val section:   9%|█████████▉                                                                                                | 97/1034 [00:02<00:28, 32.90it/s]100 sample done at c= 200
val section:   9%|█████████▉                                                                                                | 97/1034 [00:02<00:22, 41.70it/s]
87 words in vocab
Exception ignored in: <function CoreNLP.__del__ at 0x7efde4998560>
Traceback (most recent call last):
  File "/app/ratsql/resources/corenlp.py", line 24, in __del__
  File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 83, in stop
  File "/opt/conda/lib/python3.7/subprocess.py", line 1790, in kill
AttributeError: 'NoneType' object has no attribute 'SIGKILL'

I have also tried to increase docker memory to 8gb.
Any suggestion?

can't build dockerfile

when I use docker to build Dockerfile, always occurs this error:

image

someone can help me to resolve this problem, thanks

Asking for pre-trained models!

Hi! Hope everybody is doing well.
Is it possible for you to share the pre-trained models??
Thank you for the consideration!

build Dockerfile causes segmentation fault

Hi, while trying to run the Dockerfile using docker build -t ratsql . the process keeps failing due to a segmentation fault while trying to download Bert (step 7).
Has anyone else encountered this issue?

Thanks!

Step 7/14 : RUN python -c "from transformers import BertModel; BertModel.from_pretrained('bert-large-uncased-whole-word-masking')"
 ---> Running in 43837f926306
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
Segmentation fault (core dumped)
The command '/bin/sh -c python -c "from transformers import BertModel; BertModel.from_pretrained('bert-large-uncased-whole-word-masking')"' returned a non-zero code: 139

add() argument 'other' must be tensor, not tuple

when I run:
def debug(self, model, sliced_data, output):
for i, item in enumerate(tqdm.tqdm(sliced_data)):
(
, history), = model.compute_loss([item], debug=True)
output.write(
json.dumps({
'index': i,
'history': history,
}) + '\n')
output.flush()

def compute_loss(self, enc_input, example, desc_enc, debug):
    if not (self.enumerate_order and self.training):
        mle_loss = self.compute_mle_loss(enc_input, example, desc_enc, debug)
    else:
        mle_loss = self.compute_loss_from_all_ordering(enc_input, example, desc_enc, debug)

    if self.use_align_loss:
        align_loss = self.compute_align_loss(desc_enc, example)
        **return mle_loss + align_loss**
    return mle_loss

This error appeared.I don't know how to fix it

Error in RATSQL BERT

When I am training the model using BERT, it runs till 80,990th step. But when i run the eval command, I am running across an exception- 'Attempting to infer on untrained model'. Kindly please help me to resolve this issue. Any help will be really appreciated.

Docker error

I have successfully created a docker image for the 1st time but after deleting the image and again building the image got stuck at

Step 7/14 : RUN python -c "from transformers import BertModel; BertModel.from_pretrained('bert-large-uncased-whole-word-masking')"
 ---> Running in deca10b47ffe
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html

and after some, it is giving below error

OSError: Couldn't reach server at 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-pytorch_model.bin' to download pretrained weights.

Can some suggestion on how to solve this issue

Edit

For removing the docker image I have used the below command
sudo docker image prune -a

After about 25,700 steps, the loss value suddenly gets larger and larger.The loss is still very large until now,Is this normal?

[2020-07-12T13:57:41] Logging to logdir/bert_run/bs=8,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1

[2020-07-18T04:40:38] Step 25500 stats, train: loss = 0.04672800004482269
[2020-07-18T04:40:52] Step 25500 stats, val: loss = 5.44137978553772

[2020-07-18T05:30:57] Step 25600 stats, train: loss = 0.010738465120084584
[2020-07-18T05:31:12] Step 25600 stats, val: loss = 5.063877463340759

[2020-07-18T06:20:57] Step 25700 stats, train: loss = 0.05373691872227937
[2020-07-18T06:21:09] Step 25700 stats, val: loss = 5.3940101861953735

[2020-07-18T06:49:23] Step 25800 stats, train: loss = 7.563784122467041
[2020-07-18T06:49:31] Step 25800 stats, val: loss = 10.999245643615723

[2020-07-18T07:21:47] Step 25900 stats, train: loss = 12.75868844985962
[2020-07-18T07:22:03] Step 25900 stats, val: loss = 16.12211561203003

[2020-07-18T08:15:55] Step 26000 stats, train: loss = 8.206974983215332
[2020-07-18T08:16:11] Step 26000 stats, val: loss = 12.26950979232788

[2020-07-18T09:09:56] Step 26100 stats, train: loss = 78.75397872924805
[2020-07-18T09:10:12] Step 26100 stats, val: loss = 94.27817153930664
......
[2020-07-20T04:33:23] Step 32000 stats, train: loss = 100.70531845092773
[2020-07-20T04:33:39] Step 32000 stats, val: loss = 119.73884582519531

[2020-07-20T05:24:20] Step 32100 stats, train: loss = 97.69664764404297
[2020-07-20T05:24:34] Step 32100 stats, val: loss = 117.05315780639648

[2020-07-20T06:20:09] Step 32200 stats, train: loss = 104.20828628540039
[2020-07-20T06:20:25] Step 32200 stats, val: loss = 123.8116683959961

assert next_choices is not None ERROR , previously by SIGKILL error which was apparently fixed

After training with the command:

python run.py train experiments/wikisql-glove-run.jsonnet

and getting through 3990 epochs:

[2020-11-19T18:51:02] Step 39990: loss=0.8703

I tried next step:

python run eval experiments/wikisql-glove-run.jsonnet

but I go the following error:

Loading model from logdir/glove_run/model_checkpoint-00030100
0%| | 0/8421 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 109, in
main()
File "run.py", line 91, in main
infer.main(infer_config)
File "/app/ratsql/commands/infer.py", line 163, in main
inferer.infer(model, output_path, args)
File "/app/ratsql/commands/infer.py", line 71, in infer
output, args.use_heuristic)
File "/app/ratsql/commands/infer.py", line 86, in _inner_infer
decoded = self._infer_one(model, orig_item, preproc_item, beam_size, output_history, use_heuristic)
File "/app/ratsql/commands/infer.py", line 98, in _infer_one
model, data_item, preproc_item, beam_size=beam_size, max_steps=1000, from_cond=False)
File "/app/ratsql/models/spider/spider_beam_search.py", line 59, in beam_search_with_heuristics
assert next_choices is not None
AssertionError

In the logdir/glove_run I have

drwxr-xr-x. 2 root root 54 Nov 20 10:34 ie_dirs
lrwxrwxrwx. 1 root root 25 Nov 19 18:51 model_checkpoint -> model_checkpoint-00040000
-rw-r--r--. 1 root root 142281149 Nov 19 18:51 model_checkpoint-00040000
-rw-r--r--. 1 root root 240073 Nov 19 18:51 log.txt
-rw-r--r--. 1 root root 142281149 Nov 19 18:10 model_checkpoint-00039100
-rw-r--r--. 1 root root 142281149 Nov 19 17:24 model_checkpoint-00038100
-rw-r--r--. 1 root root 142281149 Nov 19 16:38 model_checkpoint-00037100

and so on.

Please advice what went wrong?

Previously, the training thrown an error related to the the fact that SIGKILL was not recognized. I replaced, following the internet fix, with a SIGTERM and an conditional to see if the object has the method.

Could

Multi GPU training?

Hi

I have started to experiment with training this model and it seems that it can make use of a large batchsize, but even then the training times are quite long.

Are there any plans to make this project multi GPU using dataparallel or dataparallelprocessing

Sam

Problem with training the model with BERT

Hi,

I trained the model using GloVe and it works fine.

But when I try to train the model with BERT (python run.py train experiments/spider-bert-run.jsonnet) the process is killed and it shows me this line only : To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html

what's the problem?

Note that I have GPU with his driver installed.

root@XXXXXXXXXX:/app# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

But when I check CUDA if available (in the container) isn't !!

root@XXXXXXXXX:/app# python
Python 3.7.7 (default, Mar 23 2020, 22:36:06)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
print(torch.cuda.is_available())
False

Error during training (killed in step0)

Hi there,

I'm trying to run your model with BERT, but during training, the process is killed just at Step0:

root@7e709b6634de:/app# python run.py train experiments/spider-bert-run.jsonnet
To use data.metrics please install scikit-learn. See https://scikit-learn.org/st able/index.html
[2020-10-21T08:05:13] Logging to logdir/bert_run/bs=6,lr=7.4e-04,bert_lr=3.0e-06 ,end_lr=0e0,att=1
[2020-10-21T08:06:48] Step 0 stats, train: loss = 161.07755279541016
[2020-10-21T08:07:34] Step 0 stats, val: loss = 195.53395080566406
[2020-10-21T08:08:48] Step 0: loss=186.9427
Killed

@alexpolozov
@berlino
@DevanshChoubey
@MuriloSchaefer

how to conduct oracle experiment

Hi, in your paper you say "For “oracle sketch”, at every grammar nonterminal the decoder is forced to choose the correct production so the final SQL sketch exactly matches that of the ground truth".

I don't understand how the decoder is forced and I wanna ask how to conduct oracle experiment? I will appreciate it if you could explain it in detail. Thanks a lot~

Issue during preprocessing: 'CoreNLP' object has no attribute 'client'

Facing this during preprocess.
Command: python run.py preprocess experiments/spider-glove-run.jsonnet.
Someone, please help.

DB connections: 100%|████████████████████████| 166/166 [00:00<00:00, 326.78it/s]
train section:   0%|                                   | 0/8659 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run.py", line 109, in <module>
    main()
  File "run.py", line 73, in main
    preprocess.main(preprocess_config)
  File "/app/ratsql/commands/preprocess.py", line 53, in main
    preprocessor.preprocess()
  File "/app/ratsql/commands/preprocess.py", line 34, in preprocess
    self.model_preproc.add_item(item, section, validation_info)
  File "/app/ratsql/models/enc_dec.py", line 43, in add_item
    self.enc_preproc.add_item(item, section, enc_info)
  File "/app/ratsql/models/spider/spider_enc.py", line 168, in add_item
    preprocessed = self.preprocess_item(item, validation_info)
  File "/app/ratsql/models/spider/spider_enc.py", line 193, in preprocess_item
    question, question_for_copying = self._tokenize_for_copying(item.text, item.orig['question'])
  File "/app/ratsql/models/spider/spider_enc.py", line 239, in _tokenize_for_copying
    return self.word_emb.tokenize_for_copying(unsplit)
  File "/app/ratsql/resources/pretrained_embeddings.py", line 67, in tokenize_for_copying
    ann = corenlp.annotate(text, self.corenlp_annotators)
  File "/app/ratsql/resources/corenlp.py", line 45, in annotate
    _singleton = CoreNLP()
  File "/app/ratsql/resources/corenlp.py", line 20, in __init__
    Landing page: https://stanfordnlp.github.io/CoreNLP/''')
Exception: Please install Stanford CoreNLP and put it at /app/third_party/stanford-corenlp-full-2018-10-05.

                Direct URL: http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip
                Landing page: https://stanfordnlp.github.io/CoreNLP/
Exception ignored in: <function CoreNLP.__del__ at 0x7f87b0e83cb0>
Traceback (most recent call last):
  File "/app/ratsql/resources/corenlp.py", line 24, in __del__
    self.client.stop()
AttributeError: 'CoreNLP' object has no attribute 'client'

Loss starts to increase during BERT model training

Hi, I'm trying to reproduce your results with the BERT model. After ~14000 training steps, loss started to increase. I tried rerun, but it didn't help me. Have you faced this problem? This situation looks similar to #3, #7.

Log:

[2020-07-28T15:25:54] Logging to logdir/bert_run/bs=6,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1
...
[2020-07-29T13:59:21] Step 14100 stats, train: loss = 1.1323808431625366
[2020-07-29T13:59:27] Step 14100 stats, val: loss = 3.3228100538253784
...
[2020-07-29T14:08:51] Step 14200 stats, train: loss = 0.9168887138366699
[2020-07-29T14:08:57] Step 14200 stats, val: loss = 3.5443124771118164
...
[2020-07-29T14:18:30] Step 14300 stats, train: loss = 2.303567111492157
[2020-07-29T14:18:37] Step 14300 stats, val: loss = 4.652050733566284
...
[2020-07-29T14:28:01] Step 14400 stats, train: loss = 95.80101776123047
[2020-07-29T14:28:08] Step 14400 stats, val: loss = 112.55300903320312

Does it need question id convert after computing cv_link in BERT mode?

It's a nice idea to encode question&schema using RAT, and thanks for your sharing of the well designed codes.

I have some doubt about cell-value linking in BERT mode when reading your code.

In ratsql/models/spider/spider_enc.py SpiderEncoderBertPreproc.preprocess_item (from line 670 to 683), I realized that you compute sc link, and then cv link.

In the procedure of sc linking you call the function Bertokenize.bert_schema_linking, which uses normalized_piece to compute schema linking, and converts question (token) id back to pieces id using idx_map.

But in the procedure of cv linking you only linking question_bert_tokens.normalized_pieces with item.schema without converting id back.

Is that a bug? I'm hoping for you response.

Unexpected error while running eval script

Hi,

We encounter an unexpected error while running the eval command. Does any one meet the same issue?

Command

python run.py eval experiments/spider-bert-run.jsonnet

And here is the detail error stack.

Traceback (most recent call last):
  File "run.py", line 109, in <module>
    main()
  File "run.py", line 104, in main
    res_json = json.load(open(eval_output_path))
FileNotFoundError: [Errno 2] No such file or directory: '__LOGDIR__/ie_dirs/bert_run_true_1-step30100.eval'

We are using latest code and the commit hash is f2e0033

Could not enable GPU for Bert based model

I am training the Bert based RATSQL model in Google colab pro. It takes roughly around 55 minutes for 10 steps. After started training, a warning pops up in the colab notebook. "WARNING: you are connected to a GPU runtime but not utilizing the GPU". After debugging the code, it shows that GPU is not enabled for Bert training. What can be done to enable CUDA? And please tell me what is the good environmental setup for training Bert based model?

Running Error: Segmentation fault (core dumped)

Thanks for your work. But when I ran the program, I meet some errors.
When I execute
python run.py train experiments/spider-bert-run.jsonnet
I got an error about Segmentation fault (core dumped)
I found that when the code running at the rat-sql/ratsql/utils/random_state.py
line 12 : self.torch_cpu_state = torch.get_rng_state()

I use gdb to debug, the log is shown below:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fa00000561d in ?? ()
(gdb) where
#0  0x00007fa00000561d in ?? ()
#1  0x00007fa069add1d9 in c10::detail::LogAPIUsageFakeReturn(std::string const&) () from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libc10.so
#2  0x00007fa069ace17d in c10::TensorImpl::TensorImpl(c10::Storage&&, c10::TensorTypeSet, caffe2::TypeMeta const&, c10::optional<c10::Device>) () from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libc10.so
#3  0x00007fa069acec2e in c10::TensorImpl::TensorImpl(c10::Storage&&, c10::TensorTypeSet) () from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libc10.so
#4  0x00007fa06bae0ff7 in at::Tensor at::detail::make_tensor<c10::TensorImpl, c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::TensorTypeId>(c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >&&, c10::TensorTypeId&&) () from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libtorch.so
#5  0x00007fa06bad2cf8 in at::native::empty_cpu(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) () from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libtorch.so
#6  0x00007fa06bcb76fb in at::CPUType::(anonymous namespace)::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) ()
   from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libtorch.so
#7  0x00007fa0b1b37592 in torch::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) () from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#8  0x00007fa0b1b973b6 in THPGenerator_getState(THPGenerator*, _object*) () from /app/miniconda3_docker/envs/ratsql/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#9  0x0000561d32553ea1 in _PyMethodDef_RawFastCallKeywords ()
#10 0x0000561d3255414f in _PyMethodDescr_FastCallKeywords ()
#11 0x0000561d325affa9 in _PyEval_EvalFrameDefault ()
#12 0x0000561d3255307b in _PyFunction_FastCallKeywords ()
#13 0x0000561d325afe6e in _PyEval_EvalFrameDefault ()
#14 0x0000561d324f206b in _PyFunction_FastCallDict ()
#15 0x0000561d32508a03 in _PyObject_Call_Prepend ()
#16 0x0000561d3254baaa in slot_tp_init ()
#17 0x0000561d32554298 in _PyObject_FastCallKeywords ()
#18 0x0000561d325aff56 in _PyEval_EvalFrameDefault ()
#19 0x0000561d324f1059 in _PyEval_EvalCodeWithName ()
#20 0x0000561d324f2134 in _PyFunction_FastCallDict ()
#21 0x0000561d32508a03 in _PyObject_Call_Prepend ()
#22 0x0000561d3254baaa in slot_tp_init ()
#23 0x0000561d32554298 in _PyObject_FastCallKeywords ()
#24 0x0000561d325b06b2 in _PyEval_EvalFrameDefault ()
#25 0x0000561d324f206b in _PyFunction_FastCallDict ()
#26 0x0000561d32508a03 in _PyObject_Call_Prepend ()
#27 0x0000561d3254baaa in slot_tp_init ()
#28 0x0000561d32554298 in _PyObject_FastCallKeywords ()
#29 0x0000561d325aff56 in _PyEval_EvalFrameDefault ()
#30 0x0000561d3255307b in _PyFunction_FastCallKeywords ()
#31 0x0000561d325afe6e in _PyEval_EvalFrameDefault ()
#32 0x0000561d3255307b in _PyFunction_FastCallKeywords ()
#33 0x0000561d325aba66 in _PyEval_EvalFrameDefault ()
#34 0x0000561d324f1059 in _PyEval_EvalCodeWithName ()
#35 0x0000561d324f1f24 in PyEval_EvalCodeEx ()
#36 0x0000561d324f1f4c in PyEval_EvalCode ()
#37 0x0000561d3260aa14 in run_mod ()
#38 0x0000561d32613f11 in PyRun_FileExFlags ()
#39 0x0000561d32614104 in PyRun_SimpleFileExFlags ()
#40 0x0000561d32615bbd in pymain_main.constprop ()
#41 0x0000561d32615e30 in _Py_UnixMain ()
#42 0x00007fa1065d9b97 in __libc_start_main (main=0x561d324d1d20 <main>, argc=4, argv=0x7ffeccd7fc68, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffeccd7fc58) at ../csu/libc-start.c:310
#43 0x0000561d325bb052 in _start () at ../sysdeps/x86_64/elf/start.S:103

Thanks.

Loss does not drop

Hi,

I'm trying to run your model but during training the loss does not drop.

Here is the part of the loss.

(qiusi) qiusi@mewtwo:~/rat-sql$ python run.py preprocess experiments/spider-glove-run.jsonnet
WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
DB connections: 100%|████████████████████████████████████████████| 166/166 [00:01<00:00, 84.90it/s]
train section: 100%|███████████████████████████████████████████| 8659/8659 [07:43<00:00, 18.69it/s]
DB connections: 100%|███████████████████████████████████████████| 166/166 [00:01<00:00, 115.15it/s]
val section: 100%|█████████████████████████████████████████████| 1034/1034 [00:47<00:00, 21.98it/s]

(qiusi) qiusi@mewtwo:~/rat-sql$ python run.py train experiments/spider-glove-run.jsonnet
[2020-07-11T20:53:06] Logging to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0
Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint
[2020-07-11T20:55:38] Step 10: loss=178.1385
[2020-07-11T20:57:36] Step 20: loss=185.5836
[2020-07-11T20:59:29] Step 30: loss=165.1267
[2020-07-11T21:01:16] Step 40: loss=175.4593
[2020-07-11T21:03:07] Step 50: loss=191.4991
[2020-07-11T21:05:07] Step 60: loss=198.2391
[2020-07-11T21:07:04] Step 70: loss=204.6865
[2020-07-11T21:08:56] Step 80: loss=156.1282
[2020-07-11T21:10:47] Step 90: loss=158.8194
[2020-07-11T21:12:44] Step 100 stats, train: loss = 159.02596282958984
[2020-07-11T21:13:02] Step 100 stats, val: loss = 188.73794555664062
[2020-07-11T21:13:13] Step 100: loss=195.0599
[2020-07-11T21:15:03] Step 110: loss=166.1000
[2020-07-11T21:16:56] Step 120: loss=160.7225
[2020-07-11T21:18:48] Step 130: loss=172.8267
[2020-07-11T21:20:38] Step 140: loss=200.5286
[2020-07-11T21:22:29] Step 150: loss=194.8727
[2020-07-11T21:24:21] Step 160: loss=211.9967
[2020-07-11T21:26:17] Step 170: loss=215.9024
[2020-07-11T21:28:10] Step 180: loss=196.7601
[2020-07-11T21:30:00] Step 190: loss=186.0729
[2020-07-11T21:32:02] Step 200 stats, train: loss = 159.02596282958984
[2020-07-11T21:32:19] Step 200 stats, val: loss = 188.73794555664062
[2020-07-11T21:32:29] Step 200: loss=175.3226
[2020-07-11T21:34:22] Step 210: loss=176.8896
[2020-07-11T21:36:17] Step 220: loss=229.3411
[2020-07-11T21:38:08] Step 230: loss=183.6960
[2020-07-11T21:39:59] Step 240: loss=201.4256
[2020-07-11T21:41:50] Step 250: loss=171.4176

The loss remains unchanged in the two eval_on_train, as if the model has not been updated.

I did not use docker but installed related dependencies.
Is the error caused by this reason?
Thank you!!!

I run the Glove model and can get the max accuracy only 57.3%

Thanks for sharing the well-designed codes.

I have run the glove model and got the following eval results. However, all of my checkpoints (after 30000 steps) are less than 57.5% accuracy, which is far away from the leaderboard results.

I get warnings about EncDec during the training and eval. I was wondering whether this is the reason causing my bad results?

Eval results:
WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00030100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 377.05it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:48<00:00, 4.52it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 518.72it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step30100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 460.43it/s] 30100 0.5319148936170213 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00031100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 478.05it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:52<00:00, 4.45it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 535.65it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step31100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 540.92it/s] 31100 0.5531914893617021 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00032100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 522.01it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:39<00:00, 4.71it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 517.54it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step32100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 511.89it/s] 32100 0.5483558994197292 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00033100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 493.44it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:38<00:00, 4.73it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 514.43it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step33100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 499.72it/s] 33100 0.5502901353965184 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00034100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 526.80it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:34<00:00, 4.82it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 533.41it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step34100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 524.09it/s] 34100 0.5493230174081238 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00035100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 540.29it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:36<00:00, 4.78it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 529.28it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step35100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 508.08it/s] 35100 0.5493230174081238 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00036100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 528.12it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:32<00:00, 4.88it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 534.44it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step36100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 536.96it/s] 36100 0.5667311411992263 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00037100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 529.62it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:32<00:00, 4.87it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 524.56it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step37100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 510.18it/s] 37100 0.5676982591876208 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00038100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 504.98it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:34<00:00, 4.83it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 533.56it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step38100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 532.26it/s] 38100 0.5725338491295938 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00039100 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 523.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:29<00:00, 4.93it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 524.81it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step39100.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 518.20it/s] 39100 0.5609284332688588 WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} WARNING <class 'ratsql.models.enc_dec.EncDecModel'>: superfluous {'decoder_preproc': {'grammar': {'clause_order': None, 'end_with_from': True, 'factorize_sketch': 2, 'include_literals': False, 'infer_from_conditions': True, 'name': 'spider', 'output_from': True, 'use_table_pointer': True}, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'use_seq_elem_rules': True}, 'encoder_preproc': {'compute_cv_link': False, 'compute_sc_link': True, 'count_tokens_in_word_emb_for_vocab': True, 'db_path': 'data/spider/database', 'fix_issue_16_primary_keys': True, 'include_table_name_in_column': False, 'max_count': 5000, 'min_freq': 4, 'save_path': 'data/spider/nl2code-glove,cv_link=false', 'word_emb': {'kind': '42B', 'lemmatize': True, 'name': 'glove'}}} Loading model from logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/model_checkpoint-00040000 DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 524.89it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [03:34<00:00, 4.82it/s] DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 538.59it/s] Wrote eval results to logdir/glove_run/bs=20,lr=7.4e-04,end_lr=0e0,att=0/ie_dirs/glove_run_true_1-step40000.eval DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 523.47it/s] 40000 0.5657640232108317

RATSQL v2 vs v3?

Hey, I was looking at the leaderboard, and it is not clear what was changed from version 2 to 3.

RATSQL v3 + BERT (DB content used) 69.7 | 65.6
RATSQL v2 + BERT (DB content used) 62.7 | 57.2

Can you elaborate on what changes were made?
Thank you.

constants in SQL

How to select constants in the process of the AST generation?

initial state of the decoder

Hi, I looked into the code of RATSQL. It seems the initial state of the decoder is zero instead of the end state of the questions.

Am I understanding this part clearly? Is there any specific reason you don't follow traditional seq2seq that the initial state of the decoder is the end state of the question?

bug: lr_scheduler does not work after restarting from checkpoint

Hi, I think I've found a tricky bug.

Line

last_step = saver.restore(modeldir, map_location=self.device)

when actually loading from a checkpoint file (does not happen when starting fresh training because no checkpoint exists) breaks the connection between optimizer and lr_scheduler as it ends up calling load_state_dict from torch.optim.Optimizer, which in this line
https://github.com/pytorch/pytorch/blob/ee77ccbb6da4e2efd83673e798acf7081bc03564/torch/optim/optimizer.py#L155-L157
creates a new reference to param group. Same in the current pytorch https://github.com/pytorch/pytorch/blob/ec6de6a697668e594a3f1d49e9a87a7c94b6164b/torch/optim/optimizer.py#L185-L187

This can be fix by adding lr_scheduler.param_groups = optimizer.param_groups after calling saver.restore, which is not pretty at all. Maybe there is a better fix?

Best,
Anton

Error during training

Hi there,

I'm trying to run your model, I managed to make the preprocess step but during training I'm getting the following error:

root@1ea4af5e6c45:/app# python run.py preprocess experiments/spider-bert-run.jsonnet
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 559.67it/s]
train section: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [21:49<00:00,  6.61it/s]
DB connections: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 795.36it/s]
val section: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [07:16<00:00,  2.37it/s]


root@1ea4af5e6c45:/app# python run.py train experiments/spider-bert-run.jsonnet
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
[2020-07-09T16:29:33] Logging to logdir/bert_run/bs=8,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1
[2020-07-09T16:30:44] Step 0 stats, train: loss = 157.97794342041016
Traceback (most recent call last):
  File "run.py", line 109, in <module>
    main()
  File "run.py", line 77, in main
    train.main(train_config)
  File "/app/ratsql/commands/train.py", line 274, in main
    trainer.train(config, modeldir=args.logdir)
  File "/app/ratsql/commands/train.py", line 184, in train
    num_eval_items=self.train_config.num_eval_items)
  File "/app/ratsql/commands/train.py", line 225, in _eval_model
    batch_res = model.eval_on_batch(eval_batch)
  File "/app/ratsql/models/enc_dec.py", line 109, in eval_on_batch
    mean_loss = self.compute_loss(batch).item()
  File "/app/ratsql/models/enc_dec.py", line 79, in _compute_loss_enc_batched
    loss = self.decoder.compute_loss(enc_input, dec_output, enc_state, debug)
  File "/app/ratsql/models/nl2code/decoder.py", line 510, in compute_loss
    mle_loss = self.compute_mle_loss(enc_input, example, desc_enc, debug)
  File "/app/ratsql/models/nl2code/decoder.py", line 636, in compute_mle_loss
    type_info = self.ast_wrapper.singular_types[node['_type']]
TypeError: 'NoneType' object is not subscriptable

I'm following the instructions on the readme file

I saw that to train this model I need at least 16gb of VRAM, I only have 8Gb tho. I was expecting for an out of memory error instead. Am I missing something?

How to run RATSQL in google colab?

As in the title, I want to run RATSQL in google collab but I don't know how to set up the environment sens Google collab doesn't support docker!!

Thank you.

Preprocessing problem

Hi,

When I run the preprocessing script (python run.py preprocess experiments/spider-glove-run.jsonnet), it gives me the following error !!

WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:25<00:00, 6.48it/s]
train section: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [1:17:47<00:00, 1.86it/s]
DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [09:33<00:00, 3.45s/it]
val section: 20%|█████████████████████▊ | 211/1034 [3:10:35<16:09:18, 70.67s/it]
val section: 63%|████████████████████████████████████████████████████████████████████▏ | 653/1034 [7:03:05<5:04:30, 47.95s/it]
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.7/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.7/http/client.py", line 306, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.7/http/client.py", line 267, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 423, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=30.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 109, in
main()
File "run.py", line 73, in main
preprocess.main(preprocess_config)
File "/app/ratsql/commands/preprocess.py", line 53, in main
preprocessor.preprocess()
File "/app/ratsql/commands/preprocess.py", line 34, in preprocess
self.model_preproc.add_item(item, section, validation_info)
File "/app/ratsql/models/enc_dec.py", line 43, in add_item
self.enc_preproc.add_item(item, section, enc_info)
File "/app/ratsql/models/spider/spider_enc.py", line 168, in add_item
preprocessed = self.preprocess_item(item, validation_info)
File "/app/ratsql/models/spider/spider_enc.py", line 193, in preprocess_item
question, question_for_copying = self._tokenize_for_copying(item.text, item.orig['question'])
File "/app/ratsql/models/spider/spider_enc.py", line 239, in _tokenize_for_copying
return self.word_emb.tokenize_for_copying(unsplit)
File "/app/ratsql/resources/pretrained_embeddings.py", line 67, in tokenize_for_copying
ann = corenlp.annotate(text, self.corenlp_annotators)
File "/app/ratsql/resources/corenlp.py", line 46, in annotate
return _singleton.annotate(text, annotators, output_format, properties)
File "/app/ratsql/resources/corenlp.py", line 28, in annotate
result = self.client.annotate(text, annotators, output_format, properties)
File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 225, in annotate
r = self._request(text.encode('utf-8'), properties)
File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 192, in _request
timeout=(self.timeout*2)/1000)
File "/opt/conda/lib/python3.7/site-packages/requests/api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=30.0)
val section: 63%|████████████████████████████████████████████████████████████████████▏ | 653/1034 [7:10:21<4:11:05, 39.54s/it]
Exception ignored in: <function CoreNLP.del at 0x7f2f490c8320>
Traceback (most recent call last):
File "/app/ratsql/resources/corenlp.py", line 24, in del
File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 83, in stop
File "/opt/conda/lib/python3.7/subprocess.py", line 1790, in kill
AttributeError: 'NoneType' object has no attribute 'SIGKILL'

Process getting killed after running 100%

I am trying to run the GloVe model on spider dataset. When I try to run the preprocess step it runs up to 100% and then displays killed. I am using my personal laptop running on CPU. I am following the instructions on the Readme file.
Screenshot 2020-08-04 at 11 07 30 PM

Why GPU is not used when training bert model

Hi,
When training the BERT model, I didn't change the default settings file, but only 250 steps have been completed for nearly 2 days. Why is it so slow?
And when I run 'nvidia-smi' command, No running program detected, I attached below.

python run.py train experiments/spider-bert-run.jsonnet
[2020-08-03T12:37:20] Logging to logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1
[2020-08-03T12:44:43] Step 0 stats, train: loss = 158.00450134277344
[2020-08-03T12:55:02] Step 0 stats, val: loss = 187.51132202148438
^[[A^[[B[2020-08-03T13:06:20] Step 0: loss=147.2463
[2020-08-03T14:54:06] Step 10: loss=164.6095
[2020-08-03T16:26:34] Step 20: loss=207.7960
[2020-08-03T17:57:49] Step 30: loss=148.4615
[2020-08-03T19:31:26] Step 40: loss=204.2443
[2020-08-03T21:05:35] Step 50: loss=113.9567
[2020-08-03T22:38:39] Step 60: loss=147.5295
[2020-08-04T00:10:32] Step 70: loss=112.1358
[2020-08-04T01:43:07] Step 80: loss=208.5774
[2020-08-04T03:14:14] Step 90: loss=161.5002
[2020-08-04T04:54:04] Step 100 stats, train: loss = 130.48414611816406
[2020-08-04T05:01:46] Step 100 stats, val: loss = 155.1061248779297
[2020-08-04T05:10:42] Step 100: loss=139.9932
[2020-08-04T06:45:24] Step 110: loss=97.4097
[2020-08-04T08:20:06] Step 120: loss=125.1138
[2020-08-04T09:54:11] Step 130: loss=184.3925
[2020-08-04T11:23:59] Step 140: loss=181.3083
[2020-08-04T12:58:00] Step 150: loss=106.3904
[2020-08-04T14:30:41] Step 160: loss=124.2398
[2020-08-04T16:01:37] Step 170: loss=90.2176
[2020-08-04T17:55:44] Step 180: loss=128.8443
[2020-08-04T19:14:49] Step 190: loss=124.6035
[2020-08-04T20:38:50] Step 200 stats, train: loss = 112.74446868896484
[2020-08-04T20:47:12] Step 200 stats, val: loss = 133.75208282470703
[2020-08-04T20:54:27] Step 200: loss=161.0717
[2020-08-04T22:16:30] Step 210: loss=182.9842
[2020-08-04T23:40:16] Step 220: loss=154.8580
[2020-08-05T01:00:57] Step 230: loss=184.8499
[2020-08-05T02:22:38] Step 240: loss=137.9579
[2020-08-05T04:04:16] Step 250: loss=91.6928

+--------------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|---------------------------------+----------------------+-----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|=======================+=================+=================|
| 0 Quadro GV100 On | 00000000:18:00.0 Off | 0 |
| 29% 35C P2 24W / 250W | 0MiB / 32508MiB | 0% Default |
+--------------------------------+-----------------------+----------------------+
| 1 Quadro GV100 On | 00000000:3B:00.0 Off | 0 |
| 29% 37C P2 25W / 250W | 0MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Quadro GV100 On | 00000000:86:00.0 Off | 0 |
| 29% 36C P2 25W / 250W | 1MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Quadro GV100 On | 00000000:AF:00.0 Off | 0 |
| 29% 36C P2 25W / 250W | 0MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=========================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Question about match foreign key

Hi!
Here are some more detailed questions about the code. When constructing the relationship, there is a match_foreign_key(cls, desc, col, table) method(spider_enc_modules.py). At the end of the method (line 705 of spider_enc_modules.py):return desc['column_to_table'][str(col)] == foreign_table
I think it should be return table == foreign_table?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.