guotong1988 / nl2sql-rule Goto Github PK

View Code? Open in Web Editor NEW

184.0 7.0 48.0 6.48 MB

Content Enhanced BERT-based Text-to-SQL Generation https://arxiv.org/abs/1910.07179

Python 60.86% Jupyter Notebook 39.14%

nl2sql pytorch semantic-parsing nlp bert knowledge-representation knowledge deep-learning text2sql rule-inject-to-model

nl2sql-rule's Introduction

NL2SQL-RULE

Content Enhanced BERT-based Text-to-SQL Generation https://arxiv.org/abs/1910.07179

Motivation

REASONABLE: Incorporating database design rule into text-to-sql generation:

We use the matching information of the table cells and question string to construct a vector where its length is the same to the question length. This question vector mainly improves the performance of WHERE-VALUE inference result. Because it injects the knowledge that the answer cell and its corresponding table header are bound together. If we locate the answer cell then we locate the answer column which contains the answer cell.
We use the matching information of all the table headers and the question string to construct a vector where its length is the same to the table headers’ length. This header vector mainly improves the performance of WHERE-COLUMN inference result.

Requirements

python 3.6

torch 1.1.0

Run

Step-1

Data prepare: Download all origin data( https://drive.google.com/file/d/1iJvsf38f16el58H4NPINQ7uzal5-V4v4 or https://download.csdn.net/download/guotong1988/13008037) and put them at data_and_model directory.

Then run data_and_model/output_entity.py

Step-2

Train and eval: train.py

Results on BERT-Base-Uncased without Execution-Guided-Decoding

Model	Dev logical form accuracy	Dev execution accuracy	Test logical form accuracy	Test execution accuracy
SQLova	80.6	86.5	80.0	85.5
Our Method	84.3	90.3	83.7	89.2

Data

One data view:

{
	"table_id": "1-1000181-1",
	"phase": 1,
	"question": "Tell me what the notes are for South Australia ",
	"question_tok": ["Tell", "me", "what", "the", "notes", "are", "for", "South", "Australia"],
	"sql": {
		"sel": 5,
		"conds": [
			[3, 0, "SOUTH AUSTRALIA"]
		],
		"agg": 0
	},
	"query": {
		"sel": 5,
		"conds": [
			[3, 0, "SOUTH AUSTRALIA"]
		],
		"agg": 0
	},
	"wvi_corenlp": [
		[7, 8]
	],
	"bertindex_knowledge": [0, 0, 0, 0, 4, 0, 0, 1, 3],
	"header_knowledge": [2, 0, 0, 2, 0, 1]
}

The Corresponding Table:

{
	"id": "1-1000181-1",
	"header": ["State/territory", "Text/background colour", "Format", "Current slogan", "Current series", "Notes"],
	"rows": [
		["Australian Capital Territory", "blue/white", "Yaa·nna", "ACT · CELEBRATION OF A CENTURY 2013", "YIL·00A", "Slogan screenprinted on plate"],
		["New South Wales", "black/yellow", "aa·nn·aa", "NEW SOUTH WALES", "BX·99·HI", "No slogan on current series"],
		["New South Wales", "black/white", "aaa·nna", "NSW", "CPX·12A", "Optional white slimline series"],
		["Northern Territory", "ochre/white", "Ca·nn·aa", "NT · OUTBACK AUSTRALIA", "CB·06·ZZ", "New series began in June 2011"],
		["Queensland", "maroon/white", "nnn·aaa", "QUEENSLAND · SUNSHINE STATE", "999·TLG", "Slogan embossed on plate"],
		["South Australia", "black/white", "Snnn·aaa", "SOUTH AUSTRALIA", "S000·AZD", "No slogan on current series"],
		["Victoria", "blue/white", "aaa·nnn", "VICTORIA - THE PLACE TO BE", "ZZZ·562", "Current series will be exhausted this year"]
	]
}

Trained model

https://drive.google.com/open?id=18MBm9qzobTBgWPZlpA2EErCQtsMhlTN2

Reference

https://github.com/naver/sqlova

nl2sql-rule's People

Contributors

Stargazers

Watchers

Forkers

gauravgola96 trunghlt paulfitz pokbe dragomirradev pq2385601 edolele jaydeepb-inexture databill86 tpoljak feixiongzhang mars-wei yyht victorcheong rjalmo dwtcourses bfinj bobkentt bobycv06fpm raja-mishra1 olivierbelhumeur aramvillasana shindora akashmavle5 nlp-fork kavithacd scaleframe brian1203-zz zhangxu90s vidsgr alperonal sukruthram waguy02 jrd281 lhfei bbrangeo hanjin996 michaeltaoma manikant92 techthiyanes srivishnu888 reset0898 intery89 topiemie chenmins shubhkofficial

nl2sql-rule's Issues

a requirements.txt ?

the requirements you listed obviously are not complete.

ERROR when running train.py (RuntimeError: Error(s) in loading state_dict for Seq2SQL_v1)

Hi,
I try to run python3 train.py --trained --bert_type_abb uS but it gives me this error : RuntimeError: Error(s) in loading state_dict for Seq2SQL_v1:

Details of execution is below :

XXXX@YYYY:/mnt/c/users/administrateur/desktop/sqlova$ python3 train.py --trained --bert_type_abb uS

BERT-type: uncased_L-12_H-768_A-12
Batch_size = 32
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: False
vocab size: 30522
hidden_size: 768
num_hidden_layer: 12
num_attention_heads: 12
hidden_act: gelu
intermediate_size: 3072
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
Load pre-trained parameters.
Seq-to-SQL: the number of final BERT layers to be used: 2
Seq-to-SQL: the size of hidden dimension = 100
Seq-to-SQL: LSTM encoding layer size = 2
Seq-to-SQL: dropout rate = 0.3
Seq-to-SQL: learning rate = 0.001
Traceback (most recent call last):
File "train.py", line 741, in
path_model_bert=path_model_bert, path_model=path_model)
File "train.py", line 196, in get_models
model.load_state_dict(res['model'])
File "/home/ysfmell/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Seq2SQL_v1:
size mismatch for scp.W_att.weight: copying a param with shape torch.Size([103, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for scp.W_att.bias: copying a param with shape torch.Size([103]) from checkpoint, the shape in current model is torch.Size([100]).
size mismatch for scp.W_c.weight: copying a param with shape torch.Size([100, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for scp.W_hs.weight: copying a param with shape torch.Size([100, 103]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for sap.W_att.weight: copying a param with shape torch.Size([103, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for sap.W_att.bias: copying a param with shape torch.Size([103]) from checkpoint, the shape in current model is torch.Size([100]).
size mismatch for sap.sa_out.0.weight: copying a param with shape torch.Size([100, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wnp.W_att_h.weight: copying a param with shape torch.Size([1, 103]) from checkpoint, the shape in current model is torch.Size([1, 100]).
size mismatch for wnp.W_hidden.weight: copying a param with shape torch.Size([200, 103]) from checkpoint, the shape in current model is torch.Size([200, 100]).
size mismatch for wnp.W_cell.weight: copying a param with shape torch.Size([200, 103]) from checkpoint, the shape in current model is torch.Size([200, 100]).
size mismatch for wnp.W_att_n.weight: copying a param with shape torch.Size([1, 105]) from checkpoint, the shape in current model is torch.Size([1, 100]).
size mismatch for wnp.wn_out.0.weight: copying a param with shape torch.Size([100, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wcp.W_att.weight: copying a param with shape torch.Size([103, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wcp.W_att.bias: copying a param with shape torch.Size([103]) from checkpoint, the shape in current model is torch.Size([100]).
size mismatch for wcp.W_c.weight: copying a param with shape torch.Size([100, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wcp.W_hs.weight: copying a param with shape torch.Size([100, 103]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wvp.W_att.weight: copying a param with shape torch.Size([103, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wvp.W_att.bias: copying a param with shape torch.Size([103]) from checkpoint, the shape in current model is torch.Size([100]).
size mismatch for wvp.W_c.weight: copying a param with shape torch.Size([100, 105]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wvp.W_hs.weight: copying a param with shape torch.Size([100, 103]) from checkpoint, the shape in current model is torch.Size([100, 100]).
size mismatch for wvp.wv_out.0.weight: copying a param with shape torch.Size([100, 405]) from checkpoint, the shape in current model is torch.Size([100, 400]).

No module named '_sqlite3'

Testing models with simple prediction - Generate predictions JSON files

I have 2 questions very close :

1)How can i test the model using simple user question and schema table ? simply, how to do simple prediction ?

2)How to generate predictions JSON files (for WikiSQL TrainigSet and DevSet ?)

Thank you very much for helping me.

Evaluate cmd statement

What's the correct statement to run the evaluate.py?

missing training data

Hey,

Could verify the dataset

'./data_and_model/train_tok.jsonl'

'./data_and_model/train_knowledge.jsonl'

These are same ? Because your code reading train_tok but it's missing

CRF support

Updates on Large Models.

HI is there any updates on fine-tuned model available for
uncased_L-24_H-1024_A-16 bert model.

Provide better instructions to run

Hi,

Downloaded and tried to run as per current documentation but could get anything to run 'easily'
Can you provide better instructions / documentation on how to run on a fresh system?

i.e what libs i need etc

Implemented train for testing on my dataset.

corenlp.client.PermanentlyFailedException: Timed out waiting for service to come alive.
The model asked for the Type question, when I typed question and the above error pops up?
Do you know why this might be a problem? Please help.

PS: Calling the def infer function, Changed the args like --do_train to False --do_infer to True --infer_loop to True --EG to True and in the def infer block changed the args show_table to True and show_anwer_only to True.

Thanks
Bill

bertindex_knowledge and header_knowledge

i'm struggling how to annotate bertindex_knowledge and header_knowledge, can you explain them? thanks

train.py

NL2SQL-RULE>python train.py
BERT-type: uncased_L-12_H-768_A-12
Batch_size = 8
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: True
vocab size: 30522
hidden_size: 768
num_hidden_layer: 12
num_attention_heads: 12
hidden_act: gelu
intermediate_size: 3072
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
Load pre-trained parameters.
Seq-to-SQL: the number of final BERT layers to be used: 2
Seq-to-SQL: the size of hidden dimension = 100
Seq-to-SQL: LSTM encoding layer size = 2
Seq-to-SQL: dropout rate = 0.3
Seq-to-SQL: learning rate = 0.001
Traceback (most recent call last):
File "train.py", line 726, in
dset_name="train")
File "train.py", line 221, in train
for iB, t in enumerate(train_loader):
File "C:\Users...\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 278, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users...\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 682, in init
w.start()
File "C:\Users...\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users...\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users...\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users...\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "C:\Users...\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_loader_wikisql..'

...NL2SQL-RULE>Traceback (most recent call last):
File "", line 1, in
File "C:\Users...\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users...\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Why no use of BertTokenizer?

Hello,
why there is no use of BertTokenizer and what is the advantage of the custom BasicTokenizer?

Best Regards!

Problem while running train.py

Hello, I tried to clone your repo and run it on my local system. However when I try a normal python train.py or the command mentioned in sqlova repository, I get the following error

python train.py

BERT-type: uncased_L-12_H-768_A-12
Batch_size = 32
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: False
Traceback (most recent call last):
  File "train.py", line 703, in <module>
    model, model_bert, tokenizer, bert_config = get_models(args, BERT_PT_PATH)
  File "train.py", line 164, in get_models
    args.no_pretraining)
  File "train.py", line 124, in get_bert
    bert_config.print_status()
AttributeError: 'BertConfig' object has no attribute 'print_status'

When I comment out bert_config.print_status(), I get another error as following

BERT-type: uncased_L-12_H-768_A-12
Batch_size = 32
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: False
Traceback (most recent call last):
  File "train.py", line 703, in <module>
    model, model_bert, tokenizer, bert_config = get_models(args, BERT_PT_PATH)
  File "train.py", line 164, in get_models
    args.no_pretraining)
  File "train.py", line 126, in get_bert
    model_bert = BertModel(bert_config)
TypeError: __init__() missing 2 required positional arguments: 'is_training' and 'input_ids'

Any solution to this?

"wvi_corenlp" ?

How are you getting the position of the keyword and how are you determining that the particular word is the keyword?
wvi_corenlp - Please describe about this.

Does this model support converting Chinese language to SQL?

两个额外特征

您在论文里面提到引入两个额外的特征：问句和表格中的cell的match向量，问句和列的match向量，为了更好的理解，我尝试去您的代码中找这块的代码，但不太能找得到，能麻烦大神能跟我说下这块代码在哪个脚本里嘛？

Killed - python train.py - CPU

BERT-type: uncased_L-12_H-768_A-12
Batch_size = 8
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: True
vocab size: 30522
hidden_size: 768
num_hidden_layer: 12
num_attention_heads: 12
hidden_act: gelu
intermediate_size: 3072
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
Load pre-trained parameters.
Seq-to-SQL: the number of final BERT layers to be used: 2
Seq-to-SQL: the size of hidden dimension = 100
Seq-to-SQL: LSTM encoding layer size = 2
Seq-to-SQL: dropout rate = 0.3
Seq-to-SQL: learning rate = 0.001
Killed

Key differences between NL2SQL-BERT and SQLova?

What is the main difference between this and SQLova models?
Can this model perform tasks on Multiple Tables or join operations?
How do I infer after converting my data to similar to train data?
SOTA for this Text to SQL task.

TODO, add the data type info of column into model

Testing

How do I test for a custom CSV and text for generating SQL queries?

《leveraging table content for zero-shot text-to-sql with meta-learning》

Half-same idea but accepted.
Good example to learn.

ERROR when running train (test method only)

Hi,

Can you help please ??

when i try to run train.py precisaly the test method, it gives me the error below:

BERT-type: uncased_L-24_H-1024_A-16
Batch_size = 32
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: False
vocab size: 30522
hidden_size: 1024
num_hidden_layer: 24
num_attention_heads: 16
hidden_act: gelu
intermediate_size: 4096
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
Load pre-trained parameters.
Seq-to-SQL: the number of final BERT layers to be used: 2
Seq-to-SQL: the size of hidden dimension = 100
Seq-to-SQL: LSTM encoding layer size = 2
Seq-to-SQL: dropout rate = 0.3
Seq-to-SQL: learning rate = 0.001
Traceback (most recent call last):
File "train.py", line 709, in
path_model_bert=path_model_bert, path_model=path_model)
File "train.py", line 187, in get_models
model_bert.load_state_dict(res['model_bert'])
File "/home/ysfmell/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BertModel:
Missing key(s) in state_dict: "encoder.layer.12.attention.self.query.weight", "encoder.layer.12.attention.self.query.bias", "encoder.layer.12.attention.self.key.weight", "encoder.layer.12.attention.self.key.bias", "encoder.layer.12.attention.self.value.weight", "encoder.layer.12.attention.self.value.bias", "encoder.layer.12.attention.output.dense.weight", "encoder.layer.12.attention.output.dense.bias", "encoder.layer.12.attention.output.LayerNorm.gamma", "encoder.layer.12.attention.output.LayerNorm.beta", "encoder.layer.12.intermediate.dense.weight", "encoder.layer.12.intermediate.dense.bias", "encoder.layer.12.output.dense.weight", "encoder.layer.12.output.dense.bias", "encoder.layer.12.output.LayerNorm.gamma", "encoder.layer.12.output.LayerNorm.beta", "encoder.layer.13.attention.self.query.weight", "encoder.layer.13.attention.self.query.bias", "encoder.layer.13.attention.self.key.weight", "encoder.layer.13.attention.self.key.bias", "encoder.layer.13.attention.self.value.weight", "encoder.layer.13.attention.self.value.bias", "encoder.layer.13.attention.output.dense.weight", "encoder.layer.13.attention.output.dense.bias", "encoder.layer.13.attention.output.LayerNorm.gamma", "encoder.layer.13.attention.output.LayerNorm.beta", "encoder.layer.13.intermediate.dense.weight", "encoder.layer.13.intermediate.dense.bias", "encoder.layer.13.output.dense.weight", "encoder.layer.13.output.dense.bias", "encoder.layer.13.output.LayerNorm.gamma", "encoder.layer.13.output.LayerNorm.beta", "encoder.layer.14.attention.self.query.weight", "encoder.layer.14.attention.self.query.bias", "encoder.layer.14.attention.self.key.weight", "encoder.layer.14.attention.self.key.bias", "encoder.layer.14.attention.self.value.weight", "encoder.layer.14.attention.self.value.bias", "encoder.layer.14.attention.output.dense.weight", "encoder.layer.14.attention.output.dense.bias", "encoder.layer.14.attention.output.LayerNorm.gamma", "encoder.layer.14.attention.output.LayerNorm.beta", "encoder.layer.14.intermediate.dense.weight", "encoder.layer.14.intermediate.dense.bias", "encoder.layer.14.output.dense.weight", "encoder.layer.14.output.dense.bias", "encoder.layer.14.output.LayerNorm.gamma", "encoder.layer.14.output.LayerNorm.beta", "encoder.layer.15.attention.self.query.weight", "encoder.layer.15.attention.self.query.bias", "encoder.layer.15.attention.self.key.weight", "encoder.layer.15.attention.self.key.bias", "encoder.layer.15.attention.self.value.weight", "encoder.layer.15.attention.self.value.bias", "encoder.layer.15.attention.output.dense.weight", "encoder.layer.15.attention.output.dense.bias", "encoder.layer.15.attention.output.LayerNorm.gamma", "encoder.layer.15.attention.output.LayerNorm.beta", "encoder.layer.15.intermediate.dense.weight", "encoder.layer.15.intermediate.dense.bias", "encoder.layer.15.output.dense.weight", "encoder.layer.15.output.dense.bias", "encoder.layer.15.output.LayerNorm.gamma", "encoder.layer.15.output.LayerNorm.beta", "encoder.layer.16.attention.self.query.weight", "encoder.layer.16.attention.self.query.bias", "encoder.layer.16.attention.self.key.weight", "encoder.layer.16.attention.self.key.bias", "encoder.layer.16.attention.self.value.weight", "encoder.layer.16.attention.self.value.bias", "encoder.layer.16.attention.output.dense.weight", "encoder.layer.16.attention.output.dense.bias", "encoder.layer.16.attention.output.LayerNorm.gamma", "encoder.layer.16.attention.output.LayerNorm.beta", "encoder.layer.16.intermediate.dense.weight", "encoder.layer.16.intermediate.dense.bias", "encoder.layer.16.output.dense.weight", "encoder.layer.16.output.dense.bias", "encoder.layer.16.output.LayerNorm.gamma", "encoder.layer.16.output.LayerNorm.beta", "encoder.layer.17.attention.self.query.weight", "encoder.layer.17.attention.self.query.bias", "encoder.layer.17.attention.self.key.weight", "encoder.layer.17.attention.self.key.bias", "encoder.layer.17.attention.self.value.weight", "encoder.layer.17.attention.self.value.bias", "encoder.layer.17.attention.output.dense.weight", "encoder.layer.17.attention.output.dense.bias", "encoder.layer.17.attention.output.LayerNorm.gamma", "encoder.layer.17.attention.output.LayerNorm.beta", "encoder.layer.17.intermediate.dense.weight", "encoder.layer.17.intermediate.dense.bias", "encoder.layer.17.output.dense.weight", "encoder.layer.17.output.dense.bias", "encoder.layer.17.output.LayerNorm.gamma", "encoder.layer.17.output.LayerNorm.beta", "encoder.layer.18.attention.self.query.weight", "encoder.layer.18.attention.self.query.bias", "encoder.layer.18.attention.self.key.weight", "encoder.layer.18.attention.self.key.bias", "encoder.layer.18.attention.self.value.weight", "encoder.layer.18.attention.self.value.bias", "encoder.layer.18.attention.output.dense.weight", "encoder.layer.18.attention.output.dense.bias", "encoder.layer.18.attention.output.LayerNorm.gamma", "encoder.layer.18.attention.output.LayerNorm.beta", "encoder.layer.18.intermediate.dense.weight", "encoder.layer.18.intermediate.dense.bias", "encoder.layer.18.output.dense.weight", "encoder.layer.18.output.dense.bias", "encoder.layer.18.output.LayerNorm.gamma", "encoder.layer.18.output.LayerNorm.beta", "encoder.layer.19.attention.self.query.weight", "encoder.layer.19.attention.self.query.bias", "encoder.layer.19.attention.self.key.weight", "encoder.layer.19.attention.self.key.bias", "encoder.layer.19.attention.self.value.weight", "encoder.layer.19.attention.self.value.bias", "encoder.layer.19.attention.output.dense.weight", "encoder.layer.19.attention.output.dense.bias", "encoder.layer.19.attention.output.LayerNorm.gamma", "encoder.layer.19.attention.output.LayerNorm.beta", "encoder.layer.19.intermediate.dense.weight", "encoder.layer.19.intermediate.dense.bias", "encoder.layer.19.output.dense.weight", "encoder.layer.19.output.dense.bias", "encoder.layer.19.output.LayerNorm.gamma", "encoder.layer.19.output.LayerNorm.beta", "encoder.layer.20.attention.self.query.weight", "encoder.layer.20.attention.self.query.bias", "encoder.layer.20.attention.self.key.weight", "encoder.layer.20.attention.self.key.bias", "encoder.layer.20.attention.self.value.weight", "encoder.layer.20.attention.self.value.bias", "encoder.layer.20.attention.output.dense.weight", "encoder.layer.20.attention.output.dense.bias", "encoder.layer.20.attention.output.LayerNorm.gamma", "encoder.layer.20.attention.output.LayerNorm.beta", "encoder.layer.20.intermediate.dense.weight", "encoder.layer.20.intermediate.dense.bias", "encoder.layer.20.output.dense.weight", "encoder.layer.20.output.dense.bias", "encoder.layer.20.output.LayerNorm.gamma", "encoder.layer.20.output.LayerNorm.beta", "encoder.layer.21.attention.self.query.weight", "encoder.layer.21.attention.self.query.bias", "encoder.layer.21.attention.self.key.weight", "encoder.layer.21.attention.self.key.bias", "encoder.layer.21.attention.self.value.weight", "encoder.layer.21.attention.self.value.bias", "encoder.layer.21.attention.output.dense.weight", "encoder.layer.21.attention.output.dense.bias", "encoder.layer.21.attention.output.LayerNorm.gamma", "encoder.layer.21.attention.output.LayerNorm.beta", "encoder.layer.21.intermediate.dense.weight", "encoder.layer.21.intermediate.dense.bias", "encoder.layer.21.output.dense.weight", "encoder.layer.21.output.dense.bias", "encoder.layer.21.output.LayerNorm.gamma", "encoder.layer.21.output.LayerNorm.beta", "encoder.layer.22.attention.self.query.weight", "encoder.layer.22.attention.self.query.bias", "encoder.layer.22.attention.self.key.weight", "encoder.layer.22.attention.self.key.bias", "encoder.layer.22.attention.self.value.weight", "encoder.layer.22.attention.self.value.bias", "encoder.layer.22.attention.output.dense.weight", "encoder.layer.22.attention.output.dense.bias", "encoder.layer.22.attention.output.LayerNorm.gamma", "encoder.layer.22.attention.output.LayerNorm.beta", "encoder.layer.22.intermediate.dense.weight", "encoder.layer.22.intermediate.dense.bias", "encoder.layer.22.output.dense.weight", "encoder.layer.22.output.dense.bias", "encoder.layer.22.output.LayerNorm.gamma", "encoder.layer.22.output.LayerNorm.beta", "encoder.layer.23.attention.self.query.weight", "encoder.layer.23.attention.self.query.bias", "encoder.layer.23.attention.self.key.weight", "encoder.layer.23.attention.self.key.bias", "encoder.layer.23.attention.self.value.weight", "encoder.layer.23.attention.self.value.bias", "encoder.layer.23.attention.output.dense.weight", "encoder.layer.23.attention.output.dense.bias", "encoder.layer.23.attention.output.LayerNorm.gamma", "encoder.layer.23.attention.output.LayerNorm.beta", "encoder.layer.23.intermediate.dense.weight", "encoder.layer.23.intermediate.dense.bias", "encoder.layer.23.output.dense.weight", "encoder.layer.23.output.dense.bias", "encoder.layer.23.output.LayerNorm.gamma", "encoder.layer.23.output.LayerNorm.beta".
size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([30522, 768]) from checkpoint, the shape in current model is torch.Size([30522, 1024]).
size mismatch for embeddings.position_embeddings.weight: copying a param with shape torch.Size([512, 768]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([2, 768]) from checkpoint, the shape in current model is torch.Size([2, 1024]).
size mismatch for embeddings.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for embeddings.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.0.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.0.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.0.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.0.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.0.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.0.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.0.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.0.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.1.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.1.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.1.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.1.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.1.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.1.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.1.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.1.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.2.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.2.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.2.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.2.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.2.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.2.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.2.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.2.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.3.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.3.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.3.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.3.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.3.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.3.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.3.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.3.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.4.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.4.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.4.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.4.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.4.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.4.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.4.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.4.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.5.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.5.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.5.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.5.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.5.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.5.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.5.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.5.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.6.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.6.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.6.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.6.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.6.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.6.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.6.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.6.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.7.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.7.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.7.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.7.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.7.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.7.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.7.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.7.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.8.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.8.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.8.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.8.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.8.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.8.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.8.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.8.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.9.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.9.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.9.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.9.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.9.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.9.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.9.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.9.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.10.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.10.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.10.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.10.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.10.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.10.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.10.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.10.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.11.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.11.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.11.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for encoder.layer.11.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for encoder.layer.11.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for encoder.layer.11.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for encoder.layer.11.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for encoder.layer.11.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for pooler.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for pooler.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).

How much time to train this model

Hi, you've tried to iterate 200 times in the whole training process. But I ran your code and found one epoch would takes me several hours.

evaluate how to use

您好，请问您是如何使用do_infer的， ctable_ftable1这些数据如何获得？
谢谢！

from sqlnet.dbengine import DBEngine ModuleNotFoundError: No module named 'sqlnet.dbengine'

found this problem with sqlnet==0.1

AttributeError: ‘RMKeyView‘ object has no attribute ‘index‘

Test Data

If a have a SQL table how can I convert it to test_data and test_table to find answers to a question related to the test database table. Can you please help me with this issue.

problem with knowledge when running inference

Thanks for the pretrained model! I was trying to run inference using python3 ./train.py --trained --do_infer, but get this error:

Traceback (most recent call last):
  File "./train.py", line 799, in <module>
    beam_size=1, show_table=False, show_answer_only=False
  File "./train.py", line 632, in infer
    beam_size=beam_size)
  File "/base/sqlova/model/nl2sql/wikisql_models.py", line 115, in beam_forward
    knowledge=knowledge, knowledge_header=knowledge_header)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/base/sqlova/model/nl2sql/wikisql_models.py", line 562, in forward
    knowledge = [k + (mL_n - len(k)) * [0] for k in knowledge]
TypeError: 'NoneType' object is not iterable

I've placed something in data_and_model/ctable.tables.jsonl and data_and_model/ctable.db. I've placed a ctable_knowledge.jsonl file in a few places, but I don't see how it would be read. What am I doing wrong?

How do I use this for my dataset?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.