This is the code we used to conduct the experiments for this paper.
It allows you to group tokens in SQL corpora, which are commonly co-located, for easier prediction. The tools even allow to restrict the grouping to neighbors in the AST. Below you can see an illustration of an encoded example sentence.
If you want to use BPE encodings for your own projects on SQL data, please check the little doc on it.
If you are using torchtext you might also find some functions in utils.py
helpful.
mars-wei / sqlbpe Goto Github PK
View Code? Open in Web Editor NEWThis project forked from samuelgabriel/sqlbpe
The implementation for the paper `Byte-Pair Encoding for Text-to-SQL Generation`.
Home Page: https://arxiv.org/abs/1910.08962