Giter Site home page Giter Site logo

dme's Introduction

Dynamic Meta-Embeddings for Improved Sentence Representations

Code and models for the paper Dynamic Meta-Embeddings for Improved Sentence Representations.

Requirements

  • Python 2.7 or 3.6+
  • PyTorch >= 0.4.1
  • torchtext >= 0.2.3
  • torchvision >= 0.2.1
  • Spacy >= 2.0.11
  • NumPy >= 1.14.0
  • jsonlines
  • tqdm
  • six

Getting started

Downloading the data

First, you should get pre-trained embeddings and pre-processed datasets in place. For embeddings, run

python get_embeddings.py --embeds fasttext,glove

(An example for fasttext and glove. Available embeddings are fasttext, fasttext_wiki, fasttext_opensubtitles, fasttext_torontobooks, glove, levy_bow2 and imagenet.)

For Flickr30k dataset, run

python get_flickr30k.py --flickr30k_root './data/flickr30k' --batch_size 32

with specified batch size for image feature extraction and Flickr30k root folder that includes dataset_flickr30k.json and images subfolder for all images.

For SNLI/MultiNLI/SST dataset, run get_snli.py, get_multinli.py and get_sst2.py, respectively.

The downloaded embedding and datasets will be located at ./data/embeddings and ./data/datasets, respectively.

Training the models

Then, you can train the model by running train.py:

python train.py [arguments...]

Arguments are listed as follows:

  --name NAME           experiment name
  --task {snli,multinli,allnli,sst2,flickr30k}
                        task to train the model on
  --datasets_root DATASETS_ROOT
                        root path to dataset files
  --embeds_root EMBEDS_ROOT
                        root path to embedding files
  --savedir SAVEDIR     root path to checkpoint and caching files
  --batch_sz BATCH_SZ   minibatch size
  --clf_dropout CLF_DROPOUT
                        dropout in classifier MLP
  --early_stop_patience EARLY_STOP_PATIENCE
                        patience in early stopping criterion
  --grad_clip GRAD_CLIP
                        gradient clipping threshold
  --lr LR               learning rate
  --lr_min LR_MIN       minimal learning rate
  --lr_shrink LR_SHRINK
                        learning rate decaying factor
  --max_epochs MAX_EPOCHS
                        maximal number of epochs
  --optimizer {adam,sgd}
                        optimizer
  --resume_from RESUME_FROM
                        checkpoint file to resume training from (default is
                        the one with same experiment name)
  --scheduler_patience SCHEDULER_PATIENCE
                        patience in learning rate scheduler
  --seed SEED           random seed
  --attnnet {none,no_dep_softmax,dep_softmax,no_dep_gating,dep_gating}
                        the attention type
  --emb_dropout EMB_DROPOUT
                        the dropout in embedder
  --proj_embed_sz PROJ_EMBED_SZ
                        dimension of projected embeddings (default is the
                        smallest dimension out of all embeddings)
  --embeds EMBEDS       pre-trained embedding names
  --mixmode {cat,proj_sum}
                        method of combining embeddings
  --nonlin {none,relu}  nonlinearity in embedder
  --rnn_dim RNN_DIM     dimension of RNN sentence encoder
  --fc_dim FC_DIM       hidden layer size in classifier MLP
  --img_cropping {1c,rc}
                        image cropping method (1c/rc: center/random cropping)
                        in image caption retrieval task
  --img_feat_dim IMG_FEAT_DIM
                        image feature size in image caption retrieval task
  --margin MARGIN       margin in ranking loss for image caption retrieval
                        task

Here is an example for training SNLI model using fastText and glove embeddings:

python train.py --task snli \
--datasets_root data/datasets --embeds_root data/embeddings --savedir checkpoints \
--embeds fasttext,glove --mixmode proj_sum --attnnet no_dep_softmax \
--nonlin relu --rnn_dim 128 --fc_dim 128 \
--optimizer adam --lr 0.0004 --lr_min 0.00008 --batch_sz 64 --emb_dropout 0.2 --clf_dropout 0.2

Allowing more types of embeddings

To allow using new types of embeddings in training, put the embedding files into data/embeddings. Then update the embeddings list in dme/embeddings.py with a new tuple per new type of embeddings. Each tuple will provide the id of the embeddings, the embedding filename, the dimensionality, a description and the downloading URL (optional).

Pre-trained models

SNLI

--batch_sz 64 --clf_dropout 0.2 --lr 0.0004 --lr_min 0.00008 --emb_dropout 0.2 --proj_embed_sz 256 --embeds fasttext,glove --rnn_dim 512 --fc_dim 1024

DME (Accuracy: 86.9096%) / CDME (Accuracy: 86.6042%)

MultiNLI

--batch_sz 64 --clf_dropout 0.2 --lr 0.0004 --lr_min 0.00008 --emb_dropout 0.2 --proj_embed_sz 256 --embeds fasttext,glove --rnn_dim 512 --fc_dim 1024

DME (Accuracy: 74.3084%) / CDME (Accuracy: 74.7152%)

SST2

--batch_sz 64 --clf_dropout 0.5 --lr 0.0004 --lr_min 0.00005 --emb_dropout 0.5 --proj_embed_sz 256 --embeds fasttext,glove --rnn_dim 512 --fc_dim 512

DME (Accuracy: 89.5113%) / CDME (Accuracy: 88.1933%)

Flickr30k

--batch_sz 128 --clf_dropout 0.1 --early_stop_patience 5 --lr 0.0003 --lr_min 0.00005 --scheduler_patience 1 --emb_dropout 0.1 --proj_embed_sz 256 --embeds fasttext,imagenet --rnn_dim 1024 --fc_dim 512 --img_cropping rc

DME (Cap/Img R@1=47.3/33.12, R@10=80.9/73.44) / CDME (Cap/Img R@1=48.2/34.5, R@10=82.3/73.58)

AllNLI

--batch_sz 64 --clf_dropout 0.2 --lr 0.0004 --lr_min 0.00008 --emb_dropout 0.2 --proj_embed_sz 256 --embeds fasttext,glove --rnn_dim 2048 --fc_dim 1024

DME (Accuracy: 80.2757%) / CDME (Accuracy: 80.4742%)

Reference

Please cite the following paper if you find this code useful in your research:

D. Kiela, C. Wang, K. Cho, Dynamic Meta-Embeddings for Improved Sentence Representations

@inproceedings{kiela2018dynamic,
  title={Dynamic Meta-Embeddings for Improved Sentence Representations},
  author={Kiela, Douwe and Wang, Changhan and Cho, Kyunghyun},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  address={Brussels, Belgium},
  year={2018}
}

License

This code is licensed under CC-BY-NC 4.0.

We use SNLI, MultiNLI, SST and Flickr30k datasets in the experiments. Please check their websites for license and citation information.

Contact

This repo is maintained by Changhan Wang ([email protected]) and Douwe Kiela ([email protected]).

dme's People

Contributors

douwekiela avatar kahne avatar stephenroller avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dme's Issues

Removing UNK words from sentence ? why ?

@kahne

DME/train.py

Line 67 in a3217ee

text_field = data.Field(include_lengths=True, init_token='<s>', eos_token='</s>', preprocessing=filter_by_emb_vocab)

why filtering out the words based on the combined embedding vocabulary in the preprocessing function ? instead of replacing with <unk> ?

For exapmle :

sentence : hide new secretions from the parental units

if the word secretions is not in the emb_vocab ,
the sentence will be converted into hide new from the parental units ,
I think correct sentence should be hide new <unk> from the parental units

Support Python 3

The readme mentions Python 2 which is almost obsolete. Please upgrade to and formally support Python 3.7 which is the current version. Python 2 code cannot be readily integrated into modern codebases.

mismatch on the attention formula in the paper and implementation

Hi, thanks for sharing the code! It really helped me understanding the paper.
I have a question on the way you calculate the attention score, especially on where to apply softmax on.
2018-12-07 17 13 55

From this, it seems that the softmax function(g) takes as input "word vectors" FROM TIME STEP 1 TO S, meaning that alpha distribution over the SEQUENCE, not on NUMBER OF PRETRAINED VECTORS.

DME/dme/embedders.py

Lines 183 to 203 in 97631c4

def forward(self, words):
projected = [self.projectors[name](self.embedders[name](words)) for name in self.emb_names]
if self.args.attnnet == 'none':
out = sum(projected)
else:
projected_cat = torch.cat([p.unsqueeze(2) for p in projected], 2)
s_len, b_size, _, emb_dim = projected_cat.size()
attn_input = projected_cat
if self.args.attnnet.startswith('dep_'):
attn_input = attn_input.view(s_len, b_size * self.n_emb, -1)
self.m_attn = self.attn_1(self.attn_0(attn_input)[0])
self.m_attn = self.m_attn.view(s_len, b_size, self.n_emb)
elif self.args.attnnet.startswith('no_dep_'):
self.m_attn = self.attn_1(self.attn_0(attn_input)).squeeze(3)
if self.args.attnnet.endswith('_gating'):
self.m_attn = torch.sigmoid(self.m_attn)
elif self.args.attnnet.endswith('_softmax'):
self.m_attn = F.softmax(self.m_attn, dim=2)

However, line 203 takes softmax over dim=2, which is NUMBER OF PRETRAINED VECTORS(self.n_emb).

I am little confused about the mismatch here. I think the formula on the paper should be revised. Am I missing something? Please help me!

SNLI checkpoint incompatible

I believe the checkpoint for DME you provide is incompatible with the model that is constructed using SNLI data for task=snli. Mainly, the vocabulary somehow mismatches and doesn't allow the checkpoint's outermost layers, for embeddings, to be loaded into the model so constructed. Of course, we could train our model using the script, but just thought I should mention this in case more people are facing the same issue.

Code for CDME?

Thanks for this great repo. Is the code for CDME going to be available?

Link to Arxiv

I see that the readme links to the corresponding Arxiv paper. Please update the link to point to the HTML abstract, i.e. to https://arxiv.org/abs/1804.07983 , and not to the PDF. Some readers merely want to read the abstract only. If they want the PDF, they can easily click on the PDF link on the Arxiv HTML page.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.