Giter Site home page Giter Site logo

bidaf-pytorch's Introduction

BiDAF-pytorch

Re-implementation of BiDAF(Bidirectional Attention Flow for Machine Comprehension, Minjoon Seo et al., ICLR 2017) on PyTorch.

Results

Dataset: SQuAD v1.1

Model(Single) EM(%)(dev) F1(%)(dev)
Re-implementation 64.8 75.7
Baseline(paper) 67.7 77.3

Development Environment

  • OS: Ubuntu 16.04 LTS (64bit)
  • GPU: Nvidia Titan Xp
  • Language: Python 3.6.2.
  • Pytorch: 0.4.0

Requirements

Please install the following library requirements specified in the requirements.txt first.

torch==0.4.0
nltk==3.2.4
tensorboardX==0.8
torchtext==0.2.3

Execution

python run.py --help

usage: run.py [-h] [--char-dim CHAR_DIM]
          [--char-channel-width CHAR_CHANNEL_WIDTH]
          [--char-channel-size CHAR_CHANNEL_SIZE]
          [--dev-batch-size DEV_BATCH_SIZE] [--dev-file DEV_FILE]
          [--dropout DROPOUT] [--epoch EPOCH] [--gpu GPU]
          [--hidden-size HIDDEN_SIZE] [--learning-rate LEARNING_RATE]
          [--print-freq PRINT_FREQ] [--train-batch-size TRAIN_BATCH_SIZE]
          [--train-file TRAIN_FILE] [--word-dim WORD_DIM]

optional arguments:
  -h, --help            show this help message and exit
  --char-dim CHAR_DIM
  --char-channel-width CHAR_CHANNEL_WIDTH
  --char-channel-size CHAR_CHANNEL_SIZE
  --dev-batch-size DEV_BATCH_SIZE
  --dev-file DEV_FILE
  --dropout DROPOUT
  --epoch EPOCH
  --gpu GPU
  --hidden-size HIDDEN_SIZE
  --learning-rate LEARNING_RATE
  --print-freq PRINT_FREQ
  --train-batch-size TRAIN_BATCH_SIZE
  --train-file TRAIN_FILE
  --word-dim WORD_DIM

bidaf-pytorch's People

Contributors

dependabot[bot] avatar galsang avatar mottled233 avatar saravsak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bidaf-pytorch's Issues

GPU memory issues

First of all, fantastic work on this implementation of BiDAF, very compact and readable!

I am however having strange trouble with GPU memory consumption. With train batch size 10, dev batch size 50 and context threshold 400 it uses up to 10 GB memory during training. Full disclosure, I'm using a google translated version of SQuAD into a different language, but with the context threshold set to 400 I don't expect this to make a significant difference. There is also not at all a linear relationship between batch size and memory consumption, for example I can almost keep the train batch size at 20 without running out of memory (my card has 12 GB memory). Any idea what might cause this behaviour? Were you able to train the network at batch sizes 60/100 with 12 GB GPU ram?
EDIT: For reference, I have been able to train BiDAF with the same dataset and the same hardware using the authors TF implementation at batch size 50.

I also noticed a couple of minor issues:

  1. not specifying a dimension for torch.squeeze() in the forward function of the model will cause it to crash with batch size 1. I am far from a pytorch expert so I can't say what best practice is, but to me it seems good to always specify the dimension argument to avoid these types of issues :)
  2. if the maximum word size of all questions in a batch is less than the char channel width the forward function crashes. I solved this by adding the following function as a postprocessing function to the CHAR_NESTING field in the SQuAD data module, which just adds padding to the chars:
def char_postprocessing(batch, vocab):

    padded_batch = []
    pad_length = 2

    for chars in batch:
        pad_behind = [vocab.stoi['<pad>'] for _ in range(pad_length)]
        pad_front = [vocab.stoi['<pad>'] for _ in range(pad_length)]
        chars = pad_behind + chars + pad_front
        padded_batch.append(chars)

    return padded_batch

Some questions about the implementation in package utils

Hi!
I'm a beginner of pytorch and deep learning and now I'm considering implementing the bidaf model by myself and I think your code is so clean! I love it so much!
So I still don't know why we need to re-implement the linear module and the lstm layer by ourself? Are the models implemented in the torch.nn module not better enough? Or it's convenient to modified the hyper-parameter?
Thanks u a lot!

In att_flow_layer of bidaf model

Hi,I have just started to learn QA models and thank u sooo much 4 sharing this.
I found that the attention u write is a little bit different from the origin paper:
on line 141 of model.py
s = self.att_weight_c(c).expand(-1, -1, q_len) +
self.att_weight_q(q).permute(0, 2, 1).expand(-1, c_len, -1) +
cq
However, the paper use [h; u; h ◦u], that is 6d after concatenation, which is different from ur multiplication above.
Does it make a difference?

what's the function of file ema.py?

Hi, thank you for implement of BiDAF in this clear way, I am a beginner of pytorch, so I am confused about what's the function of ema.py, one function I guess is saving the parameters which are trainable during training. And I don't understand the update method, Could you please why you use this in implement. Thank you again.
def update(self, name, x): assert name in self.shadow new_average = (1.0 - self.mu) * x + self.mu * self.shadow[name] self.shadow[name] = new_average.clone()

Why not use PyTorch LSTM and Linear

Hi Taeuk! Thank you for implementing BiDAF in such an elegant way! Why do you use your version of LSTM and linear in utils.nn instead of using the PyTorch LSTM and linear?

Some confusions about Contextual Embedding Layer?

Hi, thanks for your sharing this reproduction code firstly, I am a little confused about the Contextual Embedding Layer. Are there two ways for Contextual embedding? one way is using the same BiLSTM for both passage and question, the other way is using the different BiLSTMs for passage and question. and you employ the first way, using the same BiLSTM, is it right?

Some confusions about the Char Emb Layer in class BiDAF

Hi. Thank you for sharing your code. It make a great help and I learnt a lot from it.

I'm a little confused about the padding_idx = 1 in self.char_emb. In my opinion, this para means that when the EmbeddingLayer receive a 1, it will output a zero-vector.
Is there any special effect for the character corresponding to the index 1?

The other one is about the CharCNN. I glanced the offical code which is implement by tf. I found there was a ReLU after the cnn. I tried to apply the ReLU in my code but the result decrease by 5%. So why you didn't use it. Is this just a negligence or another reason?

Error:RawField object has no attribute 'is_target'

Thanks for your sharing, your code is readable! when i cloned the project to run ,i got this problem" the
RawField object has no attribute 'is_target" .But I m a noob ,can you tell me what may be the problem?
Thank you!

can't optimize a non-leaf Tensor

Thanks for your sharing.
but I encounter an error as follow,and this error arises when putting parameters into the optimizer
Could you help me to resolve this problem. Thanks!!!
image
image

ntlk错误


Resource 'tokenizers/punkt/PY3/english.pickle' not found.
Please use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/home/sunshen/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
我是远程连接服务器,请问怎么解决这个问题

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'.There is a small bug during running on gpu

There is a small bug during running on gpu causing by the version of torchtext!

when I run this code on GPU, unfortunately,

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

So, I guess the tensor may be not on GPU, and there is a warning:

The `device` argument should be set by using `torch.device` or passing a string as an argument. This behavior will be deprecated soon and currently defaults to CPU.

I change the parameter device, and then working now.

device = torch.device(f"cuda:{args.gpu}" if torch.cuda.is_available() else "cpu")
self.train_iter, self.dev_iter = \
            data.BucketIterator.splits((self.train, self.dev),
            batch_sizes=[args.train_batch_size, args.dev_batch_size],
            device=device,
            sort_key=lambda x: len(x.c_word))

maybe you could change it (●'◡'●)

Cuda out of memory will fix when pytorch update to 1.0

Thanks for your code, very clear!
When i first clone the code and run in my gtx 1060 6G GPU, it exploded soon. I try to delect some intermediate variable in the Bidaf model, but have no effect. Finally, update pytorch to 1.0 version and run again, Nvidia-smi show only about 4G GPU memory be used, even add up the train_batch_size to 128.
So cool !

why change char_dim and word_len dimesion and then use conv2d

around line 82~88 in model.py
# (batch * seq_len, 1, char_dim, word_len)
x = x.view(-1, self.args.char_dim, x.size(2)).unsqueeze(1)
# (batch * seq_len, char_channel_size, 1, conv_len) -> (batch * seq_len, char_channel_size, conv_len)
x = self.char_conv(x).squeeze()
why need change the dims first ? Why not directly use conv1D ?

Predictor?

Thank you for your nice implementation. The training went well.
I have been trying to build a predictor using your model, but I have been encountering a series of problems unresolvable for me.

Can you make a predictor out of your current code, to achieve something like
answer = predictor(context, question)

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.