Giter Site home page Giter Site logo

mocha-pytorch's Issues

Questions about MonotonicAttention.soft

Is the returned attention by MonotonicAttention.soft() a probability distribution?

Seems to be not, the following code:

from attention import MonotonicAttention

monotonic = MonotonicAttention().cuda()

batch_size = 1
sequence_length= 5
enc_dim, dec_dim = 10, 10
prev_attention = None
for t in range(5):
    encoder_outputs = torch.randn(batch_size, sequence_length, enc_dim).cuda()
    decoder_h = torch.randn(batch_size, dec_dim).cuda()
    attention = monotonic.soft(encoder_outputs, decoder_h, previous_alpha=prev_attention)
    prev_attention = attention
    # probability distribution ?
    print(torch.sum(attention, dim=-1).detach().cpu().numpy())

returns:

[1.]
[0.0550258]
[0.00664481]
[0.00043618]
[4.0174375e-05]

If it was a probability distribution like softmax, every row should return 1 or ?. The consequence is my alignments look like this image:
alignment_img

So my questions are:

  • Is the returned attention by MonotonicAttention.soft() a probability distribution?
  • if not, is it possible to convert to one?

implementation of `safe_cumprod`

cumprod in the MoChA paper is defined to be exclusive, while the safe_cumprod in this repo does not. Shouldn't it be:

def safe_cumprod(self, x, exclusive=False):
    """Numerically stable cumulative product by cumulative sum in log-space"""
    bsz = x.size(0)
    logsum = torch.cumsum(torch.log(torch.clamp(x, min=1e-20, max=1)), dim=1)
    if exclusive:
        logsum = torch.cat([torch.zeros(bsz, 1).to(logsum), logsum], dim=1)[:, :-1]
    return torch.exp(logsum)

And in the function soft() of MonotonicAttention:

cumprod_1_minus_p = self.safe_cumprod(1 - p_select, exclusive=True)

safe_cumprod still causes Nan grad

I tried this MonotonicAttention in my seq2seq model, which works well with vanilla attention, while after training for a while, it still encountered the Nan grad issue. I checked the parameters with Nan grad, which are all params before MonotonicAttention's output. I also deleted the "safe_cumprod" operation, and this works well. So I think there may be some problems. Does anyone tried MonotonicAttention, and what's your situation?

Something Wrong in Energy

I think
energy = self.tanh(self.W(encoder_outputs) + self.V(decoder_h).repeat(sequence_length, 1) + self.b)
should be writen as
energy = self.tanh(self.W(encoder_outputs) + self.V(decoder_h).repeat(1,sequence_length).reshape(batch_size*sequence_length,-1) + self.b)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.