Thanks for this amazing code. I was working on your implementation when I came across

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Minor issues - but nothing's broken about attention_keras HOT 12 OPEN

thushv89 commented on June 25, 2024 1

Minor issues - but nothing's broken

from attention_keras.

Comments (12)

thushv89 commented on June 25, 2024 2

@ziadloo ,

Thanks for pointing these out. I'll have a go through and make the necessary changes.

@John-8704 ,

This is something I will soon working on. Yes, since LSTM returns two states, it might not work properly.

from attention_keras.

thushv89 commented on June 25, 2024 1

@John-8704 ,

I had a look at your code. The problem is that you are using None as the length of your inputs length. This layer can handle only one None dimension in it's input (currently). Try the following for example. It'll work.

timesteps = 15
encoder_inputs = Input(shape=(timesteps,))

# Decoder with attention
timesteps = 10
decoder_inputs = Input(shape=(timesteps,))

PS: If you can request this as a feature, I can start working on it at some point. :)

from attention_keras.

thushv89 commented on June 25, 2024 1

@OmniaZayed ,

Regarding your question. I think you're right. encoder_inf_states seems to be redundant. You should be able to use encoder_inf_out as you suggested. But I'll need to double check whether there was a reason I did this. But I suspect this is just a mistake. I'll update the issue if I find anything.

from attention_keras.

user06039 commented on June 25, 2024

@ziadloo Thanks for these fixes. Also in this implementation GRU was used and if I were to use LSTM do I need to make any changes?

I mean in GRU (return_state=True) only return 1 hidden state where as in LSTM(return_state=True) return 1 hidden state and 1 cell state. I couldn't figure out and I was wondering if you were able to help me out.

from attention_keras.

thushv89 commented on June 25, 2024

@John-8704

I am currently looking at using AttentionLayer for LSTMs. Can you post the error you're getting? Technically, it should work out of the box. Because, you should be using the output of the LSTM layer (which will have only a single state) and not the state of the LSTM.

    encoder_gru = GRU(hidden_size, return_sequences=True, return_state=True, name='encoder_gru')
    encoder_out, encoder_state = encoder_gru(encoder_inputs)

    # Set up the decoder GRU, using `encoder_states` as initial state.
    decoder_gru = GRU(hidden_size, return_sequences=True, return_state=True, name='decoder_gru')
    decoder_out, decoder_state = decoder_gru(decoder_inputs, initial_state=encoder_state)

    # Attention layer
    attn_layer = AttentionLayer(name='attention_layer')
    attn_out, attn_states = attn_layer([encoder_out, decoder_out])

from attention_keras.

user06039 commented on June 25, 2024

@thushv89

This is the error that I've been getting when I try to use Bi-Directional LSTM's inplace of GRU's. Were you able to get lstm's working with the attention layer?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-55-bf8ea5683d13> in <module>
      1 nmt_model, enc_outputs, dec_outputs = seq2seq(input_vocab, target_vocab, embed_size,
      2                                               units, en_embedding_matrix,
----> 3                                               fr_embedding_matrix)
      4 nmt_model.summary()

<ipython-input-54-3c3245d693ea> in seq2seq(input_vocab, target_vocab, embed_size, lstm_units, en_embedding_matrix, fr_embedding_matrix)
     10                                        lstm_units, enc_states)
     11 
---> 12     attention_scores, dec_context = attention(enc_outputs, dec_outputs)
     13     # output layer
     14     dense1 = Dense(dec_context.get_shape()[-1], activation='elu', name='dense1')

<ipython-input-53-56fd806996ff> in attention(enc_outputs, dec_outputs)
     38     # attention weights
     39     attn_layer = AttentionLayer(name='attention_layer')
---> 40     attn_out, attn_energy = attn_layer([enc_outputs, dec_outputs])
     41     # context
     42     dec_context = Concatenate(name='dec_context')([attn_out, dec_outputs])

~/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    840                     not base_layer_utils.is_in_eager_or_tf_function()):
    841                   with auto_control_deps.AutomaticControlDependencies() as acd:
--> 842                     outputs = call_fn(cast_inputs, *args, **kwargs)
    843                     # Wrap Tensors in `outputs` in `tf.identity` to avoid
    844                     # circular dependencies.

~/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py in wrapper(*args, **kwargs)
    235       except Exception as e:  # pylint:disable=broad-except
    236         if hasattr(e, 'ag_error_metadata'):
--> 237           raise e.ag_error_metadata.to_exception(e)
    238         else:
    239           raise

TypeError: in converted code:

   <ipython-input-52-1b23d5a92cb2>:32 energy_step  *
        W_a_dot_s = K.reshape(K.dot(reshaped_enc_outputs, self.W_a), (-1, en_seq_len, en_hidden))
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py:4035 rnn
        input_time_zero, tuple(initial_states) + tuple(constants))
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py:2737 reshape
        return array_ops.reshape(x, shape)
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:131 reshape
        result = gen_array_ops.reshape(tensor, shape, name)
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_array_ops.py:8117 reshape
        "Reshape", tensor=tensor, shape=shape, name=name)
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py:530 _apply_op_helper
        raise err
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py:527 _apply_op_helper
        preferred_dtype=default_dtype)
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1296 internal_convert_to_tensor
        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py:286 _constant_tensor_conversion_function
        return constant(v, dtype=dtype, name=name)
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py:227 constant
        allow_broadcast=True)
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py:265 _constant_impl
        allow_broadcast=allow_broadcast))
    /home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_util.py:545 make_tensor_proto
        "supported type." % (type(values), values))

    TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (-1, None, 200). Consider casting elements to a supported type.

from attention_keras.

thushv89 commented on June 25, 2024

@John-8704 Can you also post the full model you got in place?

from attention_keras.

user06039 commented on June 25, 2024

@thushv89 This is the link to the notebook I'm working on. Also i'm the dataset is english-french sentences which also I've uploaded here. https://github.com/John-8704/code

A small note, In my notebook I initialized fake states like this same as your code, It's exactly same as attention.py from your repo. "I think the error might be related to this but i'm not sure."

def create_inital_state(inputs, hidden_size):
    # We are not using initial states, but need to pass something to K.rnn funciton
    fake_state = K.zeros_like(inputs)  # <= (batch_size, enc_seq_len, latent_dim
    fake_state = K.sum(fake_state, axis=[1, 2])  # <= (batch_size)
    fake_state = K.expand_dims(fake_state)  # <= (batch_size, 1)
    fake_state = K.tile(fake_state, [1, hidden_size])  # <= (batch_size, latent_dim
    return fake_state

fake_state_c = create_inital_state(encoder_out_seq, encoder_out_seq.shape[-1])
fake_state_e = create_inital_state(encoder_out_seq, encoder_out_seq.shape[1])

If I instead initialize fake states like the above suggestion by ziadloo,

fake_state_e = K.zeros_like(K.placeholder(shape=(decoder_out_seq.shape[0], 1)))
fake_state_c = K.zeros_like(K.placeholder(shape=(decoder_out_seq.shape[0], 1)))

then I'm getting the error like the one above which I already shared in my previous comment.

from attention_keras.

user06039 commented on June 25, 2024

@thushv89 Any update on why the error might be occurring? I couldn't figure out why I'm getting that error. Any temp fix?

from attention_keras.

user06039 commented on June 25, 2024

@thushv89 Thanks for the help. Also which is the correct output shape we should use?

Original:

    def compute_output_shape(self, input_shape):
        """ Outputs produced by the layer """
        return [
            tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[1][2])),
            tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
        ]

Suggested:

def compute_output_shape(self, input_shape):
    """ Outputs produced by the layer """
    return [
        tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][2])),
        tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
    ]

from attention_keras.

OmniaZayed commented on June 25, 2024

Hi @thushv89,

I didnt really get why did you initialize a new Input Layer for the encoder_inf_states here.
why didn't you do the attention between the encoder_inf_out and the decoder_inf_out instead of:
attn_inf_out, attn_inf_states = attn_layer([encoder_inf_states, decoder_inf_out])

Any comment?

from attention_keras.

OmniaZayed commented on June 25, 2024

@thushv89

It will be great if you can clarify to me what is the role of the K.rnn in computing the energy output and the context vector here and here.

Thanks!

from attention_keras.

Minor issues - but nothing's broken about attention_keras HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent