Comments (12)
@ziadloo ,
Thanks for pointing these out. I'll have a go through and make the necessary changes.
@John-8704 ,
This is something I will soon working on. Yes, since LSTM returns two states, it might not work properly.
from attention_keras.
@John-8704 ,
I had a look at your code. The problem is that you are using None
as the length of your inputs length. This layer can handle only one None
dimension in it's input (currently). Try the following for example. It'll work.
timesteps = 15
encoder_inputs = Input(shape=(timesteps,))
# Decoder with attention
timesteps = 10
decoder_inputs = Input(shape=(timesteps,))
PS: If you can request this as a feature, I can start working on it at some point. :)
from attention_keras.
Regarding your question. I think you're right. encoder_inf_states
seems to be redundant. You should be able to use encoder_inf_out
as you suggested. But I'll need to double check whether there was a reason I did this. But I suspect this is just a mistake. I'll update the issue if I find anything.
from attention_keras.
@ziadloo Thanks for these fixes. Also in this implementation GRU was used and if I were to use LSTM do I need to make any changes?
I mean in GRU (return_state=True)
only return 1 hidden state
where as in LSTM(return_state=True)
return 1 hidden state
and 1 cell state
. I couldn't figure out and I was wondering if you were able to help me out.
from attention_keras.
@John-8704
I am currently looking at using AttentionLayer for LSTMs. Can you post the error you're getting? Technically, it should work out of the box. Because, you should be using the output of the LSTM layer (which will have only a single state) and not the state of the LSTM.
encoder_gru = GRU(hidden_size, return_sequences=True, return_state=True, name='encoder_gru')
encoder_out, encoder_state = encoder_gru(encoder_inputs)
# Set up the decoder GRU, using `encoder_states` as initial state.
decoder_gru = GRU(hidden_size, return_sequences=True, return_state=True, name='decoder_gru')
decoder_out, decoder_state = decoder_gru(decoder_inputs, initial_state=encoder_state)
# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_out, decoder_out])
from attention_keras.
This is the error that I've been getting when I try to use Bi-Directional LSTM's inplace of GRU's. Were you able to get lstm's working with the attention layer?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-55-bf8ea5683d13> in <module>
1 nmt_model, enc_outputs, dec_outputs = seq2seq(input_vocab, target_vocab, embed_size,
2 units, en_embedding_matrix,
----> 3 fr_embedding_matrix)
4 nmt_model.summary()
<ipython-input-54-3c3245d693ea> in seq2seq(input_vocab, target_vocab, embed_size, lstm_units, en_embedding_matrix, fr_embedding_matrix)
10 lstm_units, enc_states)
11
---> 12 attention_scores, dec_context = attention(enc_outputs, dec_outputs)
13 # output layer
14 dense1 = Dense(dec_context.get_shape()[-1], activation='elu', name='dense1')
<ipython-input-53-56fd806996ff> in attention(enc_outputs, dec_outputs)
38 # attention weights
39 attn_layer = AttentionLayer(name='attention_layer')
---> 40 attn_out, attn_energy = attn_layer([enc_outputs, dec_outputs])
41 # context
42 dec_context = Concatenate(name='dec_context')([attn_out, dec_outputs])
~/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
840 not base_layer_utils.is_in_eager_or_tf_function()):
841 with auto_control_deps.AutomaticControlDependencies() as acd:
--> 842 outputs = call_fn(cast_inputs, *args, **kwargs)
843 # Wrap Tensors in `outputs` in `tf.identity` to avoid
844 # circular dependencies.
~/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py in wrapper(*args, **kwargs)
235 except Exception as e: # pylint:disable=broad-except
236 if hasattr(e, 'ag_error_metadata'):
--> 237 raise e.ag_error_metadata.to_exception(e)
238 else:
239 raise
TypeError: in converted code:
<ipython-input-52-1b23d5a92cb2>:32 energy_step *
W_a_dot_s = K.reshape(K.dot(reshaped_enc_outputs, self.W_a), (-1, en_seq_len, en_hidden))
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py:4035 rnn
input_time_zero, tuple(initial_states) + tuple(constants))
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py:2737 reshape
return array_ops.reshape(x, shape)
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:131 reshape
result = gen_array_ops.reshape(tensor, shape, name)
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_array_ops.py:8117 reshape
"Reshape", tensor=tensor, shape=shape, name=name)
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py:530 _apply_op_helper
raise err
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py:527 _apply_op_helper
preferred_dtype=default_dtype)
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1296 internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py:286 _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py:227 constant
allow_broadcast=True)
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py:265 _constant_impl
allow_broadcast=allow_broadcast))
/home/john76/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_util.py:545 make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (-1, None, 200). Consider casting elements to a supported type.
from attention_keras.
@John-8704 Can you also post the full model you got in place?
from attention_keras.
@thushv89 This is the link to the notebook I'm working on. Also i'm the dataset is english-french sentences which also I've uploaded here. https://github.com/John-8704/code
A small note, In my notebook I initialized fake states like this same as your code, It's exactly same as attention.py from your repo. "I think the error might be related to this but i'm not sure."
def create_inital_state(inputs, hidden_size):
# We are not using initial states, but need to pass something to K.rnn funciton
fake_state = K.zeros_like(inputs) # <= (batch_size, enc_seq_len, latent_dim
fake_state = K.sum(fake_state, axis=[1, 2]) # <= (batch_size)
fake_state = K.expand_dims(fake_state) # <= (batch_size, 1)
fake_state = K.tile(fake_state, [1, hidden_size]) # <= (batch_size, latent_dim
return fake_state
fake_state_c = create_inital_state(encoder_out_seq, encoder_out_seq.shape[-1])
fake_state_e = create_inital_state(encoder_out_seq, encoder_out_seq.shape[1])
If I instead initialize fake states like the above suggestion by ziadloo,
fake_state_e = K.zeros_like(K.placeholder(shape=(decoder_out_seq.shape[0], 1)))
fake_state_c = K.zeros_like(K.placeholder(shape=(decoder_out_seq.shape[0], 1)))
then I'm getting the error like the one above which I already shared in my previous comment.
from attention_keras.
@thushv89 Any update on why the error might be occurring? I couldn't figure out why I'm getting that error. Any temp fix?
from attention_keras.
@thushv89 Thanks for the help. Also which is the correct output shape we should use?
Original:
def compute_output_shape(self, input_shape):
""" Outputs produced by the layer """
return [
tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[1][2])),
tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
]
Suggested:
def compute_output_shape(self, input_shape):
""" Outputs produced by the layer """
return [
tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][2])),
tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
]
from attention_keras.
Hi @thushv89,
I didnt really get why did you initialize a new Input Layer
for the encoder_inf_states
here.
why didn't you do the attention between the encoder_inf_out
and the decoder_inf_out
instead of:
attn_inf_out, attn_inf_states = attn_layer([encoder_inf_states, decoder_inf_out])
Any comment?
from attention_keras.
It will be great if you can clarify to me what is the role of the K.rnn
in computing the energy output and the context vector here and here.
Thanks!
from attention_keras.
Related Issues (20)
- None Input Shape HOT 3
- Installation failed HOT 1
- TypeError: The added layer must be an instance of class Layer. HOT 2
- AttributeError: 'tuple' object has no attribute '_keras_mask' HOT 1
- AttentionLayer does not support model.save() and load_model() HOT 2
- The vocab size definition for English and French Tokenizers is incorrect HOT 1
- AttributeError: 'tuple' object has no attribute 'layer' HOT 3
- state HOT 1
- Failed to convert object of type HOT 3
- TypeError: object of type 'Concatenate' has no len() HOT 3
- Bahdanau attention HOT 2
- Help Required
- Very important!!!! HOT 2
- Support for using None in sequence length HOT 3
- Masking Support
- I got this error: 'tuple' object has no attribute '_keras_shape' any help please?
- ValueError: The first argument to `Layer.call` must always be passed. HOT 1
- Getting "TypeError: Exception encountered when calling layer "tf.keras.backend.rnn" (type TFOpLambda)" when I employ the Attention Layer HOT 11
- "tf.keras.backend.rnn"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from attention_keras.