Hi, have you tried model.evaluate_on_batch ? I keep ge

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

evaluate_on_batch fails about stateful_multi_gpu HOT 8 CLOSED

visionscaper commented on May 30, 2024

evaluate_on_batch fails

from stateful_multi_gpu.

Comments (8)

visionscaper commented on May 30, 2024

Hi @felixhao28, likely you have encountered the same Keras issue as reported here and here. Can you post your model code?

from stateful_multi_gpu.

felixhao28 commented on May 30, 2024

I have made a workaround on this issue. Now I can evaluate/predict and save weights as expected. I am sorry I cannot share the code directly due to company's policy, but here's the idea:

Variables are created when a layer is constructed. And connected to a graph when the layer is called with inputs. You need to construct the layers in CPU and connect them in each GPU to avoid duplicating parameters (there should be only one copy).
In order to do that, I cannot use composite models. Instead of getting a model from model_generator, I get the outputs tensors given pre-constructed layers (LSTM, Dense, Embedding etc.) and input tensors (slice_i in your case).
Remove fixes (if any) for multi_gpu_model such as model.__setattr__('callback_model', single_model). You don't need them anymore.

I come up with my solution because I have experience building multi GPU model in TensorFlow. Since Keras is only a wrapper to TensorFlow backend so I figured they should work in the same manner.

One downside is you don't have direct access to the single_model anymore. In your case, it is model = model_generator(sub_batch_size). So model.summary() now looks a little bit messy.

from stateful_multi_gpu.

visionscaper commented on May 30, 2024

Hi @felixhao28,

Thanks for your input. Unfortunately it is very cryptic to me, it is a shame you cannot share your solution; I don't need to know your particular model, just the code skeleton of your multi-gpu solution would be enough.

You need to construct the layers in CPU and connect them in each GPU to avoid duplicating parameters (there should be only one copy).

I understand from point 2 that you create the outputs and the inputs with tf.device('/cpu:0') (?). Then what? How do you 'connect them in each GPU`?

I also don't recognise the model.__setattr__('callback_model', single_model) code, where did you see that? (multi_gpu_utils.py of Keras also does not contain this code either).

It would be great if I can adapt your solution in this repo, if you could explain more details (or provide some (pseudo) code), it would be very helpful.

Thanks!

Edit
PS: Are you sure that you don't have effectively created two separate models, one on each GPU, each having its own weights (so, not shared)?

What I tried to do myself earlier, which is very similar to what you are describing, is to create a generator that can generate an output tensor by connecting provided inputs and predefined layers.

Per GPU I use the generator to create a graph connecting the given sliced inputs to the predefined layers which creates an output tensor.
These output tensors are then combined as done before in the script.

What happens though is that I get an error indicating that layers with the same name exist in the final model. This is obvious because I reused the layers on all GPUs!

Is this similar to what you have done. How did you solve the issue of having the same layer names in each graph per GPU?

from stateful_multi_gpu.

felixhao28 commented on May 30, 2024

Unfortunately it is very cryptic to me

English is not my primary language but at least code speaks itself. This is a minimal example of my "solution".

# Initialize variables on CPU
with tf.device('/cpu:0'):
    full_batch_inputs = Input(name='full_input', batch_shape=(batch_size, None, rnn_size))
    rnn_layer = LSTM(name='rnn', rnn_size, stateful=True, input_shape=(None, rnn_size))
    output_layer = Dense(name='output', [output_dim1, output_dim2], activation='softmax')

n_outputs = 2

all_outputs = []
for i in range(n_outputs):
    all_outputs.append([])

for i in range(gpus):
    with tf.device('/gpu:%d' % i):
        with tf.name_scope('replica_%d' % i):
            inputs = []
            # Retrieve a slice of the input.
            for inp in full_batch_inputs:
                slice_i = SliceBatch(batch_size, gpus, i)(inp)
                inputs.append(slice_i)
            # Connect computational graph
            outputs = output_layer(rnn_layer(inputs))
            # Save the outputs for merging back together later
            for o in n_outputs:
                all_outputs[o].append(outputs[o])

# Merge outputs on CPU
with tf.device('/cpu:0'):
    merged = []
    for outputs in all_outputs:
        merged.append(concatenate(outputs, axis=0))
    return Model(full_batch_inputs, merged)

I also don't recognise the model.setattr('callback_model', single_model) code

It was required for the vanilla multi_gpu_model to make use of ModelCheckpoint callback and some other callbacks that only works on a non-composite model. It does not directly appear in your code or source code of Keras. I wrote it just for people who might get here from googling.

Are you sure that you don't have effectively created two separate models, one on each GPU, each having its own weights (so, not shared)?

I am not 100 percent sure but it appears not. I checked the numbers of parameters Keras reported and the graph TensorBoard reported:

full_batch_size = 4000
input_dim = 512
rnn_size = 512
gpus = 4
#             kernel      recurrent_kernel     bias
2099200 == 512 * 512 * 4 + 512 * 512 * 4  +  512 * 4

replica_1

replica_0

You can see replica_1 reads data from replica_0 who hosts actual matrix for embedding (which on a second thought is weird because I explicitly created embedding matrix on CPU).

The saved model (hdf5) is significant larger than a stateless counterpart somehow. I took a brief look at it and I noticed it saved training parameters for Adam as well. I am pretty sure that is the real culprit behind the big size. At least it makes continuous training easier.

What happens though is that I get an error indicating that layers with the same name exist in the final model. This is obvious because I reused the layers on all GPUs!

I did the exactly the same and got the same warning at first. That's why I emphasized creating layers before connecting them.

from stateful_multi_gpu.

visionscaper commented on May 30, 2024

Hi @felixhao28, thanks so much for taking the time to answer all my questions! I'm also not a native English speaker so no worries there :).

In my attempt to fix this, as I mentioned in my previous response, I seem to do exactly what you are doing. I'm going to try to find what the difference is between our solutions and report back.

Cheers,

Freddy

from stateful_multi_gpu.

visionscaper commented on May 30, 2024

@felixhao28 I found my issue! The solution I made earlier is exactly the same as your solution. I also instantiated my RNN layers before hand, what I did not do however is to also pre-instantiate layers such as RepeatVector. Since I gave these layers a name, the same name was used on each GPU causing the name duplication error.

When I removed the naming of these layers my system worked, but with the separate instances for RepeatVector (and similar layers) per GPU.

Would you agree that layers that don't have any trainable parameters or internal state can be duplicated over all GPUs without issues? (thus not one instance of a layer for all GPUs)

I will update the repo to not use model generators but output generators instead.

Will also keep you posted!

from stateful_multi_gpu.

visionscaper commented on May 30, 2024

@felixhao28, by the way, do you have any theory of why this works and composing models not?

from stateful_multi_gpu.

felixhao28 commented on May 30, 2024

Would you agree that layers that don't have any trainable parameters or internal state can be duplicated over all GPUs without issues?

I agree and I think it is the only way to do data parallelism training. Ideally every layer should read parameters from one single copy in memory and becomes parameter-less somehow.

do you have any theory of why this works and composing models not?

I will need to dig down into Keras source code to understand this. My wild guess is the lazy style of creating train/test/predict function in model class, which means some computational graph is not properly connected under TF device scope.

from stateful_multi_gpu.

evaluate_on_batch fails about stateful_multi_gpu HOT 8 CLOSED

Comments (8)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent