Hi. I am trying to load a tensorflow meta graph from a saved checkpoint using Tensorfl

This is my meta file. <a href="https://drive.google.com/open?id=1Gj_pJLCGgTRALhFCs

Loading a saved Returnn model from its .meta file about returnn-experiments HOT 16 CLOSED

rwth-i6 commented on May 24, 2024

Loading a saved Returnn model from its .meta file

from returnn-experiments.

Comments (16)

smdshakeelhassan commented on May 24, 2024

This is my meta file, if you wish to run the code on your system
File

from returnn-experiments.

albertz commented on May 24, 2024

You just need to load the NativeLstm2 op. E.g. you could use this code:

from returnn.TFNativeOp import make_op
from returnn.NativeOp import NativeLstm2
make_op(NativeLstm2)

Or, if you want to use the existing already compiled library, you can simply do:

tf.load_op_library(so_filename)

The library is usually automatically compiled and saved somewhere like:
/var/tmp/$USER/returnn_tf_cache/ops/NativeLstm2/945a92e691/NativeLstm2.so.
You should see that in the log.

You can also explicitly use the tool tools/compile_native_op.py.

from returnn-experiments.

albertz commented on May 24, 2024

Alternatively, you can also rewrite the config, to not use nativelstm2 in there (use e.g. standardlstm or basiclstm instead). Then import the model, and save the computation graph, and also save the checkpoint. This computation graph and checkpoint should work as-is. Note that this LSTM implementation will be much slower, though.

from returnn-experiments.

albertz commented on May 24, 2024

This is not really related to Returnn experiments. This issue belongs to the main Returnn repo.

from returnn-experiments.

albertz commented on May 24, 2024

Also, this is not too much Returnn related. This is probably better asked on StackOverflow, where much more people will likely be able to help.

from returnn-experiments.

smdshakeelhassan commented on May 24, 2024

Hey. Thanks for your quick reply. I tried both methods you suggested, viz.

from returnn.TFNativeOp import make_op
from returnn.NativeOp import NativeLstm2
make_op(NativeLstm2)

and

tf.load_op_library("/tmp/ubuntu/returnn_tf_cache/ops/NativeLstm2/814f190c43/NativeLstm2.so")
tf.load_op_library("/tmp/ubuntu/returnn_tf_cache/ops/GradOfNativeLstm2/e6dcab98a5/GradOfNativeLstm2.so")

in two separate runs. However in both of them I am getting the same error as follows:

File "xport.py", line 22, in <module>
    new_saver=tf.train.import_meta_graph("/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/data/serv_test/model.238.meta")
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1435, in import_meta_graph
    meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1457, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 852, in import_scoped_meta_graph_with_return_elements
    ops.prepend_name_scope(value, scope_to_prepend_to_names))
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3478, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3538, in _as_graph_element_locked
    "graph." % repr(name))
KeyError: "The name 'data' refers to an Operation not in the graph."

Could you please help me debug this error as I am unable to see its cause?

from returnn-experiments.

albertz commented on May 24, 2024

This is a new error now.
Again, this belongs more to the Returnn GitHub repo, or to StackOverflow.

See this TF issue report, or this StackOverflow question.

How exactly did you compile the graph, and saved it? Did you use tools/compile_tf_graph.py?
For some reason, the TF op with the name "data" is not in that graph. It's hard to tell without more context why that is. (In your xport.py, it would also help if you use better_exchook, i.e. import better_exchook; better_exchook.install().)

You might be able to simply create that op. Maybe putting tf.placeholder(name="data", shape=(None,64)) or sth like that somewhere would work. But I'm not sure...

from returnn-experiments.

smdshakeelhassan commented on May 24, 2024

It is the "bidirectional LSTM global attention" model checkpoint generated by rnn.py using the unchanged returnn.config file in returnn-experiments/2018-asr-attention/librispeech/full-setup-attention. I have 4 files: viz:

checkpoint 
model.238.data-00000-of-00001 
model.238.index 
model.238.meta

I added the following placeholders

tf.placeholder(tf.float32, name="data", shape=(None,64))
tf.placeholder(tf.float32, name="source", shape=(None,64))

in the code to check for errors. But new KeyErrors popped up. Here is the error with better_exchook.istall()

EXCEPTION
Traceback (most recent call last):
  File "xport.py", line 27, in <module>
    line: new_saver=tf.train.import_meta_graph("/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/data/serv_test/model.238.meta")
    locals:
      new_saver = <not found>
      tf = <local> <module 'tensorflow' from '/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/__init__.py'>
      tf.train = <local> <module 'tensorflow._api.v1.train' from '/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/_api/v1/train/__init__.py'>
      tf.train.import_meta_graph = <local> <function import_meta_graph at 0x7f1dca33bf28>
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1435, in import_meta_graph
    line: return _import_meta_graph_with_return_elements(
              meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
    locals:
      _import_meta_graph_with_return_elements = <global> <function _import_meta_graph_with_return_elements at 0x7f1dca33ea60>
      meta_graph_or_file = <local> '/home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/data/serv_test/model.238.meta', len = 122
      clear_devices = <local> False
      import_scope = <local> None
      kwargs = <local> {}
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1457, in _import_meta_graph_with_return_elements
    line: imported_vars, imported_return_elements = (
              meta_graph.import_scoped_meta_graph_with_return_elements(
                  meta_graph_def,
                  clear_devices=clear_devices,
                  import_scope=import_scope,
                  return_elements=return_elements,
                  **kwargs))
    locals:
      imported_vars = <not found>
      imported_return_elements = <not found>
      meta_graph = <global> <module 'tensorflow.python.framework.meta_graph' from '/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py'>
      meta_graph.import_scoped_meta_graph_with_return_elements = <global> <function import_scoped_meta_graph_with_return_elements at 0x7f1dca38e510>
      meta_graph_def = <local> meta_info_def {
                                 stripped_op_list {
                                   op {
                                     name: "Add"
                                     input_arg {
                                       name: "x"
                                       type_attr: "T"
                                     }
                                     input_arg {
                                       name: "y"
                                       type_attr: "T"
                                     }
                                     output_arg {
                                       name: "z"
                                       type_attr: "T"
                                     }
                                     attr {
                                       name: "T"
                               ...
      clear_devices = <local> False
      import_scope = <local> None
      return_elements = <local> None
      kwargs = <local> {}
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 852, in import_scoped_meta_graph_with_return_elements
    line: col_op = graph.as_graph_element(
              ops.prepend_name_scope(value, scope_to_prepend_to_names))
    locals:
      col_op = <local> <tf.Operation 'source' type=Placeholder>
      graph = <local> <tensorflow.python.framework.ops.Graph object at 0x7f1d0ddc0828>
      graph.as_graph_element = <local> <bound method Graph.as_graph_element of <tensorflow.python.framework.ops.Graph object at 0x7f1d0ddc0828>>
      ops = <global> <module 'tensorflow.python.framework.ops' from '/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/ops.py'>
      ops.prepend_name_scope = <global> <function prepend_name_scope at 0x7f1d17b22620>
      value = <local> 'lstm0_fw', len = 8
      scope_to_prepend_to_names = <local> ''
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3478, in as_graph_element
    line: return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
    locals:
      self = <local> <tensorflow.python.framework.ops.Graph object at 0x7f1d0ddc0828>
      self._as_graph_element_locked = <local> <bound method Graph._as_graph_element_locked of <tensorflow.python.framework.ops.Graph object at 0x7f1d0ddc0828>>
      obj = <local> 'lstm0_fw', len = 8
      allow_tensor = <local> True
      allow_operation = <local> True
  File "/home/ubuntu/tf1.13/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3538, in _as_graph_element_locked
    line: raise KeyError("The name %s refers to an Operation not in the "
                         "graph." % repr(name))
    locals:
      KeyError = <builtin> <class 'KeyError'>
      repr = <builtin> <built-in function repr>
      name = <local> 'lstm0_fw', len = 8
KeyError: "The name 'lstm0_fw' refers to an Operation not in the graph."

Looks like there are no operations at all in the graph. Do I need to compile a graph from the saved tensorflow checkpoint before exporting using tools/compile_tf_graph.py? Am I missing some important step?

from returnn-experiments.

albertz commented on May 24, 2024

Which checkpoint exactly? And what (meta) graph? Can you give the link?

The error is strange... Can you look into the graph file and see what is in there?

from returnn-experiments.

smdshakeelhassan commented on May 24, 2024

This is my meta file.
Meta File
I also looked into the graph using tensorboard, the operations all seem to be there. This is a .png file of the graph.
Graph
Is there any other way to get the SavedModel format from a Returnn checkpoint?

from returnn-experiments.

albertz commented on May 24, 2024

So, just to clarify again, the meta graph file was created using tools/compile_tf_graph.py? How exactly did you call it? And what computation graph do you intend to export? For training or for inference / beam search?

And this with what config file exactly? Did you modify the config somehow?

from returnn-experiments.

smdshakeelhassan commented on May 24, 2024

The meta graph was generated using the unchanged config returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn.config.
I am not sure how returnn is generating this graph, whether it used tools/compile_tf_graph.py or not. I have used the default returnn script ./22_train.sh and I am using the files generated inside data/exp-returnn after training, viz.

checkpoint 
model.238.data-00000-of-00001 
model.238.index 
model.238.meta

My goal is to convert this model to SavedModel format and generate the .pb file and variables directory. But this line new_saver=tf.train.import_meta_graph("./serv_test/model.238.meta") itself is throwing the above error. Is there any other way to do this?

from returnn-experiments.

albertz commented on May 24, 2024

So you have not used tools/compile_tf_graph.py but called the train script (22_train.sh) instead? (You must have done sth that created this meta graph file...)

But what exactly are you trying to do? What do you want to use this meta graph for? For training or for inference / beam search?

from returnn-experiments.

smdshakeelhassan commented on May 24, 2024

Yes, the meta graph was created by the 22_train.sh script. I am trying to create a SavedModel format. I wish to use it for inference.
Later with another trained returnn model with basiclstm cells instead of nativelstm2 cells, I wish to convert the SavedModel to a TFLite FlatBuffer File.

from returnn-experiments.

albertz commented on May 24, 2024

Ah, but that's wrong. When you train your network, the meta graph which will be saved is for training (also not really, it's just for debugging / visualization in TensorBoard, but it shows the computation graph used for training).

But you say now that you want to have a meta graph for inference. The computation you do is different depending on whether you train or whether you do inference (clear, right? it's a different thing, training and inference), and thus the meta graph is also different (clear, right? the meta graph is just a representation of that computation).

As I already said, you can use tools/compile_tf_graph.py to create your meta graph, for various purpose, e.g. for inference. Please see the documentation.

from returnn-experiments.

albertz commented on May 24, 2024

I think there is no bug in Returnn (or even anything related to Returnn), so I'm closing this now.

from returnn-experiments.

Loading a saved Returnn model from its .meta file about returnn-experiments HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent