Giter Site home page Giter Site logo

Comments (6)

leewyang avatar leewyang commented on May 12, 2024

@terryKing1992 I did a similar experiment a couple weeks ago with code as follows:

      with tf.Session() as sess:
          print("{0} session ready".format(datetime.now().isoformat()))
          print("restoring model")
          saver.restore(sess, model_path)
          print("restored model")

          model_builder = builder.SavedModelBuilder(args.export_path)
          tensor_info_x = utils.build_tensor_info(x)
          tensor_info_y = utils.build_tensor_info(y)
          prediction_signature = sig_util.build_signature_def(
              inputs={'images': tensor_info_x},
              outputs={'scores': tensor_info_y},
              method_name=sig.PREDICT_METHOD_NAME)
          model_builder.add_meta_graph_and_variables(
              sess, [tag.SERVING],
              signature_def_map={
                  sig.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature
              })
          print("exporting model")
          model_builder.save()
          print("exported model")

In my case I was restoring from an existing checkpoint, and then exporting a SavedModel. Also, I had a time.sleep(60) before the cluster.shutdown() (I think it was forcing the executors to shutdown before the save operation was completed). I haven't had to chance to revisit this in a while, but hopefully, this will unblock you...

from tensorflowonspark.

terryKing1992 avatar terryKing1992 commented on May 12, 2024

@leewyang thanks for your response, i try to add time.sleep(60) before cluster.shutdown(). it still can't export the model. do you know why the program does not execute these three lines?

builder.add_meta_graph_and_variables(
      sess, [tag_constants.SERVING],
      signature_def_map={
          'predict_images':
              prediction_signature,
          signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
              classification_signature,
      })
  print ('begin exporting!======77777')
  builder.save()

from tensorflowonspark.

leewyang avatar leewyang commented on May 12, 2024

Unfortunately, I don't have that much experience with the SavedModelBuilder. As I mentioned, I was experimenting recently. That said, I had tried something similar to your code, but ended up with something that loads a previously written checkpoint. Note: I had to point to a specific checkpoint file not the upper level model directory.

from tensorflowonspark.

terryKing1992 avatar terryKing1992 commented on May 12, 2024

I switch the tfspark.zip to Mar 8 commit node . the job can sometimes work, but sometimes it throws some errors like this:

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, /data/mnist/1/variables/variables_temp_7de0e7aec37b4ad3bbb04d7ddba6921d/part-00000-of-00001.data-00000-of-00001.tempstate14562722955503715538
          [[Node: save_1/SaveV2 = SaveV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:ps/replica:0/task:0/cpu:0"](save_1/ShardedFilename, save_1/SaveV2/tensor_names, save_1/SaveV2/shape_and_slices, Variable, hid_b, hid_b/Adagrad, hid_w, hid_w/Adagrad, sm_b, sm_b/Adagrad, sm_w, sm_w/Adagrad)]]
 
Caused by 
  op u'save_1/SaveV2', defined at:
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/data/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 180, in <module>
  File "/data/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 157, in manager
  File "/data/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 61, in worker
  File "/data/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main
    process()
  File "/data/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2407, in pipeline_func
  File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2407, in pipeline_func
  File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2407, in pipeline_func
  File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 346, in func
  File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 794, in func
  File "/root/tensorflow/TensorFlowOnSpark/tfspark.zip/com/yahoo/ml/tf/TFSparkNode.py", line 218, in _mapfn
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__
    code = process_obj._bootstrap()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "mnist_dist.py", line 210, in map_fun
    clear_devices=True,legacy_init_op=legacy_init_op)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/saved_model/builder_impl.py", line 432, in add_meta_graph_and_variables
    allow_empty=True)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1040, in __init__
    self.build()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1070, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 669, in build
    save_tensor = self._AddShardedSaveOps(filename_tensor, per_device)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 356, in _AddShardedSaveOps
    return self._AddShardedSaveOpsForV2(filename_tensor, per_device)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 330, in _AddShardedSaveOpsForV2
    sharded_saves.append(self._AddSaveOps(sharded_filename, saveables))
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 271, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 214, in save_op
    tensors)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 779, in save_v2
    tensors=tensors, name=name)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

Really strange issue.

from tensorflowonspark.

leewyang avatar leewyang commented on May 12, 2024

FWIW, I didn't have much success writing out the SavedModel after training (I was getting all sorts of weird TensorFlow graph/saver errors), which is why I ended up with a version of the code that restores from a saved checkpoint...

from tensorflowonspark.

terryKing1992 avatar terryKing1992 commented on May 12, 2024

Thanks all the same

from tensorflowonspark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.