Comments (6)
@terryKing1992 I did a similar experiment a couple weeks ago with code as follows:
with tf.Session() as sess:
print("{0} session ready".format(datetime.now().isoformat()))
print("restoring model")
saver.restore(sess, model_path)
print("restored model")
model_builder = builder.SavedModelBuilder(args.export_path)
tensor_info_x = utils.build_tensor_info(x)
tensor_info_y = utils.build_tensor_info(y)
prediction_signature = sig_util.build_signature_def(
inputs={'images': tensor_info_x},
outputs={'scores': tensor_info_y},
method_name=sig.PREDICT_METHOD_NAME)
model_builder.add_meta_graph_and_variables(
sess, [tag.SERVING],
signature_def_map={
sig.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature
})
print("exporting model")
model_builder.save()
print("exported model")
In my case I was restoring from an existing checkpoint, and then exporting a SavedModel
. Also, I had a time.sleep(60)
before the cluster.shutdown()
(I think it was forcing the executors to shutdown before the save operation was completed). I haven't had to chance to revisit this in a while, but hopefully, this will unblock you...
from tensorflowonspark.
@leewyang thanks for your response, i try to add time.sleep(60)
before cluster.shutdown()
. it still can't export the model. do you know why the program does not execute these three lines?
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map={
'predict_images':
prediction_signature,
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature,
})
print ('begin exporting!======77777')
builder.save()
from tensorflowonspark.
Unfortunately, I don't have that much experience with the SavedModelBuilder
. As I mentioned, I was experimenting recently. That said, I had tried something similar to your code, but ended up with something that loads a previously written checkpoint. Note: I had to point to a specific checkpoint file not the upper level model directory.
from tensorflowonspark.
I switch the tfspark.zip to Mar 8 commit node . the job can sometimes work, but sometimes it throws some errors like this:
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, /data/mnist/1/variables/variables_temp_7de0e7aec37b4ad3bbb04d7ddba6921d/part-00000-of-00001.data-00000-of-00001.tempstate14562722955503715538
[[Node: save_1/SaveV2 = SaveV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:ps/replica:0/task:0/cpu:0"](save_1/ShardedFilename, save_1/SaveV2/tensor_names, save_1/SaveV2/shape_and_slices, Variable, hid_b, hid_b/Adagrad, hid_w, hid_w/Adagrad, sm_b, sm_b/Adagrad, sm_w, sm_w/Adagrad)]]
Caused by
op u'save_1/SaveV2', defined at:
File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/data/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 180, in <module>
File "/data/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 157, in manager
File "/data/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 61, in worker
File "/data/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main
process()
File "/data/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2407, in pipeline_func
File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2407, in pipeline_func
File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2407, in pipeline_func
File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 346, in func
File "/data/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 794, in func
File "/root/tensorflow/TensorFlowOnSpark/tfspark.zip/com/yahoo/ml/tf/TFSparkNode.py", line 218, in _mapfn
File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__
code = process_obj._bootstrap()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "mnist_dist.py", line 210, in map_fun
clear_devices=True,legacy_init_op=legacy_init_op)
File "/usr/lib/python2.7/site-packages/tensorflow/python/saved_model/builder_impl.py", line 432, in add_meta_graph_and_variables
allow_empty=True)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1040, in __init__
self.build()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1070, in build
restore_sequentially=self._restore_sequentially)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 669, in build
save_tensor = self._AddShardedSaveOps(filename_tensor, per_device)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 356, in _AddShardedSaveOps
return self._AddShardedSaveOpsForV2(filename_tensor, per_device)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 330, in _AddShardedSaveOpsForV2
sharded_saves.append(self._AddSaveOps(sharded_filename, saveables))
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 271, in _AddSaveOps
save = self.save_op(filename_tensor, saveables)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 214, in save_op
tensors)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 779, in save_v2
tensors=tensors, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
Really strange issue.
from tensorflowonspark.
FWIW, I didn't have much success writing out the SavedModel
after training (I was getting all sorts of weird TensorFlow graph/saver errors), which is why I ended up with a version of the code that restores from a saved checkpoint...
from tensorflowonspark.
Thanks all the same
from tensorflowonspark.
Related Issues (20)
- MNIST SPARK on Standalone Cluster inside Docker Container HOT 11
- Writing checkpoints to HDFS takes long HOT 2
- when using mnist_spark.py , serializer.dump_stream Timeout while feeding partition HOT 2
- pkg_resources.DistributionNotFound: The 'tensorflow' distribution was not found and is required by the application HOT 3
- MNIST example - Exception in TF background thread HOT 2
- the doubt about the data policy HOT 1
- Performance issues in the program HOT 2
- Performance issues in examples/mnist/estimator (by P3) HOT 3
- Retaining original columns after inference HOT 2
- tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'cosn' not implemented HOT 2
- Model Saved with TF-2.5.0 HOT 3
- How to integrate a model into Spark cluster HOT 12
- Get stuck at "Added broadcast_0_piece0 in memory on" while runing Spark standalone cluster HOT 1
- ExitCode: 13 executing mnist_data_setup.py on a yarn cluster HOT 3
- can it run on tensorflow-cpu? HOT 1
- can it run use ParameterServerStrategy HOT 3
- do we support scala & java code write tensorflow model with tenorflow-core-api ? HOT 3
- Evalator hangs while training HOT 1
- yarn mode error HOT 1
- error while running mnist_tf_ds.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflowonspark.