Giter Site home page Giter Site logo

googlecloudplatform / cloudml-edge-automation Goto Github PK

View Code? Open in Web Editor NEW
15.0 6.0 10.0 330 KB

Automated building and packaging of Tensorflow models in the cloud, and running them on devices

Home Page: https://cloud.google.com/solutions/automating-iot-machine-learning

License: Apache License 2.0

Python 76.26% Shell 12.52% Go 1.39% JavaScript 9.83%

cloudml-edge-automation's Introduction

Automating IoT Machine Learning: Bridging Cloud and Device Benefits with Cloud ML Engine

The following code accompanies the tutorial - which includes the full walkthrough.

The workflow:

  • Trains ML model versions using Cloud Machine Learning Engine.
  • Uses photorealistic CAD-rendered data to train the ML model.
  • Automates the packaging and delivery of the new or modified model to a remote IoT device where the inference (model prediction) runs locally.

cloudml-edge-automation's People

Contributors

ptone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cloudml-edge-automation's Issues

Security Policy violation Binary Artifacts

This issue was automatically created by Allstar.

Security Policy Violation
Project is out of compliance with Binary Artifacts policy: binaries present in source code

Rule Description
Binary Artifacts are an increased security risk in your repository. Binary artifacts cannot be reviewed, allowing the introduction of possibly obsolete or maliciously subverted executables. For more information see the Security Scorecards Documentation for Binary Artifacts.

Remediation Steps
To remediate, remove the generated executable artifacts from the repository.

Artifacts Found

  • trainer/google_cloud_storage-1.4.0-py2.py3-none-any.whl

Additional Information
This policy is drawn from Security Scorecards, which is a tool that scores a project's adherence to security best practices. You may wish to run a Scorecards scan directly on this repository for more details.


Allstar has been installed on all Google managed GitHub orgs. Policies are gradually being rolled out and enforced by the GOSST and OSPO teams. Learn more at http://go/allstar

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Training the Model fails (exceptions attached)

I followed all steps from the tutorial (https://cloud.google.com/solutions/automating-iot-machine-learning). I also updated gcloud on my Mac before executing the steps...

Model training fails with below exceptions as part of the log...
(Seems like I cannot simply download the whole log and share as zip file?)

Do you need additional information to help?

2018-07-20 10:46:38.633 CEST worker-replica-1 gapic-google-cloud-logging-v2 0.91.3 has requirement google-gax<0.16dev,>=0.15.7, but you'll have google-gax 0.12.5 which is incompatible.

2018-07-20 10:46:38.634 CEST worker-replica-1 google-cloud-logging 1.0.0 has requirement google-cloud-core<0.25dev,>=0.24.0, but you'll have google-cloud-core 0.28.1 which is incompatible.

2018-07-20 10:46:39.018 CEST worker-replica-1 The script chardetect is installed in '/root/.local/bin' which is not on PATH.

The replica master 0 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 570, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 329, in main run(model, argv) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 465, in run dispatch(args, model, cluster, task) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 505, in dispatch Trainer(args, model, cluster, task).run_training() File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 206, in run_training self.args.batch_size) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 307, in build_train_graph return self.build_graph(data_paths, batch_size, GraphMod.TRAIN) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 231, in build_graph num_epochs=None if is_training else 2) File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 47, in read_examples filename_queue = tf.train.string_input_producer(files, num_epochs, shuffle) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 217, in string_input_producer raise ValueError(not_null_err) ValueError: string_input_producer requires a non-null input tensor The replica worker 0 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 570, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 329, in main run(model, argv) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 465, in run dispatch(args, model, cluster, task) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 509, in dispatch Trainer(args, model, cluster, task).run_training() File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 206, in run_training self.args.batch_size) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 307, in build_train_graph return self.build_graph(data_paths, batch_size, GraphMod.TRAIN) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 231, in build_graph num_epochs=None if is_training else 2) File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 47, in read_examples filename_queue = tf.train.string_input_producer(files, num_epochs, shuffle) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 217, in string_input_producer raise ValueError(not_null_err) ValueError: string_input_producer requires a non-null input tensor The replica worker 1 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 570, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 329, in main run(model, argv) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 465, in run dispatch(args, model, cluster, task) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 509, in dispatch Trainer(args, model, cluster, task).run_training() File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 206, in run_training self.args.batch_size) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 307, in build_train_graph return self.build_graph(data_paths, batch_size, GraphMod.TRAIN) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 231, in build_graph num_epochs=None if is_training else 2) File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 47, in read_examples filename_queue = tf.train.string_input_producer(files, num_epochs, shuffle) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 217, in string_input_producer raise ValueError(not_null_err) ValueError: string_input_producer requires a non-null input tensor The replica worker 2 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 570, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 329, in main run(model, argv) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 465, in run dispatch(args, model, cluster, task) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 509, in dispatch Trainer(args, model, cluster, task).run_training() File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 206, in run_training self.args.batch_size) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 307, in build_train_graph return self.build_graph(data_paths, batch_size, GraphMod.TRAIN) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 231, in build_graph num_epochs=None if is_training else 2) File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 47, in read_examples filename_queue = tf.train.string_input_producer(files, num_epochs, shuffle) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 217, in string_input_producer raise ValueError(not_null_err) ValueError: string_input_producer requires a non-null input tensor The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 570, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 329, in main run(model, argv) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 465, in run dispatch(args, model, cluster, task) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 509, in dispatch Trainer(args, model, cluster, task).run_training() File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 206, in run_training self.args.batch_size) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 307, in build_train_graph return self.build_graph(data_paths, batch_size, GraphMod.TRAIN) File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 231, in build_graph num_epochs=None if is_training else 2) File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 47, in read_examples filename_queue = tf.train.string_input_producer(files, num_epochs, shuffle) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 217, in string_input_producer raise ValueError(not_null_err) ValueError: string_input_producer requires a non-null input tensor To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=317506745947&resource=ml_job%2Fjob_id%2Fequipmentparts_1_1532076159&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22equipmentparts_1_1532076159%22

Update Versions

In the requirements.txt file:

apache-beam[gcp]==2.2.0
pillow
tensorflow==1.3.0

Updates:

apache-beam[gcp]==2.4.0
pillow
tensorflow==1.3.0
protobuf==3.5.1
google-cloud-pubsub==0.26.0
google-api-core==1.1.2
google-cloud-bigquery==0.25.0
google-cloud-core==0.25.0

pip_dependencies.txt

Re-declaring variables

Setting up the environment and storage buckets

  1. First, set the environment variables
export FULL_PROJECT=$(gcloud config list project --format "value(core.project)")
export PROJECT="$(echo $FULL_PROJECT | cut -f2 -d ':')"
export REGION='us-central1' #OPTIONALLY CHANGE THIS 

Running the training job

cd trainer
sudo pip install -r requirements.txt

export MODEL_NAME=equipmentparts
export MODEL_VERSION="${MODEL_NAME}_1_$(date +%s)"

bash run-training.sh

run-training.sh script:

declare -r PROJECT=$(gcloud config list project --format "value(core.project)")

declare -r BUCKET="gs://iot-ml-edge-public"
declare -r GCS_PATH="${BUCKET}"
declare -r DICT_FILE=${GCS_PATH}/parts_images_dictionary.txt

# declare -r MODEL_NAME=lyon_model
declare -r VERSION_NAME=v1

The PROJECT variable is already set in explicit directions, where the FULL_PROJECT is defined, and then revised removing the colon ':'

In the run-training.sh script, however, the PROJECT variable is reset, omitting the step where the colon is removed.

Solution: comment out PROJECT variable declaration in run-training.sh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.