Giter Site home page Giter Site logo

aws / aws-step-functions-data-science-sdk-python Goto Github PK

View Code? Open in Web Editor NEW
281.0 21.0 86.0 37.84 MB

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS

License: Apache License 2.0

Python 100.00%
aws machine-learning python stepfunctions

aws-step-functions-data-science-sdk-python's Introduction

Unit Tests Build Status Documentation Status PyPI

AWS Step Functions Data Science SDK

The AWS Step Functions Data Science SDK is an open-source library that allows data scientists to easily create workflows that process and publish machine learning models using Amazon SageMaker and AWS Step Functions. You can create machine learning workflows in Python that orchestrate AWS infrastructure at scale, without having to provision and integrate the AWS services separately.

  • Workflow - A sequence of steps designed to perform some work
  • Step - A unit of work within a workflow
  • ML Pipeline - A type of workflow used in data science to create and train machine learning models

The AWS Step Functions Data Science SDK enables you to do the following.

  • Easily construct and run machine learning workflows that use AWS infrastructure directly in Python
  • Instantiate common training pipelines
  • Create standard machine learning workflows in a Jupyter notebook from templates

Table of Contents

Getting Started With Sample Jupyter Notebooks

The best way to quickly review how the AWS Step Functions Data Science SDK works is to review the related example notebooks. These notebooks provide code and descriptions for creating and running workflows in AWS Step Functions Using the AWS Step Functions Data Science SDK.

Example Notebooks in SageMaker

In Amazon SageMaker, example Jupyter notebooks are available in the example notebooks portion of a notebook instance. To run the example notebooks, do the following.

  1. Either Create a Notebook Instance or Access an Existing notebook instance.
  2. Select the SageMaker Examples tab.
  3. Choose a notebook in the Step Functions Data Science SDK section and select Use.

For more information, see Example Notebooks in the Amazon SageMaker documentation.

Run Example Notebooks Locally

To run the AWS Step Functions Data Science SDK example notebooks locally, download the sample notebooks and open them in a working Jupyter instance.

  1. Install Jupyter: https://jupyter.readthedocs.io/en/latest/install.html
  2. Download the following files from: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/step-functions-data-science-sdk.
  • hello_world_workflow.ipynb
  • machine_learning_workflow_abalone.ipynb
  • training_pipeline_pytorch_mnist.ipynb
  1. Open the files in Jupyter.

Installing the AWS Step Functions Data Science SDK

The AWS Step Functions Data Science SDK is built to PyPI and can be installed with pip as follows.

pip install stepfunctions

You can install from source by cloning this repository and running a pip install command in the root directory of the repository:

git clone https://github.com/aws/aws-step-functions-data-science-sdk-python.git
cd aws-step-functions-data-science-sdk-python
pip install .

Supported Operating Systems

The AWS Step Functions Data Science SDK supports Unix/Linux and Mac.

Supported Python Versions

The AWS Step Functions Data Science SDK is tested on:

  • Python 3.6

Overview of SDK

The AWS Step Functions Data Science SDK provides a Python API that enables you to create data science and machine learning workflows using AWS Step Functions and SageMaker directly in your Python code and Jupyter notebooks.

Using this SDK you can:

  1. Create steps that accomplish tasks.
  2. Chain those steps together into workflows.
  3. Include retry, succeed, or fail steps.
  4. Review a graphical representation and definition for your workflow.
  5. Create a workflow in AWS Step Functions.
  6. Start and review executions in AWS Step Functions.

For a detailed API reference of the AWS Step Functions Data Science SDK, be sure to view this documentation on Read the Docs.

AWS Step Functions

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows so you can build and update apps quickly. Using Step Functions, you can design and run workflows that combine services such as Amazon SageMaker, AWS Lambda, and Amazon Elastic Container Service (Amazon ECS), into feature-rich applications. Workflows are made up of a series of steps, with the output of one step acting as input to the next.

The AWS Step Functions Data Science SDK provides access to AWS Step Functions so that you can easily create and run machine learning and data science workflows directly in Python, and inside your Jupyter Notebooks. Workflows are created locally in Python, but when they are ready for execution, the workflow is first uploaded to the AWS Step Functions service for execution in the cloud.

When you use the SDK to create, update, or execute workflows you are talking to the Step Functions service in the cloud. Your workflows live in AWS Step Functions and can be re-used.

You can execute a workflow as many times as you want, and you can optionally change the input each time. Each time you execute a workflow, it creates a new execution instance in the cloud. You can inspect these executions with SDK commands, or with the Step Functions management console. You can run more than one execution at a time.

Using this SDK you can create steps, chain them together to create a workflow, create that workflow in AWS Step Functions, and execute the workflow in the AWS cloud.

Create a workflow in AWS Step Functions

Once you have created your workflow in AWS Step Functions, you can execute that workflow in Step Functions, in the AWS cloud.

Start a workflow in AWS Step Functions

Step Functions creates workflows out of steps called States, and expresses that workflow in the Amazon States Language. When you create a workflow in the AWS Step Functions Data Science SDK, it creates a State Machine representing your workflow and steps in AWS Step Functions.

For more information about Step Functions concepts and use, see the Step Functions documentation.

Building a Workflow

Steps

You create steps using the SDK, and chain them together into sequential workflows. Then, you can create those workflows in AWS Step Functions and execute them in Step Functions directly from your Python code. For example, the following is how you define a pass step.

start_pass_state = Pass(
    state_id="MyPassState"
)

The following is how you define a wait step.

wait_state = Wait(
    state_id="Wait for 3 seconds",
    seconds=3
)

The following example shows how to define a Lambda step, and then defines a Retry and a Catch.

lambda_state = LambdaStep(
    state_id="Convert HelloWorld to Base64",
    parameters={
        "FunctionName": "MyLambda", #replace with the name of your function
        "Payload": {
        "input": "HelloWorld"
        }
    }
)

lambda_state.add_retry(Retry(
    error_equals=["States.TaskFailed"],
    interval_seconds=15,
    max_attempts=2,
    backoff_rate=4.0
))

lambda_state.add_catch(Catch(
    error_equals=["States.TaskFailed"],
    next_step=Fail("LambdaTaskFailed")
))

Workflows

After you define these steps, chain them together into a logical sequence.

workflow_definition=Chain([start_pass_state, wait_state, lambda_state])

Once the steps are chained together, you can define the workflow definition.

workflow = Workflow(
    name="MyWorkflow_v1234",
    definition=workflow_definition,
    role=stepfunctions_execution_role
)

Visualizing a Workflow

The following generates a graphical representation of your workflow. Please note that visualization currently only works in Jupyter notebooks. Visualization is not available in JupyterLab.

workflow.render_graph(portrait=False)

Review a Workflow Definition

The following renders the JSON of the Amazon States Language definition of the workflow you created.

print(workflow.definition.to_json(pretty=True))

Running a Workflow

Create Workflow on AWS Step Functions

The following creates the workflow in AWS Step Functions.

workflow.create()

Execute the Workflow

The following starts an execution of your workflow in AWS Step Functions.

execution = workflow.execute(inputs={
  "IsHelloWorldExample": True
})

Export an AWS CloudFormation Template

The following generates an AWS CloudFormation Template to deploy your workflow.

get_cloudformation_template()

The generated template contains only the StateMachine resource. To reuse the CloudFormation template in a different region, please make sure to update the region specific AWS resources (such as the Lambda ARN and Training Image) in the StateMachine definition.

Contributing

We welcome community contributions and pull requests. See CONTRIBUTING.md for information on how to set up a development environment, run tests and submit code.

AWS Permissions

As a managed service, AWS Step Functions performs operations on your behalf on AWS hardware that is managed by AWS Step Functions. AWS Step Functions can perform only operations that the user permits. You can read more about which permissions are necessary in the AWS Documentation.

The AWS Step Functions Data Science SDK should not require any additional permissions aside from what is required for using .AWS Step Functions. However, if you are using an IAM role with a path in it, you should grant permission for iam:GetRole.

Licensing

AWS Step Functions Data Science SDK is licensed under the Apache 2.0 License. It is copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/

Verifying the Signature

This section describes the recommended process of verifying the validity of the AWS Data Science Workflows Python SDK's compiled distributions on PyPI.

Whenever you download an application from the internet, we recommend that you authenticate the identity of the software publisher and check that the application is not altered or corrupted since it was published. This protects you from installing a version of the application that contains a virus or other malicious code.

If after running the steps in this topic, you determine that the distribution for the AWS Data Science Workflows Python SDK is altered or corrupted, do NOT install the package. Instead, contact AWS Support (https://aws.amazon.com/contact-us/).

AWS Data Science Workflows Python SDK distributions on PyPI are signed using GnuPG, an open source implementation of the Pretty Good Privacy (OpenPGP) standard for secure digital signatures. GnuPG (also known as GPG) provides authentication and integrity checking through a digital signature. For more information about PGP and GnuPG (GPG), see http://www.gnupg.org.

The first step is to establish trust with the software publisher. Download the public key of the software publisher, check that the owner of the public key is who they claim to be, and then add the public key to your keyring. Your keyring is a collection of known public keys. After you establish the authenticity of the public key, you can use it to verify the signature of the application.

Topics

  1. Installing the GPG Tools
  2. Authenticating and Importing the Public Key
  3. Verify the Signature of the Package

Installing the GPG Tools

If your operating system is Linux or Unix, the GPG tools are likely already installed. To test whether the tools are installed on your system, type gpg at a command prompt. If the GPG tools are installed, you see a GPG command prompt. If the GPG tools are not installed, you see an error stating that the command cannot be found. You can install the GnuPG package from a repository.

To install GPG tools on Debian-based Linux

From a terminal, run the following command: apt-get install gnupg

To install GPG tools on Red Hat–based Linux

From a terminal, run the following command: yum install gnupg

Authenticating and Importing the Public Key

The next step in the process is to authenticate the AWS Data Science Workflows Python SDK public key and add it as a trusted key in your GPG keyring.

To authenticate and import the AWS Data Science Workflows Python SDK public key

1. Copy the key from the following text and paste it into a file called data_science_workflows.key. Make sure to include everything that follows:

-----BEGIN PGP PUBLIC KEY BLOCK-----

mQINBF27JXsBEAC18lOq7/SmynwuTJZdzoSaYzfPjt+3RN5oFLd9VY559sLb1aqV
ph+RPu35YOR0GbR76NQZV6p2OicunvjmvvOKXzud8nsV3gjcSCdxn22YwVDdFdx9
N0dMOzo126kFIkubWNsBZDxzGsgIsku82+OKJbdSZyGEs7eOQCqieVpubnAk/pc5
J4sqYDFhL2ijCIwAW6YUx4WEMq1ysVVcoNIo5J3+f1NzJZBvI9xwf+R2AnX06EZb
FFIcX6kx5B8Sz6s4AI0EVFt9YOjtD+y6aBs3e63wx9etahq5No26NffNEve+pw3o
FTU7sq6HxX/cE+ssJALAwV/3/1OiluZ/icePgYvsl8UWkkULsnHEImW2vZOe9UCw
9CYb7lgqMCd9o14kQy0+SeTS3EdFH+ONRub4RMkdT7NV5wfzgD4WpSYban1YLJYx
XLYRIopMzWuRLSUKMHzqsN48UlNwUVzvpPlcVIAotzQQbgFaeWlW1Fvv3awqaF7Q
lnt0EBX5n71LJNDmpTRPtICnxcVsNXT1Uctk1mtzYwuMrxk0pDJZs06qPLwehwmO
4A4bQCZ/1aVnXaauzshP7kzgPWG6kqOcSbn3VA/yhfDX/NBeY3Xg1ECDlFxmCrrV
D7xqpZgVaztHbRIOr6ANKLMf72ZmqxiYayrFlLLOkJYtNCaC8igO5Baf2wARAQAB
tFBTdGVwZnVuY3Rpb25zLVB5dGhvbi1TREstU2lnbmluZyA8c3RlcGZ1bmN0aW9u
cy1kZXZlbG9wZXItZXhwZXJpZW5jZUBhbWF6b24uY29tPokCVAQTAQgAPhYhBMwW
BXe3v509bl1RxWDrEDrjFKgJBQJduyV7AhsDBQkUsSsABQsJCAcCBhUKCQgLAgQW
AgMBAh4BAheAAAoJEGDrEDrjFKgJq5IP/25LVDaA3itCICBP2/eu8KkUJ437oZDr
+3z59z7p4mvispmEzi4OOb1lMGBH+MdhkgblrcSaj4XcIslTkfKD4gP/cMSl14hb
X/OIxEXFXvTq4PmWUCgl5NtsyAbgB3pAxGUfNAXR2dV3MJFAHSOVUK5Es4/kAj4a
5lra+1MwZZMDqhMTYuvTclIqPA/PXafkgL5g15JA5lFDyFQ2zuV1BgQlKh7o24Jw
a1kDB0aSePkrh4gJHXAEoGDjX2mcGhEjlBvCH4ay7VGoG6l+rjcHnqSiVX0tg9dZ
Ilc7RTR+1LX7jx8wdsYSUGekADy6wGTjk9HBTafh8Bl8sR2eNoH1qZuIn/YIHxkR
JPH/74hG71pjS4FWPBbbPrdkC/G47mXMfLUrGpigcgkhePuA1BBW30U0ZZWWDHsf
ISxp8hcQkR5gFhU+37tsC06pwihhDWgx4kTfeTmNqkl03fTH5lwNsig0HSpUINWR
+EWN0jXb8DtjMzZbiDhLxQX9U3HBEdw2g2/Ktsqv+MM1P1choEGNtzots3V9fqMY
Txy7MkYLtRDYu+sX5DNob309vPzbI4b3KBv6hCRJdnICjBvgL6C8WHaLm6+FU+68
rFRKw6WImWHyygdnv8Bzdq4h+MaTE6AhteYutd+ZTWpazfE1h0ngrEerQju2VLZP
LAACxHBQNjT+uQINBF27JXsBEAC/PDJmWIkJBdnOmPU/W0SosOZRMvzs/KR89qeI
ebT8O0rNFeHR6Iql5ak6kGeDLwnzcOOwqamO+vwGmRScwPT6NF9+HDkXCzITOE22
71zKVjGVf+tX5kHJzT8ZqQBxvnk5Cx/d7sr3kwLBhhygHLS/kn2K9fhYwbtsQTLE
o9XvTBOip+DohHHJjZHcboeYnZ2g2b8Gnwe4cz75ogFNcuHZXusr8Y6enJX8wTBy
/AvXPVUIyrHbrXcHaNS3UYKzbhkH6W1cfkV6Bb49FKYkxH0N1ZeooyS6zXyf0X4n
TAbyCfoFYQ68KC17/pGMOXtR/UlqDeJe0sFeyyTHKjdSTDpA+WKKJJZ5BSCYQ5Hq
ewy6mvaIcKURExIZyNqRHRhb4p/0BA7eXzMCryx1AZPcQnaMVQYJTi5e+HSnOxnK
AB7jm2HHPHCRgO4qvavr5dIlEoKBM6qya1KVqoarw5hv8J8+R9ECn4kWZ8QjBlgO
y65q/b3mwqK0rVA1w73BPWea/xLCLrqqVRGa/fB7dhTnPfn+BpaQ3qruLinIJatM
8c2/p1LZ1nuWgrssSkSMn3TlffF0Lq9jtcbi7K11A082RiB2L0lu+j8r07RgVQvZ
4UliS1Lklsp7Ixh+zoR712hKPQpNVLstEHTxQhXZTWAk/Ih7b9ukrL/1HJAnhZBe
uBhDDQARAQABiQI8BBgBCAAmFiEEzBYFd7e/nT1uXVHFYOsQOuMUqAkFAl27JXsC
GwwFCRSxKwAACgkQYOsQOuMUqAnJvA//SDQZxf0zbge8o9kGfrm7bnExz8a6sxEn
urooUaSk3isbGFAUg+Q7rQ+ViG9gDG74F5liwwcKoBct/Z9tCi/7p3QI0BE0bM1j
IHdm5dXaZAcMlUy6f0p3DO3qE2IjnNjEjvpm7Xzt6tKJu/scZQNdQxG/CDn5+ezm
nIatgDV6ugDDv/2o0BXMyAZT008T/QLR2U5dEsbt9H3Bzl4Ska6gjak2ToJL0T61
1dZjfv/1UbeYRPFCO6CsLj9uEq+RoHAsvAS4rl9HyM3b2sVzr8CMsP6LVdqlA2Qz
/nIBd+GuLofi3/PGvvS63ubfqSRGd5VvJXoiRl2WoE8lmyIB5UJfFfd8Zdn6j+hQ
c14VOp89mEfg57BiQXfZnzjFVNkl7T5I2g3X5O8StosncChqiJTSH5C731KUVqxO
xYknFostioIVKmyis/Nwmwr6fIItYyYCwh5YCqAg0r4SLbhFEVXdannUbFPF6upO
EbKlZP3Iyu/kYANMnq+9+GImrPrT/FCpM9RW1GFAnuVBt9Qjs+eRq4DQJl/EaIjZ
cgqz+e5TZNxDK9r2sHC4zGWy88/2GuhD8xh4FH5hBIDJPmHUtKh9XElq187VA4Jg
U0mbryduKMQIyuc6OLzfJUbVTMvKWaPASbGtvAAOwCFtAi33dZ8bOfjQLgOb9uDh
/vQojRxttMc=
=ovUh
-----END PGP PUBLIC KEY BLOCK-----

2. At a command prompt in the directory where you saved data_science_workflows.key, use the following command to import the AWS Data Science Workflows Python SDK public key into your keyring:

gpg --import data_science_workflows.key

The command returns results that are similar to the following:

gpg: key 60EB103AE314A809: public key "Stepfunctions-Python-SDK-Signing <stepfunctions-developer-experience [at] amazon.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1

Make a note of the key value; you need it in the next step. In the preceding example, the key value is 60EB103AE314A809.

3. Verify the fingerprint by running the following command, replacing key-value with the value from the preceding step:

gpg --fingerprint <key-value>

This command returns results similar to the following:

pub   rsa4096 2019-10-31 [SC] [expires: 2030-10-31] CC16 0577 B7BF 9D3D 6E5D
51C5 60EB 103A E314 A809 uid           [ unknown]
Stepfunctions-Python-SDK-Signing
<stepfunctions-developer-experience [at] amazon.com> sub   rsa4096 2019-10-31 [E]
[expires: 2030-10-31]

Additionally, the fingerprint string should be identical to CC16 0577 B7BF 9D3D 6E5D 51C5 60EB 103A E314 A809, as shown in the preceding example. Compare the key fingerprint that is returned to the one published on this page. They should match. If they don't match, don't install the AWS Data Science Workflows Python SDK package, and contact AWS Support.

Verify the Signature of the Package

After you install the GPG tools, authenticate and import the AWS Data Science Workflows Python SDK public key, and verify that the public key is trusted, you are ready to verify the signature of the package.

To verify the package signature, do the following.

  1. Download the detached signature for the package from PyPI

Go to the downloads section for the Data Science Workflows Python SDK https://pypi.org/project/stepfunctions/#files on PyPI, Right-click on the SDK distribution link, and choose "Copy Link Location/Address".

Append the string ".asc" to the end of the link you copied, and paste this new link on your browser.

Your browser will prompt you to download a file, which is the detatched signature associated with the respective distribution. Save the file on your local machine.

2. Verify the signature by running the following command at a command prompt in the directory where you saved signature file and the AWS Data Science Workflows Python SDK installation file. Both files must be present.

gpg --verify <path-to-detached-signature-file>

The output should look something like the following:

gpg: Signature made Thu 31 Oct 12:14:53 2019 PDT
gpg:                using RSA key CC160577B7BF9D3D6E5D51C560EB103AE314A809
gpg: Good signature from "Stepfunctions-Python-SDK-Signing <stepfunctions-developer-experience [at] amazon.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: CC16 0577 B7BF 9D3D 6E5D  51C5 60EB 103A E314 A809

If the output contains the phrase Good signature from "AWS Data Science Workflows Python SDK <stepfunctions-developer-experience [at] amazon.com>", it means that the signature has successfully been verified, and you can proceed to run the AWS Data Science Workflows Python SDK package.

If the output includes the phrase BAD signature, check whether you performed the procedure correctly. If you continue to get this response, don't run the installation file that you downloaded previously, and contact AWS Support.

The following are details about the warnings you might see:

WARNING: This key is not certified with a trusted signature! There is no
indication that the signature belongs to the owner. This refers to your
personal level of trust in your belief that you possess an authentic public
key for AWS Data Science Workflows Python SDK. In an ideal world, you would
visit an AWS office and receive the key in person. However, more often you
download it from a website. In this case, the website is an AWS website.

gpg: no ultimately trusted keys found. This means that the specific key is not
"ultimately trusted" by you (or by other people whom you trust).

For more information, see http://www.gnupg.org.

aws-step-functions-data-science-sdk-python's People

Contributors

alex avatar amazon-states-language-automation avatar andydouglas-exs avatar brightsparc avatar ca-nguyen avatar ertugrullkara avatar ertukara avatar francislfg avatar lialln avatar mriccia avatar paridelpooya avatar shayanelhami avatar shivlaks avatar shunjd avatar stepfunctions-bot avatar tuliocasagrande avatar vaib-amz avatar wmlba avatar wong-a avatar yoodan93 avatar yuan-bwn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-step-functions-data-science-sdk-python's Issues

Deeply Nested Placeholder keys are not replaced

For instance, in the following

{
  "InputDataConfig": [
    {
      "ChannelName": "training",
      "DataSource": {
        "S3DataSource": {
          "S3DataDistributionType": "FullyReplicated",
          "S3DataType": "S3Prefix",
          "S3Uri": "$.Payload.cloned_config"
        }
      }
    }
  ]
}

where S3Uri is either ExecutionInput or StepInput should be replaced with S3Uri.$

Estimator Output Path cannot accept Execution Input placeholder

The arguments of sagemaker.estimator.Estimator() and sagemaker.tuner.HyperparameterTuner() doesn't accept ExecutionInput or StepInput as placeholder. E.g. for the output_path of sagemaker.estimator.Estimator() has to be fixed in the definition of workflow.

The current way around this is not to make use of stepfunctions.steps.sagemaker.TuningStep and use the generic step functions.steps.Task to invoke the arn:aws:states:::sagemaker:createHyperParameterTuningJob.sync with parameters from ExecutionInput and StepInput

I am not able to view the Stepfunction rendition from an Amazon SageMaker Notebook

For Context I am following the article https://github.com/juliensimon/amazon-sagemaker-examples/blob/master/step-functions-data-science-sdk/hello_world_workflow/hello_world_workflow.ipynb from Julien Simon of AWS

For context this is the corresponding video link https://youtu.be/0kMdOi69tjQ?t=502 where the author calls the render graph method on the jupyter notebook and he is able to see the graph

basic_workflow_execution.render_progress() doest not render the graph in the Amazon SageMaker Notebook but I can see the rendered graph in the AWS Step Functions console.
jupyter-notebook

TrainingStep fails when Estimator object has 'debugger_hook_config' as False

Hello,

When debugger_hook_config=False is specified in an estimator object, TrainingStep will fail with the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-ce223140d492> in <module>
     20     state_id="samplestate",
     21     estimator=estimator,
---> 22     job_name="samplejob"
     23 )

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/stepfunctions/steps/sagemaker.py in __init__(self, state_id, estimator, job_name, data, hyperparameters, mini_batch_size, experiment_config, wait_for_completion, tags, **kwargs)
     69 
     70         if estimator.debugger_hook_config != None:
---> 71             parameters['DebugHookConfig'] = estimator.debugger_hook_config._to_request_dict()
     72 
     73         if estimator.rules != None:

AttributeError: 'bool' object has no attribute '_to_request_dict'

As per this doc: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_debugger.html#default-behavior-and-opting-out , debugger_hook_config=False is required to disable the hook iniialization.

Code sample:

estimator = TensorFlow(
    entry_point="training.py",
    role="arn:aws:iam::747911269416:role/service-role/AmazonSageMaker-ExecutionRole-20190313T101302",
    train_instance_count=1,
    train_instance_type="ml.m5d.large",
    output_path="s3://jkkwon-miami-us-west-2",
    code_location="s3://jkkwon-miami-us-west-2",
    train_volume_size=1024,
    metric_definitions=[
        {"Name": "train:loss", "Regex": "Train Loss: (.*?);"},
        {"Name": "test:loss", "Regex": "Test Average loss: (.*?),"},
        {"Name": "test:accuracy", "Regex": "Test Accuracy: (.*?)%;"},
    ],
    enable_sagemaker_metrics=True,
    debugger_hook_config=False
)


training_step = TrainingStep(
    state_id="samplestate",
    estimator=estimator,
    job_name="samplejob"
)

Specifying `destination` in ProcessingOutput silently fails - and does not write to the destination S3 bucket

This works:

sklearn_processor.run(code='preprocess-scikit.py',
                                       outputs=[ProcessingOutput(output_name='train_data', 
                                                                   source='/opt/ml/processing/output/train'),

This silently fails and does not write to the output s3 location:

s3_output_train_data = 's3://{}/{}/train'.format(bucket, output_prefix)

sklearn_processor.run(code='preprocess-scikit.py',
                                       outputs=[ProcessingOutput(output_name='train_data',
                                                                      source='/opt/ml/processing/output/train',
                                                                      destination=s3_output_train_data),

(Spent lots of time debugging this ^^)

Any suggestions?

ResultPath is ignored if set to None

This doesn't work. if I set the result_path=None, "ResultPath" won't appear in JSON, so I cannot pass the input as output for a task.

Aren't below test case fail every time?

def test_paths_none_converted_to_null():
task_state = Task('Task', resource='arn:aws:lambda:us-east-1:1234567890:function:StartLambda',
result_path=None,
input_path=None,
output_path=None)
assert '"ResultPath": null' in task_state.to_json()
assert '"InputPath": null' in task_state.to_json()
assert '"OutputPath": null' in task_state.to_json()

Problem with CheckpointConfig s3uri in Tensorflow estimator

Values from estimator not passing to the step function definition (JSON). Had to manually set it in json by editing the step function from console to make the training job work.

"CheckpointConfig": { "S3Uri": "s3://bucket_name/location/for/checkpoint/" },
S3Uri is taken from checkpoint_s3_uri right? but in my case this is not working. CheckpointConfig dictionary is not generated.

Thank you for taking a look at it.

feature: adding support for DataCaptureConfig in endpoint config

The EndpointConfig constructor could be extended to support passing in sagemaker.model_monitor.DataCaptureConfig which includes configuration eg:

    DataCaptureConfig(
        enable_capture=True,
        sampling_percentage=100,
        destination_s3_uri='s3://sagemaker/datacapture')

Missing IAM Permissions in the StepFunctionsWorkflowExecutionPolicy

This is a great library for us

Tried running the hello world example and ran into issues with the IAM permissions for stateMachines.

workflow.create() requires CreateStateMachine permission and workflow.execute() requires states:StartExecution which are currently missing from the IAM policy.

Kindly add the following state machine permissions to the IAM policy:

states:CreateStateMachine
states:StartExecution

Thanks a lot.

`stepfunctions.steps.states.State.output()` doesn't seem to respect result_path

Hello,

there seems to be an issue if stepfunctions.steps.states.State.output() is used together with result_path. For example:

lambda_state_first = LambdaStep(
    state_id="MyFirstLambdaStep",
    parameters={
        "FunctionName": "MakeApiCall",
        "Payload": {
          "input": "20192312"
          }
    }, 
    result_path="$.NewOutputPath"
)

lambda_state_first.output()["Payload"].to_jsonpath()

I would expect that the output is "$['NewOutputPath']['Payload']", but it's actually "$['Payload']". Is that expected behavior and if so, how is the correct way to use result_path together with output()?

Thanks!

Can't seem to specify `inference.py` and `requirements.txt` with TrainingPipeline - only `model_fn`

Is there a way to specify inference.py and requirements.txt with TrainingPipeline for the deploy step?

I only see a model_fn() mentioned, but I need to perform request-level transformations. And, actually, how is model_fn() enough? What is the format of the data coming in?

I haven't actually tested the end-to-end PyTorch/MNIST example, but I don't understand how model_fn() would work, now that I look at the example closer. Can you help me understand?

I pulled down the deployed model.tar.gz inference code but i don't see the equivalent of an inference.py in the bundle.

And are there any TensorFlow examples? I only see PyTorch at the moment.

Thanks!

Choice states cannot be chained

I'm currently adopting this package in place of own code to generate a state machine definition and one of our main patterns is to have a choice state that checks for a certain variable and kicks off previous step again if the condition was met, or move to the next step if not.
Example (failing) code:

import stepfunctions

first_job = stepfunctions.steps.Pass('First job')
second_job = stepfunctions.steps.Pass('Second job')

check = stepfunctions.steps.Choice('Check first job')
check.add_choice(
    rule=stepfunctions.steps.ChoiceRule.BooleanEquals(
        variable='$run_me_again',
        value=True
    ),
    next_step=first_job
)
check.default_choice(second_job) # This could be set automatically

chain = stepfunctions.steps.Chain([first_job, check, second_job])

This code currently gives State type `Choice` does not support method `next`.

I understand that the choice state was designed to branch the workflow and that terminates the chain. Please correct me if i'm wrong but in my opinion the presence of the next step can be treated as the default_choice for the choice state.

Here is a quick workaround that solves our issue:

class ChainableChoice(stepfunctions.steps.Choice):
    def next(self, next_step):
        self.default_choice(next_step)
        return next_step

I'd be more than happy to submit a proper PR, please let me know what do you think about it.

Allow Workflow.create() to update workflow

Hey guys,

One use case I believe is very common when using notebooks is to modify some cells and re-run stuff, but the process to update an existing workflow with the SDK requires additional steps.

To explain this, consider these two screenshots:

Here, I executed these cells for the first time and effectively created the workflow called MyWorkflow_Simple:
image

However, during development I added a new step second_step (consider this may occur several times either by adding new steps or modifying existing ones). The problem now is the workflow already exists and I couldn't override/update it:
image

Of course later I managed to update with attach, but this creates additional cells and complexity in terms of reproducibility:
image

Given this usage, which I believe is very common when dealing with notebooks, my proposal would be to add a new parameter update (default to False) to Workflow.create() to update/override in a single step.

I can submit a PR if you agree with this approach.
Thank you for reviewing this!

Can't define OutputDataConfig for TrainingStep

It doesn't look like it's possible to specify the output path for the model artifacts in the TrainingStep class. You can only use the default output directory or define it in the estimator object before you pass it to the TrainingStep. In either case it doesn't look like its possible to set the output path from the execution input or the output of a prior task.

Model path bug In training pipeline template

I believe there is a bug in the TrainingPipeline class.

You can recreate the bug by running all cells in this notebook without making any changes.

When I run this notebook, my Step Function fails at the "Create Model" step with the following exception:

Error

SageMaker.AmazonSageMakerException
Cause

Could not find model data at s3://sagemaker-us-west-2-{account-number}/training-pipeline-2020-06-30-00-08-30/models/estimator-training-pipeline-2020-06-30-00-08-39/output/model.tar.gz. (Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: 817d6712-c1d1-4765-b1da-bd61c3c7157d)

I looked in to the code here and I think the bug occurs in line 153 in this code snippet:

        # Configure the path to model artifact
        inputs[StepId.CreateModel.value]['PrimaryContainer']['ModelDataUrl'] = '{s3_uri}/{job}/output/model.tar.gz'.format(
            s3_uri=inputs[StepId.Train.value]['OutputDataConfig']['S3OutputPath'],
            job=inputs[StepId.Train.value]['TrainingJobName']
        )

It looks like the SDK passes the model data uri as the string: '{s3_uri}/{TrainingJobName}/output/model.tar.gz', but in my testing it looks like SageMaker treats the TrainingJobName as a prefix, not the entire job name. SageMaker is appending a unique hash to this prefix, which means the S3 path is no longer correct.

For example, when I last ran the notebook my training job succeeded and stored the model artifacts here:

"S3ModelArtifacts": s3://sagemaker-us-west-2-{account-number}/training-pipeline-2020-06-30-00-08-30/models/estimator-training-pipelin -2136f0ca-232b-4c66-8487-835d6ad1d153 /output/model.tar.gz

but the execution input that Create Model state takes in was:

"ModelDataUrl": "s3://sagemaker-us-west-2-{account-number}/training-pipeline-2020-06-30-00-08-30/models/estimator-training-pipeline-2020-06-30-00-08-39/output/model.tar.gz"

Open source the visualization component?

Hi, thanks for releasing this as open source!

I noticed that the code for visualizing workflow graphs refers to JS and CSS files that aren't open source.

Would it be possible to release those components as open source too? Or at least relax their license to something other than "all rights reserved"? Thanks!

S3Operations is not supported by Step Functions error: Using TrainingPipeline with a TensorFlow model

Error: The field \"S3Operations\" is not supported by Step Functions

{   "error": "States.Runtime",   "cause": "An error occurred while executing the state 'Create Model' (entered at the event id #8). The Parameters '{\"ExecutionRoleArn\":\"arn:aws:iam::835319576252:role/service-role/AmazonSageMaker-ExecutionRole-20191006T135881\",\"PrimaryContainer\":{\"Image\":\"520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tensorflow:2.0.0-gpu-py3\",\"Environment\":{\"SAGEMAKER_PROGRAM\":\"mnist_keras_tf2.py\",\"SAGEMAKER_SUBMIT_DIRECTORY\":\"s3://sagemaker-us-east-1-835319576252/training-pipeline-2019-12-27-19-39-29/estimator-source/source/sourcedir.tar.gz\",\"SAGEMAKER_ENABLE_CLOUDWATCH_METRICS\":\"false\",\"SAGEMAKER_CONTAINER_LOG_LEVEL\":\"20\",\"SAGEMAKER_REGION\":\"us-east-1\"},\"ModelDataUrl\":\"s3://sagemaker-us-east-1-835319576252/sagemaker/tensorflow-mnist/training-runs/training-pipeline-2019-12-27-19-39-29/models/estimator-training-pipeline-2019-12-27-19-39-35/output/model.tar.gz\"},\"ModelName\":\"training-pipeline-2019-12-27-19-39-35\",\"S3Operations\":{\"S3Upload\":[{\"Path\":\"mnist_keras_tf2.py\",\"Bucket\":\"sagemaker-us-east-1-835319576252\",\"Key\":\"training-pipeline-2019-12-27-19-39-29/estimator-source/source/sourcedir.tar.gz\",\"Tar\":true}]}}' could not be used to start the Task: [The field \"S3Operations\" is not supported by Step Functions]" }
--

The code is roughly like this...

from sagemaker.tensorflow import TensorFlow

mnist_estimator = TensorFlow(entry_point='mnist_keras_tf2.py',
                             source_dir='./src',
                             role=sagemaker_execution_role,
                             framework_version='2.0.0',
                             train_instance_count=1,
                             train_instance_type='ml.p3.2xlarge',
                             py_version='py3',
                             distributions={'parameter_server': {'enabled': True}})

model_output_s3_bucket = '{}/sagemaker/tensorflow-mnist/training-runs'.format(bucket)

pipeline = TrainingPipeline(
    estimator=mnist_estimator,
    role=workflow_execution_role,
    inputs=training_data_uri,
    s3_bucket=model_output_s3_bucket
)

pipeline.create()

pipeline.execute()

This happens during the Create Model step...

image

Any ideas?

Hyperparameter are not getting picked up during `execute` in `stepfunctions.template.pipeline.train.TrainingPipeline`

The hyperparameter set through stepfunctions.template.pipeline.train.TrainingPipeline.execute(job_name=None, hyperparameters=None) are not picked up during execution for DeepAR. Instead hyperparameters need to be set using estimator.set_hyperparameters(**hyperparameters) before estimator is passed as argument while instantiating stepfunctions.template.pipeline.train.TrainingPipeline(estimator, role, inputs, s3_bucket)

Problem with environment variables in modelstep

Hi,
Please help.
I am having some challenges with stepfunctions + sagemaker APIs.
It seems that I should be able to pass environmental variables to endpoint using env(dict[str,str]) in the sagemaker.Model constructor in the steps.Modelstep.
However I am not able to get those env variables visible in the endpoint.
I am able to get the env variables working with boto3 but I don't want to use that as I prefer Sagemaker API with step functions.

So how to pass env variables correctly to stepfunctions to that those are visible?

Running:
stepfunctions=1.0.0.9
sagemaker=1.55.4

image
However in container once I start it (printing os.environ all keys and values)
image

Cannot output null value in ResultPath

We cannot output the following JSON definition even if result_path is None.

{
  "Type": "Parallel",
  "ResultPath": null
}

If result_path is specified explicitly, user should expect to have it in the final statemachine definition.

from stepfunctions import steps

p = steps.Parallel('test', result_path=None)
assert 'ResultPath' in p.to_dict()

feature: Adding support for debug hooks

The sagemaker.estimator class will assign a default debug output to the same directory as model output for any regions that support debugger.

see: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/estimator.py#L1674

The TrainingStep which includes inference preprocessors needs to be update to include support for setting the debug hook config s3 output path eg:

{
    'DebugHookConfig': {
           'S3OutputPath': 's3://sagemaker-us-east-1/inference-pipeline/models/debug'
     }
}

TrainingStep should expose EnableManagedSpotTraining property to enable usage of EC2 spot instances

Seems like SDK doesn't support using managed spot instances.

When setting the estimator to use them, it doesn't render to the state definition.

xgb_estimator = sagemaker.estimator.Estimator(
..
    train_use_spot_instances=True,
..

training_step = steps.TrainingStep(
..
    estimator=xgb_estimator,
..

The Step functions state doesn't have the EnableManagedSpotTraining present as I would expect. As this is supported according to: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#API_CreateTrainingJob_RequestSyntax

Support for passing full arn to LambdaStep on init

Hi

My understanding is that the result returned from a Lambda varies dependent on whether the full arn is passed or the shortened one. For example, if the resource is arn:aws:states:::lambda:invoke and the function name is passed in the payload then the result of the lambda will also be in a payload field, which then requires unpacking.

If the full arn is specified then the result is returned directly. Therefore, it would be good if when the full arn is passed to the Lambda Step as a kwarg arg, it is preserved not clobbered.

Thanks
Andy

different name for training job inside estimator than step input

  • sagemaker contrainer : conda_pytorch_p36
  • estimator mode : 'script mode'
    While it is a MUST param that I have to give a name for TrainingJobName from step functions for data science sdk.
pytorch_estimator = PyTorch(entry_point='HRC_0818_final.py',
                            train_instance_type='ml.m4.xlarge',
                            role=role,
                            train_instance_count=1,
                            framework_version='1.4.0',
                            base_job_name = 'kanto-base-job',
                            )
import stepfunctions
training_step = steps.TrainingStep(
    'Model Training', 
    estimator=pytorch_estimator,
    data={
         'training': s3_input(s3_data=execution_input['TrainTargetLocation'])
    } ,
    job_name=execution_input['TrainingJobName'],
    wait_for_completion=True
)

model_step = steps.ModelStep(
    'Save model',
    model=training_step.get_expected_model(),
    model_name=execution_input['ModelName'] ,
    instance_type='ml.m4.xlarge',

)

execution = workflow.execute(
    inputs={
 
        'ModelName': 'kanto-mode-{}'.format(uuid.uuid4().hex),
        'TrainTargetLocation' : 's3://hrms-train/traindata/train.jsonl'
    }
)

it is still the default training job name inside the estimator with current strtime following base_job_name
"module_dir": "s3://sagemaker-{aws-region}-{aws-id}/{training-job-name}/source/sourcedir.tar.gz",

Then you link a wrong dir to a Model consequently.
SAGEMAKER_SUBMIT_DIRECTORY | s3://sagemaker-{aws-region}-{aws-id}/{base-job-name}-2020-08-20-17-47-50-751/source/sourcedir.tar.gz

I guess the reason is that I have two difference folder for model.tar.gz and sourcedir.tar.gz then leads to a awkward behavior that you can't create consolidated model.tar.gz when you deploy it to server. I can only copy sourcedir.tar.gz to a mms server as this is a default job name. I am missing model.pth consequently.

So, that just leads to put a lambda function that just copies model.tar.gz (model.pth) from TrainTargetLocation folder to default training job folder (strtime named) to make it work correctly.

Does the Step Function Data Science SDK support SageMaker Processing Jobs similar to Batch Transform Jobs?

cc @shunjd

I see stepfunctions.steps.sagemaker.TransformStep for Batch Transform Jobs, but I don't see the equivalent for SageMaker Processing Jobs. Am I missing something?

If this support doesn't exist, can you suggest a suitable workaround? Perhaps, we need to invoke a shim Lambda function for now?

Any idea when support for SageMaker Processing Jobs will be added to the Step Function Data Science SDK?

Tensorflow estimator script mode not working if we pass HyperParameters as part of TrainingStep

Hello,

I was trying to use Tensorflow estimator in script mode and I was trying to pass the hyper-parameters as part of stepfunctions.steps.TrainingStep as follows:
Screenshot 2020-08-04 at 6 22 01 PM

But, the training job failed with the following error
Screenshot 2020-08-04 at 6 11 50 PM

Upon investigation I found out that other hyper-parameters that the estimator attaches on its own were not going (ex - 'sagemaker_program', 'sagemaker_submit_directory' etc.)

For example, the following image is the set of hyper-parameters for the job when we give it as part of TrainingStep:
Screenshot 2020-08-04 at 6 21 42 PM

But, if we give the hyper-parameters as part of the estimator:
Screenshot 2020-08-04 at 6 12 59 PM

The reason why I'd want to give hyper-parameters as part of the TrainingStep is that I'd be able to use stepfunctions.inputs.ExecutionInput in order to provide inputs dynamically that might be useful in the script.

Train S3 location from ExecutionInput

This is an excellent tool, great Work!. I was looking forward to future releases.

I was trying the below example:
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/step-functions-data-science-sdk/automate_model_retraining_workflow/automate_model_retraining_workflow.ipynb

I have a requirement to rerun the same workflow each time with a different s3 location.

training_step = steps.TrainingStep( 'Model Training', estimator=xgb, data={ 'train': s3_input(train_data, content_type='csv'), 'validation': s3_input(validation_data, content_type='csv') }, job_name=execution_input['TrainingJobName'], wait_for_completion=True )

In this example, the train and validation s3 locations are fixed. I want to change this on each workflow execution. Is there a way to input this from ExecutionInput or any other means?

AWS CDK construct library

Hello,

This is more of a suggestion for the data science SDK.

Since the release of AWS CDK, my team has moved to modeling our infrastructure to CDK. We utilized step functions to model a SageMaker endpoint deployment pipeline starting from training jobs and leading to deploying/updating endpoints.

Going through the documentation of the SDK and the functionality it provides it seems like a missed opportunity not to have this modeled as CDK constructs that AWS CDK users can leverage.

I wonder if you can provide an AWS CDK constructs library to model the components featured in the SDK

Specifying `arguments=[]` in SKLearnProcessor.run() causes `Invalid length for parameter AppSpecification.ContainerArguments` error

This code:

sklearn_processor.run(code='preprocess-scikit.py',
                                       arguments=[])                     

Causes the following error:

---------------------------------------------------------------------------
ParamValidationError                      Traceback (most recent call last)
<ipython-input-88-829761199960> in <module>()
     12                       arguments=[],
     13                       logs=True,
---> 14                       wait=False)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/processing.py in run(self, code, inputs, outputs, arguments, wait, logs, job_name, experiment_config)
    402             inputs=normalized_inputs,
    403             outputs=normalized_outputs,
--> 404             experiment_config=experiment_config,
    405         )
    406         self.jobs.append(self.latest_job)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/processing.py in start_new(cls, processor, inputs, outputs, experiment_config)
    622 
    623         # Call sagemaker_session.process using the arguments dictionary.
--> 624         processor.sagemaker_session.process(**process_request_args)
    625 
    626         return cls(

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in process(self, inputs, output_config, job_name, resources, stopping_condition, app_specification, environment, network_config, role_arn, tags, experiment_config)
    644         LOGGER.info("Creating processing-job with name %s", job_name)
    645         LOGGER.debug("process request: %s", json.dumps(process_request, indent=4))
--> 646         self.sagemaker_client.create_processing_job(**process_request)
    647 
    648     def create_monitoring_schedule(

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    314                     "%s() only accepts keyword arguments." % py_operation_name)
    315             # The "self" in this scope is referring to the BaseClient.
--> 316             return self._make_api_call(operation_name, kwargs)
    317 
    318         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    597         }
    598         request_dict = self._convert_to_request_dict(
--> 599             api_params, operation_model, context=request_context)
    600 
    601         service_id = self._service_model.service_id.hyphenize()

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _convert_to_request_dict(self, api_params, operation_model, context)
    645             api_params, operation_model, context)
    646         request_dict = self._serializer.serialize_to_request(
--> 647             api_params, operation_model)
    648         if not self._client_config.inject_host_prefix:
    649             request_dict.pop('host_prefix', None)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/validate.py in serialize_to_request(self, parameters, operation_model)
    295                                                     operation_model.input_shape)
    296             if report.has_errors():
--> 297                 raise ParamValidationError(report=report.generate_report())
    298         return self._serializer.serialize_to_request(parameters,
    299                                                      operation_model)

ParamValidationError: Parameter validation failed:
Invalid length for parameter AppSpecification.ContainerArguments, value: 0, valid range: 1-inf

Simplify parameter handling with map

Right now managing the parameters of the iterator to a map is challenging, because there's no way to use the input schema to ensure you're passing things to the right places.

It'd be great if the API provided some facility for doing this to reduce the difficulty.

TrainingPipeline fails when specifying `s3_bucket`: `s3://s3://` issue

The following code

from sagemaker.xgboost import XGBoost

model_output_path = 's3://{}/models/amazon-reviews/script-mode/training-runs'.format(bucket)

xgb_estimator = XGBoost(entry_point='train.py',
                                            ...
                                            output_path=model_output_path)


from stepfunctions.template.pipeline import TrainingPipeline

pipeline = TrainingPipeline(
    estimator=xgb_estimator,
    role=workflow_execution_role,
    inputs={'train': s3_input_train_data, 
            'validation': s3_input_validation_data},
    s3_bucket=model_output_path)

Causes the following error:

{
  "Training": {
    "AlgorithmSpecification": {
      "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3",
      "TrainingInputMode": "File"
    },
    "OutputDataConfig": {
      "S3OutputPath": "**s3://s3://sagemaker-us-east-1-835319576252/models/amazon-reviews/script-mode/training-runs/training-pipeline-2020-03-06-01-27-38/models**"
    },

...

  },
  "Create Model": {
    "ModelName": "training-pipeline-2020-03-06-01-30-16",
    "PrimaryContainer": {
      "Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3",
      "Environment": {
        "SAGEMAKER_PROGRAM": "xgboost_reviews.py",
        "SAGEMAKER_SUBMIT_DIRECTORY": "s3://sagemaker-us-east-1-XXX/training-pipeline-2020-03-06-01-27-38/estimator-source/source/sourcedir.tar.gz",
        "SAGEMAKER_ENABLE_CLOUDWATCH_METRICS": "false",
        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_REGION": "us-east-1"
      },
      "ModelDataUrl": "**s3://s3://sagemaker-us-east-1-XXX/models/amazon-reviews/script-mode/training-runs/training-pipeline-2020-03-06-01-27-38/models/estimator-training-pipeline-2020-03-06-01-30-16/output/model.tar.gz**"
    },

And if I remove the s3://, I see the following error:

ValueError: Expecting 's3' scheme, got:  in sagemaker-us-east-1-
XXX/models/amazon-reviews/script-mode/training-runs.

No metrics when I use tensorflow estimator

hi

I'm trying step functions data science sdk, but I ran into a issue

I created tf estimator in my jupyter notebook like this:
tf_estimator = TensorFlow( entry_point='my_train_1.py', model_dir=model_dir, output_path=s3_output_location, train_instance_type=train_instance_type, train_instance_count=1, hyperparameters=hyperparameters, role=sagemaker.get_execution_role(), #base_job_name='tf-scriptmode-mnist', framework_version='2.0.0', py_version='py3', metric_definitions=metric_definitions, script_mode=True)

I really defined the "metric_definitions", but when I chained all steps as a workflow, and executed it.

there is no customer metrics in sagemaker console

but when I just create the estimator and fit without step functions data science sdk, I can get the metric.

what did I miss? or any suggestion?

thanks a lot

Can't Pass dict Objects as Execution Inputs

When I try to pass a dict object as an execution input, I receive a ValueError.

Notes to reproduce:

Run this example notebook but add a dict input to the workflow with the following cells:

from stepfunctions.inputs import ExecutionInput
execution_input = ExecutionInput(schema={
'InputDict': dict})`

InputDict = {"Key":"Value"}
type(InputDict)

basic_workflow_execution = basic_workflow.execute(
inputs={"InputDict": InputDict})

When I run the execute method I receive the following error:


ValueError Traceback (most recent call last)
in
1 basic_workflow_execution = basic_workflow.execute(
----> 2 inputs={"InputDict": InputDict})

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/stepfunctions.py in execute(self, name, inputs)
285 validation_result = self.workflow_input.validate(inputs)
286 if validation_result.valid is False:
--> 287 raise ValueError("Expected run input with the schema: {}".format(self.workflow_input.get_schema_as_json()))
288
289 if self.state_machine_arn is None:

ValueError: Expected run input with the schema: {"InputDict": "dict"}


I tried passing in the same dictionairy to the created Step Functions directly through the console and it works:

image

Execution list_events html output fails with boolean properties for inputDetails.

When an event includes boolean properties the execution.list_events(html=True) method fails with error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-65-138c22b04713> in <module>
----> 1 execution.list_events(html=True) # Bug

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/stepfunctions.py in list_events(self, max_items, reverse_order, html)
    487 
    488         if html:
--> 489             return HTML(events_list.to_html())
    490         else:
    491             return events_list

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/stepfunctions.py in to_html(self)
     45 
     46     def to_html(self):
---> 47         return EventsTableWidget(self).show()
     48 
     49 

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/widgets/events_table.py in __init__(self, events)
    141             timestamp=format_time(event.get("timestamp")),
    142             event_detail=self._format_event_detail(event)
--> 143         ) for event in events]
    144 
    145         self.template = Template(TABLE_TEMPLATE.format(table_rows='\n'.join(table_rows)))

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/widgets/events_table.py in <listcomp>(.0)
    141             timestamp=format_time(event.get("timestamp")),
    142             event_detail=self._format_event_detail(event)
--> 143         ) for event in events]
    144 
    145         self.template = Template(TABLE_TEMPLATE.format(table_rows='\n'.join(table_rows)))

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/widgets/events_table.py in _format_event_detail(self, event)
    307     def _format_event_detail(self, event):
    308         event_details = self._get_step_detail(event)
--> 309         self._unpack_to_proper_dict(event_details)
    310         return json.dumps(event_details, indent=4)
    311 

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/widgets/events_table.py in _unpack_to_proper_dict(self, dictionary)
    313         for k, v in dictionary.items():
    314             if isinstance(v, dict):
--> 315                 self._unpack_to_proper_dict(v)
    316             else:
    317                 dictionary[k] = self._load_json(v)

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/widgets/events_table.py in _unpack_to_proper_dict(self, dictionary)
    315                 self._unpack_to_proper_dict(v)
    316             else:
--> 317                 dictionary[k] = self._load_json(v)
    318 
    319     def _load_json(self, value):

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/workflow/widgets/events_table.py in _load_json(self, value)
    319     def _load_json(self, value):
    320         try:
--> 321             return json.loads(value)
    322         except ValueError as e:
    323             return value

~/anaconda3/envs/python3/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346         if not isinstance(s, (bytes, bytearray)):
    347             raise TypeError('the JSON object must be str, bytes or bytearray, '
--> 348                             'not {!r}'.format(s.__class__.__name__))
    349         s = s.decode(detect_encoding(s), 'surrogatepass')
    350 

TypeError: the JSON object must be str, bytes or bytearray, not 'bool'

This is due to the {'truncated': False} value in the following event:

{'timestamp': datetime.datetime(2020, 10, 6, 2, 41, 57, 614000, tzinfo=tzlocal()),
 'type': 'ExecutionStarted',
 'id': 1,
 'previousEventId': 0,
 'executionStartedEventDetails': {'input': '{\n    "ExperimentName": "mlops-nyctaxi",\n    "TrialName": "mlops-nyctaxi-7f99979c077d11ebbed6ad48a7dcc771",\n    "GitBranch": "master",\n    "GitCommitHash": "xxx",\n    "DataVersionId": "yyy",\n    "BaselineJobName": "mlops-nyctaxi-7f99979c077d11ebbed6ad48a7dcc771",\n    "BaselineOutputUri": "s3://sagemaker-us-east-1-691313291965/nyctaxi/monitoring/baseline/mlops-nyctaxi-pbl-7f99979c077d11ebbed6ad48a7dcc771",\n    "TrainingJobName": "mlops-nyctaxi-7f99979c077d11ebbed6ad48a7dcc771"\n}',
  'inputDetails': {'truncated': False},
  'roleArn': 'arn:aws:iam::691313291965:role/mlops-nyctaxi-sfn-execution-role'}}

transform_config() got an unexpected keyword argument 'input_filter'

dear collaborators, I am new in AWS Step Functions Data Science SDK for Amazon SageMaker. I'm working with the example " machine_learning_workflow_abalone" of the sagemaker examples, and when execute transform_step appears the next error:

TypeError: transform_config() got an unexpected keyword argument 'input_filter'

the code of cell is:

transform_step = steps.TransformStep( 'Transform Input Dataset', transformer=xgb.transformer( instance_count=1, instance_type='ml.m5.large' ), job_name=execution_input['JobName'], model_name=execution_input['ModelName'], data=test_s3_file, content_type='text/libsvm' )
I need help please to resolve this problem
Best regards

timestamp mismatch when using code_location

HI,

When code_location is used in estimator of TrainingStep(), the uploaded s3 path and sagemaker_submit_directory timestamp do not match(about 400 ms).
This will cause the execution to fail.

In SageMaker training job, timestamp matches even if code_location is used.

S3 uploaded path
s3://my-bucket/model/sagemaker-xgboost-2020-06-10-06-29-37-910/source/sourcedir.tar.gz

sagemaker_submit_directory
"s3://my-bucket/model/sagemaker-xgboost-2020-06-10-06-29-38-323/source/sourcedir.tar.gz"

# Open Source distributed script mode
from sagemaker.session import s3_input, Session
from sagemaker.xgboost.estimator import XGBoost

boto_session = boto3.Session(region_name=region)
session = Session(boto_session=boto_session)

output_path = 's3://{}/{}'.format(bucket_name, 'model')

xgb_script_mode_estimator = XGBoost(
    entry_point='xgboost.py',
    source_dir='source',
    framework_version='0.90-2', # Note: framework_version is mandatory
    hyperparameters=hyperparams,
    role=role,
    train_instance_count=1, 
    train_instance_type='ml.m5.2xlarge',
    code_location=output_path, # ← Cause a mismatch
    output_path=output_path
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.