Comments (4)
Thanks for the feedback @humanzz
Are you using the data science SDK today?
The aws-stepfunctions-tasks module supports constructs to create Task states for SageMaker CreateTransformJob and CreateTrainingJob. Besides other SageMaker APIs (aws/aws-cdk#6572), which features of this SDK would you like to see in CDK as constructs?
from aws-step-functions-data-science-sdk-python.
The CDK provides L2 constructs for several more of the SageMaker APIs including:
CreateTrainingJob
CreateTransformJob
CreateEndpoint
CreateEndpointConfig
CreateModel
UpdateEndpoint
.
L2 constructs are intended to model and simplify assembly of a State Machine definition so that it can be deployed through CloudFormation via cdk deploy
in the CDK CLI. You can also author your state machine in the language of your choice (TypeScript, JavaScript, C#, Python, Java, Go). It also allows for creation of resources outside of your state machine through the aws-sagemaker
module, which allows for creation of all these resources as well via CloudFormation
The Data Science SDK offers capabilities in Python by talking to Step Functions APIs instead of deploying to CloudFormation. This includes several utilities that are geared towards simplify usage in Jupyter Notebooks that allow you to visualize the workflow, poll its execution status, etc that are unlocked by being able to call Step Functions APIs.
Question: Is there something actionable in this repository that we could be providing to improve the data science sdk experience? @humanzz
If there are features that are missing in the CDK towards the creation of a state machine definition, they should probably be created as issues and tracked in the aws-cdk repository.
If there are other areas to explore, I suggest converting this issue into a discussion
.
from aws-step-functions-data-science-sdk-python.
Thanks for the response @shivlaks.
I believe when I opened this issue, SageMaker/StepFunctions support in CDK was very limited. My team saw value in modeling our ML pipelines in CDK, and this SDK seemed to do something very similar.
As you say, this library probably serves a different - though with similarities - to the use case I had in mind.
Since this is in Python, works with notebooks, it can fit easily into earlier experimentation phases, iterating to reach a final pipeline.
In our case, we chose CDK to represent those final pipelines, since they'll be used many times. But to be honest, the cycle between experimenting and reaching a final pipeline is a bit more cumbersome than if using the SDK here (requires infrastructure changes with CDK and Python change for SageMaker).
So to answer your question: nothing actionable required but appreciate the added context.
from aws-step-functions-data-science-sdk-python.
thank you for sharing your journey so far @humanzz 🙏
the cycle between experimenting and reaching a final pipeline is a bit more cumbersome than if using the SDK here
👀 It does still sound like we can smooth out the developer experience and go further in either CDK or data science SDK. If you have any thoughts or ideas for this repo or the CDK, I encourage you to open the feature request and that will help start the conversation around how we can bridge gaps. We want developers to have the option and the flexibility by providing solutions that simplifies their use caes.
Aside:
I believe the aws-stepfunctions
and aws-stepfunctions-tasks
modules were also in their early stages as the modules were in experimental
stability when this issue was created (I was on the CDK team at that time). These modules are stable now although the SageMaker APIs offered in stepfunctions-tasks
don't quite leverage SageMaker L2s as input type as they are largely modeled within the sfn tasks module.
The Step Functions team will be looking to improve the developer experience in both the data science SDK and we will also monitor issues for the CDK modules to weight in / open PRs to contribute where we can.
I'm resolving this issue for now as there is nothing actionable required at this time. Feel free to re-open if you have any unresolved questions.
from aws-step-functions-data-science-sdk-python.
Related Issues (20)
- Is it Possible to use schema from ExecutionInput into container_arguments of ProcessingStep? HOT 6
- Feature Request: Add Workflow.create_or_update() to update workflow if it exists or create it if it doesn't HOT 1
- feature: adding support for AsyncInferenceConfig in endpoint config HOT 1
- Unable set ModelClientConfig in TransformerStep HOT 3
- object of type TrainingInput is not JSON serializable HOT 1
- Make sagemaker an optional dependency HOT 5
- [aws-stepfunctions] Support for ResultSelector HOT 1
- Allow drop-in states language HOT 1
- Support "Cycles" with Choice states HOT 1
- pip install stepfunctions fails in SageMaker Studio Notebook
- render_graph() has hidden dependencies
- Workflow Tags are not updated HOT 1
- Retry policies with error codes HOT 1
- stepfunctions package side effect of changing application's root logger logging level to ERROR
- Instructions in CONTRIBUTING no longer work with tox>=4
- Project Maintenance HOT 5
- Execution inputs as container arguments for processing jobs HOT 4
- Stepfunctions-Textract-StartDocumentTextDetection does not accept SNSTopicArn in NotificationChannel HOT 1
- Please add Distributed Map
- adding tags to a Sagemaker estimator in the training step does not seem to be supported
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-step-functions-data-science-sdk-python.