Comments (10)
Thank you for your interest ! Yes, we plan to add support for Pytorch 1.3. However, we currently do not have a timeline to share.
from sagemaker-pytorch-training-toolkit.
It would be really nice to have the support for pytorch1.3
from sagemaker-pytorch-training-toolkit.
Thank you for your interest! I've reached out to the corresponding team for their input.
from sagemaker-pytorch-training-toolkit.
Pytorch 1.3.1 containers are released.
Docker files will be merged to this repo in a few days.
from sagemaker-pytorch-training-toolkit.
Hi @akartsky and others. Does AWS update their pre-built Docker images frequently? I am still using this line in my Dockerfile:
FROM 520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-pytorch:1.1.0-gpu-py3
Replacing 1.1.0
with anything above does not work - those images don't seem to exist.
from sagemaker-pytorch-training-toolkit.
versions 1.2.0+ follow a different format:
763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.3.1-gpu-py3
https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html
from sagemaker-pytorch-training-toolkit.
@laurenyu where is a list of these magic ECR repositories? The link that you mention does not list them.
from sagemaker-pytorch-training-toolkit.
@siavashdarkvision looks like the page has been updated, but it does link to: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
from sagemaker-pytorch-training-toolkit.
@laurenyu if I extend a recent version of pytorch-training
image like 1.7.1 using this, will I be able to do hyper-parameter tuning?
from sagemaker-pytorch-training-toolkit.
@siavashdarkvision it should work!
(if you run into problems, please open a new issue since hyperparameter tuning is a separate topic and this issue has been closed for over a year)
from sagemaker-pytorch-training-toolkit.
Related Issues (20)
- "bash: cannot set terminal process group (-1): Inappropriate ioctl for device" printed at the start of sagemaker jobs HOT 3
- Training on GPU with a custom container based on official pytorch-training container HOT 2
- Custom serving code with framework_version beyond 1.1.0 HOT 5
- Issue with torchvision::nms using custom Pytorch and TorchVision HOT 20
- requirements.txt not working HOT 2
- RuntimeError in training a model of resnet152 using transfer learning: "models cannot register a hook on a tensor that doesn't require gradient" HOT 3
- Pytorch 1.5 build issue HOT 2
- unable to build final dockerfile.cpu HOT 4
- FastAI v1.0.59 causes failed training job HOT 1
- cannot recognize num_gpus for more than 1 gpu per instance HOT 4
- Getting cudnn error while training on ml.p2.xlarge instance HOT 2
- Error importing torchaudio HOT 2
- Example use case HOT 2
- Dockerfile installation of torch and torchvision from s3, replacing original versions.
- model_fn is not recognized. Sagemaker Studio template for model building, training, and deployment HOT 1
- Environment variables set for NCCL and Distributed training are not passed onto the sagemaker-training entrypoint HOT 1
- [bug] Torch does not find GPU on pytorch-training:1.10.0-gpu-py38 container
- "Train": executable file not found in $PATH
- [FATAL tini (7)] exec train failed: No such file or directory
- ModuleNotFoundError: Sagemaker only copies entry_point file to /opt/ml/code/ instead of the holy-cloned source code HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sagemaker-pytorch-training-toolkit.