Comments (14)
@nyue My guess is you're running the update script without having first deployed the stack. Take a look at More Regions (click to expand) in the README and click on the ca-central-1
launch stack button:
from pcluster-manager.
I tried clicking on ca-central-1, it started creating the stack. After monitoring it (IN_PROGRESS) for about 30 minutes, I get a ROLLBACK_COMPLETE
It's like it has issue creating the stack
from pcluster-manager.
@nyue Can you check the stack events and find the event that failed? You might need filter by FAILED
to get the correct stack.
from pcluster-manager.
I found the failure, it mentioned ECR
Embedded stack arn:aws:cloudformation:ca-central-1:083230063072:stack/pcluster-manager-ParallelClusterApi-HAUEC9DAO8LE/099a6170-a96b-11ec-9e63-024e96f96c22 was not successfully created: The following resource(s) failed to create: [EcrImage].
from pcluster-manager.
@nyue - Can you please share the error with the embedded stack? It should have information specific to the ECR failure.
from pcluster-manager.
I am not sure how to get to an embedded stack? Is that the same as a nested stack ?
I am using this ID to match 099a6170-a96b-11ec-9e63-024e96f96c22
from pcluster-manager.
You've got the right stack!
Yes, embedded stacks are synonymous with nested stacks.
Once CloudFormation finds a failure it will delete all the resources that it has created up to that point, so seeing all the resources as DELETE_COMPLETE is expected.
Can please click on the Events tab of the stack and share information regarding the first failures you see?
FWIW I just deployed to ca-central-1 by clicking on the link @sean-smith sent and PCluster manager successfully installed.
from pcluster-manager.
Here are the events
from pcluster-manager.
@nyue - Can you please scroll down the list of events and grab a screenshot the point at which you see the first error? The delete events happened subsequent to the error.
from pcluster-manager.
Here is the part where the creation failure message is visible. It mentioned SSM Agent
from pcluster-manager.
Thanks for the detail! I think Image Builder uses the default VPC to build an image. What does that VPC look like in your environment? If there are private subnets do all their route tables have a NAT GW entry? Any NACLs in place? Any VPC or gateway endpoints with policies in place?
from pcluster-manager.
I am not an AWS power user, I never touch the networking stuff so I am struggling to comprehend the details and be able to find answers to the questions (even though I vaguely understand the questions and they look important).
I have been setting up MPI clusters via terraform and was hoping to do so via parallelcluster v3 seeing the configuration has move to YAML.
I am looking to pcluster-manager to make it simple for users like me to launch an MPI cluster, run some MPI aware applications be they rendering, genomics, visualization and shut them down once the work is done.
I am happy to learn about the VPC/network working stuff but am unable to provide useful answers at this juncture.
Cheers
from pcluster-manager.
Thanks @nyue
This is exactly the type of use case we're building PCM to solve. We want to make it easy to deploy and manage HPC clusters without having to be an expert in AWS networking (which I'm not).
Specifically for this issue, I think the root of the problem comes from #55 which will be merged in the next few weeks.
Basically the default VPC (which you can find here) needs to exist and have a route to the internet. Once #55 is merged you'll be able to specify another VPC/Subnet outside of the default one to launch in.
from pcluster-manager.
Closing since #55 was merged
from pcluster-manager.
Related Issues (20)
- Cluster in `CREATE_FAILED` state cannot be selected/deleted HOT 1
- parallelCluster Manager 3.2.0 failed on PERSISTENT_2 Lustre creation HOT 1
- FSx Netap Ontap not displaying filesystems HOT 4
- Cluster cannot be deleted after failed build HOT 2
- Removing Second Filesystem crashes HOT 1
- Cost-tags.sh script does not treat aws tagging API throttling
- [Wizard] Static count increase in the wizard results in an implicit Dynamic count increase HOT 1
- Cannot delete cluster build outside of PCM HOT 4
- PCM deployment is overriding default behavior of SSM Sessions HOT 5
- `PerUnitStorageThroughput` not added by default HOT 1
- Logs HOT 1
- Creating new cluster using parallel cluster manager fails HOT 5
- Create Pcluster Manager Fails HOT 7
- Cannot submit job with number of nodes defined HOT 2
- st1 volumes have issues HOT 1
- Polices needed HOT 4
- Create new cluster fails due to missing iam:AttachRolePolicy permission HOT 3
- DCV vs Shell experience differs for third party user HOT 1
- Pcluster Manager not working on ap-south-1 region HOT 5
- [BUG] PC UI deployment fails HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pcluster-manager.