Comments (10)
@suiguoxin @siaimes Any comments?
from pai.
@hzy46 May know this part better
from pai.
You can try this:
https://github.com/siaimes/k8s-share
This is the solution I used now, which is simple and stable.
from pai.
You can try this:
https://github.com/siaimes/k8s-share
This is the solution I used now, which is simple and stable.
yeah, I tried this and here is what I'm experiencing now:
`TASK [kubernetes/master : Create hardcoded kubeadm token for joining nodes with 24h expiration (if defined)] ***************************************************************************************************************************
Monday 16 May 2022 11:17:44 +0800 (0:00:00.040) 0:02:57.894 ************
TASK [kubernetes/master : Create kubeadm token for joining nodes with 24h expiration (default)] ****************************************************************************************************************************************
Monday 16 May 2022 11:17:44 +0800 (0:00:00.047) 0:02:57.941 ************
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (5 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (4 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (3 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (2 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (1 retries left).
fatal: [pai-master -> 192.168.0.20]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["/usr/local/bin/kubeadm", "--kubeconfig", "/etc/kubernetes/admin.conf", "token", "create"], "delta": "0:01:15.022307", "end": "2022-05-16 11:25:40.956708", "msg": "non-zero return code", "rc": 1, "start": "2022-05-16 11:24:25.934401", "stderr": "timed out waiting for the condition", "stderr_lines": ["timed out waiting for the condition"], "stdout": "", "stdout_lines": []}
NO MORE HOSTS LEFT *********************************************************************************************************************************************************************************************************************
PLAY RECAP *****************************************************************************************************************************************************************************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
pai-master : ok=509 changed=17 unreachable=0 failed=1 skipped=509 rescued=0 ignored=0
pai-worker : ok=355 changed=12 unreachable=0 failed=0 skipped=293 rescued=0 ignored=0 `
from pai.
from pai.
I have not deleted the node and I have tried the command in the worker node, but the token creation failure error is still there.
from pai.
Your log is reported by the master node, so run this command on the master node.
from pai.
Your log is reported by the master node, so run this command on the master node.
Actually,I have run this command on all nodes(dev, master and worker).
from pai.
kubernetes/kubeadm#1447 (comment)
kubernetes-sigs/kubespray#5227
This thread may be useful for you.
from pai.
@chjm This may be useful for you.
from pai.
Related Issues (20)
- Can't redirect to k8s dashboard management page HOT 2
- Command 'PAI: Add PAI Cluster' resulted in an error (command 'paiext.cluster.add' not found) HOT 1
- Uninstall Pai service but interrupted by 'nvidia-device-plugin-daemonset' HOT 1
- can not use curl to access api, it returns"UnauthorizedUserError" HOT 2
- Feedback v1.7.0
- How to backup and restore user data stored by rest-server. HOT 5
- I followed the documentation to update the certificate and the cluster crashed. HOT 2
- does this project is still beyond develop? is there any roadmap about it? HOT 5
- Failing to Join to Cluster during deployment HOT 1
- Can't Install Docker Image HOT 1
- FAILED - RETRYING: ensure docker packages are installed.
- Memory cgroup out of memory will happen when training a job for few days HOT 5
- How to deploy custom plugins in the webprortal?
- docker pull cluster-proportional-autoscaler-amd64:1.6.0 error HOT 1
- Prebuilt docker image for aarch64 HOT 3
- Support for different hardware configurations for different task roles of one distributed job.
- Can not receive job status change message HOT 1
- Feedback v1.8.0
- If I want to use openpai to adapt to npu hardware, how should I configure it?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pai.