Comments (5)
If the db file not be deleted, you can recover the data. Here is a guide for this. https://openpai.readthedocs.io/en/latest/manual/cluster-admin/troubleshooting.html#how-to-solve-the-problem
@hzy46 Can you help to take a look?
from pai.
If the db file not be deleted, you can recover the data. Here is a guide for this. https://openpai.readthedocs.io/en/latest/manual/cluster-admin/troubleshooting.html#how-to-solve-the-problem @hzy46 Can you help to take a look?
User data doesn't seem to be stored here, job data is stored here.
After I reset and installed the cluster, the job data still existed, but the user data was gone, including username, password, e-mail, SSH public Keys et. al.
from pai.
I see that user information and group information are stored in the Secret, so now the problem seems to be how to backup and restore the Secret of k8s.
from pai.
You are right, if you delete the data file fot etcd, then user/group info will be lost. We need to dump secrets first then apply them to the new cluster
from pai.
So running the following command will reset the cluster, but all etcd data will be lost, please be careful.
ansible-playbook -i inventory/pai/hosts.yml -e "ansible_python_interpreter=/usr/bin/python3" reset.yml --become --become-user=root -e "@inventory/pai/openpai.yml"
from pai.
Related Issues (20)
- Errors occurred after Are your cluster is in Azure cloud or not? HOT 10
- Can't redirect to k8s dashboard management page HOT 2
- Command 'PAI: Add PAI Cluster' resulted in an error (command 'paiext.cluster.add' not found) HOT 1
- Uninstall Pai service but interrupted by 'nvidia-device-plugin-daemonset' HOT 1
- can not use curl to access api, it returns"UnauthorizedUserError" HOT 2
- Feedback v1.7.0
- I followed the documentation to update the certificate and the cluster crashed. HOT 2
- does this project is still beyond develop? is there any roadmap about it? HOT 5
- Failing to Join to Cluster during deployment HOT 1
- Can't Install Docker Image HOT 1
- FAILED - RETRYING: ensure docker packages are installed.
- Memory cgroup out of memory will happen when training a job for few days HOT 5
- How to deploy custom plugins in the webprortal?
- docker pull cluster-proportional-autoscaler-amd64:1.6.0 error HOT 1
- Prebuilt docker image for aarch64 HOT 3
- Support for different hardware configurations for different task roles of one distributed job.
- Can not receive job status change message HOT 1
- Feedback v1.8.0
- If I want to use openpai to adapt to npu hardware, how should I configure it?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pai.