Giter Site home page Giter Site logo

pvmigrate's Introduction

pvmigrate

pvmigrate allows migrating PVCs between two StorageClasses by creating new PVs, copying over the data, and then changing PVCs to refer to the new PVs.

Preflight Validation

pvmigrate will run preflight migration validation to catch any potential failures prior to the migration.

Currently supported validations are:

  • Checking for existence of storage classes
  • Checking existing PVC access modes are supported on the destination storage provider

Examples

To migrate PVs from the 'default' StorageClass to mynewsc:

pvmigrate --source-sc "default" --dest-sc "mynewsc"

To run preflight migration validation without actually running the migration operation:

pvmigrate --source-sc "source" --dest-sc "destination" --preflight-validation-only

Flags

Flag Type Required Default Description
--source-sc String โœ“ storage provider name to migrate from
--dest-sc String โœ“ storage provider name to migrate to
--namespace String only migrate PVCs within this namespace
--rsync-image String eeacms/rsync:2.3 the image to use to copy PVCs - must have 'rsync' on the path
--rsync-flags String A comma-separated list of additional flags to pass to rsync when copying PVCs
--set-defaults Bool false change default storage class from source to dest
--verbose-copy Bool false show output from the rsync command used to copy data between PVCs
--skip-source-validation Bool false migrate from PVCs using a particular StorageClass name, even if that StorageClass does not exist
--preflight-validation-only Bool false skip the migration and run preflight validation only
--skip-preflight-validation Bool false skip preflight migration validation on the destination storage provider
--pod-ready-timeout Integer 60 length of time to wait (in seconds) for validation pod(s) to go into Ready phase
--delete-pv-timeout Integer 300 length of time to wait (in seconds) for backing PV to be removed when the temporary PVC is deleted

Annotations

kurl.sh/pvcmigrate-destinationaccessmode - Modifies the access mode of the PVC during migration. Valid options are - [ReadWriteOnce, ReadWriteMany, ReadOnlyMany]

Process

In order, it:

  1. Validates that both the source and dest StorageClasses exist
  2. Finds PVs using the source StorageClass
  3. Finds PVCs corresponding to the above PVs
  4. Creates new PVCs for each existing PVC
    • Uses the dest StorageClass for the new PVCs
    • Uses the access mode set in the annotation: kurl.sh/pvcmigrate-destinationaccessmode if specified on a source PVC
  5. For each PVC:
    • Finds all pods mounting the existing PVC
    • Finds all StatefulSets and Deployments controlling those pods and adds an annotation with the original scale before setting that scale to 0
    • Waits for all pods mounting the existing PVC to be removed
  6. For each PVC:
    • Creates a pod mounting both the original and replacement PVC which then rsyncs data between the two
    • Waits for that invocation of rsync to complete
  7. For each PVC:
    • Marks all the PVs associated with the original and replacement PVCs as 'retain', so that they will not be deleted when the PVCs are removed, and adds an annotation to the replacement PV with the original's reclaim policy
    • Deletes the original PVC so that the name is available, and removes the association between the PV and the removed PVC
    • Deletes the replacement PVC so that the PV is available, and removes the association between the PV and the removed PVC
    • Creates a new PVC with the original name, but associated with the replacement PV
    • Sets the reclaim policy of the replacement PV to be what the original PV was set to
    • Deletes the original PV
  8. Resets the scales of the affected StatefulSets and Deployments
  9. If --set-defaults is set, changes the default StorageClass to dest

Known Limitations

  1. If the migration process is interrupted in the middle, it will not always be resumed properly when rerunning. This should be rare, and please open an issue if it happens to you
  2. All pods are stopped at once, instead of stopping only the pods whose PVCs are being migrated
  3. PVCs are not migrated in parallel
  4. Constructs other than StatefulSets and Deployments are not handled (for instance, DaemonSets and Jobs), and will cause pvmigrate to exit with an error
  5. Pods not controlled by anything are not handled, and will cause pvmigrate to exit with an error
  6. PVs without associated PVCs are not handled, and will cause pvmigrate to exit with an error
  7. PVCs that are only available on one node (or some subset of nodes) may not have their migration pod run on the proper node, which would result in the pod never starting and pvmigrate hanging forever
  8. If the default StorageClass is sc3, migrating from sc1 to sc2 with --set-defaults will not change the default and will return an error.

pvmigrate's People

Contributors

camilamacedo86 avatar clementnuss avatar czomo avatar dependabot[bot] avatar diamonwiggins avatar emosbaugh avatar laverya avatar ricardomaraschini avatar rrpolanco avatar sgalsaleh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pvmigrate's Issues

Add option to transfer specific PersistentVolumes

The future kubernetes architecture will be working using drivers to provision volumes. aws-ebs-csi-driver for example adds new storage class, csi-driver, create proxy for old pv created by kubernetes.io/aws-ebs but lack of script that ppl could migrate from. It seems that pvmigrate is the solution for that use case(maybe some colaboration).
This is feature request to add granular selector for PV. Ideally it would be to also implement --dry-run option
See kubernetes-sigs/aws-ebs-csi-driver#1287
laverya Wdyt?

Build arm64 assets

Thanks for this plugin!

Would it be possible to build arm64 assets please. For Apple silicon, the amd64 build will work but it cannot be installed with krew.

โฏ kubectl krew install pvmigrate
Updated the local copy of plugin index.
Installing plugin: pvmigrate
W1126 18:27:46.696634   59168 install.go:164] failed to install plugin "pvmigrate": plugin "pvmigrate" does not offer installation for this platform
failed to install some plugins: [pvmigrate]: plugin "pvmigrate" does not offer installation for this platform

Add option to specify a Pod's PriorityClass

I had to manually hardcode a priorityClassName in the code, as our cluster otherwise preempts the Pod during the rsync process.
It would be great if you could add an option to set that, or maybe even better: allow to specify a YAML template with all the bells and whistles that a Pod needs (labels, annotations, etc.) to be deployed successfully.

Add option for size of the preflight PVC

I'm currently migrating some manual iSCSI PVs to Democratic-CSI managed iSCSI PVs, one of the problems I've hit is that 1Mi is too small for xfs filesystems to format correctly with the block size setup I have.

Building a custom version of pvmigrate with the PVC size bumped to 100Mi worked. It would be nice if we had a argument to pass in the preflight PVC size to avoid this issue.

Sending SIGINT while the tool checks for migration Pod readiness causes infinity loop

Currently, if the user sends a SIGINT (Ctrl+C) to the process while it is waiting for the migration Pod to become Ready, the process will enter an infinity loop, because method:

1. It will try to get the Pod from the API Server
2. The client will return an error because the context is expired (due to SIGINT).
3. It will continue to the next iteration of the loop.

The body of the method copyOnePVC() should check if the context has expired and return in immediately an error to the caller, immediately so that the script exits immediately.

Handle long named PVs better

kube-prometheus-stack Helm chart produces some stupid naming conventions, that fly very close to the sun regarding name length. Running pvmigrate throws an error as the new PVC name goes over the max length.

It would be a nice to have for pvmigrate to handle these situations and maybe derive a shorter name for the migration PVC

Running pvmigrate build:
version=v0.8.0-dirty
sha=33bbb070fd5685c16b544c84f7ec8e7398c8164d
time=2023-02-25T11:48:51Z

...

Found 1 matching PVCs to migrate across 1 namespaces:
namespace: pvc:                                                                                         pv:                                      size:
prometheus prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 pvc-24dd3b63-f891-413c-9a33-3acaefe5c0e0 20Gi
prometheus kube-prometheus-stack-grafana                                                                pvc-aa1153a7-dc83-445e-95fa-4eaf87eeffd4 250Mi
prometheus prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-1 pvc-fcf77baf-7c4f-4be9-916f-0206dd7e0392 500Gi

Creating new PVCs to migrate data to using the nfs-client-standard StorageClass
migration failed: failed to create new PVC prometheus-kube-prometheus-stacus-stack-prometheus-0-pvcmigrate in prometheus: PersistentVolumeClaim "prometheus-kube-prometheus-stacus-stack-prometheus-0-pvcmigrate" is invalid: metadata.labels: Invalid value: "prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0": must be no more than 63 characters

"Object is being deleted" Error during migration

Description

When trying to migrate from Rook to Longhorn, I get the following error which looks like a race condition in deleting the PVC and re-creating:

Swapping PVC kotsadm-postgres-kotsadm-postgres-0 in default to the new StorageClass
Marking original PV pvc-307e0664-39e9-4153-baf2-6696953401c8 as to-be-retained
Marking migrated-to PV pvc-da1fac2b-13df-46f0-91d3-99b813c379dc as to-be-retained
Deleting original PVC kotsadm-postgres-kotsadm-postgres-0 in default to free up the name
Deleting migrated-to PVC kotsadm-postgres-kotsadm-postgres-0 in default to release the PV
Removing claimref from original PV pvc-307e0664-39e9-4153-baf2-6696953401c8
Removing claimref from migrated-to PV pvc-da1fac2b-13df-46f0-91d3-99b813c379dc
Creating new PVC kotsadm-postgres-kotsadm-postgres-0 with migrated-to PV pvc-da1fac2b-13df-46f0-91d3-99b813c379dc
failed to swap PVs for PVC kotsadm-postgres-kotsadm-postgres-0 in default: failed to create migrated-to PVC kotsadm-postgres-kotsadm-postgres-0 in default: object is being deleted: persistentvolumeclaims "kotsadm-postgres-kotsadm-postgres-0" already exists

How to Repro

Environment:
Ubuntu 20.04

Used this rook installer as the base: https://kurl.sh/a60de60

Then tried to migrate to this installer (note that 1.21.8 hasn't been merged yet, but I imagine any 1.21 would produce similar results)

apiVersion: cluster.kurl.sh/v1beta1
kind: Installer
metadata: 
  name: latest
spec: 
  kubernetes: 
    version: 1.21.8
  weave: 
    version: 2.6.5
  registry: 
    version: 2.7.1
  containerd: 
    version: 1.4.9
  longhorn:
    version: 1.2.2
  velero:
    version: 1.7.1
  kotsadm:
    version: 1.59.0
    disableS3: true

Suggested Fix:

This seems to be in the neighborhood of the pending delete

What RBAC roles does pvmigrate need?

What is the minimum set of RBAC permissions for pvmigrate to complete?

My initial expectation is storageclass list/edit (to determine if the storageclass exists and change defaults), PVC get/list/create/edit/delete (for the migration itself), pod get/list/create/delete (to move data), deployment/statefulset/replicaset get/list/edit (for scaling things down), but having this available as a ClusterRole yaml would be convenient.

Should these permissions be tested for ahead of time?

Allow setting annotations on migrate pod

When using this on my cluster, it crashed at one point because the migration pod got additional containers from linkerd, which was configured to auto-inject on the given namespace.

It would be great to allow setting annotations on the migration pod to disable this behavior. In my case, I temporarily disabled linkerd injection on the namespace to get this to work.

Add option to exclude a path and/or ignore rsync errors

Our storage relies on netapp volumes, that all contain a directory .snapshot that pvmigrate cannot rsync because this directory is not writable - so it fails the whole volume migration.
We have tried to pass an exclusion argument to rsync, but it's not possible on a standalone pod.

I suggest to add both options :

  • --exclude_path string
  • --ignore_rsync_errors boolean
$ kubectl pvmigrate --source-sc basic --dest-sc netapp-hdd --namespace monitoring --verbose-copy --rsync-image docker.io/eeacms/rsync:latest 
Running pvmigrate build:
version=0.5.0
sha=cdab40689296764e5d39b9d9e60658f4cf6bd275
time=2022-07-14T20:12:11Z
[...]
Copying data from basic PVCs to netapp-hdd PVCs
Copying data from storage-loki-0 (pvc-13328fb0-ae67-4954-bc87-580031dd387e) to storage-loki-0-pvcmigrate in monitoring
waiting for pod migrate-storage-loki-0 to start in monitoring
got status Failed for pod migrate-storage-loki-0, this is likely an error
got status Failed for pod migrate-storage-loki-0, this is likely an error
[...]
NAME                                   READY   STATUS    RESTARTS       AGE    LABELS
migrate-storage-loki-0                 0/1     Error     0              3m7s   kurl.sh/pvcmigrate=storage-loki-0
$ kubectl logs pod/migrate-storage-loki-0
sending incremental file list
rsync: failed to set times on "/dest/.snapshot": Read-only file system (30)
.snapshot/
$ kubectl patch pod/migrate-storage-loki-0 -n monitoring --type "json" -p '[{"op":"add","path":"/spec/containers/0/args/-","value":"--exclude=.snapshot"}]'
The Pod "migrate-storage-loki-0" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.tolerations` (only additions to existing tolerations) or `spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
$ kubectl exec -t pod/migrate-storage-loki-0 -- rsync -a -v -P --delete --exclude=.snapshot /source/ /dest
error: cannot exec into a container in a completed pod; current phase is Failed

failed to create new PVC must be no more than 63 characters

Hello, i am get error:

failed to create new PVC prometheus-prometheus-operator-kube-p-prometheus-db-prometheus-prometheus-operator-kube-p-prometheus-0-pvcmigrate in monitoring: PersistentVolumeClaim "prometheus-prometheus-operator-kube-p-prometheus-db-prometheus-prometheus-operator-kube-p-prometheus-0-pvcmigrate" is invalid: metadata.labels: Invalid value: "prometheus-prometheus-operator-kube-p-prometheus-db-prometheus-prometheus-operator-kube-p-prometheus-0": must be no more than 63 characters

Running pvmigrate build:
version=0.4.1

Specify container of migrate pod to exec in

When using this on my cluster, it crashed at one point because the migration pod additional containers from linkerd. Besides allowing to add annotations to the migration pod (#18), it would probably be good to specify the container to exec rsync in.

Error was the "you have to specify a container" one, which happens when doing a kubectl exec $pod on a pod with multiple containers without specifying the target container. I sadly don't have any logs saved ._.

I wrote two issues because it's two different problems, but I think this here is the root cause to fix in my setup, while the other (#18) might be important for other setups.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.