mayadata-io / cstorpoolauto Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 6.0 476 KB

Data Agility Operator for cstor pool

License: Apache License 2.0

Dockerfile 0.23% Makefile 0.10% Go 99.67%

cluster cstor kubernetes openebs pool

cstorpoolauto's People

Contributors

Stargazers

Watchers

Forkers

amitkumardas rahulchheda harshita-sharma011 shovanmaity slalwani97 atulabhi

cstorpoolauto's Issues

chore: rename package util to common

refer - https://github.com/mayadata-io/cstorpoolauto/tree/master/util

logic: cstorpoolauto/sync-cstorclusterconfig seems to be in constant reconciliation

Problem Statement: GenericController with name cstorpoolauto/sync-cstorclusterconfig is under constant reconciliation due to the current way of overriding the CStorClusterConfig specs with default values. This override triggers CStorClusterConfig into a new reconciliation loop & this goes on forever. Need to find a proper mechanism to stop this hot loop.

logic: use logger instance as a property to go structures

Using logger instance as a property helps in Unit Testing business methods of various structures. Alternatively we have to depend on similar mock logic during UT:

func TestPlannerIsReadyByNodeDiskCount(t *testing.T) {
	mockloginfo := &types.CStorClusterPlan{
		ObjectMeta: metav1.ObjectMeta{
			Name:      "test",
			Namespace: "test",
		},
	}
	var tests = map[string]struct {
		planner *Planner
		isReady bool
	}{
		"desired disk count == observed disk count": {
			planner: &Planner{
				storageSetToDesiredDiskCount: map[string]resource.Quantity{
					"101": resource.MustParse("1"),
				},
				storageSetToBlockDevices: map[string][]string{
					"101": []string{"bd1"},
				},
			},
			isReady: true,
		},
		"desired disk count > observed disk count": {
			planner: &Planner{
				// TODO (@amitkumardas):
				// 	Use log as a field in Planner
				CStorClusterPlan: mockloginfo,
				storageSetToDesiredDiskCount: map[string]resource.Quantity{
					"101": resource.MustParse("2"),
				},
				storageSetToBlockDevices: map[string][]string{
					"101": []string{"bd1"},
				},
			},
			isReady: false,
		},
		"desired disk count < observed disk count": {
			planner: &Planner{
				storageSetToDesiredDiskCount: map[string]resource.Quantity{
					"101": resource.MustParse("2"),
				},
				storageSetToBlockDevices: map[string][]string{
					"101": []string{"bd1", "bd2", "bd3"},
				},
			},
			isReady: true,
		},
	}
	for name, mock := range tests {
		name := name
		mock := mock
		t.Run(name, func(t *testing.T) {
			got := mock.planner.isReadyByNodeDiskCount()
			if got != mock.isReady {
				t.Fatalf("Want %t got %t", mock.isReady, got)
			}
		})
	}
}

bug: max pool count is not set

Max pool count field is always observed to be 0 as show below:

  spec:
    diskConfig:
      externalProvisioner:
        csiAttacherName: pd.csi.storage.gke.io
        storageClassName: csi-gce-pd
      minCapacity: 107374182400
      minCount: 2
    maxPoolCount: 0
    minPoolCount: 3
    poolConfig:
      raidType: mirror

Bump version of CStorPoolCluster: Add support create cStor pool with v1 schema

Cstorpoolauto has the ability to create a cStor pool with v1alpha1 schema. OpenEBS will be releasing v1 schema of the cStor pool. This issue is a feature request for creating a cStor pool using v1 schema.

chore: add various badges to the README

README can be added with various badges like code coverage, go report, license, etc.

Add validation for supported raid types

While provisioning a CStor Pool using local block devices if I passed a wrong raid type then it experiences panic.

log

mayadata.io/cstorpoolauto/unstruct.GetString
	/mayadata.io/cstorpoolauto/unstruct/util.go:97
mayadata.io/cstorpoolauto/unstruct.GetStringOrError
	/mayadata.io/cstorpoolauto/unstruct/util.go:69
mayadata.io/cstorpoolauto/util/cstorclusterconfig.(*Helper).GetRAIDType
	/mayadata.io/cstorpoolauto/util/cstorclusterconfig/helper.go:128
mayadata.io/cstorpoolauto/util/cstorclusterconfig.(*Helper).GetRAIDTypeOrCached
	/mayadata.io/cstorpoolauto/util/cstorclusterconfig/helper.go:147
mayadata.io/cstorpoolauto/controller/localdevice.(*Reconciler).setRAIDType
	/mayadata.io/cstorpoolauto/controller/localdevice/reconciler.go:274
mayadata.io/cstorpoolauto/controller/localdevice.(*Reconciler).Reconcile
	/mayadata.io/cstorpoolauto/controller/localdevice/reconciler.go:387
mayadata.io/cstorpoolauto/controller/localdevice.(*syncer).reconcile
	/mayadata.io/cstorpoolauto/controller/localdevice/reconciler.go:133
mayadata.io/cstorpoolauto/controller/localdevice.(*syncer).sync
	/mayadata.io/cstorpoolauto/controller/localdevice/reconciler.go:194
mayadata.io/cstorpoolauto/controller/localdevice.Sync
	/mayadata.io/cstorpoolauto/controller/localdevice/reconciler.go:231
openebs.io/metac/controller/generic.(*InlineHookInvoker).Invoke

logic: associating blockdevices with storage

This is the current logic that is followed by default openebs operators to create CStor Pools.

	err = blockdevice.
		BuilderForAPIObject(bdObj).
		BlockDevice.
		ValidateBlockDevice(
			blockdevice.CheckIfBDIsActive(),
			blockdevice.CheckIfBDIsNonFsType(),
			blockdevice.CheckIfBDBelongsToNode(poolValidator.nodeName))

However, cstorpoolauto does not make check for presence of file on the device. This issue should have the details if filesystem should be considered at all to select a device.

Add identity of CStorPoolCluster in CStor cluster config

CStorPool cluster identity is not present in CStor cluster config it will help us to clean up the associated resource.

logic: avoid prefixing to given name to derive a new name if original name is auto generated

There are cases, where names of a resource are derived from names of other resources. In such cases, it might be good to avoid prefixing values s.a namespace etc. This helps avoid growing the name to reach their max limit.

In addition, it will be good to verify if entire UID needs to be used while forming a name when the case demands the use of the UID?

Check how to make use of namespace & name of a resource to generate the name of another resource.
e.g. (a-b, c-d) (a-b-c, d) may generate same name of ns-name is used as a format to generate name for a resource.

bug: use hostname vs. nodename appropriately

CStorPoolAuto currently makes use of nodename & node uid. It needs to take into account both hostname as well as nodename based on the environments.

bug: filter blockdevices that do not have any filesystem on them

Following is an example of BlockDevice that is fit to be considered to participate in building a CStor Pool, since it does not have any filesystem.

  spec:
    capacity:
      logicalSectorSize: 512
      physicalSectorSize: 0
      storage: 107374182400
    details:
      compliance: ""
      deviceType: ""
      firmwareRevision: ""
      model: PersistentDisk
      serial: pvc-c1919d99-3295-11ea-a90b-42010a800016
      vendor: Google
    devlinks:
    - kind: by-id
      links:
      - /dev/disk/by-id/scsi-0Google_PersistentDisk_pvc-c1919d99-3295-11ea-a90b-42010a800016
      - /dev/disk/by-id/google-pvc-c1919d99-3295-11ea-a90b-42010a800016
    - kind: by-path
      links:
      - /dev/disk/by-path/virtio-pci-0000:00:03.0-scsi-0:0:30:0
    filesystem: {}
    nodeAttributes:
      nodeName: gke-amitd-dao-d-default-pool-fcc50975-fvsb
    partitioned: "No"
    path: /dev/sde
  status:
    claimState: Unclaimed
    state: Active

bug: cstorpoolcluster should get deleted on deleting of cstorcluserconfig

ProblemStatement: In case of local device based CStorClusterConfig, a CStorPoolCluster i.e. CSPC gets created when CStorClusterConfig is created in the K8s cluster. Similarly, when CStorClusterConfig gets deleted we expect CSPC to get deleted.

usecase: verify if disks are auto attached in case of node reboots

This is one of the SRE requirements to automatically attach disks when previously attached disks get detached due to a node reboot.

https://docs.google.com/document/d/1lECKO77fpOMRqE50xXFBkCtBlIYybv-bR-rLj5Pg5Kk/

Migration of volume replicas from one pool to another pool

Description:
I have cstorpoolcluster(CSPC) resource created on top of 3 nodes(That interns creates CSPI resources) and I deployed CSI-Volumes on top of above CSPC.

Now My Setup looks like:

kubectl get nodes
NAME                                        STATUS   ROLES    AGE    VERSION
gke-sai-test-cluster-pool-1-8d7defe8-37rr   Ready    <none>   102m   v1.14.8-gke.33
gke-sai-test-cluster-pool-1-8d7defe8-8471   Ready    <none>   102m   v1.14.8-gke.33
gke-sai-test-cluster-pool-1-8d7defe8-8nt0   Ready    <none>   102m   v1.14.8-gke.33
gke-sai-test-cluster-pool-1-8d7defe8-chws   Ready    <none>   102m   v1.14.8-gke.33

 kubectl get cspi -n openebs
NAME                     HOSTNAME                                    ALLOCATED   FREE    CAPACITY   STATUS   AGE
cstor-sparse-cspc-kdrs   gke-sai-test-cluster-pool-1-8d7defe8-8471   154K        9.94G   9.94G      ONLINE   13m
cstor-sparse-cspc-nb99   gke-sai-test-cluster-pool-1-8d7defe8-8nt0   158K        9.94G   9.94G      ONLINE   13m
cstor-sparse-cspc-twjx   gke-sai-test-cluster-pool-1-8d7defe8-chws   312K        9.94G   9.94G      ONLINE   13m

kubectl get cvr -n openebs
NAME                                                              USED   ALLOCATED   STATUS    AGE
pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-kdrs   6K     6K          Healthy   105s
pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-nb99   6K     6K          Healthy   105s
pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-twjx   6K     6K          Healthy   105s

Now I performed horizontal scaleup of CSPC which created CSPI on new nodes

 kubectl get cspi -n openebs
NAME                     HOSTNAME                                    ALLOCATED   FREE    CAPACITY   STATUS   AGE
cstor-sparse-cspc-kdrs   gke-sai-test-cluster-pool-1-8d7defe8-8471   161K        9.94G   9.94G      ONLINE   15m
cstor-sparse-cspc-kmt7   gke-sai-test-cluster-pool-1-8d7defe8-37rr   50K         9.94G   9.94G      ONLINE   42s
cstor-sparse-cspc-nb99   gke-sai-test-cluster-pool-1-8d7defe8-8nt0   161K        9.94G   9.94G      ONLINE   15m
cstor-sparse-cspc-twjx   gke-sai-test-cluster-pool-1-8d7defe8-chws   161K        9.94G   9.94G      ONLINE   15m

Scenario:
I want to remove the node gke-sai-test-cluster-pool-1-8d7defe8-chws from my cluster. I performed horizontally scaled of the pool(i.e removed the pool spec of the above node from CSPC), but before scaling down the pool from that node I want to move volume replicas on that pool to the different pool(which was newly created i.e cstor-sparse-cspc-kmt7). How I can achieve that without many manual steps.

I want volume replicas on below pools

kubectl get cvr -n openebs
NAME                                                              USED   ALLOCATED   STATUS    AGE
pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-kdrs   6K     6K          Healthy   105s
pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-nb99   6K     6K          Healthy   105s
pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-kmt7   6K     6K          Healthy   105s

In above migrated the volume replica from pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-twjx to pvc-3f83cac1-5f80-11ea-85dd-42010a800121-cstor-sparse-cspc-kmt7

Add validation for empty or nil selector

While provisioning CStor Pool using local device config, if I provide an empty or nil selector then it tries to create a pool using all the block devices present in the system. We will not face any problem for 1st pool problem will come when we want to create 2nd pool as some of the block devices are not eligible for provisioning CStor Pool so validation webhook will start giving error.

usecase: Filter Block devices on the basis of labels for a deployment

Simple scenario! I want to deploy Elasticsearch and i need to provision some volumes. Now i have the blockdevices ( say 5) and I need to create a storage pool claim that should use (let's say) 2 of the blockdevices. Now for doing so I need to manually look at the blockdevices to be included and add them in the blockdeviceslist. The simple case can be, if the ndm or ndm operator(just saying) through some means is able to put a label on the blockdevice which will match the labels provided in the storage pool claim for es. If both of them match, we need not to mention the block device list in the storage pool claim.

bug: scale down of pools does not work

Title: scaling down pools does not work in CStorClusterConfig
Recreation Steps: Create CStorClusterConfig with defaults auto set by controller. Edit the CStorClusterConfig with MinPoolCount set to 2.
Expect: CStor pools should scale down from 3 to 2.
Actual: CStor pools remain at 3
Other Details: There is a visible skew in MinPoolCount in last applied state vs. specs.

  basic > kubectl -n openebs get cstorclusterconfig -oyaml
apiVersion: v1
items:
- apiVersion: dao.mayadata.io/v1alpha1
  kind: CStorClusterConfig
  metadata:
    annotations:
      52234ef9-4702-11ea-a70d-42010a800115/gctl-last-applied: '{"apiVersion":"dao.mayadata.io/v1alpha1","kind":"CStorClusterConfig","metadata":{"name":"my-cstor-cluster","namespace":"openebs"},"spec":{"diskConfig":{"minCapacity":107374182400,"minCount":2},"maxPoolCount":0,"minPoolCount":3,"poolConfig":{"raidType":"mirror"}}}'
      52234ef9-4702-11ea-a70d-42010a800115/updated-due-to-watch: dao.mayadata.io-v1alpha1-CStorClusterConfig-openebs-my-cstor-cluster
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"dao.mayadata.io/v1alpha1","kind":"CStorClusterConfig","metadata":{"annotations":{},"name":"my-cstor-cluster","namespace":"openebs"},"spec":{"diskConfig":{"externalProvisioner":{"csiAttacherName":"pd.csi.storage.gke.io","storageClassName":"csi-gce-pd"}},"poolConfig":{"raidType":"mirror"}}}
    creationTimestamp: "2020-02-04T03:56:28Z"
    generation: 3
    name: my-cstor-cluster
    namespace: openebs
    resourceVersion: "23778614"
    selfLink: /apis/dao.mayadata.io/v1alpha1/namespaces/openebs/cstorclusterconfigs/my-cstor-cluster
    uid: 52234ef9-4702-11ea-a70d-42010a800115
  spec:
    diskConfig:
      externalProvisioner:
        csiAttacherName: pd.csi.storage.gke.io
        storageClassName: csi-gce-pd
      minCapacity: 107374182400
      minCount: 2
    maxPoolCount: 0
    minPoolCount: 2
    poolConfig:
      raidType: mirror
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

  basic > kubectl -n openebs get cstorclusterplan -oyaml
apiVersion: v1
items:
- apiVersion: dao.mayadata.io/v1alpha1
  kind: CStorClusterPlan
  metadata:
    annotations:
      52234ef9-4702-11ea-a70d-42010a800115/gctl-last-applied: '{"apiVersion":"dao.mayadata.io/v1alpha1","kind":"CStorClusterPlan","metadata":{"annotations":{"dao.mayadata.io/cstorclusterconfig-uid":"52234ef9-4702-11ea-a70d-42010a800115"},"name":"my-cstor-cluster","namespace":"openebs"},"spec":{"nodes":[{"name":"gke-amitd-dao-d-default-pool-fcc50975-wt3x","uid":"bb3318c7-1b11-11ea-a90b-42010a800016"},{"name":"gke-amitd-dao-d-default-pool-fcc50975-blz8","uid":"bb3c94a2-1b11-11ea-a90b-42010a800016"},{"name":"gke-amitd-dao-d-default-pool-fcc50975-fvsb","uid":"ba217ce0-1b11-11ea-a90b-42010a800016"}]}}'
      dao.mayadata.io/cstorclusterconfig-uid: 52234ef9-4702-11ea-a70d-42010a800115
      metac.openebs.io/created-due-to-watch: 52234ef9-4702-11ea-a70d-42010a800115
    creationTimestamp: "2020-02-04T03:56:28Z"
    generation: 1
    name: my-cstor-cluster
    namespace: openebs
    resourceVersion: "23774136"
    selfLink: /apis/dao.mayadata.io/v1alpha1/namespaces/openebs/cstorclusterplans/my-cstor-cluster
    uid: 5226123b-4702-11ea-a70d-42010a800115
  spec:
    nodes:
    - name: gke-amitd-dao-d-default-pool-fcc50975-wt3x
      uid: bb3318c7-1b11-11ea-a90b-42010a800016
    - name: gke-amitd-dao-d-default-pool-fcc50975-blz8
      uid: bb3c94a2-1b11-11ea-a90b-42010a800016
    - name: gke-amitd-dao-d-default-pool-fcc50975-fvsb
      uid: ba217ce0-1b11-11ea-a90b-42010a800016
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

bug: arrange raid groups based on raidtype & device count

Below CStorPoolCluster is invalid:

{
  "apiVersion": "openebs.io/v1alpha1",
  "kind": "CStorPoolCluster",
  "metadata": {
    "annotations": {
      "dao.mayadata.io/cstorclusterconfig-uid": "a037ab42-328f-11ea-a90b-42010a800016",
      "dao.mayadata.io/cstorclusterplan-uid": "a04e9fca-328f-11ea-a90b-42010a800016"
    },
    "name": "my-cstor-cluster",
    "namespace": "openebs"
  },
  "spec": {
    "pools": [
      {
        "nodeSelector": {
          "kubernetes.io/hostname": "gke-amitd-dao-d-default-pool-fcc50975-fvsb"
        },
        "poolConfig": {
          "compression": "off",
          "defaultRaidGroupType": "mirror",
          "overProvisioning": false
        },
        "raidGroups": [
          {
            "blockDevices": [
              {
                "blockDeviceName": "blockdevice-67614a80a8f70690ee3a0072b2f5b5c2"
              },
              {
                "blockDeviceName": "blockdevice-f81d2158d43e8aac671082989ad2d018"
              },
              {
                "blockDeviceName": "blockdevice-fd2e433b5e752b05f705fc3f8de5a8ae"
              },
              {
                "blockDeviceName": "blockdevice-ce33173d2f797b76c7428ec959f55759"
              }
            ],
            "isReadCache": false,
            "isSpare": false,
            "isWriteCache": false,
            "type": "mirror"
          }
        ]
      },
      {
        "nodeSelector": {
          "kubernetes.io/hostname": "gke-amitd-dao-d-default-pool-fcc50975-blz8"
        },
        "poolConfig": {
          "compression": "off",
          "defaultRaidGroupType": "mirror",
          "overProvisioning": false
        },
        "raidGroups": [
          {
            "blockDevices": [
              {
                "blockDeviceName": "blockdevice-c85cf467fb25844f9b91b3e3d613cbeb"
              },
              {
                "blockDeviceName": "blockdevice-bb15406735c513085ab2043ea7d7462c"
              },
              {
                "blockDeviceName": "blockdevice-a1423da7b823c4d38c16454970dc5884"
              },
              {
                "blockDeviceName": "blockdevice-a0109ab6e5f6fc7806d85b7feacd618c"
              }
            ],
            "isReadCache": false,
            "isSpare": false,
            "isWriteCache": false,
            "type": "mirror"
          }
        ]
      },
      {
        "nodeSelector": {
          "kubernetes.io/hostname": "gke-amitd-dao-d-default-pool-fcc50975-wt3x"
        },
        "poolConfig": {
          "compression": "off",
          "defaultRaidGroupType": "mirror",
          "overProvisioning": false
        },
        "raidGroups": [
          {
            "blockDevices": [
              {
                "blockDeviceName": "blockdevice-3ccc4adb622aadcfd83635ca0755c0a4"
              },
              {
                "blockDeviceName": "blockdevice-fa259c15050f113d5e8cb8addbcfe63b"
              },
              {
                "blockDeviceName": "blockdevice-093e5708e2ab5aabbc915b17f41cb7f9"
              },
              {
                "blockDeviceName": "blockdevice-70c78a9343fda5e435d1d74972dded7b"
              }
            ],
            "isReadCache": false,
            "isSpare": false,
            "isWriteCache": false,
            "type": "mirror"
          }
        ]
      }
    ]
  }
}

Error - E0109 07:21:30.485121 1 controller.go:352] WatchGCtl cstorpoolauto/sync-cstorpoolcluster: Failed to sync "dao.mayadata.io/v1alpha1:CStorClusterPlan:openebs:my-cstor-cluster": admission webhook "admission-webhook.openebs.io" denied the request: invalid cspc specification: invalid pool spec: number of block devices honouring raid type should be specified

use latest metac that has fixes for advanced selector

Using Deterministic Names to design cluster operators

Title - Using Deterministic Names to design cluster operators

Overview of solution:

Set name as <some prefix const>-<parent name>-<yyyymmdd>-<24hh>-<desired count index>

where:
yyyymmdd is the current day
24hh is current hour in 24 hour format
min is current minutes (not needed in above formula & hence not shown)

value of 24hh is (current hour); if current minutes > 30
value 24hh is (current hour - 1); if current minutes < 30

Will above be sufficient to avoid over provisioning of resources?

The solution will be effective if all the pod instances calculate & get the same name & clash due to same name & hence only one wins. It also seems that above solution can avoid use of leader election.

When caches are stale & apiserver gives these stale info via its informers, listers, then there is a high possibility that a single instance of pod can create duplicate resources if name of these resources are not deterministic (i.e. when controller logic creates resources using generateName property). However with above solution, controllers can handle stale cache for upto 30 minutes and still respond with proper number of desired resources. One may argue that, controller's logic should implement good reconciliation that should remove the faults eventually. In other words controller should be able to detect over provisioned resources & remove them once detected. However, there are cases, where creation of resource itself is very costly. This logic intends to solve the over provisioning defects at the creation time itself.

There should be other barriers like liveness, ready & startup probes. However these can't handle above logic single alone.