Giter Site home page Giter Site logo

embercsi / ember-csi-operator Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 6.0 47.27 MB

Operator to create/configure/manage Ember CSI Driver atop Kubernetes/OpenShift

License: Apache License 2.0

Go 65.44% Dockerfile 0.32% Makefile 1.16% Shell 3.94% Python 29.14%

ember-csi-operator's People

Contributors

akrog avatar cschwede avatar danielerez avatar irosenzw avatar kirankt avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

ember-csi-operator's Issues

Image build fails using Docker 1.13.1

Building the operator on RHEL 7.6 with Docker 1.3.11 fails. The reason is due to Dockerfile's COPY command not recognizing the '--from' flag. Perhaps this was introduced in the later versions of Docker?

--
Step 13/13 : COPY --from=0 /go/src/github.com/embercsi/ember-csi-operator/build/ember-csi-operator /usr/local/bin/ember-csi-operator
Unknown flag: from
make: *** [build] Error 1

[root@kt-c7kb7 ember-csi-operator]# docker version
Client:
Version: 1.13.1
API version: 1.26
Package version: docker-1.13.1-88.git07f3374.el7.x86_64
Go version: go1.10.2
Git commit: 07f3374/1.13.1
Built: Thu Dec 6 07:01:49 2018
OS/Arch: linux/amd64

Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: docker-1.13.1-88.git07f3374.el7.x86_64
Go version: go1.10.2
Git commit: 07f3374/1.13.1
Built: Thu Dec 6 07:01:49 2018
OS/Arch: linux/amd64
Experimental: false
[root@kt-c7kb7 ember-csi-operator]# uname -a
Linux kt-c7kb7.cloud.lab.eng.bos.redhat.com 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@kt-c7kb7 ember-csi-operator]# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.6"
PRETTY_NAME="Employee SKU"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.6:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.6"

Operator form incorrectly handling Float values

The current operator form parses float values in driver configuration as text, which means they reach as str to Ember-CSI code, which breaks drivers that use float's in their configuration and assume (reasonably) that they receive them as float and not str.

As an example: Issue #166 in the Ember-CSI repository that fails because of the vmware_task_poll_interval configuration option.

The reason why we were converting to text is the OLM not having a float type. It only has the number type and using it for floats will truncate the number and removes the decimal portion of the number.

A possible solution is to add a new transform function on the operator to the existing ones that transforms from text to a float and then
add this new transform to the generator and then use it on the floats configuration options section.

That way Ember-CSI will be able to receive a float instance instead of a string.

Images are based on backend names

Current code maps container images to be used using the backend name as the key:

    Image:   Conf.getDriverImage(ecsi.Spec.Backend),

It should be based on the driver to be used, as some drivers may need a custom container with specific packages.

Allow pinning the controller

For the LVM backend we need to be able to pin the controller pod to a specific node.
The operator should support this.

Missing mountpoints

There are a couple of mountpoints that are missing on the nodes:

  • /var/lib/iscsi
  • /run/udev

We need these among other things to share the iscsi nodes with the host, which will ensure we'll still see them after the container restarts.

Use CRD as the default persistence

Instead of forcing every deployment to define CRD as the default persistence, we should be setting this as a default in the operator.

Since the operator is storing data in the "ember-csi" namespace, we know that it exists, so we can tell Ember-CSI to use it instead of the default one: X_CSI_PERSISTENCE_CONFIG: '{"storage":"crd","namespace":"ember-csi"}'

Stop using Kirant's images

There are multiple places in the repository where we use kirant registry.
We should change this to Docker Hub's embercsi:

In the Makefile we have REPO?=quay.io/kirankt/ember-csi-operator
Same thing in install: quay.io/kirankt/ember-csi-operator:v0.0.3
And in the README file: quay.io/kirankt/ember-csi-operator:0.0.3

The same image is available as embercsi/ember-csi-operator:v0.0.3

Operator fails if X_CSI_PERSISTENCE_CONFIG is not set

The operator should use a default if X_CSI_PERSISTENCE_CONFIG is not set. However, the default includes some invalid JSON and therefore it fails with the following error:

E0203 08:41:07.939574 1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha1.EmberCSI: v1alpha1.EmberCSIList.Items: []v1alpha1.EmberCSI: v1alpha1.EmberCSI.Spec: v1alpha1.EmberCSISpec.Config: v1alpha1.EmberCSIConfig.EnvVars: v1alpha1.EnvVars.X_CSI_BACKEND_CONFIG: ReadString: expects " or n, but found {, error found in #10 byte of ...|_CONFIG":{"driver":"|..., bigger context ...|ec":{"config":{"envVars":{"X_CSI_BACKEND_CONFIG":{"driver":"RBD","name":"rbd","rbd_ceph_conf":"/etc/|...

Operator must handle RBAC, ServiceAccounts

Currently, we are piggy-backing on the 'ember-csi-operator' and its really wide-open RBAC to get a working deployment. Ideally the 'ember-csi-operator' must only have permissions to handle the Operator itself and the Operator must create the necessary RBAC and service accounts dynamically for each deployment.

serviceaccount "ember-csi-operator" not found

Hi @kirankt ,

Following document: https://github.com/kirankt/ember-csi-operator , external-ceph-node pod and external-ceph-controller can't be created.

[cloud-user@cnv-executor-qwang-1016-master1 ember-csi-operator]$ oc get all
NAME                                      READY     STATUS    RESTARTS   AGE
pod/ember-csi-operator-59dbb585db-ckwd4   1/1       Running   0          47m

NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/ember-csi-operator   ClusterIP   172.30.246.142   <none>        60000/TCP   47m

NAME                                DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/external-ceph-node   0         0         0         0            0           <none>          4m

NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ember-csi-operator   1         1         1            1           47m

NAME                                            DESIRED   CURRENT   READY     AGE
replicaset.apps/ember-csi-operator-59dbb585db   1         1         1         47m

NAME                                        DESIRED   CURRENT   AGE
statefulset.apps/external-ceph-controller   1         0         4m


[cloud-user@cnv-executor-qwang-1016-master1 ember-csi-operator]$ oc describe daemonset.apps/external-ceph-node | grep -A10 Events
Events:
  Type     Reason        Age                From                  Message
  ----     ------        ----               ----                  -------
  Warning  FailedCreate  1m (x20 over 12m)  daemonset-controller  Error creating: pods "external-ceph-node-" is forbidden: error looking up service account ember-csi/ember-csi-operator: serviceaccount "ember-csi-operator" not found

[cloud-user@cnv-executor-qwang-1016-master1 ember-csi-operator]$ oc describe statefulset.apps/external-ceph-controller | grep -A10 Events
Events:
  Type     Reason        Age                 From                    Message
  ----     ------        ----                ----                    -------
  Warning  FailedCreate  54s (x37 over 11m)  statefulset-controller  create Pod external-ceph-controller-0 in StatefulSet external-ceph-controller failed error: pods "external-ceph-controller-0" is forbidden: error looking up service account ember-csi/ember-csi-operator: serviceaccount "ember-csi-operator" not found

[cloud-user@cnv-executor-qwang-1016-master1 ember-csi-operator]$ oc get sa
NAME                SECRETS   AGE
builder             2         50m
csi-controller-sa   2         50m
csi-node-sa         2         50m
default             2         50m
deployer            2         50m

Ember-CSI pod should share IPC with the host

When using multipath, if we don't share the IPC of the Ember-CSI container running on the nodes, we will have problems detaching the volumes, and it will take 5 minutes for the operation to "complete".

RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /var/lib/kubelet/plugins/ember-csi.io/csi.sock, err: rpc error: code = Unimplemented desc = Method not found!

As Reported on Ember-CSI's repository (embercsi/ember-csi#153), on Kubernetes nodes since v1.15 we've been starting to see these continual errors:

Jan 19 13:19:21 kube23.foo.com kubelet: I0119 13:19:21.844405 2107 operation_generator.go:193] parsed scheme: ""
Jan 19 13:19:21 kube23.foo.com kubelet: I0119 13:19:21.844431 2107 operation_generator.go:193] scheme "" not registered, fallback to default scheme
Jan 19 13:19:21 kube23.foo.com kubelet: I0119 13:19:21.844449 2107 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/ember-csi.io/csi.sock 0 }] }
Jan 19 13:19:21 kube23.foo.com kubelet: I0119 13:19:21.848196 2107 clientconn.go:577] ClientConn switching balancer to "pick_first"
Jan 19 13:19:21 kube23.foo.com kubelet: E0119 13:19:21.850493 2107 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins/ember-csi.io/csi.sock" failed. No retries permitted until 2020-01-19 13:21:23.850457201 -0500 EST m=+1109229.205664694 (durationBeforeRetry 2m2s). Error: "RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /var/lib/kubelet/plugins/ember-csi.io/csi.sock, err: rpc error: code = Unimplemented desc = Method not found!"

This is on a system that currently doesn't have any pvc mounts. Rebooting the system will clear it up for a time but then it will start to happen again.

Kubernetes 1.15 introduced a change in the plugin-manager that keeps retrying till all the plugins (socket files under /var/lib/kubelet/{plugins,plugin_registry}) registrations succeed. That didn't use to be the case in previous releases, where we would only see the error once.

We need to change the location of the CSI socket used for communication between the CSI plugin and its sidecars.

The operator is creating the volume for the nodes as in Ember-CSI's examples https://github.com/embercsi/ember-csi-operator/blame/8c6a833acb97d433b6c1841318ace989b2fe0250/pkg/controller/embercsi/node.go#L156:

fmt.Sprintf("%s/%s/%s", "--kubelet-registration-path=/var/lib/kubelet/plugins", GetPluginDomainName(ecsi.Name), "csi.sock"),

And like the doc's manifest at https://kubernetes-csi.github.io/docs/deploying.html#driver-volume-mounts:

      # This volume is where the socket for kubelet->driver communication is done
      - name: socket-dir
        hostPath:
          path: /var/lib/kubelet/plugins/<driver-name>
          type: DirectoryOrCreate

Though the picture (https://kubernetes-csi.github.io/docs/images/kubelet.png) refers to a different directory: /var/lib/kubelet/<driver-name>

I believe we should be changing the nodes manifest to avoid these errors.

We can use an empty dir, or we can create the sockets under /var/lib/kubelet/.

Operator's clusterrole must be able to manipulate 'csinode' resource in storage.k8s.io API group

Kubelet complains that the operator's serviceaccount cannot query/update the 'csinode' resource in the 'storage.k8s.io' API group. Kubelet error:

Aug 28 22:11:16 node1.example.com dockerd-current[1274]: E0829 02:11:16.820017 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:serviceaccount:ember-csi:ember-csi-operator" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope

Fix should be easy, with the addition of csinode in the 'ember-csi-operator' clusterrole.

Don't mount LVM directories for other drivers

We don't need to mount the LVM directories if we are not using the LVM driver:

		},{
			MountPath: "/etc/lvm",
			Name: "lvm-dir",
			MountPropagation: &bidirectional,
		},{
			MountPath: "/var/lock/lvm",
			Name: "lvm-lock",
			MountPropagation: &bidirectional,
		},{

Operator not reconciling StorageClass and VolumeSnapshotClass

If the StorageClass and VolumeSnapshotClass objects are removed after they are initially created, they are not being re-created by the Operator. However, if either the StatefulSet or DaemonSet is removed, these get recreated immediately in addition to any missing StorageClass and VolumeSnapshotClass objects.

The correct Operator behaviour should be to immediately recreate StorageClass and VolumeSnapshotClass after they are removed.

Missing directory /var/lib/ember-csi/vols

Commit cd81ebc added support for the shared lock directory, however this requires an additional subdirectory or pods will fail to start:

Warning FailedMount 51s (x8 over 2m) kubelet, node01 MountVolume.MountDevice failed for volume "pvc-67f73de2208311e9" : rpc error: code = Unknown desc = Exception calling application: [Errno 2] No such file or directory: u'/var/lib/ember-csi/vols/466e0060-a518-4f57-9f2a-cbe95c9629de'

After creating /var/lib/ember-csi/vols on the node01 it is working.

Remove deprecated parameters

There are some deprecated parameters used, for example when using the external-provisioner sidecar:

Warning: option provisioner="my-ceph.ember-csi.io" is deprecated and has no effect

Also: connection-timeout when using the external-attacher sidecar >= 2.0

Volume mount fails on OCP 4.2 due to missing directory

This is due to not using var/lib/kubelet/plugins/ on OCP >= 4.2.

MountVolume.MountDevice failed for volume "pvc-8ba199ce-41f1-11ea-9720-52fdfc072182" : rpc error: code = InvalidArgument desc = Parent staging directory for /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ba199ce-41f1-11ea-9720-52fdfc072182/globalmount/stage doesn't exist: [Errno 2] No such file or directory: '/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-8ba199ce-41f1-11ea-9720-52fdfc072182/globalmount'

Deployment fails when host is missing /etc/localtime

According to the Ember-CSI reported issue #167 the Operator's deployment will fail if the host is missing file /etc/localtime.

We assumed that all hosts would have that file, but it looks like in some cases OKD 4.5 beta (Fedora CoreOS) is actually missing it:

$ ls /etc/localtime
ls: cannot access '/etc/localtime': No such file or directory

We'll see the following misleading error:

Error: container create failed: time="2020-06-24T06:40:26Z" level=warning msg="exit status 1" time="2020-06-24T06:40:26Z" level=error msg="container_linux.go:349: starting container process caused \"process_linux.go:449: container init caused \\\"rootfs_linux.go:58: mounting \\\\\\\"/etc/localtime\\\\\\\" to rootfs \\\\\\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged\\\\\\\" at \\\\\\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged/usr/share/zoneinfo/UTC\\\\\\\" caused \\\\\\\"not a directory\\\\\\\"\\\"\"" container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/localtime\\\" to rootfs \\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged\\\" at \\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged/usr/share/zoneinfo/UTC\\\" caused \\\"not a directory\\\"\""

Deleting volume /etc/localtime from StatefulSet and DaemonSet will solve the problem, but that's not a convenient way of doing it.

The operator should have a way to disable the mounting of the host file for systems that don't have it, even if then the logs will not have the right timestamps.

Automatically build images

Images should be automatically build on push to master and when tagging versions.

This will be easier once we resolve issue #9, since we can setup Docker Hub to do this automatically.

Change service names

For rolling upgrades/updates we'll need to have multiple instances of Ember-CSI running against the same backend, so we cannot have a fixed name for the StatefulSet and the DaemonSet of each backend, as that would create a conflict in Kubernetes.

The name could either use the hash of the image that's going to be used, some metadata from the image, or the vendor_version that will be reported by Ember-CSI once it has a finer grained vendor_version (embercsi/ember-csi#108).

Update socket-dir

Originally socket-dir for the pods had to be on /var/lib/kubelet/plugins/<driver-name>/csi.sock, as described in the documentation, but this is no longer the case. Now sidecars accept an argument with the address to use to connect to the CSI plugin, so we can use an emptyDir instead of a fixed path.

This will be required to be able to do rolling upgrades, as we'll have multiple instances of the plugin running simultaneous while we update/upgrade.

We should update the operator to use emptyDir when the sidecar containers allow it, to support rolling upgrades.

Graceful shutdown

We need to ensure operator actions have been finished successfully before shutting down the operator, eg. if receiving a SIGTERM.

Todo:

  1. Send a notification to a channel once all operations are done
  2. Add a goroutine & channels to block exiting until notification sent
  3. Test timeouts with OLM and adopt them as needed

Node pod is not created

The node pod is not created, likely due to a wrong HostIPC setting.

For example, when creating a pod using a pvc it will fail with the following error:

kubectl -n sample-project describe pod/busybox-sleep
[...]
AttachVolume.Attach failed for volume "pvc-a2a48c97208111e9" : node "node01" has no NodeID annotation

Looking at the pods running in ember-csi, the "external-ceph-node-*" is missing:

kubectl -n ember-csi get pods
NAME READY STATUS RESTARTS AGE
ember-csi-operator-68844f4988-qfjd6 1/1 Running 0 4m
external-ceph-controller-0 3/3 Running 0 3m

And looking at the DaemonSet it fails due to HostIPC not being allowed:

kubectl -n ember-csi describe daemonset

Warning FailedCreate 59s (x16 over 3m) daemonset-controller Error creating: pods "external-ceph-node-" is forbidden: unable to validate against any security context constraint: [...] spec.containers[0].securityContext.hostIPC: Invalid value: true: Host IPC is not allowed to be used

error: unable to recognize "deploy/examples/lvmdriver.yaml": no matches for kind "EmberCSI" in version "ember-csi.io/v1alpha1"

I am following the README in the devel branch and i am unable to procceed with deployment when attempting

oc create -f deploy/examples/lvmdriver.yaml
error: unable to recognize "deploy/examples/lvmdriver.yaml": no matches for kind "EmberCSI" in version "ember-csi.io/v1alpha1"
oc version 
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.5.0-0.nightly-2020-05-17-163339
Kubernetes Version: v1.18.2+ee20b51

deploy/uninstall.yml needs update

Hi @kirankt ,

Ember CSI operator related resources can't be removed by uninstall.yml. Did I use it correctly?

[cloud-user@cnv-executor-qwang-1016-master1 ember-csi-operator]$ oc create -f deploy/uninstall.yml -n ember-csi
Error from server (Invalid): error when creating "deploy/uninstall.yml": Deployment.apps "ember-csi-operator" is invalid: [spec.selector: Required value, spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels`, spec.template.spec.containers: Required value]
Error from server (Invalid): error when creating "deploy/uninstall.yml": Service "ember-csi-operator" is invalid: spec.ports: Required value
Error from server (Invalid): error when creating "deploy/uninstall.yml": CustomResourceDefinition.apiextensions.k8s.io "embercsis.ember-csi.io" is invalid: [metadata.name: Invalid value: "embercsis.ember-csi.io": must be spec.names.plural+"."+spec.group, spec.group: Required value, spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion(nil): must have exactly one version marked as storage version, spec.names.plural: Required value, spec.names.singular: Required value, spec.names.kind: Required value, spec.names.listKind: Required value, status.storedVersions: Invalid value: []string(nil): must have at least one stored version]
Error from server (AlreadyExists): error when creating "deploy/uninstall.yml": roles.rbac.authorization.k8s.io "ember-csi-operator" already exists
Error from server (Invalid): error when creating "deploy/uninstall.yml": RoleBinding.rbac.authorization.k8s.io "ember-csi-operator-rb" is invalid: [roleRef.kind: Unsupported value: "": supported values: "Role", "ClusterRole", roleRef.name: Required value]
Error from server (AlreadyExists): error when creating "deploy/uninstall.yml": clusterroles.rbac.authorization.k8s.io "ember-csi-controller-cr" already exists
Error from server (AlreadyExists): error when creating "deploy/uninstall.yml": clusterrolebindings.rbac.authorization.k8s.io "ember-csi-controller-rb" already exists
Error from server (AlreadyExists): error when creating "deploy/uninstall.yml": clusterroles.rbac.authorization.k8s.io "ember-csi-node-cr" already exists
Error from server (AlreadyExists): error when creating "deploy/uninstall.yml": clusterrolebindings.rbac.authorization.k8s.io "ember-csi-node-rb" already exists
Error from server (AlreadyExists): error when creating "deploy/uninstall.yml": serviceaccounts "csi-controller-sa" already exists
Error from server (AlreadyExists): error when creating "deploy/uninstall.yml": serviceaccounts "csi-node-sa" already exists
Error from server (Invalid): error when creating "deploy/uninstall.yml": SecurityContextConstraints.security.openshift.io "ember-csi-scc" is invalid: [runAsUser.type: Invalid value: "": invalid strategy type.  Valid values are MustRunAs, MustRunAsNonRoot, MustRunAsRange, RunAsAny, seLinuxContext.type: Invalid value: "": invalid strategy type.  Valid values are MustRunAs, RunAsAny]

StorageBackend vSphere Failed as Operator in OKD 4.5

OKD 4.5 beta (Fedora CoreOS)
Ember-csi operator

When I add new storage backend with vsphere driver all pod become failing

Error: container create failed: time="2020-06-24T06:40:26Z" level=warning msg="exit status 1" time="2020-06-24T06:40:26Z" level=error msg="container_linux.go:349: starting container process caused \"process_linux.go:449: container init caused \\\"rootfs_linux.go:58: mounting \\\\\\\"/etc/localtime\\\\\\\" to rootfs \\\\\\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged\\\\\\\" at \\\\\\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged/usr/share/zoneinfo/UTC\\\\\\\" caused \\\\\\\"not a directory\\\\\\\"\\\"\"" container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/localtime\\\" to rootfs \\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged\\\" at \\\"/var/lib/containers/storage/overlay/c87616da1d0f51f436eacf9e97bc4622c0285aad28edbcc08a1ec7283d7f930c/merged/usr/share/zoneinfo/UTC\\\" caused \\\"not a directory\\\"\""

I delete volume '/etc/localtime' from statefulset and daemonset, after pods starts successfully

Add support for Topology

CSI spec v1.0 supports topology. This needs to be incorporated into the operator and ensure it works with X_CSI_TOPOLOGIES and X_CSI_NODE_TOPOLOGY in Ember-CSI driver.

No pv is created after deploy the ember-csi-operator

Deploy ember-csi-operator to OCP according to https://github.com/embercsi/ember-csi-operator
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.3.0-0.nightly-2019-12-29-173422 True False 28h Cluster version is 4.3.0-0.nightly-2019-12-29-173422

1.After deployment, ember-csi-operator is running
pod/ember-csi-operator-7cfb6b9c67-cqlkl 1/1 Running 0 22h
pod/my-ceph-controller-0 5/5 Running 0 9h
pod/my-ceph-node-0-hf9tr 2/2 Running 0 9h
pod/my-ceph-node-0-pjhpr 2/2 Running 0 9h
oc get pod/my-ceph-controller-0 -o yaml | grep image
image: embercsi/ember-csi:master
imagePullPolicy: Always
image: quay.io/k8scsi/csi-attacher:v1.1.1
imagePullPolicy: IfNotPresent
image: quay.io/k8scsi/csi-provisioner:v1.1.0
imagePullPolicy: IfNotPresent
image: quay.io/k8scsi/csi-cluster-driver-registrar:v1.0.1
imagePullPolicy: IfNotPresent
image: quay.io/k8scsi/csi-snapshotter:v1.1.0
imagePullPolicy: IfNotPresent
imagePullSecrets:
image: quay.io/k8scsi/csi-cluster-driver-registrar:v1.0.1
imageID: quay.io/k8scsi/csi-cluster-driver-registrar@sha256:fafd75ae5442f192cfa8c2e792903aee30d5884b62e802e4464b0a895d21e3ef
image: docker.io/embercsi/ember-csi:master
imageID: docker.io/embercsi/ember-csi@sha256:95ca3849471d65bb9500ad9169fdffccc3c31e468d40de33a746e38418150069
image: quay.io/k8scsi/csi-attacher:v1.1.1
imageID: quay.io/k8scsi/csi-attacher@sha256:e4db94969e1d463807162a1115192ed70d632a61fbeb3bdc97b40fe9ce78c831
image: quay.io/k8scsi/csi-provisioner:v1.1.0
imageID: quay.io/k8scsi/csi-provisioner@sha256:9828f32a1b350bef5f813857c2a3223e8aec79a9762bd78545eaea8fa79735d1
image: quay.io/k8scsi/csi-snapshotter:v1.1.0
imageID: quay.io/k8scsi/csi-snapshotter@sha256:a49e0da1af6f2bf717e41ba1eee8b5e6a1cbd66a709dd92cc43fe475fe2589eb

  1. Create pvc, app to test but no pv is created
    oc describe pvc ember-csi-pvc
    Name: ember-csi-pvc
    Namespace: demoapp
    StorageClass: my-ceph.ember-csi.io-sc
    Status: Pending
    Volume:
    Labels:
    Annotations: volume.beta.kubernetes.io/storage-provisioner: my-ceph.ember-csi.io
    Finalizers: [kubernetes.io/pvc-protection]
    Capacity:
    Access Modes:
    VolumeMode: Filesystem
    Events:
    Type Reason Age From Message

Normal ExternalProvisioning 6s (x6 over 69s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "my-ceph.ember-csi.io" or manually created by system administrator
Mounted By: my-csi-app

W0102 20:18:28.503719 1 deprecatedflags.go:53] Warning: option provisioner="my-ceph.ember-csi.io" is deprecated and has no effect
I0102 20:18:28.503800 1 feature_gate.go:226] feature gates: &{map[Topology:true]}
I0102 20:18:28.503823 1 csi-provisioner.go:95] Version: v1.1.0-0-gcecb5a96
I0102 20:18:28.503839 1 csi-provisioner.go:109] Building kube configs for running in cluster...
I0102 20:18:28.586631 1 connection.go:151] Connecting to unix:///csi-data/csi.sock
I0102 20:18:32.529469 1 connection.go:261] Probing CSI driver for readiness
I0102 20:18:32.529508 1 connection.go:180] GRPC call: /csi.v1.Identity/Probe
I0102 20:18:32.529518 1 connection.go:181] GRPC request: {}
I0102 20:18:32.534075 1 connection.go:183] GRPC response: {"ready":{"value":true}}
I0102 20:18:32.534578 1 connection.go:184] GRPC error:
I0102 20:18:32.534599 1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0102 20:18:32.534608 1 connection.go:181] GRPC request: {}
I0102 20:18:32.538798 1 connection.go:183] GRPC response: {"manifest":{"cinder-driver":"RBDDriver","cinder-driver-supported":"True","cinder-driver-version":"1.2.0","cinder-version":"15.1.0.dev125","cinderlib-version":"1.0.1.dev3","mode":"controller","persistence":"CRDPersistence"},"name":"ember-csi.io","vendor_version":"0.9.0-44-gf161c3c+19122019161820487031859"}
I0102 20:18:32.539646 1 connection.go:184] GRPC error:
I0102 20:18:32.539662 1 csi-provisioner.go:149] Detected CSI driver ember-csi.io
I0102 20:18:32.539677 1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0102 20:18:32.539686 1 connection.go:181] GRPC request: {}
I0102 20:18:32.544832 1 connection.go:183] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}}]}
I0102 20:18:32.545839 1 connection.go:184] GRPC error:
I0102 20:18:32.545865 1 connection.go:180] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0102 20:18:32.545874 1 connection.go:181] GRPC request: {}
I0102 20:18:32.550529 1 connection.go:183] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":4}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":8}}}]}
I0102 20:18:32.554534 1 connection.go:184] GRPC error:
I0102 20:18:32.555593 1 controller.go:621] Using saving PVs to API server in background
I0102 20:18:32.555868 1 controller.go:769] Starting provisioner controller ember-csi.io_my-ceph-controller-0_0c2c0085-2d9d-11ea-a1d9-0a580a830013!
I0102 20:18:32.556539 1 reflector.go:123] Starting reflector *v1.PersistentVolumeClaim (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800
I0102 20:18:32.556578 1 reflector.go:161] Listing and watching *v1.PersistentVolumeClaim from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800
I0102 20:18:32.557169 1 volume_store.go:90] Starting save volume queue
I0102 20:18:32.557414 1 reflector.go:123] Starting reflector *v1.PersistentVolume (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:803
I0102 20:18:32.557435 1 reflector.go:161] Listing and watching *v1.PersistentVolume from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:803
I0102 20:18:32.557869 1 reflector.go:123] Starting reflector *v1.StorageClass (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:806
I0102 20:18:32.557895 1 reflector.go:161] Listing and watching *v1.StorageClass from sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:806
I0102 20:18:32.656432 1 shared_informer.go:123] caches populated
I0102 20:18:32.656650 1 controller.go:979] Final error received, removing PVC 41ae847e-a383-4a56-a7dd-e2a2cb0e7655 from claims in progress
I0102 20:18:32.656668 1 controller.go:818] Started provisioner controller ember-csi.io_my-ceph-controller-0_0c2c0085-2d9d-11ea-a1d9-0a580a830013!
I0102 20:18:32.656673 1 controller.go:902] Provisioning succeeded, removing PVC 41ae847e-a383-4a56-a7dd-e2a2cb0e7655 from claims in progress
I0102 20:22:39.647669 1 streamwatcher.go:107] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13, ErrCode=NO_ERROR, debug=""
I0102 20:22:39.650857 1 reflector.go:370] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:803: Watch close - *v1.PersistentVolume total 0 items received
I0102 20:22:39.723217 1 streamwatcher.go:107] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13, ErrCode=NO_ERROR, debug=""
I0102 20:22:39.723276 1 reflector.go:370] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:806: Watch close - *v1.StorageClass total 0 items received
I0102 20:22:39.723704 1 streamwatcher.go:107] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13, ErrCode=NO_ERROR, debug=""
I0102 20:22:39.723729 1 reflector.go:370] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800: Watch close - *v1.PersistentVolumeClaim total 0 items received
W0102 20:22:39.900193 1 reflector.go:289] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:806: watch of *v1.StorageClass ended with: too old resource version: 316368 (317507)
W0102 20:22:39.900359 1 reflector.go:289] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800: watch of *v1.PersistentVolumeClaim ended with: too old resource version: 314203 (317500)

Build everything in the Dockerfile

We should have a multistep Dockerfile that is capable of compiling the code and then generating the final image with it instead of requiring us to manually build it in our system and then build the image.

Support Ember-CSI container restarts and share locks

As we currently deploy the Ember-CSI node containers, if we restart them we'll mess up things because they'll lose the private volume bind mounts.

If we have multiple Ember-CSI node containers running on the same host (we have multiple backends) we may run into problems since they are not sharing the locks.

To resolve these two issues we just need to mount the container's /var/lib/ember-csi directory into the host, and share this directory between all the Ember-CSI containers.

Consider listing operator in Artifact Hub

Hi! ๐Ÿ‘‹๐Ÿป

Have you considered listing the ember-csi operator directly in Artifact Hub?

At the moment it is already listed there, because the Artifact Hub team has added the community-operators repository. However, listing it yourself directly has some benefits:

  • You add your repository once, and new versions (or even new operators) committed to your git repository will be indexed automatically and listed in Artifact Hub, with no extra PRs needed.
  • You can display the Verified Publisher label in your operators, increasing their visibility and potentially the users' trust in your content.
  • Increased visibility of your organization in urls and search results. Users will be able to see your organization's description, a link to the home page and search for other content published by you.
  • If something goes wrong indexing your repository, you will be notified and you can even inspect the logs to check what went wrong.

If you decide to go ahead, you just need to sign in and add your repository from the control panel. You can add it using a single user or create an organization for it, whatever suits your needs best.

You can find some notes about the expected repository url format and repository structure in the repositories guide. There is also available an example of an operator repository already listed in Artifact Hub in the documentation. Operators are expected to be packaged using the format defined in the Operator Framework documentation to facilitate the process.

Please let me know if you have any questions or if you encounter any issue during the process ๐Ÿ™‚

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.