Giter Site home page Giter Site logo

oracle / oci-cloud-controller-manager Goto Github PK

View Code? Open in Web Editor NEW
134.0 34.0 81.0 142.19 MB

Kubernetes Cloud Controller Manager implementation for Oracle Cloud Infrastructure

License: Apache License 2.0

Makefile 0.29% Go 97.31% Shell 1.65% Python 0.68% Dockerfile 0.08%
kubernetes oracle-cloud kubernetes-cluster oracle-cloud-infrastructure

oci-cloud-controller-manager's Introduction

OCI Cloud Controller Manager (CCM)

oci-cloud-controller-manager is a Kubernetes Cloud Controller Manager implementation (or out-of-tree cloud-provider) for Oracle Cloud Infrastucture (OCI).

Go Report Card

Introduction

External cloud providers were introduced as an Alpha feature in Kubernetes 1.6 with the addition of the Cloud Controller Manager binary. External cloud providers are Kubernetes (master) controllers that implement the cloud-provider specific control loops required for Kubernetes to function.

This functionality is implemented in-tree in the kube-controller-manger binary for existing cloud-providers (e.g. AWS, GCE, etc.), however, in-tree cloud-providers have entered maintenance mode and no additional providers will be accepted. Furthermore, there is an ongoing effort to remove all existing cloud-provider specific code out of the Kubernetes codebase.

Compatibility matrix

Min Kubernetes Version Max Kubernetes Version
>=v 0.11 v1.16 v1.18
>=v 0.12 v1.18 v1.21
>=v 0.13 v1.19 v1.21
v1.19.12 v1.19 v1.21
v1.22.0 v1.22 -
v1.23.0 v1.23 -
v1.24.2 v1.24 -
v1.25.2 v1.25 -
v1.26.4 v1.26 -
v1.27.3 v1.27 -
v1.28.1 v1.28 -
v1.29.0 v1.29 -

Note: Versions older than v1.27.3 are no longer supported, new features / bug fixes will be available in v1.27.3 and later.

Implementation

Currently oci-cloud-controller-manager implements:

  • NodeController - updates nodes with cloud provider specific labels and addresses, also deletes kubernetes nodes when deleted from the cloud-provider.
  • ServiceController - responsible for creating load balancers when a service of type: LoadBalancer is created in Kubernetes.

Additionally, this project implements a container-storage-interface, a flexvolume driver and a flexvolume provisioner for Kubernetes clusters running on Oracle Cloud Infrastructure (OCI).

Setup and Installation

To get the CCM running in your Kubernetes cluster you will need to do the following:

  1. Prepare your Kubernetes cluster for running an external cloud provider.
  2. Create a Kubernetes secret containing the configuration for the CCM.
  3. Deploy the CCM as a DaemonSet.

Note: For the setup and installation of flexvolume driver, flexvolume provisioner and container-storage-interface please refer linked resources.

Preparing Your Cluster

To deploy the Cloud Controller Manager (CCM) your cluster must be configured to use an external cloud-provider.

This involves:

  • Setting the --cloud-provider=external flag on the kubelet on all nodes in your cluster.
  • Setting the --provider-id=<instanceID> flag on the kubelet on all nodes in your cluster. Where <instanceID> is the instance ocid of a node (unique for each node).
  • Setting the --cloud-provider=external flag on the kube-controller-manager in your Kubernetes control plane.

Depending on how kube-proxy is run you may need the following:

  • Ensuring that kube-proxy tolerates the uninitialised cloud taint. The following should appear in the kube-proxy pod yaml:
- effect: NoSchedule
  key: node.cloudprovider.kubernetes.io/uninitialized
  value: "true"

If your cluster was created using kubeadm >= v1.7.2 this toleration will already be applied. See kubernetes/kubernetes#49017 for details.

Remember to restart any components that you have reconfigured before continuing.

Authentication and Configuration

An example configuration file can be found here. Download this file and populate it with values specific to your chosen OCI identity and tenancy. Then create the Kubernetes secret with the following command:

For CCM -

$ kubectl  create secret generic oci-cloud-controller-manager \
     -n kube-system                                           \
     --from-file=cloud-provider.yaml=provider-config-example.yaml

Note that you must ensure the secret contains the key cloud-provider.yaml rather than the name of the file on disk.

Deployment

Deploy the controller manager and associated RBAC rules if your cluster is configured to use RBAC (replace ? with the version you want to install to):

$ export RELEASE=?
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-cloud-controller-manager-rbac.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-cloud-controller-manager.yaml

Check the CCM logs to ensure it's running correctly:

$ kubectl -n kube-system get po | grep oci
oci-cloud-controller-manager-ds-k2txq   1/1       Running   0          19s

$ kubectl -n kube-system logs oci-cloud-controller-manager-ds-k2txq
I0905 13:44:51.785964       7 flags.go:52] FLAG: --address="0.0.0.0"
I0905 13:44:51.786063       7 flags.go:52] FLAG: --allocate-node-cidrs="false"
I0905 13:44:51.786074       7 flags.go:52] FLAG: --alsologtostderr="false"
I0905 13:44:51.786078       7 flags.go:52] FLAG: --cloud-config="/etc/oci/cloud-config.cfg"
I0905 13:44:51.786083       7 flags.go:52] FLAG: --cloud-provider="oci"

Upgrade

The following example shows how to upgrade the CCM, FVP, FVD and CSI from an older version (replace ? with the version you're upgrading to):

$ export RELEASE=?
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-cloud-controller-manager-rbac.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-cloud-controller-manager.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-volume-provisioner.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-volume-provisioner-rbac.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-flexvolume-driver.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-flexvolume-driver-rbac.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-csi-controller-driver.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-csi-node-driver.yaml
$ kubectl apply -f https://github.com/oracle/oci-cloud-controller-manager/releases/download/${RELEASE}/oci-csi-node-rbac.yaml

Examples

Development

See DEVELOPMENT.md.

Support

If you think you've found a bug, please raise an issue.

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide

Security

Please consult the security guide for our responsible security vulnerability disclosure process

License

Copyright (c) 2017, 2023 Oracle and/or its affiliates. All rights reserved.

oci-cloud-controller-manager is licensed under the Apache License 2.0.

See LICENSE for more details.

oci-cloud-controller-manager's People

Contributors

akarshes avatar alapidas avatar anchaube avatar ankitkumr10 avatar arindam-bandyopadhyay avatar bdourallawzi avatar bencurrerburgess avatar dhananjay-ng avatar garthy avatar gouthamml avatar iamprvn avatar jayasheelankumar avatar jbornemann avatar jhorwit2 avatar joekr avatar kristenjacobs avatar l-technicore avatar madalinapatrichi avatar mrunalpagnis avatar owainlewis avatar pranavsriram8 avatar prydie avatar rjtsdl avatar rodrigc avatar shyamradhakrishnan avatar simonlord avatar templecloud avatar vbhargav875 avatar yashasg98 avatar yashwantgohokar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oci-cloud-controller-manager's Issues

412 error configuring LB

A customer hit this when creating an LB service:

I1208 20:58:06.783268       7 load_balancer.go:334] Applying `create` action on listener `TCP-30081` for lb `ocid1.loadbalancer.oc1.iad.aaaaaaaaf2dqk6s7sz4re5db36dscjnyze6pzsr4bzlyz54fa3rnv52fuwlq`
I1208 20:58:32.782339       7 load_balancer.go:334] Applying `create` action on listener `TCP-9102` for lb `ocid1.loadbalancer.oc1.iad.aaaaaaaaf2dqk6s7sz4re5db36dscjnyze6pzsr4bzlyz54fa3rnv52fuwlq`
E1208 20:58:32.819069       7 service_controller.go:749] Failed to process service. Retrying in 40s: Failed to ensure load balancer for service default/my-another-sm-meteorservicemanager: udpate listeners: update lb security list rules `ocid1.securitylist.oc1.iad.aaaaaaaachnkqxv7z63rmkabnv63isaijmdig3f5f2zdkg6jawlwn5434jva` for subnet `ocid1.subnet.oc1.iad.aaaaaaaacwm4yw7ke2efncrcbxlly74gz5obtda67h3uwhmfzbyvuhi2trpq: Status: 412; Code: NoEtagMatch; OPC Request ID: /0BFAD5466C1016BC6FED9BA160208578/9C12B940702247BBB700BC706A8B3F5E; Message: Entity security list with ID ocid1.securitylist.oc1.iad.aaaaaaaachnkqxv7z63rmkabnv63isaijmdig3f5f2zdkg6jawlwn5434jva has a computed tag of eea3653b, but is passed a tag of 9a048831

I don't have any more details right now around the service being created, but I can put you in touch with the user.

Which versions are we supporting?

Do we plan to support prior to 1.7.2? If so, we need to update the health checker to not default to the node healthz port since it didn't exist before that time. You can see what GCE does here

Add support for disabling security rules to allow handling them differently

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L438

//The aws provider creates an inbound rule per load balancer on the node security
//group. However, this can run into the AWS security group rule limit of 50 if
//many LoadBalancers are created.
//
//This flag disables the automatic ingress creation. It requires that the user
//has setup a rule that allows inbound traffic on kubelet ports from the
//local VPC subnet (so load balancers can access it). E.g. 10.82.0.0/16 30000-32000.
DisableSecurityGroupIngress bool

This is very useful for us since we use iptables to lock down communication, so if we avoid creating redundant security rules we can create more load balancers per security list.

Feature Request: Create multiple LBs for one service

There's a use case we have where we want to create multiple LBs for a single service (mainly for HA). My guess is that this breaks the model a little bit WRT the service's External IP field expecting one IP only, but I don't know for sure.

Review our deployment code

The following appears to be how GCE are deploying the CCM (see: here). We should compare with our manifest and identify areas for improvement/standardisation.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloud-controller-manager
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: cloud-controller-manager
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: cloud-controller-manager
  name: cloud-controller-manager
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cloud-controller-manager
  template:
    metadata:
      labels:
        app: cloud-controller-manager
    spec:
      serviceAccountName: cloud-controller-manager
      containers:
      - name: cloud-controller-manager
        image: gcr.io/google_containers/cloud-controller-manager:v1.7.0
        command:
        - /usr/local/bin/cloud-controller-manager
        - --cloud-provider=gce
        - --allocate-node-cidrs=true
        - --configure-cloud-routes=true
        - --cluster-cidr=10.135.0.0/16
        volumeMounts:
        - name: etcssl
          mountPath: /etc/ssl
      tolerations:
      - key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      volumes:
      - name: etcssl
        hostPath:
          path: /etc/ssl

Make LB create/update faster by making calls concurrently

We should be able to speed up the load balancer logic quite a bit by making api calls concurrently for backendsets & listeners when possible.

During creation, we need to ensure that the backendsets are created prior to creating the listeners, this means we could create a simple pipeline for each combination and execute them.

For updates, it shouldn't matter; we should be able to update the backendset & listener at the same time.

Question: Does it matter when security lists are updated? It a rule doesn't exist then the health check would fail, so i feel we should be able to do those concurrently as well.

cc @prydie

Use Helm to install the CCM

Given the complexity of installing this, we might want to create a Helm chart to make it easier for end users to install the CCM.

Implement Config validation

We validate that we can communicate with the OCI API, however, we do not validate the config and provide feedback on any omitted fields.

Document or default Config.PrivateKeyFile

Currently users must specify the key-file key in their oci-cloud-controller-manager ConfigMap and they must set it to /etc/pem/<oci-api-key_filename> where the <oci-api-key_filename> is the filename of their private key stored in the oci-api-key secret.

This is poorly documented. We should either document it or provide a sensible default.

Example:

❯ kubectl -n kube-system get secrets oci-api-key -o yaml
apiVersion: v1
data:
  oci_api_key.pem: <snip>
❯ kubectl -n kube-system get configmap oci-cloud-controller-manager -o yaml
apiVersion: v1
data:
  cloud-config.cfg: |
    [Global]
    <snip>
    key-file = /etc/pem/oci_api_key.pem
    <snip>

/cc @owainlewis @jhorwit2

should not allow the load balancer subnets to change

Currently, it's possible to change the load balancer subnets via annotations. We should not allow this since the load balancer subnets cannot be updated. Right now if the user changes the annotation values it would cause security rules to be applied in a subnet where the load balancer doesn't actually exist.

Error updating service port

I created a LoadBalancer service and tried to change the port, and CCM errored out with the following:

I1031 21:46:33.332339       1 service_controller.go:286] Ensuring LB for service default/frontend
I1031 21:46:33.371770       1 load_balancer.go:252] Applying `delete` action on backend set `TCP-80` for lb `ocid1.loadbalancer.oc1.iad.aaaaaaaaxsq3zcrkxgu4rvkmc75iw27jme6zz6xijesw52qr26fnlz467wtq`
E1031 21:46:33.416808       1 service_controller.go:749] Failed to process service. Retrying in 5s: Failed to ensure load balancer for service default/frontend: update backendsets: update lb security list rules `ocid1.securitylist.oc1.iad.aaaaaaaa6tb2ljjdxvgxlpcs7wrfvi6oinv3c5cjn2oujcxyvuvu45dtl6eq` for subnet `ocid1.subnet.oc1.iad.aaaaaaaawt2lzrjd2dpoj3mvi5ljznqxaem7fzjjuffp5ac36mc2uw2td5vq: Status: 412; Code: NoEtagMatch; OPC Request ID: /27241DEF8077EF887A5AE78FF3F255C9/BA7DF04E4B9046F7BBC883B3BA02E36A; Message: Entity security list with ID ocid1.securitylist.oc1.iad.aaaaaaaa6tb2ljjdxvgxlpcs7wrfvi6oinv3c5cjn2oujcxyvuvu45dtl6eq has a computed tag of 66939e3f, but is passed a tag of 11f4c135
I1031 21:46:38.416995       1 service_controller.go:286] Ensuring LB for service default/frontend
I1031 21:46:38.438356       1 load_balancer.go:252] Applying `delete` action on backend set `TCP-80` for lb `ocid1.loadbalancer.oc1.iad.aaaaaaaaxsq3zcrkxgu4rvkmc75iw27jme6zz6xijesw52qr26fnlz467wtq`
E1031 21:46:51.708515       1 service_controller.go:749] Failed to process service. Retrying in 10s: Failed to ensure load balancer for service default/frontend: update backendsets: WorkRequest "ocid1.loadbalancerworkrequest.oc1.iad.aaaaaaaaqzsol2b24vbgcazcpggnwe2i5q2rt2rf4m4z6wmow6exjfha6xla" failed: Listener 'TCP-80' has unknown default backend set 'TCP-80'
I1031 21:47:01.708678       1 service_controller.go:286] Ensuring LB for service default/frontend
I1031 21:47:01.729085       1 load_balancer.go:252] Applying `delete` action on backend set `TCP-80` for lb `ocid1.loadbalancer.oc1.iad.aaaaaaaaxsq3zcrkxgu4rvkmc75iw27jme6zz6xijesw52qr26fnlz467wtq`
E1031 21:47:10.120603       1 service_controller.go:749] Failed to process service. Retrying in 20s: Failed to ensure load balancer for service default/frontend: update backendsets: WorkRequest "ocid1.loadbalancerworkrequest.oc1.iad.aaaaaaaawj6s3ohnsmbzy4ijc6ratb4etrriahe4myr2i5d5p342bznowqha" failed: Listener 'TCP-80' has unknown default backend set 'TCP-80'
I1031 21:47:30.120776       1 service_controller.go:286] Ensuring LB for service default/frontend
I1031 21:47:30.137829       1 load_balancer.go:252] Applying `delete` action on backend set `TCP-80` for lb `ocid1.loadbalancer.oc1.iad.aaaaaaaaxsq3zcrkxgu4rvkmc75iw27jme6zz6xijesw52qr26fnlz467wtq`
E1031 21:47:43.705711       1 service_controller.go:749] Failed to process service. Retrying in 40s: Failed to ensure load balancer for service default/frontend: update backendsets: WorkRequest "ocid1.loadbalancerworkrequest.oc1.iad.aaaaaaaao4n6tmc2zudmjkmtjnt3v4nyac4wusxjtammyvp453tyuisnenkq" failed: Listener 'TCP-80' has unknown default backend set 'TCP-80'

And in the OCI Console:

screen shot 2017-10-31 at 4 50 17 pm

Migrate config to a single secret

The current method of configuring the API signing key as detailed in #45 leaves a lot to be desired.

A better solution would be to use a single secret containing a yaml configuration with the key file inlined. We would still support a auth.keyFile property but document it only in development documentation as a means of configuring the CCM when running via make run-dev.

Suggested format:

auth:
  region: us-phoenix-1
  tenancy: ocid1.tenancy.oc1..aaaaaaaatyn7scrtwtqedvgrxgr2xunzeo6uanvyhzxqblctwkrpisvke4kq
  compartment: ocid1.compartment.oc1..aaaaaaaa3um2atybwhder4qttfhgon4j3hcxgmsvnyvx4flfjyewkkwfzwnq
  user: ocid1.user.oc1..aaaaaaaai77mql2xerv7cn6wu3nhxang3y4jk56vo5bn5l5lysl34avnui3q
  key: |
    -----BEGIN RSA PRIVATE KEY-----
    MIIEowIBAAKCAQEA4KpLGy/BLbph55HMjWLxCO657DLQTk4o+WWPi1+5oeAUVgyh
    kdvPR22jn9HiAL9jKv7PR3/OdHSp/6E3d05htksI7Tct4M/eWVMGRIzoMJvpJ99e
    ZP7MtQT9yknbJDSJoibSwLmPoInnPE/WbcgrTKSAfNURK0bKw1tnLd85qt7zdLI3
    g6O/14Bsmf+ovGiQHP6oiTuC4l3D8eTLlKdSrRVqZXhdvslpZU8MtNB8pPHMB4GZ
    R6HccBi7TJY7kkNg+5flRBTdYL8bvaji3zxSlvawvet+bJmEtApkUoLnovLCviVp
    NVTJZb5iQxMJLZlDJJT/ruq+HMJ3PiiYFOjFVwIDAQABAoIBAQDNkiT9MFoj/Hpf
    SOKRsKn60W3gObKvJAeMBKkvD50tCHuzLQWeEDJ/GkxxDbwtkPItwlBqDQEdQC7Z
    UGwPR/JSuh/l5uqc3beHpleC3CgNamwSZunZoegv7uxGcAQMAeK6M6n+XQyWCflD
    D46Wj2VHUPKcxt1Z6wHXdchYifwbYwUNA3hOlRJK3ODgk/X6UjTGb3+gpY3qU4kX
    Iz5L1ekCSgVIPBFVwdZQUyUC7+iIySaK+qcmEEx/UwOZ6uxhcmRzca31cjeaRS4H
    pUjrl/aqLIW57E2MQ/vSzfQn7kEGBOrS0RjHZgq9u4Qdq6EkjHj3fenKpwWB7S1z
    4t0PpinJAoGBAPRmxAcCd88EhWh5HhN+RWjmXdDCOmZ0yXbxxVBTQtK5pPnP8I9A
    3Jd2ughHk7dFBvgKbHkVsyWgAk8zRZdD2hkQBOXvoeJF2scmvgFUBs1otf6xiFsf
    IC0I8A/wXn3IHmyrG7xmPAtHWKvTTAFg7IjIIofcX7cuzMeLXEUMvLQdAoGBAOtT
    wJCtPTNs4c3vhO4gba98c30U3tHmbLVKJXGEeZkSv3/ez5eIiYBJTzwLB2+ppy8j
    2lYsdkLvsoyKF3LUwyt0gsX+AU9DJ2dmSJZ3E67UHsY6+qog5QlYfWWD8mKWeE9L
    2r0rhG6l0WHR15LdvVc9MJ8e3YVUvNJJJJhQ2v0DAoGAAosXOyNxb7wST1YDVBya
    SE8tZsC+rtZESnKVpRJYvayk5NyfGj6IjSL1KKTmCqAzRF2HZ3MsXBXgMEbOUJaq
    LFyYUHQ/8QTdE/l5PLZNI9IVIsNiMeCPCyjuppvPv+tXNbZKIZnGwi9J4u/d+J2z
    mHDMuzE15cgc5W6z1Rwe0pkCgYBzRwvF05dvYZ8bqoGLxQb2OBi65UZhvGb0R+Yf
    va1zduOoWBWJPbFdzoup9h0mbg0f4ohKPm2QTKtCfUMPVXpmByUoqE0r7tGWrVxR
    mPNjaTXKFYpFXOfVtCt5VzGdaeh1r8rvcCnnqgLv0EOyBj2CRs9So2QQtHnq6Tms
    A6/C0QKBgAw8IsCnkNoZujCEOR/6ZHbK3eeyAs2yuJumsjYYosIGZ/bzsXTpfzAw
    bs45GZxrW67zB/0HA7bVWS9ZkCVflHI2uBCFofm+y55IAzg9/c1xYU19PA3KRxHZ
    D/yEDdXVK/lIzNt7kIMFhtoYGrwv1JQGfK5Wh2bi+AwbBDZ45/17
    -----END RSA PRIVATE KEY-----
  fingerprint: 97:84:f7:26:a3:7b:74:d0:bd:4e:08:a7:79:c9:d0:1d

loadBalancer:
  disableSecurityListManagement: false
  subnet1: ocid1.subnet.oc1.phx.aaaaaaaasa53hlkzk6nzksqfccegk2qnkxmphkblst3riclzs4rhwg7rg57q
  subnet2: ocid1.subnet.oc1.phx.aaaaaaaahuxrgvs65iwdz7ekwgg3l5gyah7ww5klkwjcso74u3e4i64hvtvq

(Note: key generated for the purposes of this example only)

Thoughts @owainlewis?

Support an alternative naming scheme for load balancers

Currently we make use of the service's UID. The problem with that is we can't use products like heptio ark to do cluster restores since clients cannot set the UID of a resource (restricted by the api server). The problem is you can associate the load balancer back to the service in the new cluster w/o manual intervention to rename the load balancers. Alternatively, if we allow the load balancer to be named something like (<optional prefix>-)<namespace>-<service> we can associate an existing load balancer with a given service after restore.

Proposal:

Add an optional configuration argument to use an alternate naming schema. The default will remain UID. If the flag is enabled then we'll first attempt to fetch the load balancer by name then if that fails attempt to fetch the load balancer by uid for backwards compatibility. That gives an upgrade path (albeit manual) to migrate existing LB's to the new naming schema if they want.

cc @prydie

Allow specifying default source cidrs in the cloud config

GCE allows the CCM to specify the source cidrs for the load balancers with the following flag--cloud-provider-gce-lb-src-cidrs.

This is useful when you want to have a more secure policy out of the box. By default, 0.0.0.0/0 is used, which is fine, but it'd be nice to default to something more secure like the VCN only.

CCM panics when there are no available backends

While testing my LB refactor I noticed an existing bug that answered a TODO about what happens if there are no backends (say all the workers are cordoned). I'm not sure the best approach here. We derive the backend port for security rules based on the backends. If there are no backends in the lbspec then how do we clean up security rules? The bug itself is here

I need to think on this some more.

stack trace

goroutine 126 [running]:
github.com/oracle/oci-cloud-controller-manager/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/Users/horwitz/gocode/src/github.com/oracle/oci-cloud-controller-manager/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x126
panic(0x1226b60, 0x1cdb5f0)
	/usr/local/Cellar/go/1.8.3/libexec/src/runtime/panic.go:489 +0x2cf
github.com/oracle/oci-cloud-controller-manager/pkg/oci.(*CloudProvider).updateBackendSets(0xc4206f4140, 0xc42049d5e0, 0xc421532c30, 0x2c, 0x13a9b58, 0x7, 0xc4201310e0, 0x0, 0x0, 0x0, ...)
	/Users/horwitz/gocode/src/github.com/oracle/oci-cloud-controller-manager/pkg/oci/load_balancer.go:248 +0xbdb
github.com/oracle/oci-cloud-controller-manager/pkg/oci.(*CloudProvider).EnsureLoadBalancer(0xc4206f4140, 0x13ae1a0, 0xa, 0xc4201310e0, 0x0, 0x0, 0x0, 0x14, 0x13baf66, 0x16)
	/Users/horwitz/gocode/src/github.com/oracle/oci-cloud-controller-manager/pkg/oci/load_balancer.go:204 +0x73c
github.com/oracle/oci-cloud-controller-manager/vendor/k8s.io/kubernetes/pkg/controller/service.(*ServiceController).ensureLoadBalancer(0xc42049ca80, 0xc4201310e0, 0xc4201310e0, 0x13a8a54, 0x6)
	/Users/horwitz/gocode/src/github.com/oracle/oci-cloud-controller-manager/vendor/k8s.io/kubernetes/pkg/controller/service/service_controller.go:354 +0xcc
github.com/oracle/oci-cloud-controller-manager/vendor/k8s.io/kubernetes/pkg/controller/service.(*ServiceController).createLoadBalancerIfNeeded(0xc42049ca80, 0xc4201bc400, 0x19, 0xc4201310e0, 0xc420654690, 0xc42034c5a0, 0xc420588ca8)
	/Users/horwitz/gocode/src/github.com/oracle/oci-cloud-controller-manager/vendor/k8s.io/kubernetes/pkg/controller/service/service_controller.go:291 +0x1ee
github.com/oracle/oci-cloud-controller-manager/vendor/k8s.io/kubernetes/pkg/controller/service.(*ServiceController).processServiceUpdate(0xc42049ca80, 0xc42157c1b0, 0xc4201310e0, 0xc4201bc400, 0x19, 0x0, 0x0, 0x0)

Race condition on updating security lists

We need to add etag support to updating security lists. Currently, multiple goroutines can trigger an update. Please see kubernetes/kubernetes#53462 for the upstream issue. The problem for us lies in the fact we update security lists as a whole instead of individual security rules in a list. This means that we have a race condition on updating the same subnet, which is fairly common since nodes typically don't span that many subnets.

Nginx SSL tutorial broken in 1.8

The service contains multiple frontend ports with the same target port which gives the following error as of 1.8:

❯ kubectl create -f examples/nginx-demo-svc-ssl.yaml
deployment "nginx-deployment" created
The Service "nginx-service" is invalid: spec.ports[1].targetPort: Duplicate value: intstr.IntOrString{Type:0, IntVal:80, StrVal:""}

Don't require compartment in config

The compartment can be inferred from the OCI metadata endpoint in OCI, so there's no strict requirement to supply it in the config. Similar code is used in the flexvolume provisioner.

Run integration tests in CI

We should configure CI to run the integration tests.

Perhaps we limit these to only run on commits to master due to resource constraints? That does mean that PRs would land prior to integration tests running, however.

Load balancer subnet/vnic calls cause rate limiting issues in compartments with many instances

I have a 4 node cluster spanning 3 subnets and tried to test out the load balancer; however, I'm not able to get the subnets for the backends. The most it could be querying is 5 subnets.

E0929 22:38:27.653040     429 servicecontroller.go:753] Failed to process service. Retrying in 5s: Failed to create load balancer for service kube-system/kube-proxy-lb: update backendsets: get subnets for backends: Status: 429; Code: TooManyRequests; OPC Request ID: /BDB7FC30514267810C3430ED2C1EECC7/50F507BACD75F997C166114A4376EA40; Message: User-rate limit exceeded.
E0929 22:38:35.222196     429 servicecontroller.go:753] Failed to process service. Retrying in 10s: Failed to create load balancer for service kube-system/kube-proxy-lb: update backendsets: get subnets for backends: Status: 429; Code: TooManyRequests; OPC Request ID: /61D80443877F748ED2D38497E3D9EA4D/B7319D392DE2C6949265AD4D000D43A0; Message: User-rate limit exceeded.
E0929 22:38:48.689485     429 servicecontroller.go:753] Failed to process service. Retrying in 20s: Failed to create load balancer for service kube-system/kube-proxy-lb: update backendsets: get subnets for backends: Status: 429; Code: TooManyRequests; OPC Request ID: /C5F87D4E2753F6F29740CBB8090E6A24/3C6840CED946BCB2F6F576145F8BB02D; Message: User-rate limit exceeded.
E0929 22:39:14.373435     429 servicecontroller.go:753] Failed to process service. Retrying in 40s: Failed to create load balancer for service kube-system/kube-proxy-lb: update backendsets: get subnets for backends: Status: 429; Code: TooManyRequests; OPC Request ID: /904203C75135820DEE733DEF44ED120C/5EECBB24663498699B7CF849EEF2575D; Message: User-rate limit exceeded.
E0929 22:40:05.747849     429 servicecontroller.go:753] Failed to process service. Retrying in 1m20s: Failed to create load balancer for service kube-system/kube-proxy-lb: update backendsets: get subnets for backends: Status: 429; Code: TooManyRequests; OPC Request ID: /80189FC1BC6EA79F8AD3EF2E639256E1/2F1CA3CC6D764781DBDF062682383359; Message: User-rate limit exceeded.
E0929 22:41:42.130380     429 servicecontroller.go:753] Failed to process service. Retrying in 2m40s: Failed to create load balancer for service kube-system/kube-proxy-lb: update backendsets: get subnets for backends: Status: 429; Code: TooManyRequests; OPC Request ID: /883324FBC55915B716067693BB54B33D/FDD6150CE3CEA50124E81781863D2F86; Message: User-rate limit exceeded.

`instances.ExternalID` tells the CCM to delete nodes that should not be deleted

I ran into a networking issue earlier that basically caused a bunch of connections to fail to the API. When I came back I noticed half my cluster was gone, but the severs themselves were still running (just firewalled off). Technically, since the state is still running the CCM should not remove those nodes.

The culprit is here. We aren't checking the type of the error. We should only return cloudprovider.InstanceNotFound IF the error is a not found error.

cc @prydie

Update README to be more user centric

The README was originally written as more of a summery of the emerging out-of-tree provider landscape. With the release of 1.8 the docs have been updated and can be referenced for background information instead. The README should be updated to better cater to (dev)ops wanting to deploy the CCM.

Document supported Service annotations

The following service annotations should be documented comprehensively:

  • SSL Annotations (largely documented by SSL example already)
  • Shape annotation
  • Subnet annotations
  • Internal load balancer annotation

Cannot get to LB listener unless seclist rule exists

I created a load balancer service and was not able to hit the listener port. It turned out that my seclist attached to my LB subnets was not allowing ingress traffic into the listener port. Once I added a rule for it, things started flowing.

Perhaps the CCM should automatically poke a hole in the seclist for this. I do not know the behavior if the 2 subnets that the LB is using use different seclists. We may want to poke a hole for 0.0.0.0/0 by default, and support annotations to limit this.

Security rules are missing for healthcheck ports

I'm kinda confused how this currently works without the security rules for anyone, but we don't add security rules to allow 10256 (kube-proxy) or custom healthcheck ports via .spec.healthCheckNodePort for local traffic services.

Does the terraform you use for testing @prydie setup any security list rules by default to be more permissive? I know ours are setup that way which is why I missed this.

Zones interface is not enabled

<name>   Ready     2m        v1.7.6    beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=VM.DenseIO1.8,beta.kubernetes.io/os=linux,kubernetes.io/hostname=<hostname>

The CCM didn't provide any useful errors but I did get this when I tried to manually add the labels via kubectl

\"failure-domain.beta.kubernetes.io/zone=niat:PHX-AD-1\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'

The problem seems to be :.

A node that's not found triggers rate limited infinite loop

While deploying a new cluster I mismatched my hostname to the display name which caused the CCM to get into a rate limiting infinite loop since the node controllers backoff retry is fairly fast compared to the rate limits on OCI's API.

We need to start rate limiting requests in the client which is similar to what Azure does. We should also implement additional delay on 429 responses to help the CCM recover.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.