Giter Site home page Giter Site logo

kubernetes-sigs / cloud-provider-azure Goto Github PK

View Code? Open in Web Editor NEW
258.0 23.0 270.0 129.93 MB

Cloud provider for Azure

Home Page: https://cloud-provider-azure.sigs.k8s.io/

License: Apache License 2.0

Makefile 0.79% Go 97.10% Shell 1.56% Perl 0.07% Dockerfile 0.17% Python 0.29% Smarty 0.03%
cloud-providers kubernetes azure k8s-sig-cloud-provider

cloud-provider-azure's Introduction

Cloud provider for Azure

Go Report Card Coverage Status GitHub stars GitHub stars

Introduction

This repository provides the Azure implementation of the Kubernetes cloud provider interface.

This is the "external" or "out-of-tree" cloud provider for Azure. The "in-tree" cloud provider has been deprecated since v1.20 and only bug fixes are allowed in its Kubernetes repository directory.

Current status

cloud-provider-azure has been GA since v1.0.0. Releases are available from the Microsoft Container Registry (MCR).

The latest release of azure-cloud-controller-manager and azure-cloud-node-manager can be found at

  • mcr.microsoft.com/oss/kubernetes/azure-cloud-controller-manager:v1.30.4
  • mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager:v1.30.4

Version matrix

(Minor release versions match Kubernetes minor release versions.)

Kubernetes version cloud-provider version cloud-provider branch
master N/A master
v1.y.x v1.y.z release-1.y
v1.30.x v1.30.z release-1.30
v1.29.x v1.29.z release-1.29
v1.28.x v1.28.z release-1.28
v1.27.x v1.27.z release-1.27

AKS version matrix

The table below shows the cloud-controller-manager and cloud-node-manager versions supported in Azure Kubernetes Service (AKS).

AKS version cloud-controller-manager version cloud-node-manager version
v1.30.x v1.30.4 v1.30.0
v1.29.x v1.29.8 v1.29.4
v1.28.x v1.28.10 v1.28.9
v1.27.x v1.27.18 v1.27.17

Build

To build the binary for azure-cloud-controller-manager:

make all

To build the Docker image for azure-cloud-controller-manager:

IMAGE_REGISTRY=<registry> make image

For detailed directions on image building, please read here.

Run

To run azure-cloud-controller-manager locally:

azure-cloud-controller-manager \
    --cloud-provider=azure \
    --cluster-name=kubernetes \
    --controllers=*,-cloud-node \
    --cloud-config=/etc/kubernetes/cloud-config/azure.json \
    --kubeconfig=/etc/kubernetes/kubeconfig \
    --allocate-node-cidrs=true \
    --configure-cloud-routes=true \
    --cluster-cidr=10.240.0.0/16 \
    --route-reconciliation-period=10s \
    --leader-elect=true \
    --secure-port=10267 \
    --v=2

To run azure-cloud-node-manager locally:

azure-cloud-node-manager \
    --node-name=$(hostname) \
    --wait-routes=true

It is recommended to run azure-cloud-controller-manager as a Deployment with multiple replicas, or directly with kubelet as static Pods on each control plane Node. See here for an example.

Get more detail at Deploy Cloud Controller Manager.

E2E tests

Please read the following documents for e2e test information:

Documentation

Refer https://cloud-provider-azure.sigs.k8s.io/ for the documentation of Cloud Provider Azure (documents are hosted in documentation branch).

Contributing

Please see CONTRIBUTING.md for instructions on how to contribute.

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

License

Apache License 2.0.

cloud-provider-azure's People

Contributors

abhik1998 avatar adamdang avatar andyzhangx avatar aramase avatar brendandburns avatar colemickens avatar cvvz avatar dependabot[bot] avatar feiskyer avatar fseldow avatar ialidzhikov avatar itowlson avatar jackfrancis avatar jesusalvareztorres avatar jwtty avatar k8s-ci-robot avatar karataliu avatar lanlou1554 avatar lzhecheng avatar martinforreal avatar marwanad avatar mboersma avatar mirzasikander avatar nilo19 avatar ritazh avatar sozercan avatar tyler-lloyd avatar zarvd avatar zeromagic avatar zmyzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloud-provider-azure's Issues

A storageAccount API call blocks kubelet from becoming Ready when there isn't any outbound IP

Running k8s v1.11.5
following this flow, it was discovered that an API call to the ARM API will block kubelet from properly registering the node as Ready for 10mins.

  1. we create an AS with 10 VM, they all get a outbound IP defined by the AS outbound IP.
  2. we install kubelet, kubelet starts registering to the ARM API using the public IP (outbound IP)
  3. Eventually, all nodes are registered in k8s.
  4. A user creates a Service of Type LoadBalancer, the outboundIP if the nodes changes to the frontEnd IP of that LoadBalancer service.
  5. Deallocate a VM, the node disappear from the backend pool and become NotReady. It doesn't have an outboundIP defined anymore
  6. The VM is started again. The kubelet service gets stuck trying to talk to the ARM API
.222271    2413 azure_auth.go:59] azure: using managed identity extension to retrieve the access token
.222303    2413 azure.go:219] Azure cloudprovider (read ops) using rate limit config: QPS=25, bucket=200
.222370    2413 azure.go:223] Azure cloud provider (write ops) using rate limit config: QPS=10, bucket=100
.222483    2413 azure.go:280] Azure cloud provider using retry backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000
  1. eventually (after 10mins), the kubelet registers again... there is a call to the ARM API done by azure_blobDiskController.go:70 which blocks for 10mins, before letting kubelet complete with its starting logic and making the node as Ready
70    2413 azure.go:223] Azure cloud provider (write ops) using rate limit config: QPS=10, bucket=100
.222483    2413 azure.go:280] Azure cloud provider using retry backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000
azure_blobDiskController.go:70] azureDisk - getAllStorageAccounts error: storage.AccountsClient#ListByResourceGroup: Failure sending request: StatusCode=0 -- Original Error: Get https://management.azure.com/subscriptions/<subID>/resourceGroups/<resourceGroup>/providers/Microsoft.Storage/storageAccounts?api-version=2017-10-01:  dial tcp 52.232.180.115:443: i/o timeout
.610372    2413 server.go:526] Successfully initialized cloud provider: "azure" from the config file: "/etc/kubernetes/azure.json"
.610414    2413 server.go:772] cloud provider determined the current node name to be kn-es-11
.661299    2413 bootstrap.go:52] Kubeconfig /var/lib/kubelet/kubeconfig exists and is valid, skipping bootstrap
  1. after that, the Kube-Controller-Manager, adds the node back to the SLB backend pool since it has become Ready. giving back the outbound IP for the VM
I0131 03:25:26.425690       1 node_lifecycle_controller.go:808] ReadyCondition for Node kn-es-11 transitioned from &NodeCondition{Type:Ready,Status:False,LastHeartbeatTime:2019-01-31 03:24:18 +0000 UTC,LastTransitionTime:2019-01-31 03:24:18 +0000 UTC,Reason:KubeletNotReady,Message:container runtime is down,runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR,} to &NodeCondition{Type:Ready,Status:True,LastHeartbeatTime:2019-01-31 03:25:18 +0000 UTC,LastTransitionTime:2019-01-31 03:25:18 +0000 UTC,Reason:KubeletReady,Message:kubelet is posting ready status,}
I0131 03:25:26.425755       1 node_lifecycle_controller.go:816] Node kn-es-11 ReadyCondition updated. Updating timestamp.
I0131 03:25:33.377721       1 service_controller.go:639] Detected change in list of current cluster nodes. New node set: map[kn-es-11:{} kn-default-5:{} kn-default-9:{} kn-default-4:{} kn-default-7:{} kn-es-14:{} kn-es-16:{} kn-infra-1:{} kn-es-15:{} kn-default-10:{} kn-default-6:{} kn-default-2:{} kn-es-12:{} kn-default-12:{} kn-default-11:{} kn-infra-2:{} kn-default-1:{} kn-default-3:{} kn-infra-0:{} kn-default-0:{} kn-default-8:{} kn-es-13:{}]

Updated load balancer configuration by Service Controller

Hi,
We have a kubernetes cluster version 1.9.2 deployed using acs engine template.
And we are using nginx ingress as a load balancer for our services.
Few times we have observed that all the services are going down at once and come back within few seconds. No pod restarts observed during that time.
When we checked the kubernetes events, we stumbled upon something which read " Load balancer configuration updated by Service Controller".
Please help us understand if this is the one which is making all the services go down or there might be some other reason .

Documentation for `AZURE_ENVIRONMENT_FILEPATH`

Why is this needed
When using the cloud provider on Azure Stack the API services are available at different endpoints than those at Azure Public Cloud. These endpoints need to be provided to the azure-sdk-for-go in an additional json file.

Usually this file is set using the environment variable AZURE_ENVIRONMENT_FILEPATH with the value of /etc/kubernetes/azurestackcloud.json or similar and has a certain format and parameters that are available for configuration.

This is yet to be documented, and other than reverse engineering the AKS Engine it would be impossible to "guess" how this works.

Describe the solution you'd like in detail
Create documentation describing the AZURE_ENVIRONMENT_FILEPATH and its contents.

[e2e] Switch current perl based presumit e2e tests to kubetest

The current e2e tests (job name: pull-cloud-provider-azure-e2e) we are running for cloud-provider presubmit PRs is based on perl scripts. We should switch it to kubetest.

Work items:

  • Replace the current perl script with kubetest in make test-e2e
  • Update docs of how to run it locally in e2e-tests.md
  • Add a new page for pull-cloud-provider-azure-e2e in testgrid

How to replace aadClientSecret?

Out current aadClientSecret expired and we have created a new one. However after updating cloud config file and restarting kubelets we still an error:

Failed to provision volume with StorageClass "standard": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://xxxx/disks/kubernetes-dynamic-pvc-....?api-version=2017-03-30: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided....}

Manually logging with credentials from cloud config works fine. It there any kind of cache that needs to be cleared?

[e2e] Refine periodic job for basic conformance tests

The current periodic job azure-master-conformance is also based on perl scripts. We should switch to kubetest.

Note: the job is running e2e conformance tests from Kubernetes repo.

  • Switch current perf script method to kubetest
  • Rename current job azure-master-conformance to cloud-provider-azure-conformance

External kubernetes dependencies

This is the meta-issue for tracking all the external dependencies, which are blockers for full standalone azure cloud provider.

cross reference KEP for out-of-tree Azure cloud provider.

API throttling

Instance metadata service could help to reduce API throttling issues, and increase the nodes initialization duration. This is especially helpful for large clusters.

But with CCM, this is not possible anymore because the above functionality has been moved to cloud controller manager. We should add this back into Kubelet.

This issue is being tracked under kubernetes/cloud-provider#30 and the KEP for supporting IMDS is kubernetes/enhancements#1158.

Credential provider

Azure credential provider is still required (set by kubelet --azure-container-registry-config)

AzureDisk and AzureFile CSI drivers

AzureDisk and AzureFile volume plugins are still in-tree, but even with --external-cloud-volume-plugin=azure configured on kube-controller-manager, AzureDisk and AzureFile are still not working. See kubernetes/kubernetes#71018 for explanations.

So to use with CCM, CSI drivers should always be used. The CSI drivers for them are tracking on the separate repos:

1. CSI on Windows (alpha in k8s v1.18) (in progress)

2. CSI Driver Migration (in-tree driver to CSI driver) (alpha in k8s v1.18) (in progress)

3. CSI drivers support on Windows

4. CSI driver integration with aks-engine (done)

Document disableOutboundSNAT

disableOutboundSNAT is added from PR kubernetes/kubernetes#75282, which is used together with standard load balancer:

Allow disable outbound SNAT when Azure standard load balancer is used together with outbound rules. 

It is supported in v1.11.9, v1.12.7, v1.13.5 and v1.14.0.

We should update the docs for it.

Hitting rate limit for "NicGet"

What happened:
We have ~150 node cluster running Kubernetes (the hard way). We are using Azure LB managed by Kubernetes controller manager. It works fine when we are only leveraging the public IP LB, but once we create a private IP LB Kubernetes service it creates extra queries for all the NICs of the nodes in the backend pool of the internal loadbalancer (~extra 150). This causes us to hit the rate limit on NicGet. Here is an example of an error message we get from kubernetes when it fails to create the LB:

rate limited(read) for operation:NicGet", ensure(namespace/haproxy-tcp): backendPoolID(/subscriptions/ID/resourceGroups/NAME/providers/Microsoft.Network/loadBalancers/NAME/backendAddressPools/NAME) - failed to ensure host inpool: "azure - cloud provider 

The azure.json cloud provider config that we are using with Kubernetes is

  "cloudProviderBackoff": true,
  "cloudProviderBackoffDuration": 6,
  "cloudProviderBackoffExponent": 1.5,
  "cloudProviderBackoffJitter": 1,
  "cloudProviderBackoffRetries": 6,
  "cloudProviderRateLimit": true,
  "cloudProviderRateLimitBucket": 200,
  "cloudProviderRateLimitBucketWrite": 100,
  "cloudProviderRateLimitQPS": 25,
  "cloudProviderRateLimitQPSWrite": 10

We have plans to move to 400 nodes support in future, so we need to find a way around this limitation.

What you expected to happen:
Being able to use a private IP and public IP LB without hitting the rate limit

How to reproduce it:

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.13.5
  • OS (e.g. from /etc/os-release): CentOS 7.5
  • Kernel (e.g. uname -a): 3.10.0-862.14.4.el7.x86_64
  • Install tools: internal tools
  • Others:

Add validations for Azure node resources

Add validations for Azure node resources, which include

  • Validate route tables are set correctly
  • Validate node's providerID are set correctly (note: vmss and vmas formats are different)
  • Validate node's publicIP
  • Validate node's privateIP

/kind testing
/milestone v1.16

Improve Attach Detach Disk Performance

On my cluster disks can take up to 10 min to attach to my node after a failover because it takes too long to detach from old node before attaching to new one. Is there anything that can be done to fix this performance issue?

[e2e] Add periodic jobs for Kubernetes conformance tests

The core codes of Azure cloud provider is still hosted in kubernetes repo (we vendored the codes here), hence we should setup periodic jobs for it.

Work items:

  • Add new job 'ci-kubernetes-e2e-conformance' (kube-controller-manager should be used in this case)
  • Add the job to testgrid pages

APIversion in Azure SDK is too restrictive

Is your feature request related to a problem?/Why is this needed
The azure-sdk-for-go used in the azure-clooiud-provider has a single API version implementation for each of compute, network and storage services. For example, the version for compute is compute/mgmt/2018-10-01.

This is very restrictive, and when using the provider on Azure Stack which does not support the latest versions of the API, registering VMs with the LoadBalancer as an example fails to do the API calls.

Describe the solution you'd like in detail
It would be beneficial to include slightly older API versions as well as the latest one, for compatibility with Azure Stack. It would be even better if the versions used to do the API calls would be configurable via the AZURE_ENVIRONMENT_FILEPATH file where the endpoints are specified.

Describe alternatives you've considered
As a shortcut, we have compiled the provider after replacing compute API versions from 2018-10-01 to 2017-03-30, without actually including the real 2017-03-30 code. This is a dirty hack, but for the purpose of the provider, it seems to do the job.

Pre existing Nodes will be removed from the LB's backend on deallocation causing them to loose outbound IP at boot up

Running k8s v1.11.5
Following this flow

  1. we create an AS with 10 VM, they all get a outbound IP defined by the AS outbound IP.
  2. we install kubelet, kubelet starts registering to the ARM API using the public IP (outbound IP)
  3. Eventually, all nodes are registered in k8s.
  4. A user creates a Service of Type LoadBalancer, the outboundIP if the nodes changes to the frontEnd IP of that LoadBalancer service.
  5. Deallocate a VM, the node disappear from the backend pool and become NotReady. It doesn't have an outboundIP defined anymore

The step above takes some time. > 24h I would say.

here's a snapshot of the k8s LB backend pool after a while, as you can see, we should have kn-es-0, kn-es-1, kn-es-2, ect, in the list and many more. Maybe this is the expected behavior, but it causes the node to not have a valid Outbound IP when it gets allocated again.

screen shot 2019-01-31 at 06 59 56

Enable intree volume e2e tests

PR #68 disables intree volume (AzureDisk) e2e tests because azure cloud config is required to pass them.

We should figure out how to get azure cloud config for e2e clusters and enable those tests after that.

Switch glide to go modules for dependency management

We should switch glide go modules to dep for vendors management. Kubernetes's staging packages should be handled carefully (e.g. refer scripts/update-dependencies.sh)

  • Replace glide with dep go modules
  • Update scripts/update-dependencies.sh
  • Update docs dependency-management.md

/help

Enable e2e tests for Azure

We should set up an e2e tests for this repo

  • Setup e2e tests infra on Azure
  • Enable e2e tests for PRs
  • Switch to test-infra for presubmit e2e tests (presubmit tests are running for pull requests in cloud-provider-azure repo) @ritazh
    • Add e2e steps to setup test-infra via Makefile
    • Update presubmit tests to the new way
    • Remove current perl scripts
  • Enable full periodic e2e tests for stable releases (periodic tests are running in backgroud for each release braches)
    • Add more test scenarios, e.g. conformance, correctness, alpha-features, autoscaling, scalability, multi-zone, slow and serial
    • Enable e2e for more releases, e.g. v1.13 and v1.12
    • Setup prow jobs for new test scenarios
  • Add Azure features e2e testings #7
    • LoadBalancer service tests with various annotations
    • ACR image pulling tests without docker secrets setting explicitly
    • Multi availability zones tests
    • Cloud provider configuration options tests (e.g. Standard LoadBalancer and VMSS)
    • Upgrading tests (e.g. cloud-provider version updates)
  • Documentation for e2e @ritazh

Document versions for load balancer annotations

Is your feature request related to a problem?/Why is this needed

/kind docs

Describe the solution you'd like in detail

Not all annotations are available for all versions, we should document the kubernetes versions for each annotation.

Describe alternatives you've considered

Additional context

[sig-storage] CSI mock volume CSI volume limit information using mock driver should report attach limit when limit is bigger than 0 7m57s

/kind failing-tests

What happened:

The following tests are failing constantly:

[sig-storage] CSI mock volume CSI volume limit information using mock driver should report attach limit when limit is bigger than 0 7m57s

test/e2e/storage/csi_mock_volume.go:352
while waiting for max volume condition on pod : &Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:pvc-volume-tester-nwdfn,GenerateName:pvc-volume-tester-,Namespace:csi-mock-volumes-8517,SelfLink:/api/v1/namespaces/csi-mock-volumes-8517/pods/pvc-volume-tester-nwdfn,UID:4a27f9f3-4e2f-11e9-b579-000d3a0385b1,ResourceVersion:7235,Generation:0,CreationTimestamp:2019-03-24 12:21:02 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[],},Spec:PodSpec{Volumes:[{my-volume {nil nil nil nil nil nil nil nil nil PersistentVolumeClaimVolumeSource{ClaimName:pvc-kbvbc,ReadOnly:false,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {default-token-z4d92 {nil nil nil nil nil &SecretVolumeSource{SecretName:default-token-z4d92,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{volume-tester k8s.gcr.io/pause:3.1 [] []  [] [] [] {map[] map[]} [{my-volume false /mnt/test  <nil> } {default-token-z4d92 true /var/run/secrets/kubernetes.io/serviceaccount  <nil> }] [] nil nil nil /dev/termination-log File Always nil false false false}],RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{attach-limit-csi-csi-mock-volumes-8517: csi-mock-volumes-8517,},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists  NoExecute 0xc001a2eff0} {node.kubernetes.io/unreachable Exists  NoExecute 0xc001a2f010}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],RuntimeClassName:nil,EnableServiceLinks:*true,},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},}
Unexpected error:
    <*errors.errorString | 0xc0002bd3e0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred
test/e2e/storage/csi_mock_volume.go:382

See logs at https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/cloud-provider-azure/125/pull-cloud-provider-azure-e2e/79#sig-storage-csi-mock-volume-csi-volume-limit-information-using-mock-driver-should-report-attach-limit-when-limit-is-bigger-than-0.

What you expected to happen:

How to reproduce it:

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

Add validations for Azure credential provider

Add validations for Azure credential provider, so that both private and public images from ACR could be pulled without setting docker secrets explicitly.

/kind testing
/milestone v1.16

Typo

docs/cloud-provider-config.md
s/Descriiption/Description/

"excludeMasterFromStandardLB": false doesn't work

When setting "excludeMasterFromStandardLB": false for 3 nodes K8s 1.13.1 cluster deployment in azure, the load balancer created from azure will never include master node.

Noted, I have removed taint from master node as well but still doesn't work
kubectl taint nodes --all node-role.kubernetes.io/master-

Repro steps:

  1. Use newest kubeadm deploy a multiple zones cluster with 3 VMs, k8s-01 is master node and k8s-02 and k8s-03 are agent nodes.
  2. Use helm deploy a nginx-ingress controller with 3 replicas
    D:>kubectl get pod -o wide
    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    k8ssea-nginx-ingress-controller-5b665b585d-k5r82 1/1 Running 0 5m18s 192.168.2.8 k8s-03
    k8ssea-nginx-ingress-controller-5b665b585d-l4lvh 1/1 Running 0 5m18s 192.168.1.9 k8s-02
    k8ssea-nginx-ingress-controller-5b665b585d-vxv4z 1/1 Running 0 5m18s 192.168.0.9 k8s-01
    k8ssea-nginx-ingress-default-backend-79b9979997-ljvh7 1/1 Running 0 5m18s 192.168.2.7 k8s-03
  3. Checked loadbalancer resource from azure portal, it shows only k8s-02 and k8s-03 as backend pool

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Setup MCR for cloud-provider-azure image

As the external cloud provider is actually used as a container image, we should setup MCR for publishing it publicly when doing our first release.

The image should probably be built from some automatic pipelines (e.g. jenkins).

Add documentation of kube-controller-manager

Add documentation of how to use Azure in kube-controller-manager, e.g.

  • What should be configured when provisioning kube-controller-manager
  • Docuement kubelet/kube-apiserver should also be configured with same
  • Link to docs created in #5 for cloud-config

Add validations for Azure standard loadbalancer

Add validations for Azure standard load balancer, which include:

  • loadBalancerSku should be standard
  • all nodes in different agent pools should be added to the SLB backends
  • Pod's outbound IPs should be same as that configured in SLB outbound rules

/kind testing
/milestone v1.16

E2e tests for service annotation service.beta.kubernetes.io/azure-load-balancer-mode

Sub-item of #7: E2e tests for service annotation service.beta.kubernetes.io/azure-load-balancer-mode.

Pre-requirements:

  • A Kubernetes cluster with at least two vmss agent pools
  • LoadBalancerSku is basic

Validation workflow:

  • Get a list of all nodes and their providerIDs
  • ProviderID example: azure:///subscriptions/<subscription-id>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachineScaleSets/<vmss-name>/virtualMachines/1
  • Get all the vmss names from all node's providerIDs (e.g. <vmss-name> in above example)
  • Skip if there're no vmss nodes
  • Choose two vmss names and validate the following steps for them:
    • Create a deployment (e.g. name: validate-lb-mode, image: nginx) and a LoadBalancer service with annotation service.beta.kubernetes.io/azure-load-balancer-mode (value is vmss name)
    • Wait and get the service's public IP address (suppose it's pip1)
    • Invoke Azure network client and list all public IPs, filter out the pip1
    • Get the ALB name from pip1's ipConfiguration
    • Get the ALB by name
    • Get the backend address pools and get the vmss name from the list of nodes (skip if there're no vmss network interfaces)
    • Check the vmss name, it should be same with node's vmss name

Add more e2e test cases

Azure related features should be tested in e2e tests, which includes

  • Different authz methods
  • Different load balancer behavior with annotations
  • Verification of various resources
    • Routes
    • Node's externalID
    • Node's publicIP
    • NSGs
  • ACR image pulling tests without docker secrets setting explicitly
  • Multi availability zones tests
  • Cloud provider configuration options tests (e.g. Standard LoadBalancer and VMSS)
  • Upgrading tests (e.g. cloud-provider version updates)

Also add tests for Azure persistent storages, e.g.

  • AzureFile
  • AzureDisk

Expanding a node pool with new nodes fails when there is a Public LB associated to some members of the node pools' AS

Running k8s v1.11.5
Following this flow,

  1. we create an AS with 10 VM, they all get an outbound IP defined by the AS outbound IP.
  2. we install kubelet, kubelet starts registering to the ARM API using the public IP (outbound IP)
  3. Eventually, all nodes are registered in k8s.
  4. A user creates a Service of Type LoadBalancer, the outboundIP if the nodes changes to the frontEnd IP of that LoadBalancer service.
  5. We add a new VM to the AS (total count 11 vms), the VM comes up with no outbound IP since it isn't part of the LB's backend pool.
  6. Kubelet will fail to communicate with the ARM API preventing the node from ever becoming healthy.

I'm not sure if there could be a way to allow all traffic to the ARM API and other required URI used by Kubelet to not use the Public IP and simply route internally within the Azure Network. like we can do with https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.