kubernetes-sigs / cloud-provider-azure Goto Github PK

View Code? Open in Web Editor NEW

258.0 23.0 270.0 129.93 MB

Cloud provider for Azure

Home Page: https://cloud-provider-azure.sigs.k8s.io/

License: Apache License 2.0

Makefile 0.79% Go 97.10% Shell 1.56% Perl 0.07% Dockerfile 0.17% Python 0.29% Smarty 0.03%

cloud-providers kubernetes azure k8s-sig-cloud-provider

cloud-provider-azure's Introduction

Cloud provider for Azure

Introduction

This repository provides the Azure implementation of the Kubernetes cloud provider interface.

This is the "external" or "out-of-tree" cloud provider for Azure. The "in-tree" cloud provider has been deprecated since v1.20 and only bug fixes are allowed in its Kubernetes repository directory.

Current status

cloud-provider-azure has been GA since v1.0.0. Releases are available from the Microsoft Container Registry (MCR).

The latest release of azure-cloud-controller-manager and azure-cloud-node-manager can be found at

mcr.microsoft.com/oss/kubernetes/azure-cloud-controller-manager:v1.30.4
mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager:v1.30.4

Version matrix

(Minor release versions match Kubernetes minor release versions.)

Kubernetes version	cloud-provider version	cloud-provider branch
master	N/A	master
v1.y.x	v1.y.z	release-1.y
v1.30.x	v1.30.z	release-1.30
v1.29.x	v1.29.z	release-1.29
v1.28.x	v1.28.z	release-1.28
v1.27.x	v1.27.z	release-1.27

AKS version matrix

The table below shows the cloud-controller-manager and cloud-node-manager versions supported in Azure Kubernetes Service (AKS).

AKS version	cloud-controller-manager version	cloud-node-manager version
v1.30.x	v1.30.4	v1.30.0
v1.29.x	v1.29.8	v1.29.4
v1.28.x	v1.28.10	v1.28.9
v1.27.x	v1.27.18	v1.27.17

Build

To build the binary for azure-cloud-controller-manager:

make all

To build the Docker image for azure-cloud-controller-manager:

IMAGE_REGISTRY=<registry> make image

For detailed directions on image building, please read here.

Run

To run azure-cloud-controller-manager locally:

azure-cloud-controller-manager \
    --cloud-provider=azure \
    --cluster-name=kubernetes \
    --controllers=*,-cloud-node \
    --cloud-config=/etc/kubernetes/cloud-config/azure.json \
    --kubeconfig=/etc/kubernetes/kubeconfig \
    --allocate-node-cidrs=true \
    --configure-cloud-routes=true \
    --cluster-cidr=10.240.0.0/16 \
    --route-reconciliation-period=10s \
    --leader-elect=true \
    --secure-port=10267 \
    --v=2

To run azure-cloud-node-manager locally:

azure-cloud-node-manager \
    --node-name=$(hostname) \
    --wait-routes=true

It is recommended to run azure-cloud-controller-manager as a Deployment with multiple replicas, or directly with kubelet as static Pods on each control plane Node. See here for an example.

Get more detail at Deploy Cloud Controller Manager.

E2E tests

Please read the following documents for e2e test information:

Documentation

Refer https://cloud-provider-azure.sigs.k8s.io/ for the documentation of Cloud Provider Azure (documents are hosted in documentation branch).

Contributing

Please see CONTRIBUTING.md for instructions on how to contribute.

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

License

Apache License 2.0.

cloud-provider-azure's People

Contributors

Stargazers

Watchers

Forkers

karataliu andyzhangx adamdang kai-ci feiskyer mirake fseldow liyongxin andyliuliming justaugustus cpuguy83 nikhita poothia ra489 mohmdnofal ahaeber xiezongzhe mikeweiwei mooncak agilebot1 cdoan1 joewrightss jbayer sharukhsuhail dcvtruong dharmab bhanditz puneeshmotwani smimenon ritazh jaybatra26 joelsmith jchauncey abhik1998 rnsv webspider nashim220 aramase saiyan86 liuysh1985 cecilerobertmichon charlesakalugwu alena1108 ohadmuch walbeh mboersma invidian kothapeta xuto2 gkaleta ncolon kkmsft ialidzhikov juan-lee jadarsie yulng venkatx5 finnfunz garce1 s682 wilhelmguo sivanzcw bourne-id sylr make-bin thomasqin2090 oliwheeler omrisnyk andrelsnyk jluk tobias860 stysiok gardener-attic sozercan dklesev eriksywu sarun87 c3y1huang palma21 alexeldeib sakuralbj cpanato tatianamsnyk ec-snyk zeromagic gtxistxgao niachary nearora-msft jesusalvareztorres mirzasikander devigned isabella232 yjuns jsturtevant edreed sfowl jackfrancis m1kola gauravaggrhari gauravaggrahari

cloud-provider-azure's Issues

A storageAccount API call blocks kubelet from becoming Ready when there isn't any outbound IP

Running k8s v1.11.5
following this flow, it was discovered that an API call to the ARM API will block kubelet from properly registering the node as Ready for 10mins.

we create an AS with 10 VM, they all get a outbound IP defined by the AS outbound IP.
we install kubelet, kubelet starts registering to the ARM API using the public IP (outbound IP)
Eventually, all nodes are registered in k8s.
A user creates a Service of Type LoadBalancer, the outboundIP if the nodes changes to the frontEnd IP of that LoadBalancer service.
Deallocate a VM, the node disappear from the backend pool and become NotReady. It doesn't have an outboundIP defined anymore
The VM is started again. The kubelet service gets stuck trying to talk to the ARM API

.222271    2413 azure_auth.go:59] azure: using managed identity extension to retrieve the access token
.222303    2413 azure.go:219] Azure cloudprovider (read ops) using rate limit config: QPS=25, bucket=200
.222370    2413 azure.go:223] Azure cloud provider (write ops) using rate limit config: QPS=10, bucket=100
.222483    2413 azure.go:280] Azure cloud provider using retry backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000

eventually (after 10mins), the kubelet registers again... there is a call to the ARM API done by azure_blobDiskController.go:70 which blocks for 10mins, before letting kubelet complete with its starting logic and making the node as Ready

70    2413 azure.go:223] Azure cloud provider (write ops) using rate limit config: QPS=10, bucket=100
.222483    2413 azure.go:280] Azure cloud provider using retry backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000

azure_blobDiskController.go:70] azureDisk - getAllStorageAccounts error: storage.AccountsClient#ListByResourceGroup: Failure sending request: StatusCode=0 -- Original Error: Get https://management.azure.com/subscriptions/<subID>/resourceGroups/<resourceGroup>/providers/Microsoft.Storage/storageAccounts?api-version=2017-10-01:  dial tcp 52.232.180.115:443: i/o timeout

.610372    2413 server.go:526] Successfully initialized cloud provider: "azure" from the config file: "/etc/kubernetes/azure.json"
.610414    2413 server.go:772] cloud provider determined the current node name to be kn-es-11
.661299    2413 bootstrap.go:52] Kubeconfig /var/lib/kubelet/kubeconfig exists and is valid, skipping bootstrap

after that, the Kube-Controller-Manager, adds the node back to the SLB backend pool since it has become Ready. giving back the outbound IP for the VM

I0131 03:25:26.425690       1 node_lifecycle_controller.go:808] ReadyCondition for Node kn-es-11 transitioned from &NodeCondition{Type:Ready,Status:False,LastHeartbeatTime:2019-01-31 03:24:18 +0000 UTC,LastTransitionTime:2019-01-31 03:24:18 +0000 UTC,Reason:KubeletNotReady,Message:container runtime is down,runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR,} to &NodeCondition{Type:Ready,Status:True,LastHeartbeatTime:2019-01-31 03:25:18 +0000 UTC,LastTransitionTime:2019-01-31 03:25:18 +0000 UTC,Reason:KubeletReady,Message:kubelet is posting ready status,}
I0131 03:25:26.425755       1 node_lifecycle_controller.go:816] Node kn-es-11 ReadyCondition updated. Updating timestamp.
I0131 03:25:33.377721       1 service_controller.go:639] Detected change in list of current cluster nodes. New node set: map[kn-es-11:{} kn-default-5:{} kn-default-9:{} kn-default-4:{} kn-default-7:{} kn-es-14:{} kn-es-16:{} kn-infra-1:{} kn-es-15:{} kn-default-10:{} kn-default-6:{} kn-default-2:{} kn-es-12:{} kn-default-12:{} kn-default-11:{} kn-infra-2:{} kn-default-1:{} kn-default-3:{} kn-infra-0:{} kn-default-0:{} kn-default-8:{} kn-es-13:{}]

Can't create a Persistent Volumes (Standard_LRS or Premium_LRS) with size that's greater than 1000Gi

We tried to spin up a statefulset that has a persistentVolumeClaimTemplate that is bigger than 1000Gi and we could never mount it to the pod

Our Kubernetes version is v1.10.7 and we use acs-engine to boostrap the cluster

Let us know if you need any info from us.

Steps to reproduce: Create any pod with PVC that is 2000Gi

Updated load balancer configuration by Service Controller

Hi,
We have a kubernetes cluster version 1.9.2 deployed using acs engine template.
And we are using nginx ingress as a load balancer for our services.
Few times we have observed that all the services are going down at once and come back within few seconds. No pod restarts observed during that time.
When we checked the kubernetes events, we stumbled upon something which read " Load balancer configuration updated by Service Controller".
Please help us understand if this is the one which is making all the services go down or there might be some other reason .

Documentation for `AZURE_ENVIRONMENT_FILEPATH`

Why is this needed
When using the cloud provider on Azure Stack the API services are available at different endpoints than those at Azure Public Cloud. These endpoints need to be provided to the azure-sdk-for-go in an additional json file.

Usually this file is set using the environment variable AZURE_ENVIRONMENT_FILEPATH with the value of /etc/kubernetes/azurestackcloud.json or similar and has a certain format and parameters that are available for configuration.

This is yet to be documented, and other than reverse engineering the AKS Engine it would be impossible to "guess" how this works.

Describe the solution you'd like in detail
Create documentation describing the AZURE_ENVIRONMENT_FILEPATH and its contents.

[e2e] Switch current perl based presumit e2e tests to kubetest

The current e2e tests (job name: pull-cloud-provider-azure-e2e) we are running for cloud-provider presubmit PRs is based on perl scripts. We should switch it to kubetest.

Work items:

Replace the current perl script with kubetest in make test-e2e
Update docs of how to run it locally in e2e-tests.md
Add a new page for pull-cloud-provider-azure-e2e in testgrid

Documentation regarding CCM for Azure dependencies is out of date

https://github.com/kubernetes/cloud-provider-azure/blob/master/docs/cloud-controller-manager.md says:

It depends on cloud-controller-manager app and azure cloud provider.

However, azure cloud provider link does not work anymore, since providers other than openstack has been removed from master kubernetes: https://github.com/kubernetes/kubernetes/tree/master/pkg/cloudprovider/providers.

How to replace aadClientSecret?

Out current aadClientSecret expired and we have created a new one. However after updating cloud config file and restarting kubelets we still an error:

Failed to provision volume with StorageClass "standard": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://xxxx/disks/kubernetes-dynamic-pvc-....?api-version=2017-03-30: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided....}

Manually logging with credentials from cloud config works fine. It there any kind of cache that needs to be cleared?

[e2e] Refine periodic job for basic conformance tests

The current periodic job azure-master-conformance is also based on perl scripts. We should switch to kubetest.

Note: the job is running e2e conformance tests from Kubernetes repo.

Switch current perf script method to kubetest
Rename current job azure-master-conformance to cloud-provider-azure-conformance

Build binaries with Bazel

Add bazel support
Add auto-generation scritps
Add documention

Missing CONTRIBUTING.md file

All K8s subrepositories should have a CONTRIBUTING.md file, which at the minimum should point to https://github.com/kubernetes/community/blob/master/contributors/guide/README.md. Care should be taken that all information is in sync with the contributor guide.

Subrepositories may also have contributing guidelines specific to that repository. They should be explicitly documented and explained in the CONTRIBUTING.md

Ref: kubernetes/community#1832

External kubernetes dependencies

This is the meta-issue for tracking all the external dependencies, which are blockers for full standalone azure cloud provider.

cross reference KEP for out-of-tree Azure cloud provider.

API throttling

Instance metadata service could help to reduce API throttling issues, and increase the nodes initialization duration. This is especially helpful for large clusters.

But with CCM, this is not possible anymore because the above functionality has been moved to cloud controller manager. We should add this back into Kubelet.

This issue is being tracked under kubernetes/cloud-provider#30 and the KEP for supporting IMDS is kubernetes/enhancements#1158.

Credential provider

Azure credential provider is still required (set by kubelet --azure-container-registry-config)

On tracking at kubernetes/kubernetes#58034
Being tracked under kubernetes/cloud-provider#13.
KEP kubernetes/enhancements#1406

AzureDisk and AzureFile CSI drivers

AzureDisk and AzureFile volume plugins are still in-tree, but even with --external-cloud-volume-plugin=azure configured on kube-controller-manager, AzureDisk and AzureFile are still not working. See kubernetes/kubernetes#71018 for explanations.

So to use with CCM, CSI drivers should always be used. The CSI drivers for them are tracking on the separate repos:

1. CSI on Windows (alpha in k8s v1.18) (in progress)

2. CSI Driver Migration (in-tree driver to CSI driver) (alpha in k8s v1.18) (in progress)

3. CSI drivers support on Windows

4. CSI driver integration with aks-engine (done)

Deploy CSI drivers for AzureDisk/AzureFile for cloud-controller-manager

Document disableOutboundSNAT

disableOutboundSNAT is added from PR kubernetes/kubernetes#75282, which is used together with standard load balancer:

Allow disable outbound SNAT when Azure standard load balancer is used together with outbound rules.

It is supported in v1.11.9, v1.12.7, v1.13.5 and v1.14.0.

We should update the docs for it.

Hitting rate limit for "NicGet"

What happened:
We have ~150 node cluster running Kubernetes (the hard way). We are using Azure LB managed by Kubernetes controller manager. It works fine when we are only leveraging the public IP LB, but once we create a private IP LB Kubernetes service it creates extra queries for all the NICs of the nodes in the backend pool of the internal loadbalancer (~extra 150). This causes us to hit the rate limit on NicGet. Here is an example of an error message we get from kubernetes when it fails to create the LB:

rate limited(read) for operation:NicGet", ensure(namespace/haproxy-tcp): backendPoolID(/subscriptions/ID/resourceGroups/NAME/providers/Microsoft.Network/loadBalancers/NAME/backendAddressPools/NAME) - failed to ensure host inpool: "azure - cloud provider

The azure.json cloud provider config that we are using with Kubernetes is

  "cloudProviderBackoff": true,
  "cloudProviderBackoffDuration": 6,
  "cloudProviderBackoffExponent": 1.5,
  "cloudProviderBackoffJitter": 1,
  "cloudProviderBackoffRetries": 6,
  "cloudProviderRateLimit": true,
  "cloudProviderRateLimitBucket": 200,
  "cloudProviderRateLimitBucketWrite": 100,
  "cloudProviderRateLimitQPS": 25,
  "cloudProviderRateLimitQPSWrite": 10

We have plans to move to 400 nodes support in future, so we need to find a way around this limitation.

What you expected to happen:
Being able to use a private IP and public IP LB without hitting the rate limit

How to reproduce it:

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.13.5
OS (e.g. from /etc/os-release): CentOS 7.5
Kernel (e.g. uname -a): 3.10.0-862.14.4.el7.x86_64
Install tools: internal tools
Others:

Support external-cloud-volume-plugin in acs-engine

Verify if it helps on transition.

Document known issues of each release

We should add documentation of known issues of each release, e.g.

load balancer and nsg issues
AzureDisk issues
AzureFile issues

To begin with, we could add SourceAddressPrefixes issue first (Refer Azure/AKS#199).

/cc @khenidak @andyzhangx @karataliu

Add validations for Azure authorization

Add e2e tests for Azure authorization, which include three types:

system-assigned MSI
user-assigned MSI
service principal

Those tests would help to ensure all the auth types won't be broken by future changes (e.g. vendor updates).

Refer https://github.com/kubernetes/cloud-provider-azure/blob/master/docs/cloud-provider-config.md#auth-configs.

/kind testing
/milestone v1.16

Enable cloud-provider-azure-serial e2e tests

Enable cloud-provider-azure-serial periodic e2e tests and public the results to testgrid.

/kind testing
/milestone v1.16

/cc @nilo19

Add validations for Azure node resources

Add validations for Azure node resources, which include

Validate route tables are set correctly
Validate node's providerID are set correctly (note: vmss and vmas formats are different)
Validate node's publicIP
Validate node's privateIP

/kind testing
/milestone v1.16

Improve Attach Detach Disk Performance

On my cluster disks can take up to 10 min to attach to my node after a failover because it takes too long to detach from old node before attaching to new one. Is there anything that can be done to fix this performance issue?

[e2e] Add periodic jobs for Kubernetes conformance tests

The core codes of Azure cloud provider is still hosted in kubernetes repo (we vendored the codes here), hence we should setup periodic jobs for it.

Work items:

Add new job 'ci-kubernetes-e2e-conformance' (kube-controller-manager should be used in this case)
Add the job to testgrid pages

APIversion in Azure SDK is too restrictive

Is your feature request related to a problem?/Why is this needed
The azure-sdk-for-go used in the azure-clooiud-provider has a single API version implementation for each of compute, network and storage services. For example, the version for compute is compute/mgmt/2018-10-01.

This is very restrictive, and when using the provider on Azure Stack which does not support the latest versions of the API, registering VMs with the LoadBalancer as an example fails to do the API calls.

Describe the solution you'd like in detail
It would be beneficial to include slightly older API versions as well as the latest one, for compatibility with Azure Stack. It would be even better if the versions used to do the API calls would be configurable via the AZURE_ENVIRONMENT_FILEPATH file where the endpoints are specified.

Describe alternatives you've considered
As a shortcut, we have compiled the provider after replacing compute API versions from 2018-10-01 to 2017-03-30, without actually including the real 2017-03-30 code. This is a dirty hack, but for the purpose of the provider, it seems to do the job.

Pre existing Nodes will be removed from the LB's backend on deallocation causing them to loose outbound IP at boot up

Running k8s v1.11.5
Following this flow

we create an AS with 10 VM, they all get a outbound IP defined by the AS outbound IP.
we install kubelet, kubelet starts registering to the ARM API using the public IP (outbound IP)
Eventually, all nodes are registered in k8s.
A user creates a Service of Type LoadBalancer, the outboundIP if the nodes changes to the frontEnd IP of that LoadBalancer service.
Deallocate a VM, the node disappear from the backend pool and become NotReady. It doesn't have an outboundIP defined anymore

The step above takes some time. > 24h I would say.

here's a snapshot of the k8s LB backend pool after a while, as you can see, we should have kn-es-0, kn-es-1, kn-es-2, ect, in the list and many more. Maybe this is the expected behavior, but it causes the node to not have a valid Outbound IP when it gets allocated again.

Document routeTableResourceGroup

New cloud provider option routeTableResourceGroup is added from kubernetes/kubernetes#75580 (in master branch).

We should update documents for it.

/kind documentation

Add disk attach/detach e2e case

Enable intree volume e2e tests

PR #68 disables intree volume (AzureDisk) e2e tests because azure cloud config is required to pass them.

We should figure out how to get azure cloud config for e2e clusters and enable those tests after that.

Reduce e2e logs file size

isn't a job failure, however that file is quite large (47MB gzipped it looks like!) which is probably an issue for our log viewer.

gubernator.k8s.io is running on a like 250mb app engine instance, it can't do > 30 MB log files very well. See kubernetes/test-infra#10214 (comment).

Also, junit output is missing from http://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-cloud-provider-azure-master/2/, hence no results is shown in the page.

Enable cloud-provider-azure-multiple-zones periodic e2e tests

Enable cloud-provider-azure-multiple-zones periodic e2e tests and public the results to testgrid.

Refer kubernetes/enhancements#586.

/kind testing
/milestone v1.16

/cc @nilo19

Enable cloud-provider-azure-cross-node-groups periodic e2e tests

Enable cloud-provider-azure-cross-node-groups periodic e2e tests and public the results to testgrid.

Refer kubernetes/enhancements#604.
/kind testing
/milestone v1.16

Switch glide to go modules for dependency management

We should switch ~~glide~~ go modules to dep for vendors management. Kubernetes's staging packages should be handled carefully (e.g. refer scripts/update-dependencies.sh)

Replace glide with ~~dep~~ go modules
Update scripts/update-dependencies.sh
Update docs dependency-management.md

/help

Enable lint check for tests/e2e

Enable e2e tests for Azure

We should set up an e2e tests for this repo

Add documentation for azure service annotations

Azure cloud provider supports a lot of annotations: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure_loadbalancer.go#L36-L83, we should document them.

Also refer kubernetes/website#8154

Document versions for load balancer annotations

Is your feature request related to a problem?/Why is this needed

/kind docs

Describe the solution you'd like in detail

Not all annotations are available for all versions, we should document the kubernetes versions for each annotation.

Describe alternatives you've considered

Additional context

Add documentations for Azure cloud provider config file

All params are listed here

Add / update issue and PR templates

/help
/sig azure

[sig-storage] CSI mock volume CSI volume limit information using mock driver should report attach limit when limit is bigger than 0 7m57s

/kind failing-tests

What happened:

The following tests are failing constantly:

[sig-storage] CSI mock volume CSI volume limit information using mock driver should report attach limit when limit is bigger than 0 7m57s

test/e2e/storage/csi_mock_volume.go:352
while waiting for max volume condition on pod : &Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:pvc-volume-tester-nwdfn,GenerateName:pvc-volume-tester-,Namespace:csi-mock-volumes-8517,SelfLink:/api/v1/namespaces/csi-mock-volumes-8517/pods/pvc-volume-tester-nwdfn,UID:4a27f9f3-4e2f-11e9-b579-000d3a0385b1,ResourceVersion:7235,Generation:0,CreationTimestamp:2019-03-24 12:21:02 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[],},Spec:PodSpec{Volumes:[{my-volume {nil nil nil nil nil nil nil nil nil PersistentVolumeClaimVolumeSource{ClaimName:pvc-kbvbc,ReadOnly:false,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {default-token-z4d92 {nil nil nil nil nil &SecretVolumeSource{SecretName:default-token-z4d92,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{volume-tester k8s.gcr.io/pause:3.1 [] []  [] [] [] {map[] map[]} [{my-volume false /mnt/test  <nil> } {default-token-z4d92 true /var/run/secrets/kubernetes.io/serviceaccount  <nil> }] [] nil nil nil /dev/termination-log File Always nil false false false}],RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{attach-limit-csi-csi-mock-volumes-8517: csi-mock-volumes-8517,},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists  NoExecute 0xc001a2eff0} {node.kubernetes.io/unreachable Exists  NoExecute 0xc001a2f010}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],RuntimeClassName:nil,EnableServiceLinks:*true,},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},}
Unexpected error:
    <*errors.errorString | 0xc0002bd3e0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred
test/e2e/storage/csi_mock_volume.go:382

See logs at https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/cloud-provider-azure/125/pull-cloud-provider-azure-e2e/79#sig-storage-csi-mock-volume-csi-volume-limit-information-using-mock-driver-should-report-attach-limit-when-limit-is-bigger-than-0.

What you expected to happen:

How to reproduce it:

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Add validations for Azure credential provider

Add validations for Azure credential provider, so that both private and public images from ACR could be pulled without setting docker secrets explicitly.

/kind testing
/milestone v1.16

Typo

docs/cloud-provider-config.md
s/Descriiption/Description/

"excludeMasterFromStandardLB": false doesn't work

When setting "excludeMasterFromStandardLB": false for 3 nodes K8s 1.13.1 cluster deployment in azure, the load balancer created from azure will never include master node.

Noted, I have removed taint from master node as well but still doesn't work
kubectl taint nodes --all node-role.kubernetes.io/master-

Repro steps:

Use newest kubeadm deploy a multiple zones cluster with 3 VMs, k8s-01 is master node and k8s-02 and k8s-03 are agent nodes.
Use helm deploy a nginx-ingress controller with 3 replicas
D:>kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
k8ssea-nginx-ingress-controller-5b665b585d-k5r82 1/1 Running 0 5m18s 192.168.2.8 k8s-03
k8ssea-nginx-ingress-controller-5b665b585d-l4lvh 1/1 Running 0 5m18s 192.168.1.9 k8s-02
k8ssea-nginx-ingress-controller-5b665b585d-vxv4z 1/1 Running 0 5m18s 192.168.0.9 k8s-01
k8ssea-nginx-ingress-default-backend-79b9979997-ljvh7 1/1 Running 0 5m18s 192.168.2.7 k8s-03
Checked loadbalancer resource from azure portal, it shows only k8s-02 and k8s-03 as backend pool

Please allow loadbalancer SKU to be set via an annotation instead of only cloud_config

Currently the only way to set this is by setting the cloud_config to have the sku set to standard. I'd like to be able to set this as an annotation instead of globally only.

[e2e] Add presumit e2e jobs for cloud provider

Add new jobs for e2e test from cloud-provider-azure repo. The test cases are under tests/e2e.

Add a new job in prow named pull-cloud-provider-azure-e2e-ccm
Add the job to testgrid page (also named pull-cloud-provider-azure-e2e-ccm)

/assign @ritazh

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Setup MCR for cloud-provider-azure image

As the external cloud provider is actually used as a container image, we should setup MCR for publishing it publicly when doing our first release.

The image should probably be built from some automatic pipelines (e.g. jenkins).

Add documentation of kube-controller-manager

Add documentation of how to use Azure in kube-controller-manager, e.g.

What should be configured when provisioning kube-controller-manager
Docuement kubelet/kube-apiserver should also be configured with same
Link to docs created in #5 for cloud-config

Add validations for Azure standard loadbalancer

Add validations for Azure standard load balancer, which include:

loadBalancerSku should be standard
all nodes in different agent pools should be added to the SLB backends
Pod's outbound IPs should be same as that configured in SLB outbound rules

/kind testing
/milestone v1.16

Move repo to use dep + bazel for dependency mgmt

/help

E2e tests for service annotation service.beta.kubernetes.io/azure-load-balancer-mode

Sub-item of #7: E2e tests for service annotation service.beta.kubernetes.io/azure-load-balancer-mode.

Pre-requirements:

A Kubernetes cluster with at least two vmss agent pools
LoadBalancerSku is basic

Validation workflow:

Get a list of all nodes and their providerIDs
ProviderID example: azure:///subscriptions/<subscription-id>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachineScaleSets/<vmss-name>/virtualMachines/1
Get all the vmss names from all node's providerIDs (e.g. <vmss-name> in above example)
Skip if there're no vmss nodes
Choose two vmss names and validate the following steps for them:
- Create a deployment (e.g. name: validate-lb-mode, image: nginx) and a LoadBalancer service with annotation service.beta.kubernetes.io/azure-load-balancer-mode (value is vmss name)
- Wait and get the service's public IP address (suppose it's pip1)
- Invoke Azure network client and list all public IPs, filter out the pip1
- Get the ALB name from pip1's ipConfiguration
- Get the ALB by name
- Get the backend address pools and get the vmss name from the list of nodes (skip if there're no vmss network interfaces)
- Check the vmss name, it should be same with node's vmss name

Add more e2e test cases

Azure related features should be tested in e2e tests, which includes

Also add tests for Azure persistent storages, e.g.

AzureFile
AzureDisk

Update testing resource group name for pull-cloud-provider-azure-e2e

Should add time stamp into group name. Currently it only contains pr number and commit, a '/retest' usually wouldn't pass because previous resource group is not deleted yet.

Expanding a node pool with new nodes fails when there is a Public LB associated to some members of the node pools' AS

Running k8s v1.11.5
Following this flow,

we create an AS with 10 VM, they all get an outbound IP defined by the AS outbound IP.
we install kubelet, kubelet starts registering to the ARM API using the public IP (outbound IP)
Eventually, all nodes are registered in k8s.
A user creates a Service of Type LoadBalancer, the outboundIP if the nodes changes to the frontEnd IP of that LoadBalancer service.
We add a new VM to the AS (total count 11 vms), the VM comes up with no outbound IP since it isn't part of the LB's backend pool.
Kubelet will fail to communicate with the ARM API preventing the node from ever becoming healthy.

I'm not sure if there could be a way to allow all traffic to the ARM API and other required URI used by Kubelet to not use the Public IP and simply route internally within the Azure Network. like we can do with https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview

kubernetes-sigs / cloud-provider-azure Goto Github PK

cloud-provider-azure's Introduction

Cloud provider for Azure

Introduction

Current status

Version matrix

AKS version matrix

Build

Run

E2E tests

Documentation

Contributing

Code of conduct

License

cloud-provider-azure's People

Contributors

Stargazers

Watchers

Forkers

cloud-provider-azure's Issues

API throttling

Credential provider

AzureDisk and AzureFile CSI drivers

1. CSI on Windows (alpha in k8s v1.18) (in progress)

2. CSI Driver Migration (in-tree driver to CSI driver) (alpha in k8s v1.18) (in progress)

3. CSI drivers support on Windows

4. CSI driver integration with aks-engine (done)

Recommend Projects

Recommend Topics

Recommend Org