azure / aks Goto Github PK

View Code? Open in Web Editor NEW

1.9K 403.0 292.0 2.96 MB

Azure Kubernetes Service

Home Page: https://azure.github.io/AKS/

Ruby 7.75% HTML 84.27% SCSS 7.97%

aks's People

Contributors

Stargazers

Watchers

Forkers

sepppenner jblackburn21 acehack madhanrm jungho lvnilesh slack derekperkins lastcoolnameleft agolomoodysaada weinong lalo cloudmelon strtdusty dstrebel carmenkr9 opbarbo knee-berts marrobi nepomuceno martinpolivka atullalwani r3ziel senthilkungumaraj daverendon mlongarai alika ultimateboy stephanedelisle kalyan-alamuru cyberfighter-org tafkas001 ysrmh kairaazure palma21 dariuszbz ashishsharma303 jason-epp sandeep1987elex timja vakappas spartyindetroit patocl acapde itmuba jluk cmaxoutis fxwldts hellohyy sakthi-vetrivel birrana359 gkmanish jdaruix maherjendoubi azurementor kothapeta asubmani satyavel tejans debajyotiguha78 digitpk doneladams alex-doerfler kevinmiles bmoore-msft mikkelhegn oncloudtraining xvalleycorp sauryadas digitalarche blackhu269 tonicipriani bandarusrinivas phillipgibson palmerabollo christian7877 jennyljy niloshima-srivastav 03181962 https-github-com-vfulitod176641 kaismalique ta162 shyampolineni nolll77 indiejay dbradish-microsoft mothaibatacungmua naypriola venubattula rheehot dummy-andra kularora rodrigobrito shashiraon varun-kum prakash-git-devops peter-ngo juan-lee karthik-mut eriksywu

aks's Issues

An additional resource group is created when deploying AKS

When deploying an AKS cluster, an additional resource group is created.

The resource group that I created and deployed AKS to:

az resource list -g oguzp-aks
Name       ResourceGroup    Location    Type                                        Status
---------  ---------------  ----------  ------------------------------------------  --------
oguzp-aks  oguzp-aks        westus2     Microsoft.ContainerService/managedClusters

The resource group that was created automatically:

az resource list -g MC_oguzp-aks_oguzp-aks_westus2
Name                                                                 ResourceGroup                   Location    Type                                          Status
-------------------------------------------------------------------  ------------------------------  ----------  --------------------------------------------  --------
agentpool1-availabilitySet-14710316                                  MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Compute/availabilitySets
aks-agentpool1-14710316-0_OsDisk_1_fff6a42716dd4dc0a1032afc1cb67091  MC_OGUZP-AKS_OGUZP-AKS_WESTUS2  westus2     Microsoft.Compute/disks
aks-agentpool1-14710316-1_OsDisk_1_ff41571a0143470bbfc3b62653df5c2c  MC_OGUZP-AKS_OGUZP-AKS_WESTUS2  westus2     Microsoft.Compute/disks
aks-agentpool1-14710316-2_OsDisk_1_eec530dff34d4bdf80bbac8e74f5e07d  MC_OGUZP-AKS_OGUZP-AKS_WESTUS2  westus2     Microsoft.Compute/disks
aks-agentpool1-14710316-0                                            MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Compute/virtualMachines
aks-agentpool1-14710316-0/cse0                                       MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Compute/virtualMachines/extensions
aks-agentpool1-14710316-0/OmsAgentForLinux                           MC_OGUZP-AKS_OGUZP-AKS_WESTUS2  westus2     Microsoft.Compute/virtualMachines/extensions
aks-agentpool1-14710316-1                                            MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Compute/virtualMachines
aks-agentpool1-14710316-1/cse1                                       MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Compute/virtualMachines/extensions
aks-agentpool1-14710316-1/OmsAgentForLinux                           MC_OGUZP-AKS_OGUZP-AKS_WESTUS2  westus2     Microsoft.Compute/virtualMachines/extensions
aks-agentpool1-14710316-2                                            MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Compute/virtualMachines
aks-agentpool1-14710316-2/cse2                                       MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Compute/virtualMachines/extensions
aks-agentpool1-14710316-2/OmsAgentForLinux                           MC_OGUZP-AKS_OGUZP-AKS_WESTUS2  westus2     Microsoft.Compute/virtualMachines/extensions
aks-agentpool1-14710316-nic-0                                        MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Network/networkInterfaces
aks-agentpool1-14710316-nic-1                                        MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Network/networkInterfaces
aks-agentpool1-14710316-nic-2                                        MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Network/networkInterfaces
aks-agentpool-14710316-nsg                                           MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Network/networkSecurityGroups
aks-agentpool-14710316-routetable                                    MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Network/routeTables
aks-vnet-14710316                                                    MC_oguzp-aks_oguzp-aks_westus2  westus2     Microsoft.Network/virtualNetworks

VNet Peering

When creating a cluster via az aks, it seems to show up as a single managed container service in my resource group. If I wanted to expose the Kubernetes services via an App Gateway in a separate virtual networks or peer the Kubernetes services with other virtual networks to access applications running in them, what is the suggested method.

We currently use acs-engine and the kubernetes cluster had a virtual network which could be used to peer with other portions of our infrastructure. This does not seem to be the case here.

Deployment failed

I'm trying to create a new AKS. However you are presenting an error below.
az aks create -g rgrdaks -n rdaks --generate-ssh-keys --agent-count 1

Error:
Deployment failed. Correlation ID: 61b0ff88-a742-4718-ab08-7fe0329a9f8e. getAndWaitForManagedClusterProvisioningState error:

Custom VNet for AKS cluster

Is there a way to specify a custom VNet for a cluster created via AKS? When this cluster is created, it is in a VNet where the address space is 10.0.0.0/8. This basically means that if I have any other VNets in the 10. space, they cannot be peered with this cluster. With the ACS service, we were able to create custom VNet ranges with acs-engine if necessary.

Pods stuck in pending state

My AKS instance does not finish initialization when passing an already existing RG and SP.

I'm creating the AKS instance the following way. I can't create new RG in my subscription (company policy) but I have access to a RG I have contributor access to and a SP with contributor access in the same RG.

az aks --debug create -g <mygroup> -n aks-006 -l ukwest -c 2 -s Standard_D2_v2 --service-principal <serv-prin> --client-secret <client-secret>
az aks get-credentials --resource-group <mygroup> --name aks-006

Then I try to peek into the created cluster

> kubectl get pods --all-namespaces=true
NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
kube-system   heapster-553147743-cjj87                0/2       Pending   0          10m
kube-system   kube-dns-v20-1654923623-0mvpn           0/3       Pending   0          10m
kube-system   kube-dns-v20-1654923623-96vb0           0/3       Pending   0          10m
kube-system   kubernetes-dashboard-3427906134-ncj12   0/1       Pending   0          10m

> kubectl get nodes
No resources found.

Something is clearly fishy here but I don't know what to look at next.

Unable to connect to the server: net/http: TLS handshake timeout

Hi, when I create an AKS cluster, I'm receiving a timeout on the TLS handshake. The cluster creates okay with the following commands:

az group create --name dsK8S --location westus2

az aks create \
  --resource-group dsK8S \
  --name dsK8SCluster \
  --generate-ssh-keys \
  --dns-name-prefix dasanderk8 \
  --kubernetes-version 1.8.1 \
  --agent-count 2 \
  --agent-vm-size Standard_A2

az aks get-credentials --resource-group dsK8S --name dsK8SCluster

The response from the create command is a JSON object:
{
"id": "/subscriptions/OBFUSCATED/resourcegroups/dsK8S/providers/Microsoft.ContainerService/managedClusters/dsK8SCluster",
"location": "westus2",
"name": "dsK8SCluster",
"properties": {
"accessProfiles": {
"clusterAdmin": {
"kubeConfig": "OBFUSCATED"
},
"clusterUser": {
"kubeConfig": "OBFUSCATED"
}
},
"agentPoolProfiles": [
{
"count": 2,
"dnsPrefix": null,
"fqdn": null,
"name": "agentpool1",
"osDiskSizeGb": null,
"osType": "Linux",
"ports": null,
"storageProfile": "ManagedDisks",
"vmSize": "Standard_A2",
"vnetSubnetId": null
}
],
"dnsPrefix": "dasanderk8",
"fqdn": "dasanderk8-d55f0987.hcp.westus2.azmk8s.io",
"kubernetesVersion": "1.8.1",
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": "OBFUSCATED"
}
]
}
},
"provisioningState": "Succeeded",
"servicePrincipalProfile": {
"clientId": "OBFUSCATED",
"keyVaultSecretRef": null,
"secret": null
}
},
"resourceGroup": "dsK8S",
"tags": null,
"type": "Microsoft.ContainerService/ManagedClusters"
}

I've now torn down this cluster but this has happened three times today.

Any help?

David

AZ AKS Browse does not work from WSL (Bash for Windows)

When trying az aks browse gives an

Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
error: error upgrading connection: error dialing backend: dial tcp 10.240.0.5:10250: getsockopt: connection refused

dashboard fails to show CPU graphs

This could be a point-in-time thing, but at the moment, no CPU graphs are displayed for my AKS cluster.

it should look like ACS, but without master information:

Not possible to create aks cluster

hi,

I get the following issue when I try to deploy an aks cluster:

Deployment failed. Correlation ID: 08bb265b-1c7c-4c23-8f88-01137f803d8e. Timeout while polling for control plane provisioning status

I have run this commands:

az group create --name chronas-k8s --location westus2

az aks create --resource-group chronas-k8s --name myK8sCluster --agent-count 1 --generate-ssh-keys

Proxy via Kubernetes API is not supported, breaking preferred access to Kubernetes Dashboard

Proxy via the Kubernetes API is not currently supported. This breaks access to kubernetes-dashboard as the preferred method of access is via kubectl proxy then localhost:8001/ui.

As a workaround, users can run az aks browse to access the Dashboard pod via kubectl port-forward.

TLS error after upgrade of Kubernetes version

I've had an aks cluster up for a few days, working my way through different tutorials.
Most things have been ok, until I ran the following
PS> az aks upgrade -n xTestCluster -g xTestResourceGroup -k 1.8.1
As soon as this finished, all kubectl commands now return "Unable to connect to the server: net/http: TLS handshake timeout"
running
PS> az aks show --name xTestCluster --resource-group xTestResourceGroup --output table
says the upgrade succeeded.

I did notice that I put in 1.8.1 instead of 1.8.2 but I still would have expected it to work.

I ran az aks upgrade again passing in 1.8.2 but after that succeeded I still cannot connect.

I tried running get-credentials again, but that didn't help.
PS> az aks get-credentials --resource-group=xTestResourceGroup --name=xTestCluster

The website container I deployed earlier in my tests is still running, so the cluster is still up.

What is wrong with my cluster?
Is there any way to recover it other than deleting and starting over?
Are there any other tests I can run?

Managed disk encryption at REST support for AKS + default disk option (std or premium)

Top customer ask - AKS support for Managed Disk's Encryption at REST + default option for Standard or Premium disk.

Helm is not pre-installed in AKS clusters

https://docs.microsoft.com/en-us/azure/aks/kubernetes-helm

Base on the documentation it state that Helm is pre-installed in AKS clusters but when I run

helm install stable/nginx-ingress
Error: could not find tiller

I get an error and when I look at the subsystem installed in k8s Tiller isn't listed.

kubectl get pods --namespace kube-system
NAME                                    READY     STATUS             RESTARTS   AGE
heapster-b5ff6c4dd-ztxsb                2/2       Running            0          18m
kube-dns-v20-6c8f7f988b-hsq25           3/3       Running            0          18m
kube-dns-v20-6c8f7f988b-knczn           3/3       Running            0          18m
kube-proxy-9jc4t                        1/1       Running            0          15m
kube-proxy-tx6pd                        1/1       Running            0          15m
kube-svc-redirect-96tt7                 0/1       CrashLoopBackOff   7          15m
kube-svc-redirect-9nw5l                 0/1       CrashLoopBackOff   7          15m
kubernetes-dashboard-7f7d9489fc-rzlxt   0/1       CrashLoopBackOff   6          18m
tunnelfront-6mwfv                       0/1       CrashLoopBackOff   7          15m
tunnelfront-glvmz                       0/1       CrashLoopBackOff   7          15m

Error in kubernetes-dashboard

I just fired up a new AKS (West US2) from Azure portal with one agent node. API works fine but kubernetes-dashboard is not working. getting the following error when I open http://localhost:8001/ui/

Error: 'dial tcp 10.244.0.4:9090: getsockopt: connection refused'
Trying to reach: 'http://10.244.0.4:9090/'

Cluster creation fails to created Nodes

I used the following Kubernetes Walkthrough as a quickstart to stand up a cluster to federate our cluster from GCP.

Followed the following steps exactly per the documentation.

az provider register -n Microsoft.ContainerService
az group create --name myResourceGroup --location westus2
az aks create --resource-group myResourceGroup --name myK8sCluster --agent-count 1 --generate-ssh-keys
az aks get-credentials --resource-group myResourceGroup --name myK8sCluster

Everything to this point matches expected state per documentation and general observation.

However, when attempting to list the nodes in the cluster using kubectl get nodes, I get the following in both Azure Cloud Shell and locally.

~$ kubectl get nodes
No resources found.

Thinking this may have been an issue with credentials, I ran the following

 ~$ kubectl get namespaces
NAME          STATUS    AGE
default       Active    19m
kube-public   Active    19m
kube-system   Active    19m

To make things a little more confusing, when running kubectl get all from Cloud Shell, I get different results than I do from local.

From Cloud Shell

~$ kubectl get all
No resources found.

From Local Terminal

 ~$ kubectl get all
NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
svc/kubernetes   10.0.0.1     <none>        443/TCP   27m

Very frustrating.

kubectl is not working on CloudShell?

Hi, I am trying to follow your tutorial over at https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough.

I am getting early on some exception like this:
az aks create --resource-group moeKubernetesTst --name myK8sCluster --agent-count 1 --generate-ssh-keys

ERROR shortly thereafter:
Deployment failed. Correlation ID: cc5b9754-49bc-4170-bffd-23ff4f1f1612. unable to create resource group. error: resources.ProvidersClient#Register: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'c2b660ac-8354-4c31-a1c7-6dcdf8015550' with object id 'c2b660ac-8354-4c31-a1c7-6dcdf8015550' does not have authorization to perform action 'Microsoft.Compute/register/action' over scope '/subscriptions/(...)

Running kubectl would give me this:
kubectl get nodes
Error from server (BadRequest): the server rejected our request for an unknown reason

What's the matter? Thanks!

Cannot create aks cluster

When following the tutorial it results in:
The VM size of Agent is not allowed in your subscription in location 'westus2'. Agent VM size 'Standard_D2_v2' is available in locations: centraluseuap,eastus,eastus2,eastus2euap,southcentralus,southeastasia,westeurope.

when calling
az aks create --resource-group myResourceGroup --name myK8sCluster --agent-count 1 --generate-ssh-keys

Cluster no longer accepting connections

I've had a 1.7 cluster sitting there for a few days, we've been connecting to it semi-regularly but its not running any pods that weren't included by default except for Tiller.

Today we went to start doing some helm work with it and kubectl commands return Unable to connect to the server: net/http: TLS handshake timeout. Part of playing in Preview-land for sure but where I would normally hop on my ACS nodes and debug further, I'm not sure the correct method of debugging this for AKS.

Thoughts?

Clusters in West Europe unusable

Getting the same problem in west europe as we did with the UK West kubectl get all --all-namespaces

NAMESPACE     NAME                                       READY     STATUS             RESTARTS   AGE
kube-system   po/heapster-58f795c4cf-cws4n               2/2       Running            0          4m
kube-system   po/kube-dns-v20-6c8f7f988b-bgmzd           3/3       Running            0          4m
kube-system   po/kube-dns-v20-6c8f7f988b-ws8w9           3/3       Running            0          4m
kube-system   po/kube-proxy-8l982                        1/1       Running            0          4m
kube-system   po/kube-proxy-mbzk9                        1/1       Running            0          4m
kube-system   po/kube-proxy-qndmb                        1/1       Running            0          4m
kube-system   po/kube-svc-redirect-87gsb                 0/1       CrashLoopBackOff   5          4m
kube-system   po/kube-svc-redirect-tl2xv                 0/1       CrashLoopBackOff   5          4m
kube-system   po/kube-svc-redirect-xz9gx                 0/1       CrashLoopBackOff   5          4m
kube-system   po/kubernetes-dashboard-6fc8cf9586-m8kzt   0/1       CrashLoopBackOff   3          4m
kube-system   po/tunnelfront-784bc7bf8f-vcgg7            1/1       Running            0          4m

Can't access the dashboard and many of the system pods are in crashing.

Understanding AKS compared to rolling my own VMSS

Hi team,

I'm interested in understanding more about the difference between using my own VMSS to manage k8s versus AKS.

AKS brings some great benefits and cost reduction by not paying for masters, but there are a few questions:

When will Windows OS be supported for agent nodes?
Can we have multiple node types? To allow for running different workloads on different nodes.
Can you remote desktop/SSH into the nodes?
Understanding its a preview, but when I performed an aks cluster upgrade to update the kube version, my cluster and its pods went offline. This worries me about the PaaS safety of the platform.
If I roll my own VMSS, I can use the Azure Reserved Instance (RIs) system for cost reduction, but what happens if I use AKS?

Thank you

az aks get-credentials does not work on Windows

Running the following command on Windows (not WSL):

aks get-credentials --n myAKSCluster -g myAKSCluster

Results in the following error:

[Errno 13] Permission denied: 'C:\\Users\\nepeters\\AppData\\Local\\Temp\\tmp7e2dtbku'
Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\azure\cli\main.py", line 36, in main
    cmd_result = APPLICATION.execute(args)
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\azure\cli\core\application.py", line 212, in execute
    result = expanded_arg.func(params)
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\azure\cli\core\commands\__init__.py", line 377, in __call__
    return self.handler(*args, **kwargs)
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\azure\cli\core\commands\__init__.py", line 620, in _execute_command
    reraise(*sys.exc_info())
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\six.py", line 693, in reraise
    raise value
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\azure\cli\core\commands\__init__.py", line 602, in _execute_command
    result = op(client, **kwargs) if client else op(**kwargs)
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\azure\cli\command_modules\acs\custom.py", line 1288, in aks_get_credentials
    merge_kubernetes_configurations(path, additional_file.name)
  File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\site-packages\azure\cli\command_modules\acs\custom.py", line 829, in merge_kubernetes_configurations
    with open(addition_file) as stream:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\nepeters\\AppData\\Local\\Temp\\tmp7e2dtbku'

This occurs both in both an elevated and non-elevated command prompt.

Not all agents are in Ready state in UK West

Not all of the agents are being provisioned properly.

The VM is running and the Custom Script extension is in provision succeeded status.

Correlation ID: 20bbfe25-8d91-45e0-9e1d-7f0bebd60d77

az group create --name test-aks --location ukwest
az aks create -n myApp -g test-aks -c 5 -s Standard_D2_v2
az aks get-credentials --name myApp -g test-aks
kubectl get nodes

NAME                        STATUS           AGE       VERSION
aks-agentpool1-24428535-0   Ready,agent      12m       v1.7.7
aks-agentpool1-24428535-1   Ready,agent      12m       v1.7.7
aks-agentpool1-24428535-2   Ready,agent      12m       v1.7.7
aks-agentpool1-24428535-3   Ready,agent      12m       v1.7.7
aks-agentpool1-24428535-4   NotReady,agent   9m        v1.7.7

Some of the pods are in NodeLost status and I believe these are tied to the agent that is in NotReady.

kubectl get pods --all-namespaces

NAMESPACE     NAME                                    READY     STATUS     RESTARTS   AGE
kube-system   heapster-3596479800-q79t7               2/2       Running    0          10m
kube-system   kube-dns-v20-1654923623-0s07h           3/3       Running    0          10m
kube-system   kube-dns-v20-1654923623-q73v1           3/3       Running    0          10m
kube-system   kube-proxy-6fv8p                        1/1       Running    0          10m
kube-system   kube-proxy-fzl9f                        1/1       Running    0          10m
kube-system   kube-proxy-lvt7w                        1/1       Running    0          10m
kube-system   kube-proxy-lzbbq                        1/1       Running    0          10m
kube-system   kube-proxy-v9mqk                        0/1       NodeLost   0          8m
kube-system   kube-svc-redirect-0nb76                 1/1       Running    0          10m
kube-system   kube-svc-redirect-4cvm9                 1/1       Running    0          10m
kube-system   kube-svc-redirect-cqh4p                 1/1       Running    0          10m
kube-system   kube-svc-redirect-h8r24                 0/1       NodeLost   0          8m
kube-system   kube-svc-redirect-ss5bn                 1/1       Running    0          10m
kube-system   kubernetes-dashboard-3427906134-fjsfw   1/1       Running    0          10m
kube-system   tunnelfront-2j5md                       1/1       Running    0          10m
kube-system   tunnelfront-2pscj                       0/1       NodeLost   0          8m
kube-system   tunnelfront-7ds8w                       1/1       Running    0          10m
kube-system   tunnelfront-rzwp5                       1/1       Running    1          10m
kube-system   tunnelfront-sc6gg                       1/1       Running    0          10m

Dashboard info:

aks provision failed twice East US region

AKS fail with below error on East US region twice
...@Azure:~$ az aks create -g pnp-dd-compute-eastus -n pnp-dd-aks --kubernetes-version 1.8.1 --node-count 12 --generate-ssh-keys --node-vm-size Standad_F8s_v2 -l eastus --service-principal="my_appid" --client-secret="my_secret"

Exception in thread AzureOperationPoller(da199b3d-5c73-409e-86e3-4bf20ce1cb9c):
Traceback (most recent call last):
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_operation.py", line 377, in _start
self._poll(update_cmd)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_operation.py", line 464, in _poll
raise OperationFailed("Operation failed or cancelled")
msrestazure.azure_operation.OperationFailed: Operation failed or cancelled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/az/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/opt/az/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_operation.py", line 388, in _start
self._exception = CloudError(self._response)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 148, in init
self._build_error_data(response)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 164, in _build_error_data
self.error = self.deserializer('CloudErrorRoot', response).error
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 992, in call
value = self.deserialize_data(raw_value, attr_desc['type'])
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 1143, in deserialize_data
return self(obj_type, data)
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 998, in call
return self._instantiate_model(response, d_attrs)
File "/opt/az/lib/python3.6/site-packages/msrest/serialization.py", line 1090, in _instantiate_model
response_obj = response(**kwargs)
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 59, in init
self.message = kwargs.get('message')
File "/opt/az/lib/python3.6/site-packages/msrestazure/azure_exceptions.py", line 105, in message
value = eval(value)
File "", line 1, in
NameError: name 'resources' is not defined

{
"id": null,
"location": null,
"name": "7531be55-df3a-194f-8ab6-3430676f0d47",
"properties": null,
"tags": null,
"type": null
}

Set default storage class on cluster

Any PersistentVolumeClaims that don't specific a storageClass will fail on a default AKS deployment because no storage classes are marked as default.

This command would fix that:

kubectl patch storageclass default -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'```

AKS container Clouster deployemnt is failing in WestUS2 region with Operational Threshol error.

Hi,

I had a running ASK cluster in westus2 region and it was working fine . Today I checked that it is in failed state. I tried troubleshooting with no luck and deleted that and provisioned another one. But every time I am getting Operation Threshold error while launching a new AKS cluster.

Everything is good and correct at my end. Please verify and update if there is a backend problem from Azure in westus2.

This is urgent for me.

Thanks,

Deleted clusters don't seem to actually get deleted

While attempting to get a working cluster during the capacity issues I had been creating/deleting a few clusters with similar names. I was unable to delete AKS instances within the portal (or CLI) but I could delete the Resource Group the AKS instance was in and it would clean up any trace of the cluster in the portal after a few minutes.

Today, I get an error if I try to create a new cluster (now days later) using one of the names I had used/deleted a few days ago. I'm guessing resources aren't getting cleaned up in the master control plane? I have no cluster with this name/RG according to the portal atm but I get this error when running az aks create.

Operation failed with status: 'Conflict'. Details: Operation is not allowed: Cluster in a failed state

AKS capacity issues in West US 2

Update Nov 6, 15:50 PST
Capacity in westus2 has been increased; if you continue having difficulties with existing clusters, please try to deleting your cluster(s) and re-creating.

Update Nov 5, 12:05PM PST

Users should be able to create new AKS clusters in westus2. Please report any issues on this thread, thanks!

Update Nov 3, 2017 21:01pm PDT

While base compute/network capacity have been addressed, persistent HTTP errors with ARM in westus2 are preventing Azure Load Balancers via Kubernetes from obtaining public IPs. We're working with the ARM team to resolve.

Update Nov 3, 2017 17:10pm PDT

We've still in the process of rolling out additional compute and networking capacity in West US 2. We recommend deleting existing cluster and monitor this issue for updates on when to try again.

Update October 25, 2017 19:07 pm PDT

We received some good news from our capacity team and plan to both expand capacity in West US 2 and deploy AKS in additional US regions by the end of the week. Thanks for your patience with our literal growing pains!

October 25, 2017 11:00 am PDT

The AKS team is currently adding AKS capacity in West US 2 to keep up with demand. Until new capacity is in place, users on new AKS clusters won't be able to run kubectl logs, kubectl exec, and kubectl proxy.

$ kubectl logs kube-svc-redirect-hv3b0  -n kube-system
Error from server: Get https://aks-agentpool1-30179320-2:10250/containerLogs/kube-system/kube-svc-redirect-hv3b0/redirector: dial tcp 10.240.0.4:10250: getsockopt: connection refused

--agent-vm-size parameter of az aks create doesn't work:

Hi guys!

--agent-vm-size parameter of az aks create doesn't work:

az aks create -g ${ENV_NAME} -n ${K8S_CLUSTER_NAME} 
--agent-count ${K8S_AGENT_COUNT} \
--agent-vm-size Standard_D3_v2 \
--generate-ssh-keys

It is still creating a default agent size Standard_D2_v2

Thanks.

SSL issue

Trying to deploy incubator/elasticsearch chart on AKS and getting

[2017-11-08 22:51:50,585][WARN ][io.fabric8.kubernetes.client.internal.SSLUtils] SSL handshake failed. Falling back to insecure connection.

which then falls back to port 80 and fails.

Trying another chart (clockworksoul/helm-elasticsearch) also gives a similar error on SSL certs:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc not verified:
    certificate: sha1/+Ug0awk3FJZcJyQLGdW1GxZXX5Q=
    DN: CN=apiserver
    subjectAltNames: [10.0.0.1, 10.0.0.1, hcp-kubernetes, kubernetes, kubernetes.svc.default, kubernetes.svc.default.cluster.local, hcp-kubernetes.5a035234ff88ea0001117941.svc.cluster.local, jbaks02-69116066.hcp.westus2.azmk8s.io]

Deploying same chart to a cluster created with acs-engine (v1.8) works fine.

az aks scale operation failing

Scale operation is failing in WestUS2. Understandably there are to bugs being worked out, however I have been able to reproduce this a cluster created on Sunday November 5th. The cluster look relatively health with all system pods running.

The initial request / issue cam in through documentation. I am opening this issue so that the doc raised issue can be tracked. If this is related to an on-going issue, feel free to reference and close this one.

https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-scale

Command:

az aks scale --agent-count 5 --name myAKSCluster --resource-group myAKSCluster

Result:

Deployment failed. Correlation ID: ab5c25f4-fa17-40d5-a987-000000000000. getAndWaitForManagedClusterProvisioningState error: <nil>

Cluster state:

Neils-MacBook-Pro:~ neilpeterson$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                        READY     STATUS    RESTARTS   AGE
default       my-release-aci-connector-1364325002-70mb9   1/1       Running   0          16h
kube-system   heapster-342135353-fg273                    2/2       Running   0          18h
kube-system   kube-dns-v20-1654923623-7wbnl               3/3       Running   0          18h
kube-system   kube-dns-v20-1654923623-d4hgb               3/3       Running   0          18h
kube-system   kube-proxy-ptvkk                            1/1       Running   0          18h
kube-system   kube-proxy-rcm0g                            1/1       Running   0          18h
kube-system   kube-proxy-znqmw                            1/1       Running   0          18h
kube-system   kube-svc-redirect-flvx9                     1/1       Running   1          18h
kube-system   kube-svc-redirect-twvn5                     1/1       Running   1          18h
kube-system   kube-svc-redirect-wdqdh                     1/1       Running   9          18h
kube-system   kubernetes-dashboard-1672970692-vg2w2       1/1       Running   0          9h
kube-system   tiller-deploy-1936853538-8brg3              1/1       Running   0          9h
kube-system   tunnelfront-7ml34                           1/1       Running   0          18h
kube-system   tunnelfront-9xnz9                           1/1       Running   0          18h
kube-system   tunnelfront-wkjp2                           1/1       Running   0          18h

`az aks get-credentials` fails with "permission denied"

There is an open issue in the azure-cli repo. Seems to happen mostly on Windows.

Azure/azure-cli#4746

You can use this to retrieve the YAML and update your kubectl config manually:

az aks get-credentials -g <resource_group> -n <name> -f -

Issues with Multiple Components of Kubernetes Crashing

After creating the resource group cluster using AKS, multiple core processes including the dashboard are in the CrashLoopBackOff state.

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-553147743-tl502 2/2 Running 0 6h
kube-system kube-dns-v20-1654923623-1bw3h 3/3 Running 0 6h
kube-system kube-dns-v20-1654923623-c7xx9 3/3 Running 0 6h
kube-system kube-proxy-jq2lf 1/1 Running 0 6h
kube-system kube-svc-redirect-pqgx7 0/1 CrashLoopBackOff 77 6h
kube-system kubernetes-dashboard-3427906134-vxlgv 0/1 CrashLoopBackOff 83 6h
kube-system tunnelfront-rqm8f 0/1 CrashLoopBackOff 77

Attempts to install tiller have not worked. Using helm 2.6.1

kubectl --namespace kube-system create sa tiller
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller
helm init --service-account tiller --upgrade

Executing helm version:
 helm version
Client: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"}
Error: cannot connect to Tiller

SSL via Ingress on WestUS cluster doesn't work

I have a cluster in WestUS2 and UKWest. On both I have a hellworld app running and exposed via a service. On both I installed the nginx ingress controller (default values) via helm.
On both clusters I created a secret with a selfsigned certificate and key + an Ingress referring to that secret.
When I go to hostname, specified in the ingress on my UK cluster it automatically redirects to https and works fine. i can also view the selfsigned certificate.
When I go to hostname, specified in the ingress on my WestUS cluster it doesn't redirect to https and just uses http and shows me the helloworld app. But when I use https it goes to the backend service of the ingress controller and when I view the certificate it shows me a certificate with CN Kubernetes Ingress Controller Fake Certificate and is verified by Acme Co.
The ingress controller, deployment, svc, ingress and secret are exactly the same on both clusters (except for the hostname used in the ingress). The only difference is one is in UKwest and one is WestUS2... have no clue what is going wrong here.

See doc for configuration.

hwapp.txt

Unable to connect to pods via kubectl

I have an issue similar to #14 – cluster is up, pods are deployed (although had to do this multiple times due to image pull errors) but kubectl can't access pods via proxy, cp or exec:

remper@Azure:~$ az aks get-credentials --resource-group rs-frame-cluster --name frame-cluster
Merged "frame-cluster" as current context in /home/remper/.kube/config
remper@Azure:~$ kubectl get pods
NAME                                 READY     STATUS    RESTARTS   AGE
flink-jobmanager-870122904-t6mmf     1/1       Running   0          36m
flink-taskmanager-3831602672-8wffh   1/1       Running   0          36m
flink-taskmanager-3831602672-dfkt7   1/1       Running   0          36m
remper@Azure:~$ kubectl exec -it flink-jobmanager-870122904-t6mmf -- /bin/bash
Error from server: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out

Everything was working perfectly yesterday.
Is it a known issue? Will it be fixed?

Question: What is the SSH key used for on managed AKS?

Hi all,

I realise that on ACS the SSH key is used to login to the VMS themselves and perform operations on the VMS. However with managed AKS what is the SSH key used for? If it is used to log into the agents, what operations is it safe to perform on the agent hosts?

Thanks.
Andrew.

10/31/17 Issue: Groups failing to create and then being recreated

Today, trying to deploy a cluster received an error:

Deployment failed. Correlation ID: 8b0cc0fc-d4b1-42ed-a118-0b3a3843fcaa. getAndWaitForManagedClusterProvisioningState error: <nil>

Deleted the resource group in the portal and then checked my Azure subscription later (to make sure I'm not burning any costs) and the resource group was recreated; I had to manually delete the RG again. This has happened every time I tried to create a cluster today - is the service down?

David

Cannot make work the OMS tutorial

I followed the step of the tutorial:
https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-monitor
but nothing appears in the monitor solution.

I am in West US 2

Does someone completed it ?

az aks create: error: unpack requires a bytes object of length 4

Following this tutorial, when I try to create the Kubernete cluster, I get the error "az aks create: error: unpack requires an object bytes length 4"

Running with --debug, this is all output:

Command arguments ['aks', 'create', '--resource-group', 'catalogoResourceGroup', '--name', 'catalogoCluster', '--agent-count', '1', '--generate-ssh-keys']
Loading all installed modules as module with name 'aks' not found.
Installed command modules ['acr', 'acs', 'appservice', 'backup', 'batch', 'batchai', 'billing', 'cdn', 'cloud', 'cognitiveservices', 'component', 'configure', 'consumption', 'container', 'cos
mosdb', 'dla', 'dls', 'eventgrid', 'extension', 'feedback', 'find', 'interactive', 'iot', 'keyvault', 'lab', 'monitor', 'network', 'profile', 'rdbms', 'redis', 'resource', 'role', 'servicefab
ric', 'sql', 'storage', 'vm']
Current cloud config:
{'endpoints': {'active_directory': 'https://login.microsoftonline.com',
'active_directory_data_lake_resource_id': 'https://datalake.azure.net/',
'active_directory_graph_resource_id': 'https://graph.windows.net/',
'active_directory_resource_id': 'https://management.core.windows.net/',
'batch_resource_id': 'https://batch.core.windows.net/',
'gallery': 'https://gallery.azure.com/',
'management': 'https://management.core.windows.net/',
'resource_manager': 'https://management.azure.com/',
'sql_management': 'https://management.core.windows.net:8443/',
'vm_image_alias_doc': 'https://raw.githubusercontent.com/Azure/azure-rest-api-specs/master/arm-compute/quickstart-templates/aliases.json'},
'is_active': True,
'name': 'AzureCloud',
'profile': 'latest',
'suffixes': {'azure_datalake_analytics_catalog_and_job_endpoint': 'azuredatalakeanalytics.net',
'azure_datalake_store_file_system_endpoint': 'azuredatalakestore.net',
'keyvault_dns': '.vault.azure.net',
'sql_server_hostname': '.database.windows.net',
'storage_endpoint': 'core.windows.net'}}
Registered application event handler 'CommandTableParams.Loaded' at <function add_id_parameters at 0x03F936A8>
Registered application event handler 'CommandTable.Loaded' at <function add_id_parameters at 0x03F936A8>
Loaded module 'acr' in 0.013 seconds.
Loaded module 'acs' in 0.002 seconds.
Registered application event handler 'CommandParser.Parsing' at <function deprecate at 0x03FA9BB8>
Loaded module 'appservice' in 0.005 seconds.
Loaded module 'backup' in 0.002 seconds.
Loaded module 'batch' in 0.011 seconds.
Loaded module 'batchai' in 0.003 seconds.
Loaded module 'billing' in 0.002 seconds.
Loaded module 'cdn' in 0.005 seconds.
Loaded module 'cloud' in 0.001 seconds.
Loaded module 'cognitiveservices' in 0.002 seconds.
Loaded module 'component' in 0.001 seconds.
Loaded module 'configure' in 0.001 seconds.
Loaded module 'consumption' in 0.003 seconds.
Loaded module 'container' in 0.002 seconds.
Registered application event handler 'CommandParser.Parsing' at <function deprecate at 0x04075978>
Loaded module 'cosmosdb' in 0.004 seconds.
Loaded module 'dla' in 0.003 seconds.
Loaded module 'dls' in 0.002 seconds.
Loaded module 'eventgrid' in 0.002 seconds.
Loaded module 'extension' in 0.001 seconds.
Loaded module 'feedback' in 0.003 seconds.
Loaded module 'find' in 0.001 seconds.
Loaded module 'interactive' in 0.001 seconds.
Loaded module 'iot' in 0.002 seconds.
Loaded module 'keyvault' in 0.003 seconds.
Loaded module 'lab' in 0.002 seconds.
Loaded module 'monitor' in 0.003 seconds.
Loaded module 'network' in 0.010 seconds.
Loaded module 'profile' in 0.002 seconds.
Loaded module 'rdbms' in 0.003 seconds.
Loaded module 'redis' in 0.003 seconds.
Loaded module 'resource' in 0.003 seconds.
Loaded module 'role' in 0.002 seconds.
Loaded module 'servicefabric' in 0.002 seconds.
Loaded module 'sql' in 0.004 seconds.
Loaded module 'storage' in 0.012 seconds.
Loaded module 'vm' in 0.004 seconds.
Loaded all modules in 0.127 seconds. (note: there's always an overhead with the first module loaded)
Extensions directory: 'C:\Users\1545 IRON V4.azure\cliextensions'
Application event 'CommandTable.Loaded' with event data {'command_table': {'aks create': <azure.cli.core.commands.CliCommand object at 0x03FA6990>}}
Application event 'CommandParser.Loaded' with event data {'parser': AzCliCommandParser(prog='az', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_hand
ler='error', add_help=True)}
Application event 'CommandTableParams.Loaded' with event data {'command_table': {'aks create': <azure.cli.core.commands.CliCommand object at 0x03FA6990>}}
Application event 'CommandParser.Parsing' with event data {'argv': ['aks', 'create', '--resource-group', 'catalogoResourceGroup', '--name', 'catalogoCluster', '--agent-count', '1', '--generat
e-ssh-keys']}
Application event 'CommandParser.Parsed' with event data {'command': 'aks create', 'args': Namespace(_command_package='aks', _jmespath_query=None, _log_verbosity_debug=False, _log_verbosity_v
erbose=False, _output_format='json', _parser=AzCliCommandParser(prog='az aks create', usage=None, description='Create a managed Kubernetes cluster. :type dns_name_prefix: str', formatter_clas
s=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True), _validators=[<function validate_linux_host_name at 0x04FE6150>, <function validate_ssh_key [...]

Unable to delete failed clusters

After attempting to create a new cluster in westeurope, I encountered this:

Operation failed with status: 'Conflict'. Details: Public preview limit of 5 for managed cluster(AKS) has been reached for subscription 97574325-redacted in location westeurope. Please try deleting one or more managed cluster resources in this location before trying to create a new cluster or try a different Azure location.

Which was odd, since I deleted the resource groups the 5 (failed - yeah, also unable to deploy anything to westeurope - east us works fine) previous clusters belonged to.

However, when looking at the output of az aks list I still see them in their failed state. The resource groups were deleted >12 hours ago now.

[
  ...,
  {
    "id": "/subscriptions/97574325-redacted/resourcegroups/prod/providers/Microsoft.ContainerService/managedClusters/k8s",
    "location": "westeurope",
    "name": "k8s",
    "properties": {
      "accessProfiles": null,
      "agentPoolProfiles": [
        {
          "count": 3,
          "name": "nodepool1",
          "osType": "Linux",
          "storageProfile": "ManagedDisks",
          "vmSize": "Standard_DS2_v2"
        }
      ],
      "dnsPrefix": "k8s-prod-975743",
      "fqdn": null,
      "kubernetesVersion": "1.8.1",
      "linuxProfile": {
        "adminUsername": "azureuser",
        "ssh": {
          "publicKeys": [
            {
              "keyData": "redacted"
            }
          ]
        }
      },
      "provisioningState": "Failed",
      "servicePrincipalProfile": {
        "clientId": "838f6034-redacted"
      }
    },
    "resourceGroup": "prod",
    "type": "Microsoft.ContainerService/ManagedClusters"
  }
]

default storage ins't setup correctly for dynamic provisioning.

I've deployed an AKS cluster in EastUS.
And installed https://kubeapps.com/charts/stable/redis .

PVC is not getting bound.

Based on

even that cluster have 2 storageClasses: default and managed-premium

You can not create PVC without spec.storageClassName.

Right now PVC must have spec.storageClassName: default to provision PV.

Cluster creation fails in West US 2

C:\Users\walterm\Development>az aks create --resource-group aks --name myK8sCluser --agent-count 1 -s Standard_D2_V2 --location westus2
Deployment failed. Correlation ID: d6b97aa4-94bd-4738-8ebf-84986c667b0f. getAndWaitForManagedClusterProvisioningState error:

Error creating cluster

Hi, I'm trying to create a cluster and getting the following error at the moment.

kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
heapster-553147743-zpkqz 2/2 Running 0 58s
kube-dns-v20-1654923623-35mc5 2/3 Running 0 57s
kube-dns-v20-1654923623-n0kmv 2/3 Running 0 57s
kube-proxy-9tkqg 1/1 Running 0 57s
kube-proxy-x6x56 1/1 Running 0 57s
kube-svc-redirect-90dkw 0/1 Error 3 57s
kube-svc-redirect-lt301 0/1 CrashLoopBackOff 2 57s
kubernetes-dashboard-3427906134-cvnph 1/1 Running 0 57s
tunnelfront-c7rrn 0/1 Error 2 57s
tunnelfront-rk54n 0/1 CrashLoopBackOff 1 57s

Error for the kube-svc
Error from server: Get https://aks-agentpool1-39835177-0:10250/containerLogs/kube-system/kube-svc-redirect-90dkw/redirector: dial tcp 10.240.0.4:10250: getsockopt: connection refused

Also when trying to install istio

NAME READY STATUS RESTARTS AGE
istio-ca-367485603-20049 1/1 Running 0 1m
istio-egress-3571786535-1bj8w 1/1 Running 0 1m
istio-ingress-2270755287-j5t5n 1/1 Running 0 1m
istio-mixer-1505455116-mllwx 2/2 Running 0 1m
istio-pilot-2278433625-fb4k3 0/1 Error 1 1m

[eastus] Can't use kubectl after created and upgrade cluster

kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout

I got this error, also after try with curl:

curl -i https://go1-a8262be2.hcp.eastus.azmk8s.io:443
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to go1-a8262be2.hcp.eastus.azmk8s.io:443

Here is my step:

Create cluster:

az group create -n GO1AKS -l eastus
{
  "id": "/subscriptions/xxxxxx/resourceGroups/GO1AKS",
  "location": "eastus",
  "managedBy": null,
  "name": "GO1AKS",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null
}
-----
az aks create --resource-group GO1AKS --name myK8sCluster --ssh-key-value ~/.ssh/go1.pub --dns-name-prefix go1 --agent-count 1
...
"dnsPrefix": "go1",
    "fqdn": "go1-a8262be2.hcp.eastus.azmk8s.io",
    "kubernetesVersion": "1.7.7"
...

I notice the cluster running v1.7.7, so I check the versions and upgrade:

az aks get-versions --name  myK8sCluster -g GO1AKS
{
  "id": "/subscriptions/xxxx/resourcegroups/GO1AKS/providers/Microsoft.ContainerService/managedClusters/myK8sCluster/upgradeprofiles/default",
  "name": "default",
  "properties": {
    "agentPoolProfiles": [
      {
        "kubernetesVersion": "1.7.7",
        "name": null,
        "osType": "Linux",
        "upgrades": [
          "1.7.9",
          "1.8.2",
          "1.8.1"
        ]
      }
    ],
    "controlPlaneProfile": {
      "kubernetesVersion": "1.7.7",
      "name": null,
      "osType": "Linux",
      "upgrades": [
        "1.7.9",
        "1.8.2",
        "1.8.1"
      ]
    }
  },
  "resourceGroup": "GO1AKS",
  "type": "Microsoft.ContainerService/managedClusters/upgradeprofiles"
}
-----
az aks upgrade --name  myK8sCluster -g GO1AKS --kubernetes-version 1.8.2
Kubernetes may be unavailable during cluster upgrades.
Are you sure you want to perform this operation? (y/n): y
-----
az aks get-versions --name  myK8sCluster -g GO1AKS
{
  "id": "/subscriptions/xxxx/resourcegroups/GO1AKS/providers/Microsoft.ContainerService/managedClusters/myK8sCluster/upgradeprofiles/default",
  "name": "default",
  "properties": {
    "agentPoolProfiles": [
      {
        "kubernetesVersion": "1.8.2",
        "name": null,
        "osType": "Linux",
        "upgrades": null
      }
    ],
    "controlPlaneProfile": {
      "kubernetesVersion": "1.8.2",
      "name": null,
      "osType": "Linux",
      "upgrades": null
    }
  },
  "resourceGroup": "GO1AKS",
  "type": "Microsoft.ContainerService/managedClusters/upgradeprofiles"
}

After that I got this error:

~ az aks get-credentials -g GO1AKS -n myK8sCluster
Merged "myK8sCluster" as current context in /Users/sanglt/.kube/config
~ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout

azure cli aks get-credentials => save to file dos not work

I try to save the credentials for my kuberneties pod into a file with this command

az aks get-credentials --resource-group=chronas-k8s --name=myK8sCluster --file myk8sclusterConfig

and get this error:

[Errno 2] No such file or directory: ''
Traceback (most recent call last):
File "/usr/local/Cellar/azure-cli/2.0.20/libexec/lib/python3.6/site-packages/azure/cli/main.py", line 36, in main
cmd_result = APPLICATION.execute(args)
File "/usr/local/Cellar/azure-cli/2.0.20/libexec/lib/python3.6/site-packages/azure/cli/core/application.py", line 212, in execute
result = expanded_arg.func(params)
File "/usr/local/Cellar/azure-cli/2.0.20/libexec/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 377, in call
return self.handler(*args, **kwargs)
File "/usr/local/Cellar/azure-cli/2.0.20/libexec/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 620, in _execute_command
reraise(*sys.exc_info())
File "/usr/local/Cellar/azure-cli/2.0.20/libexec/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/usr/local/Cellar/azure-cli/2.0.20/libexec/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 602, in _execute_command
result = op(client, **kwargs) if client else op(**kwargs)
File "/usr/local/Cellar/azure-cli/2.0.20/libexec/lib/python3.6/site-packages/azure/cli/command_modules/acs/custom.py", line 1275, in aks_get_credentials
os.makedirs(directory)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: ''

this is happening from cloud cli in the portal and on my local mac

Using Azure Files with Kubernetes causes mount error(2): No such file or directory

mount error(2): No such file or directory is reported by kubernetes when creating a pod with a azure file volume.

I recreated this issue by following the guide Use Azure File with AKS.

After the pod and the secret is created the pod status remains "ContainerCreating" and kubectl describe says:

  Normal   Scheduled              0s    default-scheduler                   Successfully assigned azure-files-pod to aks
-agentpool1-35554641-0
  Normal   SuccessfulMountVolume  0s    kubelet, aks-agentpool1-35554641-0  MountVolume.SetUp succeeded for volume "defa
ult-token-bngqc"
  Warning  FailedMount            0s    kubelet, aks-agentpool1-35554641-0  MountVolume.SetUp failed for volume "azure"
: mount failed: exit status 32
Mounting command: mount
Mounting arguments: //xxx.file.core.windows.net/testFileShare /var/lib/kubelet/pods/5e308352-cb7b-11e7-80
ea-0a58ac1f0e1a/volumes/kubernetes.io~azure-file/azure cifs [vers=3.0,username=xxx,password=xxx,dir_mode=0777,file_mode=0777]
Output: mount error(2): No such file or directory
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)

Here is a powershell script that recreates the issue. change $account to be something unique in line 3

$group = "teststorage"
$location = "westeurope"
$account = 'testnsfasdfasdfaef' 
$namespace = 'default'
$shareName = 'testFileShare'

# create a resource group
az group create -n $group -l $location

Write-host $account
# create a storage account
az storage account create -n $account -g $group --sku Standard_LRS -l $location

# get the credentials
$credentials = (az storage account keys list --account-name $account --resource-group $group | ConvertFrom-Json)

# create a file share
az storage share create --name $shareName --account-name $account --account-key $credentials[0].value

Write-Host $credentials[0].value
$accountName = [Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($account))
$accountKey = [Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($credentials[0].value))

$yml = "
apiVersion: v1
kind: Secret
metadata:
  name: azure-storage
type: Opaque
data:
  azurestorageaccountname: $accountName
  azurestorageaccountkey: $accountKey
---
apiVersion: v1
kind: Pod
metadata:
 name: azure-files-pod
spec:
 containers:
  - image: kubernetes/pause
    name: azure
    volumeMounts:
      - name: azure
        mountPath: /mnt/azure
 volumes:
  - name: azure
    azureFile:
      secretName: azure-storage
      shareName: $shareName
      readOnly: false"
Write-host $yml

# provision the secret and the pod
echo $yml | kubectl apply --namespace $namespace -f - 

# check the result
kubectl describe pod azure-files-pod --namespace $namespace

ApiServer options

I have some questions about apiserver configuration:

--feature-gates - are they set by default values? Is it possible to turn on/off any of them (f.e. initializers)? Or will it be possible in the near future?
OIDC - is it possible to provide auth configuration (oidc-issuer-url etc)? Will it be possible in the near future?

AKS regional expansion

Using this to serve as public notice that we're likely going to expand into the following new regions to support AKS:

East US
Central US
Canada East
Canada Central

Deployment failed on upgrade cluster versions operation

I'm trying to upgrade cluster to 1.8.1 version right after creating it but experiencing this error

Deployment failed. Correlation ID: 4dacfcd4-3f8f-4371-bdcf-8a37a6d8268b. getAndWaitForManagedClusterProvisioningState error: <nil>

Maybe it is some sort of timeout issue on reporting results back to cli.
What is interesting is that when I check versions again with az aks get-versions I see that versions updated.
But from results of kubectl get nodes it shows that it's still old one

➜  azure-aks kubectl get nodes
NAME                        STATUS    ROLES     AGE       VERSION
aks-agentpool1-19217595-0   Ready     agent     2m        v1.7.7
aks-agentpool1-19217595-1   Ready     agent     2m        v1.7.7

➜  azure-aks az aks get-versions -g myResourceGroup -n myAKSCluster -o table
Name     ResourceGroup    MasterVersion    MasterUpgrades       AgentPoolVersion    AgentPoolUpgrades
-------  ---------------  ---------------  -------------------  ------------------  -------------------
default  myResourceGroup        1.7.7            1.8.1, 1.7.9, 1.8.2  1.7.7               1.8.1, 1.7.9, 1.8.2

➜  azure-aks az aks upgrade -g myResourceGroup -n myAKSCluster --kubernetes-version 1.8.1
Kubernetes may be unavailable during cluster upgrades.
Are you sure you want to perform this operation? (y/n): y
Deployment failed. Correlation ID: 4dacfcd4-3f8f-4371-bdcf-8a37a6d8268b. getAndWaitForManagedClusterProvisioningState error: <nil>

➜  azure-aks az aks get-versions -g myResourceGroup -n myAKSCluster -o table
Name     ResourceGroup    MasterVersion    MasterUpgrades    AgentPoolVersion    AgentPoolUpgrades
-------  ---------------  ---------------  ----------------  ------------------  -------------------
default  myResourceGroup        1.8.1            1.8.2, 1.8.1      1.8.1               1.8.2, 1.8.1

➜  azure-aks kubectl get nodes
NAME                        STATUS    ROLES     AGE       VERSION
aks-agentpool1-19217595-0   Ready     agent     23m       v1.7.7
aks-agentpool1-19217595-1   Ready     agent     23m       v1.7.7

Unable to get pod logs

Running a new AKS cluster in UKWest.

Running a command to get logs on a crashed pod gives me the error:
kubectl --namespace=kube-lego logs kube-lego-2933009699-p2jlm
Error from server: Get https://aks-agentpool1-17180407-0:10250/containerLogs/kub
e-lego/kube-lego-2933009699-p2jlm/kube-lego: dial tcp 10.240.0.4:10250: getsocko
pt: connection refused

Anyone else seeing similar? Haven't upgraded the cluster to 1.8 yet.

Andrew.

Provide more information during creation/deletion of a cluster

Creating (and to a lower extent deleting) an AKS cluster is a lengthy operation. Currently, one only sees "creating" for a couple of minutes.
It would be nice if there were some indication of what's going on (e.g. creating resource, setting up network, installing binaries, ....).
Even without understanding the entire process it would give a sense of progress and some understanding of why the process is taking long.
Not sure how easy it is to do with ARM deployment, but I think it would be an improved used experience.