Giter Site home page Giter Site logo

azure / acs-engine Goto Github PK

View Code? Open in Web Editor NEW
1.0K 139.0 561.0 88.85 MB

WE HAVE MOVED: Please join us at Azure/aks-engine!

Home Page: https://github.com/Azure/aks-engine

License: MIT License

PowerShell 3.03% Shell 7.03% Go 73.32% Perl 2.50% Makefile 0.27% Python 0.18% Groovy 0.90% Perl 6 12.69% Dockerfile 0.08%
kubernetes dcos mesos docker swarm swarmmode orchestration containers azure

acs-engine's Introduction

Pardon our Dust!

This codebase has been deprecated in favor of aks-engine, the natural evolution from acs-engine:

https://github.com/Azure/aks-engine

All future development and maintenance will occur there as an outcome of this deprecation. We're sorry for any inconvenience!

We've moved the Kubernetes code over 100% as-is (with the exception of the boilerplate renaming overhead that accompanies such a move); we're confident this housekeeping manouver will more effectively track the close affinity between the AKS managed service and the "build and manage your own configurable Kubernetes" stories that folks use this tool for.

See you at https://github.com/Azure/aks-engine!

The historical documentation remains below.

Microsoft Azure Container Service Engine - Builds Docker Enabled Clusters

Coverage Status CircleCI GoDoc

Overview

The Azure Container Service Engine (acs-engine) generates ARM (Azure Resource Manager) templates for Docker enabled clusters on Microsoft Azure with your choice of DC/OS, Kubernetes, OpenShift, Swarm Mode, or Swarm orchestrators. The input to the tool is a cluster definition. The cluster definition (or apimodel) is very similar to (in many cases the same as) the ARM template syntax used to deploy a Microsoft Azure Container Service cluster.

The cluster definition file enables you to customize your Docker enabled cluster in many ways including:

  • Choice of DC/OS, Kubernetes, OpenShift, Swarm Mode, or Swarm orchestrators
  • Multiple agent pools where each agent pool can specify:
    • Standard or premium VM Sizes, including GPU optimized VM sizes
    • Node count
    • Virtual Machine ScaleSets or Availability Sets
    • Storage Account Disks or Managed Disks
    • OS and distro
  • Custom VNET
  • Extensions

More info, including a thorough walkthrough is here.

User guides

These guides show how to create your first deployment for each orchestrator:

These guides cover more advanced features to try out after you have built your first cluster:

Usage

Generate Templates

Usage is best demonstrated with an example:

$ vim examples/kubernetes.json

# insert your preferred, unique DNS prefix
# insert your SSH public key

$ ./acs-engine generate examples/kubernetes.json

This produces a new directory inside _output/ that contains an ARM template for deploying Kubernetes into Azure. (In the case of Kubernetes, some additional needed assets are generated and placed in the output directory.)

Code of conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

acs-engine's People

Contributors

0xmichalis avatar amanohar avatar andyzhangx avatar anhowe avatar brendandburns avatar colemickens avatar gsacavdm avatar jackfrancis avatar jackquincy avatar jchauncey avatar jessfraz avatar jiangtianli avatar jim-minter avatar junsun17 avatar karataliu avatar lachie83 avatar mboersma avatar ofiliz avatar patricklang avatar pidah avatar raveeram avatar ritazh avatar rjtsdl avatar seanknox avatar shrutir25 avatar slack avatar sozercan avatar tariq1890 avatar wbuchwalter avatar weinong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

acs-engine's Issues

Azure Application Gateway support / SSL termination

Hi,

I want to deploy Kubernetes on Azure and wanted SSL termination at load balancer level (typical as ingress object in Kubernetes).

As of now, only Azure Load Balancers (L4) are provisioned by acs. Is there a roadmap for supporting Application gateways ?

In the mean time, what is my best option to implement SSL offloading ? Eg. could I put an Application gateway in front of the loadbalancers ? or should I implement SSL termination myself by implementing an nginx container(s) specifically for this / or use something like 'dynamic SSL' plugin of Kong ?

Thx !

Generated SSL certs issues

Hi Guys,

I would like to ask a little help. I tried to provision a Kubernetes cluster with the acs-engine CLI tool. The tool siad provision is successfully ended.

 kubectl get pods --namespace=kube-system
NAME                                            READY     STATUS    RESTARTS   AGE
heapster-v1.2.0-194960081-8xrxh                 0/2       Pending   0          8m
kube-addon-manager-k8s-master-39229988-0        1/1       Running   0          8m
kube-apiserver-k8s-master-39229988-0            1/1       Running   0          9m
kube-controller-manager-k8s-master-39229988-0   1/1       Running   0          9m
kube-dns-v19-dmvvc                              0/3       Pending   0          8m
kube-dns-v19-hnrqp                              0/3       Pending   0          8m
kube-proxy-7mqm5                                1/1       Running   0          8m
kube-proxy-ebtw9                                1/1       Running   0          8m
kube-proxy-lxlbu                                1/1       Running   0          8m
kube-proxy-r5bez                                1/1       Running   0          5m
kube-scheduler-k8s-master-39229988-0            1/1       Running   0          9m
kubernetes-dashboard-1872324879-kqdsy           0/1       Pending   0          8m
 sudo docker ps
CONTAINER ID        IMAGE                                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
176cbfdf5235        gcr.io/google_containers/hyperkube-amd64:v1.4.5          "/hyperkube proxy --k"   11 minutes ago      Up 11 minutes                           k8s_kube-proxy.6534244d_kube-proxy-lxlbu_kube-system_a62e7361-abd5-11e6-80f0-000d3a2613fc_4ae5ff30
25f91309aa05        gcr.io/google_containers/pause-amd64:3.0                 "/pause"                 11 minutes ago      Up 11 minutes                           k8s_POD.d8dbe16c_kube-proxy-lxlbu_kube-system_a62e7361-abd5-11e6-80f0-000d3a2613fc_1cbcf764
0243775c437b        gcr.io/google_containers/kube-addon-manager-amd64:v5.1   "/opt/kube-addons.sh"    11 minutes ago      Up 11 minutes                           k8s_kube-addon-manager.ed858faf_kube-addon-manager-k8s-master-39229988-0_kube-system_c0133a504dee133427d4802c1f2c3314_8e1e67dc
72ca17c7edb0        gcr.io/google_containers/hyperkube-amd64:v1.4.5          "/hyperkube scheduler"   11 minutes ago      Up 11 minutes                           k8s_kube-scheduler.22257f8_kube-scheduler-k8s-master-39229988-0_kube-system_6203373493987263d369756729453b5f_9bf5a243
1119a6276383        gcr.io/google_containers/pause-amd64:3.0                 "/pause"                 11 minutes ago      Up 11 minutes                           k8s_POD.d8dbe16c_kube-scheduler-k8s-master-39229988-0_kube-system_6203373493987263d369756729453b5f_79ff11bf
158391dcd7cc        gcr.io/google_containers/hyperkube-amd64:v1.4.5          "/hyperkube controlle"   11 minutes ago      Up 11 minutes                           k8s_kube-controller-manager.954cbc53_kube-controller-manager-k8s-master-39229988-0_kube-system_ee5fb6e3d925965b0048e6cc77534a6e_e0ba6b18
af6c5c83eeef        gcr.io/google_containers/hyperkube-amd64:v1.4.5          "/hyperkube apiserver"   11 minutes ago      Up 11 minutes                           k8s_kube-apiserver.e54c022a_kube-apiserver-k8s-master-39229988-0_kube-system_1b3fae831a29391607f2e670f7f1e21a_21cb5974
cb14c133721d        gcr.io/google_containers/pause-amd64:3.0                 "/pause"                 11 minutes ago      Up 11 minutes                           k8s_POD.d8dbe16c_kube-controller-manager-k8s-master-39229988-0_kube-system_ee5fb6e3d925965b0048e6cc77534a6e_aecd055e
16d5a7e41944        gcr.io/google_containers/pause-amd64:3.0                 "/pause"                 11 minutes ago      Up 11 minutes                           k8s_POD.d8dbe16c_kube-apiserver-k8s-master-39229988-0_kube-system_1b3fae831a29391607f2e670f7f1e21a_122a8fd7
33b25ada02b6        gcr.io/google_containers/pause-amd64:3.0                 "/pause"                 11 minutes ago      Up 11 minutes                           k8s_POD.d8dbe16c_kube-addon-manager-k8s-master-39229988-0_kube-system_c0133a504dee133427d4802c1f2c3314_70fac706
5fce078d090b        gcr.io/google_containers/hyperkube-amd64:v1.4.5          "/hyperkube kubelet -"   12 minutes ago      Up 12 minutes                           jovial_hoover

I found the tons of the followed lines in a log of the hyperkube docker container:

5728 status_manager.go:450] Failed to update status for pod "_()": Get https://10.240.255.5:443/api/v1/namespaces/kube-system/pods/kube-apiserver-k8s-master-39229988-0: dial tcp 10.240.255.5:443: getsockopt: connection refused

If I try this request, I got this error:

wget https://10.240.255.5:443/api/v1/namespaces/kube-system/pods/kube-apiserver-k8s-master-39229988-0
--2016-11-16 08:36:25--  https://10.240.255.5/api/v1/namespaces/kube-system/pods/kube-apiserver-k8s-master-39229988-0
Connecting to 10.240.255.5:443... connected.
ERROR: cannot verify 10.240.255.5's certificate, issued by 'CN=ca':
  Unable to locally verify the issuer's authority.

Could you help me, what is wrong with my setup?
Thanks in advance.

Remove out hard coded image url

Since gcr.io is not reachable from China mainland, I noticed all acs-engine guides in China are suggesting users to replace gcr.io in both code and configuration files to something else, and, what's worse, re-compile acs-engine to make this work.

This is very bad experience.

If we can not solve the URL problem, we can try to remove all image URLs in code at least, for example:

DefaultKubernetesHyperkubeSpec = "gcr.io/google_containers/hyperkube-amd64:v1.4.5"

Can't list the acs created by commandline

I use the xplatform cli 0.10.0, I followed the readme docuement and created acs cluster with docker swarm sucessfully with below command
$ azure group deployment create
--name=""
--resource-group="<RESOURCE_GROUP_NAME>"
--template-file="./_output//azuredeploy.json"
--parameters-file="./_output//azuredeploy.parameters.json
The problem is that when I run azure acs list -s {subscription id} -g {resource group name}, I get empty response.
But when I use portal->new->containers->Azure Container Service to create a new docker swarm cluster with UI, I get response from command azure acs list -s {subscription id} -g {resource group name}.
Is there any differences between ACS cluster created with portal and command line?
untitled1

Kubernetes vm scale up problem

When I try to scale up the VM instances on Kubernetes via Container service, it does not allow me to create a new node and returns the error below. Besides, it messes with my cluster, and it does not allow me to reach to apiserver anymore.

Failed to save container service 'containerservice-[name]'. Error: Provisioning of resource(s) for container service 'containerservice-[name]' in resource group '[name]' failed with errors: Resource type: Microsoft.Compute/virtualMachines, name: k8s-agent-1F791599-2, id: /subscriptions/[subscriptionid]/resourceGroups/[name]/providers/Microsoft.Compute/virtualMachines/k8s-agent-1F791599-2, StatusCode: BadRequest, StatusMessage: \n {
  "error": {
    "code": "InvalidParameter",
    "target": "vmSize",
    "message": "The value of parameter vmSize is invalid."
  }
}

acstgen: kubernetes: set primaryAvailabilitySet

Ref: kubernetes/kubernetes#34526

Need to output this field in the azure.json file that is written out via CustomScriptExtension.

@anhowe: I'll need your advice on how to do this. Should we just assume that the first availability set is the one we want loadbalanced? Should we require the user to specify the primaryAvailabilitySet to point at a specific agentpool as a field in the OrchestratorProfile?

[kubernetes] unable to create cluster with custom vnet

What happened:

Creating a k8s cluster using an existing vnet, the cluster is unable to create routes in the Azure Route table, and is therefore unable to schedule any pods.

How to reproduce it:

  1. Create a custom vnet
  2. Configure the template and deploy

When the cluster is up, the nodes report as ready:

gfadmin@k8s-master-35738843-0:~$ kubectl get nodes
NAME                        STATUS                     AGE
k8s-agentpool1-35738843-0   Ready                      16h
k8s-agentpool1-35738843-1   Ready                      16h
k8s-agentpool1-35738843-2   Ready                      16h
k8s-master-35738843-0       Ready,SchedulingDisabled   16h

Wtih NetworkUnavailable message of RouteController failed tocreate a route:

gfadmin@k8s-master-35738843-0:~$ kubectl describe node k8s-master-35738843-0
Name:                   k8s-master-35738843-0
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/instance-type=Standard_D2_v2
                        beta.kubernetes.io/os=linux
                        failure-domain.beta.kubernetes.io/region=westus
                        failure-domain.beta.kubernetes.io/zone=0
                        kubernetes.io/hostname=k8s-master-35738843-0
Taints:                 <none>
CreationTimestamp:      Wed, 23 Nov 2016 18:40:52 +0000
Phase:
Conditions:
  Type                  Status  LastHeartbeatTime                       LastTransitionTime                        Reason                          Message
  ----                  ------  -----------------                       ------------------                        ------                          -------
  OutOfDisk             False   Thu, 24 Nov 2016 11:02:41 +0000         Wed, 23 Nov 2016 18:40:52 +0000   KubeletHasSufficientDisk        kubelet has sufficient disk space available
  MemoryPressure        False   Thu, 24 Nov 2016 11:02:41 +0000         Wed, 23 Nov 2016 18:40:52 +0000   KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure          False   Thu, 24 Nov 2016 11:02:41 +0000         Wed, 23 Nov 2016 18:40:52 +0000   KubeletHasNoDiskPressure        kubelet has no disk pressure
  Ready                 True    Thu, 24 Nov 2016 11:02:41 +0000         Wed, 23 Nov 2016 18:40:52 +0000   KubeletReady                    kubelet is posting ready status
  NetworkUnavailable    True    Thu, 24 Nov 2016 11:02:47 +0000         Thu, 24 Nov 2016 11:02:47 +0000   NoRouteCreated                  RouteController failed tocreate a route

Looking at the kube-controller logs (/var/log/containers):

routecontroller.go:132] Could not create route 5cb8901d-b1ac-11e6-89eb-000d3a32ff9f 10.244.2.0/24 for node k8s-master-35738843-0 after 38.691596ms: network.SubnetsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceNotFound\" Message=\"The Resource 'Microsoft.Network/virtualNetworks/subscriptions' under resource group 'ACSRG2' was not found.\"\n","stream":"stderr","time":"2016-11-23T18:51:29.914307462Z"}

Notice the error message has an malform resource: Microsoft.Network/virtualNetworks/subscriptions.

Workaround

We've deduced this to the /etc/kubernetes/azure.json expecting unqualified names for both the vnet and subnet. Instead, the fully-qualified names are present:

{
    ...
    "subnetName": "/subscriptions/76aabf62-fa6e-41ac-a2f3-5532b22811b5/resourceGroups/ACSRG2/providers/Microsoft.Network/virtualNetworks/k8s-vnet-test/subnets/k8s-subnet-test",
    "securityGroupName": "...",
    "vnetName": "/subscriptions/76aabf62-fa6e-41ac-a2f3-5532b22811b5/resourceGroups/ACSRG2/providers/Microsoft.Network/virtualNetworks/k8s-vnet-test",
    ...
}

After changing the subnet and vnet to unqualified names and restarting kubelet, we see the routes as being created and things are back to normal.

Much of the credit in debugging this goes to @jamesbak.

Documentation missing details for region(s) generated kubconfig folder

I managed to deploy a Kubernetes cluster successfully from the first time. It was actually great. I just noticed that the acs-engine created a kubconfig folder with server configuration for each supported region. I did not see the files anywhere referenced in your docs on how to use. Can you please point to the docs that provide more details what to use these files for?

acstgen: kubernetes: template: masteroutputs.t:6: function "RequiresFakeAgentOutput" not defined

After rebasing on Azure/master I had two problems doing a Kubernetes generation/deployment:

  1. I had to add "isStateful" to the agentpool definitions to avoid this error:

    error while loading /tmp/tmp.aEZwUfrU0K: error validating acs cluster from file /tmp/tmp.aEZwUfrU0K: stateless (VMSS) deployments are not supported with Kubernetes, Kubernetes requires the ability to attach/detach disks.  To fix specify 'isStateful=true'
    
  2. After fixing that, I get:

    + ./acstgen /tmp/tmp.t5g0tSw0i4
    cert creation took 9.584222059s
    error generating template /tmp/tmp.t5g0tSw0i4: template: masteroutputs.t:6: function "RequiresFakeAgentOutput" not defined
    

Missing parameter in parameters.json error with azure-cli v0.10.5

README says:

Generated templates can be deployed using the Azure XPlat CLI (v0.10 only)

I have v0.10.5, however the examples/kubernetes.json template output I got does not have "nameSuffix" parameter in _output/Kubernetes-*/azuredeploy.parameters.json file and therefore it fails with error:

zure group deployment create --template-file azuredeploy.json --parameters-file azuredeploy.parameters.json --resource-group ahmetb-k8s
info:    Executing command group deployment create
error:    Template and Deployment "parameters" objects lengths do not match
	Deployment Parameter file does not have { nameSuffix } defined.
error:   Error information has been recorded to /Users/alp/.azure/azure.err
error:   group deployment create command failed

Create service principal instructions do not work

I'm trying to create a service principal using the instructions on

https://github.com/timfpark/acs-engine/blob/master/docs/serviceprincipal.md

and are running into a number of difficulties with both the Azure CLI and the xplat Azure CLI.

With the Azure CLI, the following happens when I attempt to create a principal:

$ az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/1d3bc944-c31f-41a9-a1ac-cafea961eba5"
Error loading command module 'storage'
Resource '2bc558c2-4e9f-41fa-bcff-3c57a5a2fce4' does not exist or one of its queried reference-property objects are not present.

While on the xplat CLI, something similar happens with both the one step method:

$ azure ad sp create -n app -p password
info:    Executing command ad sp create
+ Creating application rhom                                                    
+ Creating service principal for application 83276f3b-8652-468c-95f9-3bf91982273b
error:   {"odata.error":{"code":"Request_ResourceNotFound","message":{"lang":"en","value":"Resource 'ServicePrincipal_e0d3c5bd-8ab3-400e-829b-910f562b6a23' does not exist or one of its queried reference-property objects are not present."}}}
error:   Error information has been recorded to /Users/tim/.azure/azure.err
error:   ad sp create command failed

and two step method:

$ azure ad sp create -a cea1c793-70b1-4681-aa2a-66688eadc271
info:    Executing command ad sp create
+ Creating service principal for application cea1c793-70b1-4681-aa2a-66688eadc271
error:   Error "Unexpected token  in JSON at position 0" occurred in deserializing the responseBody - "{"odata.error":{"code":"Request_ResourceNotFound","message":{"lang":"en","value":"Resource 'ServicePrincipal_b3bbe085-5ae8-41e1-939f-eeeb68450ac0' does not exist or one of its queried reference-property objects are not present."},"requestId":"373a6fcd-3257-44c6-b433-45f098f4c329","date":"2016-11-15T15:21:58"}}" for the default response.
error:   Error information has been recorded to /Users/tim/.azure/azure.err
error:   ad sp create command failed

dcos example seems broken

I am pretty new here
following the instruction and for some reason the template failed.
{
"error": {
"code": "PropertyChangeNotAllowed",
"target": "customData",
"message": "Changing property 'customData' is not allowed."
}
}

attached template files
ocdc.zip
azure portal errors
image

CoreOS with DC/OS

The DC/OS template currently generates Ubuntu virtual machines. This is currently not standard in DC/OS. This request is to modify the virtual machine OS to support CoreOS as well as Ubuntu.

As I can see, this will require modification of the cloud-config.yaml file that is parsed into the customData fields found in the template.

Documenation missing details on scale up\down scenarions

In the documentation, there was a reference on how to use the API JSON file to scale up \ down the deployment. I did not see further details. How can I add \ remove nodes to an existing agents, master or add \ remove new agent pools?

ACS engine on Linux installation issue

I tried to install ACS engine on a Linux which had docker installed already with the document below,
https://github.com/Azure/acs-engine/blob/master/docs/acsengine.md


Linux
For Linux, ensure Docker is installed, and follow the developer instructions at https://github.com/Azure/acs-engine#development-docker to build and use the ACS Engine.


After click the above link, the instruction for install ACS engine on Linux is not very clear.


Development (Docker)
The easiest way to get started developing on acs-engine is to use Docker. If you already have Docker or "Docker for {Windows,Mac}" then you can get started without needing to install anything extra.
•Windows (PowerShell): .\scripts\devenv.ps1
•Linux (bash): ./scripts/devenv.sh
This setup mounts the acs-engine source directory as a volume into the Docker container. This means that you can edit your source code normally in your favorite editor on your machine, while still being able to compile and test inside of the Docker container (the same environment used in our Continuous Integration system).
Here's a quick demo video showing the dev/build/test cycle with this setup.


Do we need to export any path or parameters on Linux? Where is devenv.sh located at? It would be great if there will be a step-by-step installation guide as Windows or OSX. Current installation guide for Linux is not very clear.

acstgen: kubernetes: consider switching to `kubeadm`

Filing this for ongoing and future discussion/consideration.

This could dramatically reduce the amount of code we maintain to deploy Kubernetes.

Advantages:

  • removing 90+% of the yaml we have in the repo for Kubernetes (cloud-config, certs, etc)
  • not having to worry about drift between upstream addons and our addon yamls

Disadvantages:

  • no HA option in kubeadm (if it even matters)
  • no easy way to get a kubeconfig out (yet)
  • unclear which addons are deployed

Allow custom script execution when provisioning nodes

For my use case I would like to create a single ARM template which deploys an ACS cluster (DC/OS orchestrator) and some additional services directly provisioned on top of Marathon.

I've tried to add an Custom Script Extension to the Mesos master. This works although there is no way to resolve the VM's resource name due to generated 8 HEX string to make the resources unique.
Issue #53 would solve this particular problem, but I think it would be cleaner if it's somehow possible to provide an reference to a script.

[kubernetes] k8s cluster starts working after nodes are restarted

I created a sp through az ad sp create-for-rbac --role contributor --scopes /subscriptions/xxx-yyy-zzz, then I deployed a k8s cluster through the portal UI. After the boxes were up, I ssh'ed into master node and:

Unable to connect to the server: dial tcp 40.68.165.173:443: i/o timeout

but az login was working fine with my sp account! Confused, I tried restarting the k8s api server using docker restart foo, and suddenly the k8s api server was responding. Albeit all nodes were not ready!

NAME                    STATUS                     AGE
k8s-agent-a21727d1-0    NotReady                   27s
k8s-agent-a21727d1-1    NotReady                   30s
k8s-agent-a21727d1-2    NotReady                   31s
k8s-master-a21727d1-0   Ready,SchedulingDisabled   27s

I rebooted agent-1 from the web portal UI .. a minute later

NAME                    STATUS                     AGE
k8s-agent-a21727d1-0    NotReady                   6m
k8s-agent-a21727d1-1    Ready                      6m
k8s-agent-a21727d1-2    NotReady                   7m
k8s-master-a21727d1-0   Ready,SchedulingDisabled   6m

I didn't yet reboot the rest of nodes in case anyone wants to take a look. If I were to guess, It seems k8s cluster was up before AAD had fully replicated the sp account? and surprisingly, k8s does not auto-retry, but somehow gets stuck!

Allow to parametrize ephemeral disk layout for dcos nodes.

Hi
it would be helpfully if acs-engine would allow to parametrize ephemeral disk layout for dcos. Currently we can manually change values inside the file dcoscustomdata184.t

layout:
 -50
 -50

and next rebuild project and generate deploy templates. But we will end up with all nodes with the same layout. One of the disk (ephemeral0.1) is used as mesos directory (/var/lib/mesos) and second (ephemeral0.2) is used as docker directory (/var/lib/docker). Some times there can be more optimal layouts.
For example in our scenario (cassandra on dcos) we would like to have

  • masters layout: -50/-50
  • public agent layout: -30/-70
  • private agent layout: -90/-10 (cassandra uses /var/lib/mesos for storing data)

Currently workaround is manually edit generated deploy templates.

dcos cluster deployed fail in Azure China

Hi guys,
I used acs-engine to generate acs template and deploy to China, while at the last step, it showed the following information:

info: Resource 'dcos-agentpublic-nsg-31045692' of type 'Microsoft.Network/networkSecurityGroups' provisioning status is Succeeded
info: Resource 'dcos-agentprivate-nsg-31045692' of type 'Microsoft.Network/networkSecurityGroups' provisioning status is Succeeded
error: connect EAGAIN 42.159.198.85:443 - Local (0.0.0.0:17141)
error: Error information has been recorded to /home/steven/.azure/azure.err
error: group deployment create command failed

/home/steven/.azure/azure.err file details:

2016-11-19T16:13:31.024Z:
{ [Error: connect EAGAIN 42.159.198.85:443 - Local (0.0.0.0:17141)]
stack: [Getter/Setter],
code: 'EAGAIN',
errno: 'EAGAIN',
syscall: 'connect',
address: '42.159.198.85',
port: 443,
__frame:
{ name: '__1',
line: 79,
file: '/usr/local/lib/node_modules/azure-cli/lib/commands/arm/group/group.deployment.js',
prev:
{ name: '__1',
line: 52,
file: '/usr/local/lib/node_modules/azure-cli/lib/commands/arm/group/group.deployment.js',
prev: undefined,
calls: 2,
active: false,
offset: 26,
col: 24 },
calls: 96,
active: false,
offset: 2,
col: 44 },
rawStack: [Getter] }
Error: connect EAGAIN 42.159.198.85:443 - Local (0.0.0.0:17141)
<<< async stack >>>
at __1 (/usr/local/lib/node_modules/azure-cli/lib/commands/arm/group/group.deployment.js:81:45)
at __1 (/usr/local/lib/node_modules/azure-cli/lib/commands/arm/group/group.deployment.js:78:25)
<<< raw stack >>>
at Object.exports._errnoException (util.js:907:11)
at exports._exceptionWithHostPort (util.js:930:20)
at connect (net.js:865:16)

Could you please share any ideas to help on that? Thanks a lot!

Still being bitten by the docker socket bug

RE: moby/moby#23793

Just hit this on one of my deployments:

● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─clear_mount_propagation_flags.conf, overlay.conf
   Active: failed (Result: exit-code) since Thu 2016-11-03 04:56:35 UTC; 8min ago
     Docs: https://docs.docker.com
 Main PID: 4061 (code=exited, status=1/FAILURE)

Nov 03 04:56:35 k8s-agent-13086297-1 systemd[1]: Starting Docker Application Container Engine...
Nov 03 04:56:35 k8s-agent-13086297-1 docker[4061]: time="2016-11-03T04:56:35.613750839Z" level=fatal msg="no sockets found via socket activation: make sure the service was started by systemd"
Nov 03 04:56:35 k8s-agent-13086297-1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 03 04:56:35 k8s-agent-13086297-1 systemd[1]: Failed to start Docker Application Container Engine.
Nov 03 04:56:35 k8s-agent-13086297-1 systemd[1]: docker.service: Unit entered failed state.
Nov 03 04:56:35 k8s-agent-13086297-1 systemd[1]: docker.service: Failed with result 'exit-code'.

Obviously kubelet didn't come up either.

k8s: windows: instance name differs from computer name, breaking things

Let's consider a Windows node, as it were deployed by Anthony's Windows branch: https://github.com/Azure/acs-engine/tree/anhowe-wink8s.

k8s-windowspool2-250945250

This node is identified in Azure by k8s-windowspool2-250945250.

This node has a hostname of 25094acs0901 (aka, the MachineName in Windows is 25094acs0901)

The kubelet is overriding the hostname so that the apiserver identifies it as k8s-windowspool2-250945250.

However, the Kubernetes control plane expects to be able to resolve the registered hostname back to the actual machine (this is used for logs and remote pod connectivity). Unfortunately, since the hostname in Windows is 25094acs0901, there is never an internal DNS record created for k8s-windowspool2-250945250, hence the errors we observed during KubeCon that look like this:

Get https://k8s-agentpool2-299742071:10250/containerLogs/default/wordpressapp-ext-h9t8c/wordpressapp-ext?timestamps=true: dial tcp: lookup k8s-agentpool2-299742071 on 168.63.129.16:53: no such host

The first fix that comes to mind is just not overriding the hostname, but then the Routes won't be created properly because the cloudprovider expects to be able to look up the VM instance via ARM using the hostname.

Why is Azure so incredibly limiting when it comes to Windows hostnames? The 15 character limit stems from NetBIOS which, afaik, hasn't been used in Windows for a very long time.

Think about vendoring

We now have external dependencies in this project.

We should strongly considering vendoring them into the repo.

This re-presents the problem of this repo being used as both a package and a standalone CLI.

Generally you want to vendor for binaries, but not vendor for packages. We may want to revisit splitting the CLI part out of this into a separate tool.

Output directory should be named by dns-prefix rather than random numbers

This has two benefits:

  1. We would warn if the user is generating a template with a dnsPrefix that already exists. Means they're probably going to have a conflict at deploy-time, and/or they forgot to update the dns prefix in the model to be different for their next cluster.

  2. It's a lot more intuitive. I still don't like the fact that we expose this randomly generated number to users in the first place.

cc: @anhowe what do you think?

Service Principal Check before Template Generation

It would be beneficial to add a check routine into the acs-engine that will validate the Service Principal prior to generating the templates. This will reduce issue as template will build out successfully even in SP is incorrect.

Kubernetes: Open questions regarding HA, cluster scaling, cluster upgrades and distro/security upgrades

I took a fast look onto the acs-engine and the generated RG templates. First thing: Thumbs up for supporting Kubernetes :) Now I got some open questions. I hope this is the right place to ask them.

  1. How is HA with Kubernetes handled in ACS/acs-engine? All the docs speak about a single Kubernetes master and from what I understand from kubernetesmastercustomdata.yml, only a single non-HA master is set up. In the cluster definition doc however, there is the count field for the master profile, which suggests that HA is supported.

  2. How can we later scale the number of nodes in the cluster? Should we change the number of worker nodes in the generated ARM template? Or should we regenerate the template with a new cluster definition and update the RG with the new template? Does the ACS portal provide this? If we do apply custom ARM templates, can we be sure that Azure does NOT kill/recreate anything important (e.g. route table or LB rules)?

  3. How can we later upgrade to never versions of Kubernetes? Will there be any support from ACS/acs-engine or will this be a completely manual process? Will there be a difference between major/minor/patch level upgrades of Kubernetes?

  4. How ca we ensure the distro has it's latest security patches and bug fixes installed?

nameSuffix parameter missing from azuredeploy.parameters.json file

The nameSuffix field is missing from the azuredeploy.parameters.json file following template generation. It is in the azuredeploy.json file and causes the following error when following the documentation:

info:    Executing command group deployment create
error:    Template and Deployment "parameters" objects lengths do not match 
	Deployment Parameter file does not have { nameSuffix } defined.
error:   Error information has been recorded to <path>/.azure/azure.err
error:   group deployment create command failed

The error is solved by manually adding the nameSuffix field to the azuredeploy.parameters.json file.

[kubernetes] systemctl stop kubelet.service

Unable to stop kubelet.service via systemctl on k8s master node.

> systemctl stop kubelet.service
> systemctl status kubelet.service
Nov 22 01:29:19 k8s-master-1479775666-0 systemd[1]: Stopping Kubelet...
Nov 22 01:29:19 k8s-master-1479775666-0 docker[26894]: Error response from daemon: No such container: kubelet

Should be a fairly easy fix if you name the docker image when starting:
https://github.com/Azure/acs-engine/blob/master/parts/kubernetesmastercustomdata.yml#L323

acstgen: consider: replace base64/gzip trick with proper inlining in ARM template

I'm not a huge fan of the b64+gzip trick we're using to shove the addons and customscript into the cloudconfig file.

Unfortunately, the alternative is raising the string expression limit in ARM again (not sure if they're willing). Also requires then figuring out how to do the indentation correctly into the YAML file, since the contents of the file would need to be inserted, with each line having the proper indentation.

Might not be worth the effort, if the only benefit is increasing the readability of the template outputs ever so slightly, but worth consideration.

Question: Kubernetes and a custom VNET

Hi Guys,

Great to see this open sourced (and in go)! I'm looking at using this package to deploy production kubernetes clusters, however the inability to deploy to a custom VNET in another resource group is a bit of a problem for us. I was wondering why this isn't possible? is it a constraint inherited from the resources deployed by the generated template, or a feature that needs adding in code?

If it's a feature request in acs-engine then i'm happy to muck in and add this. If you can point me at the problem.

Cheers,

Morgan

Can't change default DCOS package download URL

Hi,

I used acs-engine to generate dcos ARM json file, the default package download address as following:

https://dcosio.azureedge.net/dcos/testing/bootstrap/${BOOTSTRAP_ID}.bootstrap.tar.xz
https://az837203.vo.msecnd.net/dcos-deps/docker-engine_1.11.2-0~xenial_amd64.deb
https://az837203.vo.msecnd.net/dcos-deps/ipset_6.29-1_amd64.deb
https://az837203.vo.msecnd.net/dcos-deps/libltdl7_2.4.6-0.1_amd64.deb
https://az837203.vo.msecnd.net/dcos-deps/unzip_6.0-20ubuntu1_amd64.deb

While in China, I have to using China local address to download these packages, but even I updated urls in parts/dcosprovision.sh file, the generated dcos json file still keeps the original dcosio and az837203 address.

What is the proper way to automatically generate custom URL in DCOS json file? Currently I updated the json file manually.

Support for AzureGerman/Custom Clouds

Hi,
Trying to run ACS on AzureGermanCloud and found some missing parts:

  • /parts/kubernetesmastercustomscript.sh#L53 lacks "cloud": "${CLOUD_NAME}", essential to get kubelet working
  • generated certificates do not have custom domains like cloudapp.microsoftazure.de
  • generated ubuntu version is no longer available, "osImageVersion": "16.04.201610200" runs fine

All of this could be done by hand, but fixing this would be nice.
Thanks!

Fix flow in templates

fix flow in templates so there is not so much jumping between the logic in the templates and the go code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.