cncf / demo Goto Github PK

View Code? Open in Web Editor NEW

77.0 35.0 39.0 1.67 MB

Demo of CNCF technologies

Home Page: https://cncf.io

License: Apache License 2.0

Shell 0.64% Python 1.12% Lua 0.02% JavaScript 94.30% HTML 0.50% CSS 0.29% HCL 3.12%

cloud-native kubernetes cncf

demo's People

Contributors

Stargazers

Watchers

demo's Issues

Custom VPC domains trip up Controller

See: kubernetes/kubernetes#23371

Stumbled across this by simply using a none default DHCP Option Set in my VPC.

Prometheus resource usage "observer effect"

Prometheus has been performing admirably with the demo for a while now - the amount of points written to it is relatively small despite the varied workload. So this was as expected.

Turns out according to prometheus/prometheus#455 the amount of resources it uses is not just bound by how much is written into it but also how much it is queried. For instance, if you open Grafana in a dozen tabs you can see memory starting to climb (it's a heavy dashboard).

The demo also recently added a sidecar that logs info from Prometheus to a cncfdemo backend, this increased the amount of resources used -- obvious in retrospect.

Finally, until now prometheus was just deployed as a regular pod in a 'monitoring' namespace, so it would end up on a random node. Including memory constrained nodes (the demo overloads some nodes by design). This causes a sort of observer effect and occasionally skews the results in a pronounced and strange way.

The obvious conclusion is to pin Prometheus and other crucial infra pods to some reserved nodes with plenty of headroom.

A simple build system

After a period of lots of little tweaks and modifications to nail down a bug the AMI count passed 100. That's getting a bit out of hand.

The workflow is to cd into the ansible directory of this repository, do packer build packer.json, wait 15 minutes for the process to run through, manually alter which AMI is referenced in the bootstrap scripts, deploy a new cluster, and manually poke around testing things out -- sometimes to only discover some minor thing is broken too late for comfort.

We're past the point of needing proper tests. And its beneficial at this point to automate some things to speed the process up.

So to get this going a simple build system is necessary. The general idea is to setup github hooks to kick off packer builds instead of doing it from a laptop. The 15 minutes will hopefully be cut down a bit, but even better is having a record of it happening.

An implementation detail

It seems wasteful to always keep a server up to listen for build hooks so the hook should instead trigger a lambda that will either forward it to the build server or notice its not up and turn it back on (via Autoscaling group of size 1 with scheduling to scale down to 0 when commits usually don't happen).

Getting the logs back out again

Remote Journal Logging - "Systemd journal can be configured to forward events to a remote server. Entries are forwarded including full metadata, and are stored in normal journal files, identically to locally generated logs. This can be used as an alternative or in addition to existing log forwarding solutions."

Where the logs are forwarded to and how they are persisted is being explored.

Benefits

Speed up debugging
Significantly reduce the need to SSH into individual members of a cluster
Easy reference and sharing of logs
Possibly write smoke tests against the logs

Influxdb-grafana addon

Background

Influxdb is a timeseries data store.
Grafana is a webapp that visualizes time series data and allows the creation of custom dashboards. It has support for several data sources, including influxdb (and more recently, Prometheus).

There are several slightly different versions of this pairing as a Kubernetes AddOn:

Neither is quite right as these examples combine what should be two separate deployments or replication controllers into one.

As a result the following race condition occurs:

Grafana container starts and is configured with a data source pointing to the influxdb service.
It fails to connect as the Influx pod is either not started or not ready yet.
As a result the dashboards are all blank with a warning marker.
The influxdb container starts.

At this point you can visit the Grafana UI and in the data source settings simply hit 'test connection'.

This forces a refresh and now data shows up in grafana and things work as expected.

Fluentd Use Cases

Outside Kubernetes

Systemd logs archiving to S3

For lack of somewhere better to send to at the moment we can start by simply archiving these logs, particularly kube-apiserver with increased verbosity could end up being useful.

The default is hourly which is not actionable for our short running demo. Going to try a 1 minute resolution.

Forward Output Plugin

Since we're going to have a small dedicated server (#145) to save demo runs and dashboards, we could include fluentd on it and use the forwarding plugin.

Within Kubernetes

Best way of collecting all the Kubernetes logs (from pods) via something like fluentd is under active discussion (kubernetes/kubernetes/issues/24677), still researching the way forward on this.

Integrate GRPC

Need to update demo app to include GRPC usage.

intermittent kubedns responses for kubernetes service endpoints (1.3.5)

kubernetes/kubernetes#28497

Tl;dr on this particular saga -- multiple people with various types of environments/clusters experience kubedns connectivity issues.

What this means is that if you lookup a service internal to the cluster it sometimes doesn't resolve. This is core functionality and a show stopper. Reproducible but currently we don't understand why (I've suggested MountVolumes which is supposed to be a background loop is blocking somehow, even captured it on video).

ConfigMap backed volumes mounted as root

Ref: kubernetes/kubernetes#2630, kubernetes/kubernetes#11319.

So lets say you have some configuration files and you throw them into a ConfigMap.
The pod you spin up mounts a volume and these files happily appear in, for example: /etc/config.

Great. Except that volume was mounted as root and your app demands different permissions to read those files.

The workaround suggested so far is to have a wrapper script do chown'ing -- clearly hackish.

Get distcc compiles working

Specifically:

pump mode
reliably
instrumented

1.3.5 missing CNI binaries

So up until now only having weave binaries in /opt/cni/bin/ worked and was documented as right way.

With 1.3.5 it silently stops working and after much head scratching and furious googling the only mention of this problem is here: kubernetes/kubernetes#30681

Turns out one has to manually pull in 0.3.0 of https://github.com/containernetworking/cni/releases into /opt/cni/bin. Repeating that info here for visibility to spare the next fellow who gets stuck on that.

More undocumented missing dependencies

kubernetes/kubernetes#26093 is just a treasure trove I wish I'd seen before!

I've been finding and adding packages in one at a time, for example:

[Error configuring cbr0: exec: "brctl": executable file not found in $PATH]

That's because brctl is part of bridge-utils, something you have to pull in. And so on. This list was sorely needed.

Ansible Provisioner via Packer can't connect to instance

When run as:
packer build -debug

It's possible to successfully connect with the generated temporary key over plain old ssh as well as do an ansible ping (ansible all -i 52.40.138.235, -m ping --user=centos --key-file=ec2_amazon-ebs.pem).

However when packer gets to the point where it starts the ansible provisioner it fails to connect.

==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Provisioning with Ansible...
==> amazon-ebs: SSH proxy: serving on 127.0.0.1:60344
==> amazon-ebs: Executing Ansible: ansible-playbook /Users/Gene/Projects/k8s/cnfn/demo/Kubernetes/Bootstrap/Ansible/playbooks/setup/main.yml -i /var/folders/ws/2xpp7b5n3vj5h69xf79hqfnw0000gn/T/packer-provisioner-ansible009016913 --private-key /var/folders/ws/2xpp7b5n3vj5h69xf79hqfnw0000gn/T/ansible-key242768954
amazon-ebs:
amazon-ebs: PLAY [all] *********************************************************************
amazon-ebs:
amazon-ebs: TASK [setup] *******************************************************************
amazon-ebs: SSH proxy: accepted connection
==> amazon-ebs: authentication attempt from 127.0.0.1:60345 to 127.0.0.1:60344 as centos using none
==> amazon-ebs: unauthorized key
==> amazon-ebs: authentication attempt from 127.0.0.1:60345 to 127.0.0.1:60344 as centos using publickey
==> amazon-ebs: authentication attempt from 127.0.0.1:60345 to 127.0.0.1:60344 as centos using publickey
==> amazon-ebs: starting sftp subsystem
amazon-ebs: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}
amazon-ebs: to retry, use: --limit @/Users/Gene/Projects/k8s/cnfn/demo/Kubernetes/Bootstrap/Ansible/playbooks/setup/main.retry
amazon-ebs:
amazon-ebs: PLAY RECAP *********************************************************************
amazon-ebs: default : ok=0 changed=0 unreachable=1 failed=0
amazon-ebs:
==> amazon-ebs: shutting down the SSH proxy
==> amazon-ebs: Terminating the source AWS instance...

Benchmarking time to deploy

Empirically, it'd be great to have a measurement of deployment time (or goal for) deployment of the demo project. In other words, identify how many minutes it typically takes someone to deploy k8s, prometheus and Countly using the cncfdemo.

Benchmark design proposal

Note: This is very tentative and is pending discussion.

Egress

HTTP Load is generated with WRK Pods, scriptable via Lua, and auto scaled by increments of 1000 rps (Each pod makes one thousand concurrent requests), up to 100 pods max.

WRK pods are pinned with the affinity mechanism to set of nodes A.

Given that this works out to under 100MB/s of traffic in aggregate -- this is not a high bar -- carefully picking the right instance type we can dial it in so this requires the predetermined amount of nodes of our choosing.

Its a good idea to have the number of nodes equal (or be a multiple of) the number of availability zones in the region the benchmark will run in. For 'us-west-2' that would be three.

Instance type selection is with the intention of picking the smallest/cheapest type that is still beefy enough to generate enough load with just those three nodes.

Ingress

Countly API pods are similarly pinned to set of nodes B. Again, as few as three nodes (with one pod per) are required. This provides redundancy but also mirrors the Egress described above and thus controls for variance in traffic between pods in different availability zones.

The autoscaling custom metric

The WRK pods report summaries with latency statistics and and failed requests.
Its possible to latch unto this error rate to provide the custom metric.

It seems at the moment this requires a bit of trickery, for example:

Pod1: 1000 requests made, 12 time outs, 31 errors
Pod2: 1000 requests made, 55 time outs, 14 errors
Pod3: 1000 requests made, 32 time outs, 55 errors

The autoscaler is actually provided a target.
Assuming we want to tolerate no more than a 10% bad requests (errors + timeouts) we'd provide a target of 100.

Based on the above the autoscaler will keep launching additional pods, the load will increase, so will the error rates and time outs, until an equilibrium is reached.

The backend (Mongo cluster)

Mongo pods are pinned to set of nodes C. These are CPU, Memory, and Disk I/O intensive. The number of these pods and nodes is fixed.

The background job (Boinc)

Boinc pods are pinned to nodes A,B but not C. They are scaled to soak up available CPU on these nodes which will otherwise be under utilized.

GCP support

Failed job garbage collection

kubectl get pods -a

This list becomes a bit messy sometimes.

Leaving a note here to consider tweaking clean ups based on these settings:
http://kubernetes.io/docs/admin/garbage-collection/

Low priority.

Intel Cluster Usage -- reset in 48 hours

If you have access to one of the 20 nodes allocated to CNCF please be advised that I'm going to wipe those instances soon and save your work.

Also fess up to installing Anaconda when I wasn't looking. :)

Container execution checks utility for use with InitContainers/readiness/liveness probes (proposal)

Handling initialization is typically shown with simple commands such as 'wget' or 'cat' and is rather straightforward.

However, for non trivial conditionals this can get hairy.

A contrived example

Consider an InitContainer that succeeds when a service responds with 3 DNS endpoints.
At first glance it is a simple nslookup servicename -ge 3 one liner. That is until you happen to use an image that doesn't bundle nslookup so you'd getent hosts servicename -ge 3 instead.

Writing bash one liners is suboptimal

What utilities can one safely rely on for the one-liner munging?
No sane style guide
Maintainability

In reality past the simple one liner people should (and do) reach for the scripting language of their choice. However, now you went from a tiny busybox InitContainer to a 300MB container that bundles python to avoid writing a little bash.

The executing checks do belong in the project yaml/json file instead of being baked into some one-off image on the side. Most of these checks for most projects probably fall into two dozen or some common patterns.

So I purpose a utility in the spirit of bc.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  annotations:
    pod.alpha.kubernetes.io/init-containers: '[
        {
            "name": "install",
            "image": "busybox",
            "command": ["k", "service", "name", "at least", "3"]
        }
]'

To be written in Go, with a small core, and extensible (so users can add custom checks via a volume).

Consider portable local cluster

@Zilman @leecalcote We were joking about you having a cluster that could fit in your backpack, but this one would probably speed up your debug cycles versus the latency of working with the cloud.

https://hackernoon.com/diy-kubernetes-cluster-with-x86-stick-pcs-b0b6b879f8a7
https://news.ycombinator.com/item?id=12985525

I will fund this as one of your deployment targets if you're interested.

Kubernetes AWS problems with multiple security groups due to tags

kubernetes/kubernetes#23339, kubernetes/kubernetes#26787

The Kubernetes Controller manages AWS resources by filtering on aws resource tags like KubernetesCluster:ClusterName. Unfortunately it does this inconsistently for different things.

8527    2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
3961    2292 aws_loadbalancer.go:191] Deleting removed load balancer listeners
4035    2292 log_handler.go:33] AWS request: elasticloadbalancing DeleteLoadBalancerListeners
1501    2292 aws_loadbalancer.go:203] Creating added load balancer listeners
1592    2292 log_handler.go:33] AWS request: elasticloadbalancing CreateLoadBalancerListeners
3129    2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancerAttributes
3214    2292 log_handler.go:33] AWS request: elasticloadbalancing ModifyLoadBalancerAttributes
4591    2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
9882    2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
1322    2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
8421    2292 aws.go:2731] Error opening ingress rules for the load balancer to the instances: Multiple tagged security groups found for instance i-04bd9c4c8aa; ensure only the k8s security group is tagged
8469    2292 servicecontroller.go:754] Failed to process service. Retrying in 5m0s: Failed to create load balancer for service default/pushgateway: Mutiple tagged security groups found for instance i-04bd9c4c8aa36270e; ensure only the k8s security group is tagged
8480    2292 servicecontroller.go:724] Finished syncing service "default/pushgateway" (419.263237ms)
lines 201-224

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L2783

// Returns the first security group for an instance, or nil
// We only create instances with one security group, so we don't expect multiple security groups.
// However, if there are multiple security groups, we will choose the one tagged with our cluster filter.
// Otherwise we will return an error.

The security groups in my case are:

k8s-minions-cncfdemo, k8s-masters-cncfdemo

They are both tagged with the cluster filter. Not expecting multiple security groups seems like a wrong (not to mentioned undocumented!) assumption.

Bit of a head scratcher.

Setting up a Bastion/Jump Server

Mosh

"Remote terminal application that allows roaming, supports intermittent connectivity"

The build server (#98) should greatly speed up things even for very stable connections because packer running within AWS will have by definition a tiny fraction of the latency than running from a laptop.

A Jump server with Mosh should hopefully be of similar benefit for flakey wifi.

Tmux

The build process is being altered to reduce the need to SSH into cluster instances, but sometimes that is inescapable. When you bring a lot of clusters up and down that becomes very tedious so its a lot easier to manage sessions with a multiplexer like tmux.

A nice potential bonus is collaboration on a session and some minor security benefits (only whitelist some traffic between the cluster(s) and the Bastion instance instead of to the entire world).

Odd little Jinja template / Yaml bug

Template file: demo/Kubernetes/API/example.yaml.j2#L81

You can combine multiple yaml files into one with the '---' document separator. Standard yaml fare, common pattern with kuberentes deployment yaml files.

I've jinja templated this one and for some reason if I don't end with a final (and it seems to my eyes unnecessary) terminating '---' in a block, the generated yaml becomes a bit weird.

It converts all the documents during the loop, except the last document (the "Job" one in this instance), that somehow gets clobbered so there's only one of it, from the last iteration only.

The '---' is a hackish thing and now the associated code that uses the generated yaml needs to make sure to throw away empty docs.

Kubelet failed to detect running docker process when name is not `docker`

v1.2 bug: #26259

This bit of indirection was surprising to find so I'm keeping a note here for reference. Should anybody run into this the workaround in the issue linked above works.

Seems to be resolved in 1.3.

Iptables/MASQUERADE support is insufficient

kubernetes/kubernetes#17084, kubernetes/kubernetes#11204, kubernetes/kubernetes#20893, kubernetes/kubernetes#15932

It appears like this is the explanation for two related but separate show stopping problems.

One problem is that kubedns eventually forwards to the outside resolver, in AWS this would be something like 172.20.0.2 -- and AWS apparently doesn't respect traffic coming from a different subnet. Since the request is originating from a pod within an overlay network with an IP of '10.x.x.x' it hangs.

The second problem is that internal resolving via kubedns works when you lookup directly against the kubedns pod. nslookup kubernetes.default <ip-of-kubednspod> works.

However routing to the kubedns service is broken. It seems for some environments and overlay settings the iptables kube-proxy writes are not quite right.

CPU spike when pulling big containers can kill nodes & the whole cluster

A Very Large Container can cause a huge CPU spike.

This is hard to pin point exactly, could be just docker pull working very hard, a kubelet bug, or something else.

Cloudwatch doesn't quite capture how bad this is, nodes freeze up to the point where you can't ssh into them. Everything becomes totally unresponsive, 'etc. Eventually (after 7 minutes in this case) it finally revs down and recovers. Except the Weave pods. Now the cluster is shot.

kubectl delete -f https://git.io/weave-kube, kubectl apply -f https://git.io/weave-kube does not help.

kubectl logs weave-net-sbbsm --namespace=kube-system weave-npc

..
time="2016-11-17T04:16:44Z" level=fatal msg="add pod: ipset [add weave-k?Z;25^M}|1s7P3|H9i;*;MhG 10.40.0.2] failed: ipset v6.29: Element cannot be added to the set: it's already added\n: exit status 1"

To be fair, the nodes are t2.micro and have handled everything so far. Perhaps this is their natural limit, retrying with larger instances.

Doc: Two links to graphics in README.md broken

Update: actually three graphics links

Under Three Groups:
- Broken link 1 should be https://github.com/cncf/demo/blob/master/docs/arch.png
- Broken link 2 should be https://github.com/cncf/demo/blob/master/docs/k8s-simpler.png
Under Software-defined networking for minions:
- Broken link 3 should be https://github.com/cncf/demo/blob/master/docs/sdn.png

Intermittent responses for Kubernetes service endpoints (postmortem)

Follow up to #63.

Beginning of Problems

At some point in time a known-good deployment stopped succeeding on newly created clusters. This was caused by several disparate issues across several versions/configurations/components.

Init containers would not progress because service availability checks would fail
A service would appear to exist (kubectl get svc) and point at pods with correct endpoints (kubectl describe service)
Attaching to pods directly for inspection would show them operating as expected
Sometimes parts would succeed, but not uniformly and with no clear pattern

The first step to check if a service is working correctly is actually a simple DNS check (nslookup service). By chance, this would often appear to be functioning as expected indicating the problem must be elsewhere (not necessarily with kubernetes).

However, not to bury the lead: running nslookup on a loop would later expose that it was timing out sporadically. That is the sort of thing that makes a bug sinister as it misdirects debugging efforts away from the problem.

Known KubeDNS Issues Encountered

Secrets volume & SELinux permissions

SELinux context was missing 'svirt_sandbox_file_t' on the secretes volume and therefore from the perspective of the KubeDNS pod /var/run/secrets/kubernetes.io/serviceaccount/ was mangled and it couldn't in turn use that to connect to the master.
Secrets volume got stale

The kube-controller is responsible for injecting the secrets volume into pods and keeping it up to date. There were/are known bugs where it would fail to do that. As a result KubeDNS would mysteriously stop working because its tokens to connect to the master had grown stale. (This sort of thing: kubernetes/kubernetes#24928)
Typo

official skydns-rc.yaml had a typo at some point with --domain= missing the trailing dot.
Scalability

It is now recommended to scale KubeDNS pods proportionally to number of nodes in a cluster.

These problems would crop up and get resolved yet errors would stubbornly persist.

kubectl logs $(kubectl --namespace=kube-system get pods | tail -n1 | cut -d' ' -f1) --namespace=kube-system --container kubedns

I0829 20:19:21.696107       1 server.go:94] Using https://10.16.0.1:443 for kubernetes master, kubernetes API: <nil>
I0829 20:19:21.699491       1 server.go:99] v1.4.0-alpha.2.1652+c69e3d32a29cfa-dirty
I0829 20:19:21.699518       1 server.go:101] FLAG: --alsologtostderr="false"
I0829 20:19:21.699536       1 server.go:101] FLAG: --dns-port="10053"
I0829 20:19:21.699548       1 server.go:101] FLAG: --domain="cluster.local."
I0829 20:19:21.699554       1 server.go:101] FLAG: --federations=""
I0829 20:19:21.699560       1 server.go:101] FLAG: --healthz-port="8081"
I0829 20:19:21.699565       1 server.go:101] FLAG: --kube-master-url=""
I0829 20:19:21.699571       1 server.go:101] FLAG: --kubecfg-file=""
I0829 20:19:21.699577       1 server.go:101] FLAG: --log-backtrace-at=":0"
I0829 20:19:21.699584       1 server.go:101] FLAG: --log-dir=""
I0829 20:19:21.699600       1 server.go:101] FLAG: --log-flush-frequency="5s"
I0829 20:19:21.699607       1 server.go:101] FLAG: --logtostderr="true"
I0829 20:19:21.699613       1 server.go:101] FLAG: --stderrthreshold="2"
I0829 20:19:21.699618       1 server.go:101] FLAG: --v="0"
I0829 20:19:21.699622       1 server.go:101] FLAG: --version="false"
I0829 20:19:21.699629       1 server.go:101] FLAG: --vmodule=""
I0829 20:19:21.699681       1 server.go:138] Starting SkyDNS server. Listening on port:10053
I0829 20:19:21.699729       1 server.go:145] skydns: metrics enabled on : /metrics:
I0829 20:19:21.699751       1 dns.go:167] Waiting for service: default/kubernetes
I0829 20:19:21.700458       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0829 20:19:21.700474       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0829 20:19:26.691900       1 logs.go:41] skydns: failure to forward request "read udp 10.32.0.2:49468->172.20.0.2:53: i/o timeout"

Known Kubernetes Networking Issues Encountered

Initial Checks

Kubernetes imposes the following fundamental requirements on any networking implementation:

all containers can communicate with all other containers without NAT

all nodes can communicate with all containers (and vice-versa) without NAT

the IP that a container sees itself as is the same IP that others see it as

_{- Networking in Kubernetes}

In other words, to make sure networking is not seriously broken/misconfigured check:

Pods are being created / destroyed
Pods are able to ping each other

At first blush these were looking fine, but pod creation was sluggish (30-60 seconds), and that is a red flag.

Missing Dependencies

As described in #62, at some version CNI folder started missing binaries.

More undocumented dependencies (#64) were found from staring at logs and noting weirdness.
The real important ones are (conntrack-tools, socat, bridge-utils), these things are now being pinned down upstream.

The errors were time consuming to understand because often their phrasing would leave something to be desired. Unfortunately there's at least one known false-positive warning (kubernetes/kubernetes#23385).

Cluster CIDR overlaps

--cluster-cidr="": CIDR Range for Pods in cluster.
--service-cluster-ip-range="": CIDR Range for Services in cluster.

In my case services got a /16 starting on 10.0.0.0, the cluster-cidr got a 16 on 10.244.0.0.
The service cidr is routable because kube-proxy is constantly writing iptable rules on every minion.

For Weave in particular --ipalloc-range needs to be passed to exactly match what's given to the Kubernetes cluster-cidr.

Whatever your network overlay, it must not clobber the service range!

Iptables masquerade conflicts

Flannel

If using Flannel be sure to follow the newly documented instructions:
DOCKER_OPTS="--iptables=false --ip-masq=false"

Kube-proxy makes extensive use of masquerading rules, similar to an overlay clobbering the service range, another component (like the docker daemon itself) mucking about with masq rules will cause unexpected behavior.

Weave

Weave was originally erronously started with --docker-endpoint=unix:///var/run/weave/weave.sock which similarly caused unexpected behavior. This flag is extraneous and has to be omitted when used with CNI.

Final Configuration

Image

Centos7 source_ami: ami-bec022de

Dependencies

SELinux disabled.

Yum installed:

docker
etcd
conntrack-tools
socat
bridge-utils

kubernetes_version: 1.4.0-alpha.3
(b44b716965db2d54c8c7dfcdbcb1d54792ab8559)

weave_version: 1.6.1

1 Master (172.20.0.78)

Gist of journalctl output shows it boots fine, docker, etcd, kube-apiserver, scheduler, and controller all start. Minion registers successfully.

$ kubectl  get componentstatuses

NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}

$ kubectl get nodes 

NAME                                        STATUS    AGE
ip-172-20-0-18.us-west-2.compute.internal   Ready     1m

1 minion (172.20.0.18)

$ kubectl run -i --tty --image concourse/busyboxplus:curl dns-test42-$RANDOM --restart=Never /bin/sh

Pod created (not sluggishly). Multiple pods can ping each other.

Weave

Weave and weaveproxy are up and running just fine.

$ weave status

Version: 1.6.0 (version 1.6.1 available - please upgrade!)

        Service: router
       Protocol: weave 1..2
           Name: ce:1a:4b:b0:07:6d(ip-172-20-0-18)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 0
    Connections: 0
          Peers: 1
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 10.244.0.0/16
  DefaultSubnet: 10.244.0.0/16

        Service: proxy
        Address: unix:///var/run/weave/weave.sock

$ weave status ipam

ce:1a:4b:b0:07:6d(ip-172-20-0-18)        65536 IPs (100.0% of total)

Conclusion

Kubernetes is rapidly evolving with many open issues -- there are now efforts upstream to pin down and document the dependencies along with making errors and warnings more user-friendly in the logs.

As future versions become less opaque knowing which open issue is relevant to your setup will become easier. Along with whether an obvious dependency is missing and what a good setup looks like.

The nominal sanity check command that currently exists (kubectl get componentstatuses) does not go far enough. It might show everything is healthy. Pods might be successfully created. Services might work.

And yet these can all be misleading as a cluster may still not be entirely healthy.

A useful test I found in the official repo simply tests connectivity (and authentication) to the master. Sluggishness is not tested and sluggishness it turns out is a red flag.

In fact, there's an entire folder of these, but they are not well documented as far as I can tell.

I believe a smoke test that can deployed against any running cluster and run through a suite of checks and benchmarks (to take into account unexpectedly poor performance) would significantly improve the debugging experience.

Host the grafana snapshots on a dedicated server to the side

skip-tags doesn't work for handlers

https://groups.google.com/forum/#!msg/ansible-project/qXWpCJ1449E/Bh5XeCPB0MAJ

Unfortunate.

Specifically, one use case is when baking an image you might want to setup a service but stop just shy of starting it for the first time. Ansible service restart handlers really should support tags which in this instance are much more elegant than conditionals.

Docker builds intermittently failing due to apt cache issues

This is a pretty well known problem.

Docker caches aggressively so a common first line in your Dockerfile like apt update will not run each time and eventually the mirror list will become stale and builds will fail.

Possible workaround #1

docker build --no-cache

Possible workaround #2

RUN apt-get clean && apt update

While the above helps with stale mirrors it does not help with unresponsive/slow/broken ones.

FROM debian:stable

RUN apt-get clean && apt update && apt install -y
RUN apt install -y kernel-package

This is taking a very long time today but eventually completes despite appearing to be frozen. Building large containers (this one ends up being 1GB in size, ouch) from a laptop is not a good workflow.

Boto vs Terraform

Boto3 is the Amazon Web Services (AWS) SDK for Python.

Terraform is a higher level abstraction that provides additional features like multi-cloud support, complex changesets, and so on.

Some people have embraced Terraform wholeheartedly already, however there are cons.

While multi-cloud support is an excellent feature for the people who require it - it is not currently in scope for this demo. This might change, at which point the cost-benefit of using Terraform will have to be reevaluated.

The multi-cloud support is a leaky abstraction, going up the ladder of abstraction in this case means learning yet another configuration syntax and introducing yet another dependency on yet another tool.

The union set of users who can use Terraform and not be familiar with the underlying AWS API's that Boto exposes is approximately zero. Furthermore, its worthwhile to consider that virtually all users of AWS are allowed to use the official SDK (and probably already have it configured) but not all are allowed (or capable, at least in a timely manner) of using Terraform.

There's something to be said for avoiding the situation of: "to try out our demo, first install and understand a separate project".

Finally, Terraform predates Boto3, which is significantly improved and simplified, and sprinkles in some of the higher levelness.

As a result, for our limited use case, one can get most of the pros of terraform sans the cons.

Execution Plans in a few lines of Python

Lets define a sequence of named tuples.

The tuple consists of an object, method name, and arguments, respectively.

bootstrap = [(EC2, 'create_key_pair', {'KeyName': keyname}),
             (IAM, 'create_instance_profile', {'InstanceProfileName': instanceprofile}),
             (IPE, 'wait', {'InstanceProfileName': instanceprofile}),
             (IAM, 'create_role', {'RoleName': rolename, 'AssumeRolePolicyDocument': json.dumps(TrustedPolicy)}),
             (IAM, 'add_role_to_instance_profile', {'RoleName': rolename, 'InstanceProfileName': instanceprofile}),
             (IAM, 'attach_role_policy', {'RoleName': rolename, 'PolicyArn': policyarn})]

There's absolutely no mystery here.

import boto3
EC2 = boto3.resource('ec2')
IAM = boto3.resource('iam')

You can see the method names are lifted directly from the Boto3 API Documentation along with the arguments which are typically json dictionaries.

One can simply copy-paste from the docs the json blobs and reference back and forth to understand exactly what is going on.

If we print out just the method column we get something that scans beautifully:

create_key_pair
create_instance_profile
wait
create_role
..
create_launch_configuration
create_auto_scaling_group

Which just reads as plain English and such a laundry list can be found in any number of tutorials and blog posts that enumerate the steps for creating an auto scaling group.

There are no other complex steps, the only thing modified after these things are provisioned is the scaling factor of the group.

There are no complex dependencies or teardowns as we create a dedicated vpc and blow it away completely per cluster each time -- we never edit an existing deployment as you would in a production environment and thus the raison d'etre of terraform and other such tools - complex changesets - is not relevant.

In short, this seems like a good sweet spot to ramp users unto a kubernetes cluster deployment process without unnecessary indirection steps.

It's entirely a Terraform recipe will be added future and primarily used, but the vanilla way should definitely come first and be supported.

Python2 vs Python3 for client side scripts

As part of the demo there will be several scripts that can run on the client side.

Anything from a tiny utility that exposes a k8s service endpoint via route53 as a nice human friendly subdomain to a full fledged deployment automation script that rolls out and sets up the various resources that run on the cluster.

One way this question could be made irrelevant is by simply running all of that out of a so called SideCar container. However there might be something to be said for letting the user run and play with it natively.

Opinions are welcome whether its acceptable to have python3 as a requirement.

boinc unexpected behavior -- warnings, daily quota

27-Jul-2016 18:22:43 [---] Resuming computation
27-Jul-2016 18:22:46 [http://www.worldcommunitygrid.org/] Master file download succeeded
27-Jul-2016 18:22:51 [http://www.worldcommunitygrid.org/] Sending scheduler request: Project initialization.
27-Jul-2016 18:22:51 [http://www.worldcommunitygrid.org/] Requesting new tasks for CPU
27-Jul-2016 18:22:55 [World Community Grid] Scheduler request completed: got 0 new tasks
27-Jul-2016 18:22:55 [World Community Grid] No tasks sent
27-Jul-2016 18:22:55 [World Community Grid] No tasks are available for OpenZika
27-Jul-2016 18:22:55 [World Community Grid] No tasks are available for the applications you have selected.
27-Jul-2016 18:22:55 [World Community Grid] This computer has finished a daily quota of 5 tasks

That's not quite expected, ran into it because I kept iterating from my machine. Apparently if you start and stop a lot the client gets a temporary ban.

Also:

dir_open: Could not open directory 'slots' from '/var/lib/boinc-client'.

The only evidence of this issue on google suggests its a permissions thing. Also had this with the projects directory and as a result I'm doing:

mkdir -p /var/lib/boinc-client/projects/www.worldcommunitygrid.org && chown -R boinc:boinc /var/lib/boinc-client

Missing Features for Complex Deployments

kubernetes/kubernetes#1899 - Allow users to wait for conditions from kubectl and using the API
kubernetes/kubernetes#1542 - Support master election (think, building blocks for quorum based clusters like Mongo)

https://github.com/cncf/demo/blob/master/Kubernetes/Docs/Advanced.md

Integrate LinkerD

Need to update demo app to include LinkerD usage.

OpenTracing

As first steps it would be nice to instrument cncfdemo-cli, as it is a short & simple python script.
Appdash could be containerized and deployed unto a cluster as part of the demo to receive traces.

One open question currently is that Appdash will become available a few minutes _after_ the script is started. Hopefully its possible to patch the remote controller endpoint on the fly somehow or do some trickery.

Centos 7 kdump service failed

systemctl --failed
  UNIT                         LOAD   ACTIVE SUB    DESCRIPTION
● docker-storage-setup.service loaded failed failed Docker Storage Setup
● kdump.service                loaded failed failed Crash recovery kernel arming
● network.service              loaded failed failed LSB: Bring up/down networking

systemctl status kdump.service
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2016-10-24 18:58:21 UTC; 49s ago
  Process: 812 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 812 (code=exited, status=1/FAILURE)

Oct 24 18:58:21 ip-172-31-18-67.us-west-2.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Oct 24 18:58:21 ip-172-31-18-67.us-west-2.compute.internal kdumpctl[812]: No memory reserved for crash kernel.
Oct 24 18:58:21 ip-172-31-18-67.us-west-2.compute.internal kdumpctl[812]: Starting kdump: [FAILED]
Oct 24 18:58:21 ip-172-31-18-67.us-west-2.compute.internal systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Oct 24 18:58:21 ip-172-31-18-67.us-west-2.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
Oct 24 18:58:21 ip-172-31-18-67.us-west-2.compute.internal systemd[1]: Unit kdump.service entered failed state.
Oct 24 18:58:21 ip-172-31-18-67.us-west-2.compute.internal systemd[1]: kdump.service failed.

Just started happening, there's a somewhat newer centos7 AMI so going to that.

Repeated creation/deletion of resources breaks cluster

A successful run from bootstrapping a cluster to provisioning its needed resources all in one go has been happening for some time now.

With that being said, the development experience is one of tinkering.

kubectl create -f
kubectl delete -f
kubectl create -f

And so on. Sometimes mysterious bugs pop up, connectivity issues, resources not addressable, time is spent chasing it.. a full cluster shutdown and trying again from scratch often ends up fixing everything.

The problems are usually around Kubernetes services getting 'confused' or 'sticky'. Additionally, sometimes you see a pod get stuck in 'Terminating' or an inability to attach to running pods because the docker process on the Node crashed.

So in short, small seemingly harmless actions can currently bring down a whole cluster.

Kick off distcc on linux stable commits

Poll https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/ and pull on change

Proper Job fields via the REST API

kubernetes/kubernetes#23599

So for historical reference, in your JSON:

            "spec": {
                    "autoSelector": true,
                    ..
             }

And also you at the top you must:

"apiVersion": "extensions/v1beta1"

CLA bot needs to show in progress status

@caniszczyk I watched the PR that @bgrant0607 just submitted, and I'm concerned that the CLAbot is not quite configured correctly.

The issue is that when creating the pull request, the status API should show in progress from the CLAbot, to let the submitter know that it's running. Then, it should change to passed or failed.

Right now, it seems to show as green and then come back as pass or fail a minute later.

Boolean values in YAML lead to unexpected behaviour

kubectl label nodes <nodename> echo=true

rc.yaml:

spec:
   nodeSelector:
      echo: true
   containers:

$ kubectl create -f rc.yaml --validate
unable to decode "rc.yaml": [pos 434]: json: expect char '"' but got char 't'

The same thing happens if true is set to yes. When set to anything else like foo it succeeds with no complaints. Kind of surprising if you try yes first.

Resource Quality of Service in Kubernetes

Background reading:

https://github.com/kubernetes/kubernetes/blob/master/docs/devel/scheduler_algorithm.md
https://github.com/kubernetes/kubernetes/blob/master/docs/design/resource-qos.md

This issue is to discuss what were dealing with in the context of the demo.

Integrate CoreDNS

Need to update demo app to include CoreDNS usage.

Demo output (version 0.1) - Show initial text results, images saved in folder

Save the grafana png's as regular files instead of base64 json, dump in folder.

Text results, finally something we know to want to pull:

Time from 'cluster ready' until distcc finished the compile
CPU utilization percentages per node throughout this time
Number of requests Countly handled meanwhile

Single command load generation (Countly, Boinc)

Currently countly goes from single command to available to ready to use after a few minutes.
We want to pepper over the manual steps left in setting it up and triggering WRK load against it.

Boinc is just a regular background job, nothing needed there.

Grafana output discussion

Row By Row

At the top there are two gauge views with spark charts in their background.
Left gauge is memory and right is CPU and totals underneath. This is meant as a quick cluster wide resource overview on the node level.
Pod CPU shows the avg and current cpu utilization percentage per pod.
System Pod CPU, this is all the "system" pods like kubedns, weave, node-exporter, it should under normal conditions be flat and low so it is somewhat greyed out.
Pods Memory (MB)
System Pods Memory (MB)
Pods Network I/O - shows ingress and egress

Azure support

boinc client needs mice device

https://setiathome.berkeley.edu/forum_thread.php?id=79537

/dev/input/mice is being used by boinc for entropy or something -- assuming X by boinc maintainers.. is a problem.

Grafana roundtrip json export/import bug

Unfortunately I can reproduce this issue reliably: grafana/grafana#2816

Create dashboard in Grafana 3
Export json file

Import json file manually (via Grafana web UI) works.
Import via API does not despite making the exact same API calls.

In other words, there's some additional json munging required, can't submit dashboard file as-is.
This is an inconvenience because now a hackish post processing step is necessary for demo outputs.

Note: the really odd thing is that backwards compatibility with Grafana 2 is good so dashboards exported from that are imported correctly both ways.

cncf / demo Goto Github PK

demo's People

Contributors

Stargazers

Watchers

Forkers

demo's Issues

An implementation detail

Getting the logs back out again

Benefits

Background

Outside Kubernetes

Systemd logs archiving to S3

Forward Output Plugin

Within Kubernetes

Egress

Ingress

The autoscaling custom metric

The backend (Mongo cluster)

The background job (Boinc)

A contrived example

Writing bash one liners is suboptimal

Mosh

Tmux

Follow up to #63.

Beginning of Problems

Known KubeDNS Issues Encountered

Known Kubernetes Networking Issues Encountered

Initial Checks

Missing Dependencies

Cluster CIDR overlaps

Iptables masquerade conflicts

Flannel

Weave

Final Configuration

Image

Dependencies

1 Master (172.20.0.78)

1 minion (172.20.0.18)

Weave

Conclusion

Possible workaround #1

Possible workaround #2

Execution Plans in a few lines of Python

Row By Row

Recommend Projects

Recommend Topics

Recommend Org