habitat-sh / habitat-operator Goto Github PK

View Code? Open in Web Editor NEW

61.0 12.0 17.0 29.69 MB

A Kubernetes operator for Habitat services

License: Apache License 2.0

Go 90.03% Makefile 2.97% Shell 6.27% Dockerfile 0.09% Mustache 0.64%

kubernetes habitat operator kubernetes-cluster

habitat-operator's People

Stargazers

Watchers

Forkers

etsangsplk lilic rbramwell tuananh asymmetric bryantidd kinvolk-archives defilan atalanta surajssd oooojavaoooo talits clarkin-bps biome-sh eeyun moutons isabella232

habitat-operator's Issues

Add Deployment watcher

We should have watchers for all resources that affect Habitats.

We have a pod watcher, but we still need to add a Deployment one.

Depends on #63.

Allow changing log level from command line

Currently, a user needs to change the source to change the log level.

We should expose a flag that does this instead.

Add support for --group flag

The habitat client supports the --group flag, which allows users to start the supervisors in a specific group.

In order to support this, we need to:

Add a group key to the CRD
Pass the --group flag as arguments to to the containers

Create unit tests

The unit tests should include testing individual functions in the main code base.

Configuration flags from `glog` leak into binary

Some of the flags returned by --help seem to come from the glog library, and have no effect on our own logging (e.g. -v value).

❯ ./operator --help                                                                            
Usage of ./operator:                                                                           
  -alsologtostderr                                                                             
        log to standard error as well as files                                                 
  -kubeconfig string                                                                           
        Path to a kubeconfig. Only required if out-of-cluster.                                 
  -log_backtrace_at value                                                                      
        when logging hits line file:N, emit a stack trace                                      
  -log_dir string                                                                              
        If non-empty, write log files in this directory                                        
  -logtostderr                                                                                 
        log to standard error instead of files                                                 
  -stderrthreshold value                                                                       
        logs at or above this threshold go to stderr                                           
  -v value                                                                                     
        log level for V logs                                                                   
  -vmodule value                                                                               
        comma-separated list of pattern=N settings for file-filtered logging

Use cache instead of making API calls

Whenever possible, we should use the cache returned by the cache.NewInformer function, instead of making API calls, to retrieve objects from the API.

Depends on #63.

Do not handle leader IP in standalone topology

The Pod handlers should not perform any of the leader IP work if the Topology of the ServiceGroup is standalone.

Decrease e2e tests running time

End-to-end tests currently run for 10+ minutes.

Ideas:

Disable Travis' PR test
Find a way to only run certain tests sometime

Explore integration of Open Service Broker API

One way to implement the Open Service Broker API we can explore the Kubernetes service catalog which is currently in the Kubernetes incubator.

Prerequisites for the service catalog to work we would need:

Kubernetes cluster with cluster DNS enabled
Helm charts
Handle RBAC roles

Tried using pflag but it did not work, because flag registering/parsing seem to get overridden. Maybe it's because other parts of the code/dependencies parse things through init methods, but it needs some further investigation.

My attempt:

type testFlag struct {
	image      string
	kubeconfig string
	externalIP string
}
....
flags := flag.NewFlagSet(os.Args[0], flag.ContinueOnError)

flags.StringVar(&tf.image, "image", "", "habitat operator image, 'kinvolk/habitat-operator'")
flags.StringVar(&tf.kubeconfig, "kubeconfig", "", "path to kube config file")
flags.StringVar(&tf.externalIP, "ip", "", "external ip, eg. minikube ip")

flags.Parse(os.Args[2:]) // As the previous flags are test related flags.

Only watch for our own Pods

The Lister is now watching for all Pods, whereas we want to only react to our own.

This might be accomplished with namespaces or labels.

Contents of service directory are hidden

When we mount the user.toml file under /hab/svc/foo, we hide the existing contents of that directory, which include files that the service needs.

We should look into using subPath as explained here.

Add support for --peer-watch-file

The supervisor should be started with the --peer-watch-file flag, if the CRD had a key topology: leader.

The flag should be passed as an argument to the container.

Rename operator binary

The current name, operator, is obviously not optimal.

RBAC rules for Operator

Create role based access control rules for the Habitat operator.

Kubernetes RBAC has been promoted to v1 in Kubernetes 1.8 and major Kubernetes distributions turn it on by default which means, that the Kubernetes apiserver will deny all access to its APIs by default. RBAC is there to enable access to those APIs.

The Habitat operator makes heavy use of the Kubernetes APIs, therefore we need to document the required RBAC roles, in order for users to run the Habitat operator in a secure manner.

Rename Config field

See https://github.com/kinvolk/habitat-operator/pull/94#discussion_r140207694

Handle Ring Key

The operator should auto-generate a ring key and/or accept one provided by the user. This way all the containers are secured at the gossip layer.

More info here.

Create deployment `onAdd`

Once the operator has received a new CR, it needs to create a Deployment using the CR's parameters.

Introduce workqueue

To make sure individual events don't interfere with each other, upstream Kubernetes suggestion is for controllers to implement a workqueue.

More info can be found here.

Add support for custom namespace

We should allow the user to create the Custom Object in a namespace of their choosing.

If none is specified, it will be created in the default namespace.

Add documentation for the examples

Each one of the directories under examples/ should have a README file explaining what the example does, how to run it, etc.

Add support for --topology flag

The operator needs to start the supervisor with the --topology leader flag, if the spec has a topology: leader flag.

This depends on habitat-sh/habitat#2735, because we want to start all supervisors with the same set of flags.

Fix hard dependency to minikube in Makefile

If minikube is not present/running, building the project results in an error being displayed:

❯ make
E0830 11:51:01.433090   24624 ip.go:48] Error getting IP:  Host is not running
go build -i github.com/kinvolk/habitat-operator/cmd/operator

Avoid using client-go master

Switching to v4.0.0 would makes things a bit more stable rather then using master. This release includes Kubernetes 1.7.

Increase number of workers

We can add a parameter to control how many workers are started. These workers will pop jobs from the workqueue in parallel, improving performance.

See this for an example implementation.

Update IP in ConfigMap when pod dies

In the case when we already have the pod IP data in the ConfigMap, but the pod with that IP has died we need to select a new pod and update the ConfigMap with that IP.

For more context see https://github.com/kinvolk/habitat-operator/issues/21 and https://github.com/kinvolk/habitat-operator/issues/11

Update deployment when Habitat object is updated

Currently, if the Habitat object is updated (i.e. image name or number of replicas was changed), the deployment is not updated. Our reconciler handles only creating deployment if it doesn't exist yet. In order to do upgrades of Habitat application on Kubernetes, we need to handle image name updates.

ConfigMap not found error

After deleting the SG with kubectl, the operator still receives events on the Pod handler.

In those handlers, we expect the ConfigMaps to be there, but they aren't (as they are deleted in the onDelete), so we get errors like

level=error component=controller msg="configmaps \"example-encrypted-service-group\" not found"

Use Namespace in tests

See https://github.com/kinvolk/habitat-operator/pull/94#discussion_r140207694

Running a hab Service Group inside of k8s

This is about running a Service Group as a collection of manually created pods, and confirming that the supervisors in the SG are able to talk to one another, and, for example, elect a leader.

Two types of pods will have to be crated, since we're still relying on the current --peer mechanism, and therefore we run supervisors with different flags.

Checklist

Pods can fetch from the internet
Leader election succeeds

Switch to a different logging library?

log-kit has the disadvantage that the logger instance has to be (or should be) passed around for logging to be possible.

Other libraries, like glog don't have this UX issue.

Should we switch? Some people also like log15.

Release process

There should be a documented release process, i.e. steps we need to take for when doing a new release of the Habitat operator. Here are a few steps that come to my mind right now:

Build image make linux
Tag image to the release following versioning strategy vx.x.x
Push the image to the version tag as well as latest to hub.docker.com
Bump up tag version in the deployment Habitat operator manifest file examples/habitat-operator-deployment.yml
Update CHANGELOG.md with release notes

Write pod IP in the already created ConfigMap

Once the deployment is created we want to write the IP of the first running pod into a ConfigMap.

Relates to habitat-sh/habitat#2735 --peer-watch-file flag that has to be implemented on the hab client.

e2e tests: use yaml files instead of API calls

To catch errors such as the one in https://github.com/kinvolk/habitat-operator/pull/104, we should switch in the e2e tests to yaml manifest files, instead of creating our operator SG resources through the API. As we would have noticed the problem of the tag not being changed properly.

Verify Configuration Updates Work

Things to write e2e tests for after habitat-sh/habitat#2805 is complete :

updates
--bind flag
gossip does not interfere with our user.toml

(This checklist is WIP, feel free to contribute to it.)

Topology should default to `none`

Currently the topology options in the operator are standalone and leader/follower, we need to add an option of no topology. The default option in Habitat is none and the operator should default to that as well, which means:

the difference between standalone and none is that none will never update itself

See further discussion for this here.

Create e2e tests

Create e2e tests.

Create ConfigMap and mount it in the pod

First step to getting the full functionality of the habitat leader-follower topology is to create the ConfigMap, which we will later update with the pod IP. This will be taken by habitat when using the --peer-watch-file flag. See habitat-sh/habitat#2735.

Errors when deleting Habitat resource

When deleting a Habitat, the following errors are displayed:

ts=2017-11-08T12:02:17.98051371+01:00 level=info component=controller msg="deleted deployment" name=example-leader-follower-habitat
ts=2017-11-08T12:02:17.983844937+01:00 level=error component=controller msg="deployments.apps \"example-leader-follower-habitat\" not found"
ts=2017-11-08T12:02:17.98428139+01:00 level=error component=controller msg="Habitat could not be synced, requeueing" msg="deployments.apps \"example-leader-follower-habitat\"
 not found"

The actual Habitat is removed.

This seems like it could have been introduced in #113.

Deploy operator using Helm chart

Helm is the Kubernetes Package Manager. By creating a Helm chart we give the user a reproducible way to easily deploy the Habitat operator.

Demo: Bind plus initial configuration

Create a demo for the operator. Demo will showcase the following features:

One Service group bound to a database.
We override the port on which the database listens on and display that port information in the first service.
Similar to the bind demo, but also displaying how different fields in the manifest file (Habitat features) can be used together (configuration and bind feature in this case).

Create habitat service CRD spec

This issue is about creating a CRD spec for a Habitat service.

Fields:

Count
Docker image

Use ownerReferences with CRD

Using OwnerReferences allows us to define relationships between Resources, so that deleting an owner can automatically delete owned resources.

Currently, we use OwnerReferences to associate a ConfigMap with a Deployment.

It could be useful to make all Resources we create dependent on the CustomResource, but there might currently be problems with that.

Add support for binding services

More info here.

Can we do an initial implementation, only supporting the --bind startup flag, without updates?

This depends on habitat-sh/habitat#2735.

Run E2E tests automatically

This issue is to explore ways in which we can make running our E2E tests part of the CI process.

Parallelizing Travis jobs using the build matrix
Using the -j flag in make
Using a cron job to periodically run the E2E tests on master
Using a custom bash script that only runs the E2E tests on master
Only running the E2E tests in the job that tests the PR merge commit (e.g. checking $TRAVIS_PULL_REQUEST in a bash script)

Rename CRD ServiceGroup

Current name ServiceGroup conflicts with concept of a group in Habitat, as well as concept of a Service in Kubernetes, we need to come up with a better name for CRD.

List of ideas:

HabitatService
Habitat

Upgrade to Kubernetes 1.8

Update client-go to 1.8
Update README

See this for changes.

Remove dependency on minikube

The e2e target in the Makefile calls minikube ip.

It would be good not to expect a minikube installation to be present, and get the cluster IP some other way, e.g. by parsing ~/.kube/config.

Use StatefulSets instead of Deployment

Currently we are using Deployments to deploy our Habitat Service, but since we do not know what are deploying, what type of service that is, it could be anything from a DB to a simple Rails application. We should not just assume our Habitat service would be stateless.

Couple of advantages of StatefulSets:

Graceful deployment and scaling
Stable network identity
Graceful deletion and termination
Stable, persistent storage

These would be very useful especially if our service is for example a DB.