Giter Site home page Giter Site logo

habitat-sh / habitat-operator Goto Github PK

View Code? Open in Web Editor NEW
61.0 12.0 17.0 29.69 MB

A Kubernetes operator for Habitat services

License: Apache License 2.0

Go 90.03% Makefile 2.97% Shell 6.27% Dockerfile 0.09% Mustache 0.64%
kubernetes habitat operator kubernetes-cluster

habitat-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

habitat-operator's Issues

Add Deployment watcher

We should have watchers for all resources that affect Habitats.

We have a pod watcher, but we still need to add a Deployment one.

Depends on #63.

Add support for --group flag

The habitat client supports the --group flag, which allows users to start the supervisors in a specific group.

In order to support this, we need to:

  • Add a group key to the CRD
  • Pass the --group flag as arguments to to the containers

Create unit tests

The unit tests should include testing individual functions in the main code base.

Configuration flags from `glog` leak into binary

Some of the flags returned by --help seem to come from the glog library, and have no effect on our own logging (e.g. -v value).

❯ ./operator --help                                                                            
Usage of ./operator:                                                                           
  -alsologtostderr                                                                             
        log to standard error as well as files                                                 
  -kubeconfig string                                                                           
        Path to a kubeconfig. Only required if out-of-cluster.                                 
  -log_backtrace_at value                                                                      
        when logging hits line file:N, emit a stack trace                                      
  -log_dir string                                                                              
        If non-empty, write log files in this directory                                        
  -logtostderr                                                                                 
        log to standard error instead of files                                                 
  -stderrthreshold value                                                                       
        logs at or above this threshold go to stderr                                           
  -v value                                                                                     
        log level for V logs                                                                   
  -vmodule value                                                                               
        comma-separated list of pattern=N settings for file-filtered logging

Use cache instead of making API calls

Whenever possible, we should use the cache returned by the cache.NewInformer function, instead of making API calls, to retrieve objects from the API.

Depends on #63.

Decrease e2e tests running time

End-to-end tests currently run for 10+ minutes.

Ideas:

  • Disable Travis' PR test
  • Find a way to only run certain tests sometime

Switch to pflag in test

Tried using pflag but it did not work, because flag registering/parsing seem to get overridden. Maybe it's because other parts of the code/dependencies parse things through init methods, but it needs some further investigation.

My attempt:

type testFlag struct {
	image      string
	kubeconfig string
	externalIP string
}
....
flags := flag.NewFlagSet(os.Args[0], flag.ContinueOnError)

flags.StringVar(&tf.image, "image", "", "habitat operator image, 'kinvolk/habitat-operator'")
flags.StringVar(&tf.kubeconfig, "kubeconfig", "", "path to kube config file")
flags.StringVar(&tf.externalIP, "ip", "", "external ip, eg. minikube ip")

flags.Parse(os.Args[2:]) // As the previous flags are test related flags.

Add support for --peer-watch-file

The supervisor should be started with the --peer-watch-file flag, if the CRD had a key topology: leader.

The flag should be passed as an argument to the container.

RBAC rules for Operator

Create role based access control rules for the Habitat operator.

Kubernetes RBAC has been promoted to v1 in Kubernetes 1.8 and major Kubernetes distributions turn it on by default which means, that the Kubernetes apiserver will deny all access to its APIs by default. RBAC is there to enable access to those APIs.

The Habitat operator makes heavy use of the Kubernetes APIs, therefore we need to document the required RBAC roles, in order for users to run the Habitat operator in a secure manner.

Handle Ring Key

The operator should auto-generate a ring key and/or accept one provided by the user. This way all the containers are secured at the gossip layer.

More info here.

Create deployment `onAdd`

Once the operator has received a new CR, it needs to create a Deployment using the CR's parameters.

Introduce workqueue

To make sure individual events don't interfere with each other, upstream Kubernetes suggestion is for controllers to implement a workqueue.

More info can be found here.

Add support for custom namespace

We should allow the user to create the Custom Object in a namespace of their choosing.

If none is specified, it will be created in the default namespace.

Fix hard dependency to minikube in Makefile

If minikube is not present/running, building the project results in an error being displayed:

❯ make
E0830 11:51:01.433090   24624 ip.go:48] Error getting IP:  Host is not running
go build -i github.com/kinvolk/habitat-operator/cmd/operator

Increase number of workers

We can add a parameter to control how many workers are started. These workers will pop jobs from the workqueue in parallel, improving performance.

See this for an example implementation.

Update deployment when Habitat object is updated

Currently, if the Habitat object is updated (i.e. image name or number of replicas was changed), the deployment is not updated. Our reconciler handles only creating deployment if it doesn't exist yet. In order to do upgrades of Habitat application on Kubernetes, we need to handle image name updates.

ConfigMap not found error

After deleting the SG with kubectl, the operator still receives events on the Pod handler.

In those handlers, we expect the ConfigMaps to be there, but they aren't (as they are deleted in the onDelete), so we get errors like

level=error component=controller msg="configmaps \"example-encrypted-service-group\" not found"

Running a hab Service Group inside of k8s

This is about running a Service Group as a collection of manually created pods, and confirming that the supervisors in the SG are able to talk to one another, and, for example, elect a leader.

Two types of pods will have to be crated, since we're still relying on the current --peer mechanism, and therefore we run supervisors with different flags.

Checklist

  • Pods can fetch from the internet
  • Leader election succeeds

Switch to a different logging library?

log-kit has the disadvantage that the logger instance has to be (or should be) passed around for logging to be possible.

Other libraries, like glog don't have this UX issue.

Should we switch? Some people also like log15.

Release process

There should be a documented release process, i.e. steps we need to take for when doing a new release of the Habitat operator. Here are a few steps that come to my mind right now:

  • Build image make linux
  • Tag image to the release following versioning strategy vx.x.x
  • Push the image to the version tag as well as latest to hub.docker.com
  • Bump up tag version in the deployment Habitat operator manifest file examples/habitat-operator-deployment.yml
  • Update CHANGELOG.md with release notes

Topology should default to `none`

Currently the topology options in the operator are standalone and leader/follower, we need to add an option of no topology. The default option in Habitat is none and the operator should default to that as well, which means:

the difference between standalone and none is that none will never update itself

See further discussion for this here.

Errors when deleting Habitat resource

When deleting a Habitat, the following errors are displayed:

ts=2017-11-08T12:02:17.98051371+01:00 level=info component=controller msg="deleted deployment" name=example-leader-follower-habitat
ts=2017-11-08T12:02:17.983844937+01:00 level=error component=controller msg="deployments.apps \"example-leader-follower-habitat\" not found"
ts=2017-11-08T12:02:17.98428139+01:00 level=error component=controller msg="Habitat could not be synced, requeueing" msg="deployments.apps \"example-leader-follower-habitat\"
 not found"

The actual Habitat is removed.

This seems like it could have been introduced in #113.

Demo: Bind plus initial configuration

Create a demo for the operator. Demo will showcase the following features:

  • One Service group bound to a database.
  • We override the port on which the database listens on and display that port information in the first service.
  • Similar to the bind demo, but also displaying how different fields in the manifest file (Habitat features) can be used together (configuration and bind feature in this case).

Use ownerReferences with CRD

Using OwnerReferences allows us to define relationships between Resources, so that deleting an owner can automatically delete owned resources.

Currently, we use OwnerReferences to associate a ConfigMap with a Deployment.

It could be useful to make all Resources we create dependent on the CustomResource, but there might currently be problems with that.

Run E2E tests automatically

This issue is to explore ways in which we can make running our E2E tests part of the CI process.

  • Parallelizing Travis jobs using the build matrix
  • Using the -j flag in make
  • Using a cron job to periodically run the E2E tests on master
  • Using a custom bash script that only runs the E2E tests on master
  • Only running the E2E tests in the job that tests the PR merge commit (e.g. checking $TRAVIS_PULL_REQUEST in a bash script)

Rename CRD ServiceGroup

Current name ServiceGroup conflicts with concept of a group in Habitat, as well as concept of a Service in Kubernetes, we need to come up with a better name for CRD.

List of ideas:

Remove dependency on minikube

The e2e target in the Makefile calls minikube ip.

It would be good not to expect a minikube installation to be present, and get the cluster IP some other way, e.g. by parsing ~/.kube/config.

Use StatefulSets instead of Deployment

Currently we are using Deployments to deploy our Habitat Service, but since we do not know what are deploying, what type of service that is, it could be anything from a DB to a simple Rails application. We should not just assume our Habitat service would be stateless.

Couple of advantages of StatefulSets:

  • Graceful deployment and scaling
  • Stable network identity
  • Graceful deletion and termination
  • Stable, persistent storage

These would be very useful especially if our service is for example a DB.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.