habitat-sh / habitat-operator Goto Github PK
View Code? Open in Web Editor NEWA Kubernetes operator for Habitat services
License: Apache License 2.0
A Kubernetes operator for Habitat services
License: Apache License 2.0
We should have watchers for all resources that affect Habitat
s.
We have a pod watcher, but we still need to add a Deployment one.
Depends on #63.
Currently, a user needs to change the source to change the log level.
We should expose a flag that does this instead.
The habitat client supports the --group
flag, which allows users to start the supervisors in a specific group.
In order to support this, we need to:
group
key to the CRD--group
flag as arguments to to the containersThe unit tests should include testing individual functions in the main code base.
Some of the flags returned by --help
seem to come from the glog
library, and have no effect on our own logging (e.g. -v value
).
❯ ./operator --help
Usage of ./operator:
-alsologtostderr
log to standard error as well as files
-kubeconfig string
Path to a kubeconfig. Only required if out-of-cluster.
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory
-logtostderr
log to standard error instead of files
-stderrthreshold value
logs at or above this threshold go to stderr
-v value
log level for V logs
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
Whenever possible, we should use the cache returned by the cache.NewInformer
function, instead of making API calls, to retrieve objects from the API.
Depends on #63.
The Pod handlers should not perform any of the leader IP work if the Topology of the ServiceGroup is standalone.
End-to-end tests currently run for 10+ minutes.
Ideas:
One way to implement the Open Service Broker API we can explore the Kubernetes service catalog which is currently in the Kubernetes incubator.
Prerequisites for the service catalog to work we would need:
Tried using pflag but it did not work, because flag registering/parsing seem to get overridden. Maybe it's because other parts of the code/dependencies parse things through init methods, but it needs some further investigation.
My attempt:
type testFlag struct {
image string
kubeconfig string
externalIP string
}
....
flags := flag.NewFlagSet(os.Args[0], flag.ContinueOnError)
flags.StringVar(&tf.image, "image", "", "habitat operator image, 'kinvolk/habitat-operator'")
flags.StringVar(&tf.kubeconfig, "kubeconfig", "", "path to kube config file")
flags.StringVar(&tf.externalIP, "ip", "", "external ip, eg. minikube ip")
flags.Parse(os.Args[2:]) // As the previous flags are test related flags.
The Lister is now watching for all Pods, whereas we want to only react to our own.
This might be accomplished with namespaces or labels.
When we mount the user.toml
file under /hab/svc/foo
, we hide the existing contents of that directory, which include files that the service needs.
We should look into using subPath
as explained here.
The supervisor should be started with the --peer-watch-file
flag, if the CRD had a key topology: leader
.
The flag should be passed as an argument to the container.
The current name, operator
, is obviously not optimal.
Create role based access control rules for the Habitat operator.
Kubernetes RBAC has been promoted to v1 in Kubernetes 1.8
and major Kubernetes distributions turn it on by default which means, that the Kubernetes apiserver will deny all access to its APIs by default. RBAC is there to enable access to those APIs.
The Habitat operator makes heavy use of the Kubernetes APIs, therefore we need to document the required RBAC roles, in order for users to run the Habitat operator in a secure manner.
The operator should auto-generate a ring key and/or accept one provided by the user. This way all the containers are secured at the gossip layer.
More info here.
Once the operator has received a new CR, it needs to create a Deployment using the CR's parameters.
To make sure individual events don't interfere with each other, upstream Kubernetes suggestion is for controllers to implement a workqueue.
More info can be found here.
We should allow the user to create the Custom Object in a namespace of their choosing.
If none is specified, it will be created in the default namespace.
Each one of the directories under examples/
should have a README
file explaining what the example does, how to run it, etc.
The operator needs to start the supervisor with the --topology leader
flag, if the spec has a topology: leader
flag.
This depends on habitat-sh/habitat#2735, because we want to start all supervisors with the same set of flags.
If minikube is not present/running, building the project results in an error being displayed:
❯ make
E0830 11:51:01.433090 24624 ip.go:48] Error getting IP: Host is not running
go build -i github.com/kinvolk/habitat-operator/cmd/operator
Switching to v4.0.0 would makes things a bit more stable rather then using master. This release includes Kubernetes 1.7.
We can add a parameter to control how many workers are started. These workers will pop jobs from the workqueue in parallel, improving performance.
See this for an example implementation.
In the case when we already have the pod IP data in the ConfigMap, but the pod with that IP has died we need to select a new pod and update the ConfigMap with that IP.
For more context see https://github.com/kinvolk/habitat-operator/issues/21 and https://github.com/kinvolk/habitat-operator/issues/11
Currently, if the Habitat object is updated (i.e. image name or number of replicas was changed), the deployment is not updated. Our reconciler handles only creating deployment if it doesn't exist yet. In order to do upgrades of Habitat application on Kubernetes, we need to handle image name updates.
After deleting the SG with kubectl
, the operator still receives events on the Pod handler.
In those handlers, we expect the ConfigMaps to be there, but they aren't (as they are deleted in the onDelete
), so we get errors like
level=error component=controller msg="configmaps \"example-encrypted-service-group\" not found"
This is about running a Service Group as a collection of manually created pods, and confirming that the supervisors in the SG are able to talk to one another, and, for example, elect a leader.
Two types of pods will have to be crated, since we're still relying on the current --peer
mechanism, and therefore we run supervisors with different flags.
There should be a documented release process, i.e. steps we need to take for when doing a new release of the Habitat operator. Here are a few steps that come to my mind right now:
make linux
vx.x.x
latest
to hub.docker.comexamples/habitat-operator-deployment.yml
CHANGELOG.md
with release notesOnce the deployment is created we want to write the IP of the first running pod into a ConfigMap.
Relates to habitat-sh/habitat#2735 --peer-watch-file
flag that has to be implemented on the hab
client.
To catch errors such as the one in https://github.com/kinvolk/habitat-operator/pull/104, we should switch in the e2e tests to yaml manifest files, instead of creating our operator SG resources through the API. As we would have noticed the problem of the tag not being changed properly.
Things to write e2e tests for after habitat-sh/habitat#2805 is complete :
--bind
flaguser.toml
(This checklist is WIP, feel free to contribute to it.)
Currently the topology options in the operator are standalone
and leader/follower
, we need to add an option of no topology. The default option in Habitat is none
and the operator should default to that as well, which means:
the difference between standalone and none is that none will never update itself
See further discussion for this here.
Create e2e tests.
First step to getting the full functionality of the habitat leader-follower topology is to create the ConfigMap, which we will later update with the pod IP. This will be taken by habitat when using the --peer-watch-file
flag. See habitat-sh/habitat#2735.
When deleting a Habitat, the following errors are displayed:
ts=2017-11-08T12:02:17.98051371+01:00 level=info component=controller msg="deleted deployment" name=example-leader-follower-habitat
ts=2017-11-08T12:02:17.983844937+01:00 level=error component=controller msg="deployments.apps \"example-leader-follower-habitat\" not found"
ts=2017-11-08T12:02:17.98428139+01:00 level=error component=controller msg="Habitat could not be synced, requeueing" msg="deployments.apps \"example-leader-follower-habitat\"
not found"
The actual Habitat is removed.
This seems like it could have been introduced in #113.
Helm is the Kubernetes Package Manager. By creating a Helm chart we give the user a reproducible way to easily deploy the Habitat operator.
Create a demo for the operator. Demo will showcase the following features:
This issue is about creating a CRD spec for a Habitat service.
Fields:
Using OwnerReferences
allows us to define relationships between Resources, so that deleting an owner can automatically delete owned resources.
Currently, we use OwnerReferences
to associate a ConfigMap with a Deployment.
It could be useful to make all Resources we create dependent on the CustomResource, but there might currently be problems with that.
More info here.
Can we do an initial implementation, only supporting the --bind
startup flag, without updates?
This depends on habitat-sh/habitat#2735.
This issue is to explore ways in which we can make running our E2E tests part of the CI process.
-j
flag in make
$TRAVIS_PULL_REQUEST
in a bash script)Current name ServiceGroup
conflicts with concept of a group
in Habitat, as well as concept of a Service
in Kubernetes, we need to come up with a better name for CRD.
List of ideas:
client-go
to 1.8See this for changes.
The e2e
target in the Makefile calls minikube ip
.
It would be good not to expect a minikube installation to be present, and get the cluster IP some other way, e.g. by parsing ~/.kube/config
.
Currently we are using Deployments to deploy our Habitat Service, but since we do not know what are deploying, what type of service that is, it could be anything from a DB to a simple Rails application. We should not just assume our Habitat service would be stateless.
Couple of advantages of StatefulSets:
These would be very useful especially if our service is for example a DB.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.