Giter Site home page Giter Site logo

kudobuilder / kudo Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 105.0 130.84 MB

Kubernetes Universal Declarative Operator (KUDO)

Home Page: https://kudo.dev

License: Apache License 2.0

Dockerfile 0.21% Makefile 0.76% Go 96.34% Shell 2.69%
cluster cncf crd hacktoberfest kafka kubernetes kubernetes-community kubernetes-controller kubernetes-operator kudo maestro mysql operator sdk zookeeper

kudo's People

Contributors

alenkacz avatar aneumann82 avatar anthonydahanne avatar djannot avatar fabianbaier avatar gerred avatar gkleiman avatar guenter avatar harryge00 avatar hypnoglow avatar jbarrick-mesosphere avatar joerg84 avatar k8s-ci-robot avatar kensipe avatar mattj-io avatar meichstedt avatar mpereira avatar nikhita avatar philips avatar porridge avatar rishabh96b avatar runyontr avatar shaneutt avatar sivaramsk avatar spiffxp avatar voelzmo avatar wking avatar yankcrime avatar zen-dog avatar zmalik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kudo's Issues

CreateOrUpdate function fix

In the plan controller, the line:

result, err := controllerutil.CreateOrUpdate(context.TODO(), r.Client, obj, func(runtime.Object) error { return nil })

needs to be fixed. The last argument of this function is supposed to capture the modifications to the object pulled from the server. Need to replace it with something like this from instance_controller.go

did, err := controllerutil.CreateOrUpdate(context.TODO(), mgr.GetClient(), current, func(o runtime.Object) error {
   t := true
   o.(*maestrov1alpha1.PlanExecution).Spec.Suspend = &t
   return nil
})

Custom Overrides for PlanExecutions

For some plans, it may make sense to run on an instance with a different value than specified by the Instance or FrameworkVersion, e.g. backup/restore. An example spec might look like:

apiVersion: maestro.k8s.io/v1alpha1
kind: PlanExecution
metadata:
  name: small-backup
  namespace: default
  ownerReferences:
  - apiVersion: maestro.k8s.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Instance
    name: small
    uid: a1fc8f64-fa54-11e8-8673-08002795d782
spec:
  instance:
    kind: Instance
    name: small
    namespace: default
  planName: backup
  arguments:
    BACKUP_LOCATION: s3://backup-bucket/data.sql

Some parameters should not be over-rideable and we'd want to include that in the parameter definition in the FrameworkVersion.

Call plan explicitly

Some plans are determined by specific changes in states (e.g. Version update means to call upgrade).

For specific plans (e..g createTopic), a mechanism needs to be in place to explicitly call plans

Upgrade Kubernetes version to support 1.13

Currently requires Kubebuilder to support Kubernetes 1.12. kubernetes-sigs/cluster-api#522

When trying to update to Kubernetes 1.12, we get the following error:

Solving failure: No versions of sigs.k8s.io/controller-runtime met constraints:
    v0.1.7: Could not introduce sigs.k8s.io/[email protected], as it has a dependency on k8s.io/client-go with constraint kubernetes-1.11.2, which has no overlap with existing constraint ^9.0.0 from (root)
    v0.1.6: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    v0.1.5: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    v0.1.4: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    v0.1.3: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    v0.1.2: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    v0.1.1: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    master: Could not introduce sigs.k8s.io/controller-runtime@master, as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    admissionwebhook: Could not introduce sigs.k8s.io/controller-runtime@admissionwebhook, as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    admissionwebhook-1.11: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    bulk_deposit: Could not introduce sigs.k8s.io/controller-runtime@bulk_deposit, as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    release-0.1: Could not introduce sigs.k8s.io/[email protected], as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.
    review: Could not introduce sigs.k8s.io/controller-runtime@review, as it is not allowed by constraint ^0.1.7 from project github.com/kubernetes-sigs/kubebuilder-maestro.

Once complete, re-running dep ensure should update the client-go library.

Create maestroctl CLI command

Create the initial framework of a Maestro CLI command as a skeleton for future sub-commands that can be used to manipulate Maestro-specific CRDs or an API aggregation layer if one exists.

Create Dockerfile and build instructions

Create a Dockerfile, Docker images for released version. This can be a manual process for this image for purposes of a Kubecon release. Another release process automation issue will be created.

Support Dependent Instances

Allow for FrameworkInstances to reference other FrameworkInstances to satisfy a dependency. For example Kafka needs an instance of Zookeeper if the zookeeper.url is not provided

Annotate configurable values for FrameworkVersion

To help with kubectl describe output and facilitate other tooling, as well as help users discover configuration for a particular service, let's add descriptions to the FrameworkVersion default parameters configuration values.

Changing parameters in Instance won't trigger actions

Expected Behavior

When changing parameters in an Instance, e.g. BROKERS_COUNT: "4" for Kafka or FLINK_TASKMANAGER_REPLICAS: "3" for Flink I would expect Maestro to scale up/down accordingly.

Observed Behavior

Nothing really happens, though I see some (?unrelated) error messages as well as information about plans that could lead us on the right path.

maestro-demo $ cat flink-instance.yaml 
apiVersion: maestro.k8s.io/v1alpha1
kind: Instance
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
    framework: flink
  name: flink # this is the instance label which will lead the pod name
spec:
  frameworkVersion:
    name: flink-1.7
    namespace: default
    type: FrameworkVersion
maestro-demo $ kubectl apply -f flink-instance.yaml 
instance.maestro.k8s.io/flink created
maestro-demo $ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
flink-jobmanager-69cd6768b9-bjr2v   1/1     Running   0          58s
flink-taskmanager-d57b9c8bc-d49z2   1/1     Running   0          56s
flink-taskmanager-d57b9c8bc-ng8rv   1/1     Running   0          56s
maestro-demo $ cat flink-instance.yaml 
apiVersion: maestro.k8s.io/v1alpha1
kind: Instance
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
    framework: flink
  name: flink # this is the instance label which will lead the pod name
spec:
  frameworkVersion:
    name: flink-1.7
    namespace: default
    type: FrameworkVersion
  parameters:
    FLINK_TASKMANAGER_REPLICAS: "3"
maestro-demo $ kubectl apply -f flink-instance.yaml 
instance.maestro.k8s.io/flink configured

I would expect now to have another taskmanager running, but instead I get:

2019/01/28 21:02:28 Error getting FrameworkVersion flink-1.7 for instance flink: FrameworkVersion.maestro.k8s.io "flink-1.7" not found

although the framework exists:

maestro-demo $ kubectl get frameworkversions
NAME        CREATED AT
flink-1.7   8m

I tested this as well with Kafka where I see the same behavior (error message 2019/01/28 21:28:14 Error getting FrameworkVersion zookeeper-1.0 for instance zk: FrameworkVersion.maestro.k8s.io "zookeeper-1.0" not found)

Full log

2019/01/28 21:02:28 Recieved create event for &{{Instance maestro.k8s.io/v1alpha1} {flink  default /apis/maestro.k8s.io/v1alpha1/namespaces/default/instances/flink 56e2f7a8-2382-11e9-822f-42010a800154 2949 1 2019-01-28 20:57:12 -0800 PST <nil> <nil> map[framework:flink controller-tools.k8s.io:1.0] map[kubectl.kubernetes.io/last-applied-configuration:{"apiVersion":"maestro.k8s.io/v1alpha1","kind":"Instance","metadata":{"annotations":{},"labels":{"controller-tools.k8s.io":"1.0","framework":"flink"},"name":"flink","namespace":"default"},"spec":{"frameworkVersion":{"name":"flink-1.7","namespace":"default","type":"FrameworkVersion"},"parameters":{"FLINK_TASKMANAGER_REPLICAS":"3"}}}
] [] nil [] } {{ default flink-1.7    } [] map[FLINK_TASKMANAGER_REPLICAS:3]} {{ default flink-deploy-997290000 f4bf9ccc-2382-11e9-822f-42010a800154   } COMPLETE}}
2019/01/28 21:02:28 Error getting FrameworkVersion flink-1.7 for instance flink: FrameworkVersion.maestro.k8s.io "flink-1.7" not found
2019/01/28 21:02:28 Adding flink-deploy-997290000 to reconcile
2019/01/28 21:02:28 Adding flink-deploy-997290000 to reconcile
{"level":"info","ts":1548738148.9631062,"logger":"kubebuilder.controller","caller":"controller/controller.go:134","msg":"Starting Controller","Controller":"instance-controller"}
{"level":"info","ts":1548738148.963074,"logger":"kubebuilder.controller","caller":"controller/controller.go:134","msg":"Starting Controller","Controller":"planexecution-controller"}
{"level":"info","ts":1548738148.9630651,"logger":"kubebuilder.controller","caller":"controller/controller.go:134","msg":"Starting Controller","Controller":"framework-controller"}
{"level":"info","ts":1548738148.9630919,"logger":"kubebuilder.controller","caller":"controller/controller.go:134","msg":"Starting Controller","Controller":"frameworkversion-controller"}
{"level":"info","ts":1548738149.0641012,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"planexecution-controller","WorkerCount":1}
2019/01/28 21:02:29 PlanExecution flink-deploy-243510000 has already run to completion, not processing.
2019/01/28 21:02:29 PlanExecution flink-deploy-997290000 has already run to completion, not processing.
{"level":"info","ts":1548738149.0677109,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"instance-controller","WorkerCount":1}
{"level":"info","ts":1548738149.067785,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"frameworkversion-controller","WorkerCount":1}
{"level":"info","ts":1548738149.0677068,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"framework-controller","WorkerCount":1}
2019/01/28 21:02:29 FrameworkController: Recieved Reconcile request for flink
2019/01/28 21:02:29 FrameworkVersionController: Recieved Reconcile request for flink-1.7
2019/01/28 21:02:40 InstanceController: UpdateInstance: Going to call plan deploy
2019/01/28 21:02:40 Current Plan for Instance is already done, wont change the Suspend flag
2019/01/28 21:02:40 InstanceController: Recieved Reconcile request for flink
2019/01/28 21:02:40 Old and new spec matched...
2019/01/28 21:02:40 InstanceController: UpdateInstance: Going to call plan 
2019/01/28 21:02:40 InstanceController: Recieved Reconcile request for flink
2019/01/28 21:02:40 Phase 0 Step 0 has 3 objects
2019/01/28 21:02:40 CreateOrUpdate resulted in: unchanged
2019/01/28 21:02:40 Unkonwn type is marked healthy by default
2019/01/28 21:02:40 CreateOrUpdate resulted in: unchanged
2019/01/28 21:02:40 Unkonwn type is marked healthy by default
2019/01/28 21:02:40 CreateOrUpdate resulted in: unchanged
2019/01/28 21:02:40 Unkonwn type is marked healthy by default
2019/01/28 21:02:40 Phase flink has strategy serial
2019/01/28 21:02:40 Phase flink marked as serial
2019/01/28 21:02:40 Step jobmanager is healthy, so I can continue on
2019/01/28 21:02:40 Step jobmanager looked at
2019/01/28 21:02:40 Phase flink is healthy
2019/01/28 21:02:40 Phase flink marked as healthy
2019/01/28 21:02:40 Phase flink is healthy
2019/01/28 21:02:40 PlanExecution flink-deploy-28047000 has already run to completion, not processing.

Kustomize Label Additions prevent services from finding Deployments

Kustomize provides a feature to automatically add the labels on the service object into the selector spec. This typically provides consistency for objects created, but can cause issues for pods created from a different STEP than the service:

Consider the following:

 plans:
    deploy:
      strategy: serial
      phases:
        - name: all
          strategy: parallel
          steps:
            - name: services
              tasks:
              - services
            - name: deployment
              tasks:
              - deployment

The service object

    service.yaml: |
      apiVersion: v1
      kind: Service
      metadata:
        name: appinfo
        labels:
          app: appinfo
      spec:
        ports:
        - port: 8080
          protocol: TCP
        selector:
          app: appinfo
        type: LoadBalancer

Has the following label selectors:

$ kubectl get svc bird-appinfo -o jsonpath="{.spec.selector['step']}"
services

whereas the deployment gets created with the following label

$kubectl get deployments bird-appinfo -o jsonpath="{ .metadata.labels['step']}"
deployment

This added label for the specific step was added to be able to differentiate Jobs that need to be re-created even when they have the same name.

Naming Conventions

We currently have Frameworks, FrameworkVersions and Instances. Should Instances be renamed to FrameworkInstances to better mark that its part of these sets of CRDS?

Kubernetes Objects should be Autogenerated from ServiceSpec

We currently hardcode a set of kubernetes objects and default values. These should be pulled from the following files in the universe:

  1. svc.yml - This should define most of the Kubernetes objects. The Plan section in this file should dictate the operations as the Instance moves around its lifecycle
  2. config.json - Shows the available parameters for a FrameworkVersion
  3. marathon.json.mustache - Tracks the conversions of parameters to environment variables
  4. resource.json - Tracks binaries required for creation of Docker Image used for FrameworkVersion
  5. package.json - Tracks metadata about the FrameworkVersion
  6. mustache files in src/main/dist/ for the framework. These should map to ConfigMap objects

Are there any other files that are used as input into a FrameworkVersion?

Error Message when framework/frameworkversion does not exist needs to be more descriptive

Right now when we install an instance of a framework1 as part of another framework0 without having the framework/frameworkversion both properly installed, the presented error message helps little to point us in the right direction:

Error getting PlaneExecution /: PlanExecution.maestro.k8s.io "" not found

This can be easily reproduced by trying to install the flink-demo that has zookeeper as a dependency without having the framework/frameworkversion of zookeeper installed.

What would be more intuitive to the user is that there is an error message saying the framework or frameworkversion for that instance wasn't found.

The longer error message:

2019/01/22 12:04:20 Obj is NOT healthy: &{{ } {demo-zk  default /apis/maestro.k8s.io/v1alpha1/namespaces/default/instances/demo-zk e73c3016-1e80-11e9-b99e-08002788f190 2444 1 2019-01-22 12:04:19 -0800 PST <nil> <nil> map[heritage:maestro phase:dependencies instance:demo plan:deploy step:zookeeper version: framework:zookeeper app:flink-financial-demo controller-tools.k8s.io:1.0 planexecution:demo-deploy-109789000] map[] [{maestro.k8s.io/v1alpha1 Instance demo e6baa583-1e80-11e9-b99e-08002788f190 0xc000daa7ac 0xc000daa7ad}] nil [] } {{ default zookeeper-1.0    } [] map[ZOOKEEPER_CPUS:0.3]} {{      } }}
2019/01/22 12:04:20 Phase dependencies has strategy serial
2019/01/22 12:04:20 Phase dependencies marked as serial
2019/01/22 12:04:20 Step zookeeper isn't complete, skipping rest of steps in phase until it is
2019/01/22 12:04:20 Phase dependencies is not healthy b/c step zookeeper is not healthy
2019/01/22 12:04:20 Phase dependencies not healthy, and plan marked as serial, so breaking.
2019/01/22 12:04:20 Phase dependencies is not healthy b/c step zookeeper is not healthy

We actually have that error message at another part already. When we try to install just zookeeper via kubectl apply -f zookeeper-instance.yaml without having the framework/frameworkversion installed as a prerequisite, we end up with a more descriptive message:

2019/01/22 12:07:01 Error getting FrameworkVersion zookeeper-1.0 for instance zk: FrameworkVersion.maestro.k8s.io "zookeeper-1.0" not found

Long error message:

2019/01/22 12:06:50 Could not find planExecution demo-deploy-109789000: PlanExecution.maestro.k8s.io "demo-deploy-109789000" not found
2019/01/22 12:06:50 Error getting instance object: Instance.maestro.k8s.io "demo" not found
2019/01/22 12:07:01 Recieved create event for &{{ } {zk  default /apis/maestro.k8s.io/v1alpha1/namespaces/default/instances/zk 47b56197-1e81-11e9-b99e-08002788f190 2629 1 2019-01-22 12:07:01 -0800 PST <nil> <nil> map[framework:zookeeper controller-tools.k8s.io:1.0] map[kubectl.kubernetes.io/last-applied-configuration:{"apiVersion":"maestro.k8s.io/v1alpha1","kind":"Instance","metadata":{"annotations":{},"labels":{"controller-tools.k8s.io":"1.0","framework":"zookeeper"},"name":"zk","namespace":"default"},"spec":{"frameworkVersion":{"name":"zookeeper-1.0","namespace":"default","type":"FrameworkVersions"},"name":"zk","parameters":{"ZOOKEEPER_CPUS":"0.3"}}}
] [] nil [] } {{ default zookeeper-1.0    } [] map[ZOOKEEPER_CPUS:0.3]} {{      } }}
2019/01/22 12:07:01 Error getting FrameworkVersion zookeeper-1.0 for instance zk: FrameworkVersion.maestro.k8s.io "zookeeper-1.0" not found

Naming Conventions

We currently have Frameworks, FrameworkVersions and Instances. Should Instances be renamed to FrameworkInstances to better mark that its part of these sets of CRDS?

Implement binary release process

This binary release process should cut binaries to the Github releases. This will facilitate users to get the operator and maestroctl binaries.

Check Deployment Health

Currently stubbed out:

func IsHealthy(c client.Client, obj runtime.Object) error {

	switch obj.(type) {
	case *appsv1.Deployment:
		d := obj.(*appsv1.Deployment)
		log.Printf("Deployment %v is marked healthy\n", d.Name)
		return nil

Will look at:
Check the number of ReadyPods in the Status match the number of Replicas in the spec

make run fails

$ make run
go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run ./cmd/manager/main.go
# github.com/maestrosdk/maestro/vendor/k8s.io/client-go/transport
vendor/k8s.io/client-go/transport/round_trippers.go:437:9: undefined: strings.Builder
# github.com/maestrosdk/maestro/vendor/sigs.k8s.io/kustomize/pkg/target
vendor/sigs.k8s.io/kustomize/pkg/target/kusttarget.go:89:5: dec.DisallowUnknownFields undefined (type *json.Decoder has no field or method DisallowUnknownFields)
make: *** [run] Error 2

Non- Preemptable Plans

Some plans should not be interupted. For example creating backups of databases should be completed before continuing to another plan.

Add plan parameter to set plan as non-preemptable. By default they will be preemptable

make install doesn't work

$ make install
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
Breaking recursion for type github.com/maestrosdk/maestro/pkg/apis/maestro/v1alpha1.FrameworkVersionCRD manifests generated under '/Users/djannot/Documents/go/src/github.com/maestrosdk/maestro/config/crds' 
RBAC manifests generated under '/Users/djannot/Documents/go/src/github.com/maestrosdk/maestro/config/rbac' 
kubectl apply -f config/crds
customresourcedefinition "frameworks.maestro.k8s.io" configured
error validating "config/crds/maestro_v1alpha1_frameworkversion.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.dependencies.items): invalid type for io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaPropsOrArray: got "map", expected "", ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.parameters.items): invalid type for io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaPropsOrArray: got "map", expected "", ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.upgradableFrom.items): invalid type for io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaPropsOrArray: got "map", expected ""]; if you choose to ignore these errors, turn validation off with --validate=false
error validating "config/crds/maestro_v1alpha1_instance.yaml": error validating data: ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.dependencies.items): invalid type for io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaPropsOrArray: got "map", expected ""; if you choose to ignore these errors, turn validation off with --validate=false
error validating "config/crds/maestro_v1alpha1_planexecution.yaml": error validating data: ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.status.properties.phases.items): invalid type for io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaPropsOrArray: got "map", expected ""; if you choose to ignore these errors, turn validation off with --validate=false
make: *** [install] Error 1

Zookeeper Example on Minikube missing Memory Requirements

Running zookeper example on the minikube started with the default memory options results in:

kubectl get pod
NAME      READY   STATUS    RESTARTS   AGE
zk-zk-0   1/1     Running   0          7m7s
zk-zk-1   0/1     Pending   0          7m7s
zk-zk-2   0/1     Pending   0          7m7s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  105s (x5 over 106s)  default-scheduler  pod has unbound immediate PersistentVolumeClaims
  Warning  FailedScheduling  97s (x20 over 105s)  default-scheduler  0/1 nodes are available: 1 Insufficient memory.

Suggestion: document memory requirements for minikube environment. For example
minikube start --memory 4096, etc.

Kafka Framework fails when running on a GKE cluster

The current version of Kafka we use in our framework won't work with GKE:

small-kafka-0   0/1       Pending   0         0s
small-kafka-0   0/1       Pending   0         0s
small-kafka-0   0/1       Pending   0         3s
small-kafka-0   0/1       ContainerCreating   0         3s
small-kafka-0   0/1       Running   0         15s
small-kafka-0   0/1       Error     0         17s
small-kafka-0   0/1       Running   1         18s
small-kafka-0   0/1       Error     1         20s
small-kafka-0   0/1       CrashLoopBackOff   1         33s
small-kafka-0   0/1       Running   2         34s
small-kafka-0   0/1       Error     2         36s

The exact error message is:

[2019-01-23 01:16:07,655] INFO Loading logs. (kafka.log.LogManager)
[2019-01-23 01:16:07,663] ERROR There was an error in one of the threads during logs loading: kafka.common.KafkaException: Found directory /var/lib/kafka/lost+found, 'lost+found' is not in the form of topic-partition
If a directory does not contain Kafka topic data it should not exist in Kafka's log directory (kafka.log.LogManager)
[2019-01-23 01:16:07,664] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.KafkaException: Found directory /var/lib/kafka/lost+found, 'lost+found' is not in the form of topic-partition
If a directory does not contain Kafka topic data it should not exist in Kafka's log directory
	at kafka.log.Log$.exception$1(Log.scala:1131)
	at kafka.log.Log$.parseTopicPartitionName(Log.scala:1139)
	at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:153)
	at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2019-01-23 01:16:07,667] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
[2019-01-23 01:16:07,670] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2019-01-23 01:16:07,677] INFO EventThread shut down for session: 0x268784681130001 (org.apache.zookeeper.ClientCnxn)
[2019-01-23 01:16:07,677] INFO Session: 0x268784681130001 closed (org.apache.zookeeper.ZooKeeper)
[2019-01-23 01:16:07,681] INFO [Kafka Server 0], shut down completed (kafka.server.KafkaServer)
[2019-01-23 01:16:07,682] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
kafka.common.KafkaException: Found directory /var/lib/kafka/lost+found, 'lost+found' is not in the form of topic-partition
If a directory does not contain Kafka topic data it should not exist in Kafka's log directory
	at kafka.log.Log$.exception$1(Log.scala:1131)
	at kafka.log.Log$.parseTopicPartitionName(Log.scala:1139)
	at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:153)
	at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2019-01-23 01:16:07,685] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)

Which is a known problem, e.g. see here: vmware-archive/kubeless#460

The only workaround I see is

  1. A clean-up step as also stated in the referenced kubeless issue above
  2. Using confluentinc/cp-kafka which is used by https://github.com/helm/charts/tree/master/incubator/kafka

See also Slack conversation: https://kubernetes.slack.com/archives/C09NXKJKA/p1548207090459900

Kudo is not cleaning up PersistentVolumeClaims after an instance was deleted

Expected Behavior

When uninstalling an instance, e.g. Kafka I would assume all created artifacts are being cleaned/garbage collected so that when I install an instance with the same name the next time I start from scratch again.

Observed Behavior

It looks like PVCs are not being cleaned up after a delete.

maestro-demo $ kubectl get pvc
NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-small-kafka-0   Bound    pvc-a028c4a1-2384-11e9-822f-42010a800154   1Gi        RWO            standard       24m
datadir-small-kafka-1   Bound    pvc-38f26e09-2386-11e9-822f-42010a800154   1Gi        RWO            standard       12m
datadir-small-kafka-2   Bound    pvc-4b97632c-2386-11e9-822f-42010a800154   1Gi        RWO            standard       12m
datadir-zk-zk-0         Bound    pvc-9fcc977f-2385-11e9-822f-42010a800154   2Gi        RWO            standard       17m
datadir-zk-zk-1         Bound    pvc-9fd0feed-2385-11e9-822f-42010a800154   2Gi        RWO            standard       17m
datadir-zk-zk-2         Bound    pvc-9fd7d9f9-2385-11e9-822f-42010a800154   2Gi        RWO            standard       17m

This leads to behavior, where when you install a frameworkversion (e.g. flink-financial-demo) and then deleting that the data still persists. In this case it would be in Kafka. When then re-installing the demo again you will see that Kafka magically shows you in its logs the data from the previous deleted instance, which lets also the actor in the demo immediately display detected fraud entries where those are relicts of the past.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.