Giter Site home page Giter Site logo

zalando-incubator / es-operator Goto Github PK

View Code? Open in Web Editor NEW
352.0 352.0 44.0 705 KB

Kubernetes Operator for Elasticsearch

Dockerfile 0.10% Makefile 1.49% Go 95.99% Shell 2.43%
elasticsearch elasticsearch-operator kubernetes kubernetes-operator operator

es-operator's People

Contributors

aywa avatar dependabot-preview[bot] avatar dependabot[bot] avatar linki avatar mikkeloscar avatar njuettner avatar otrosien avatar perploug avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

es-operator's Issues

Improved scaling by disabling ES auto-rebalancing

Our current node-group based index allocation is mainly due to the fact that the traffic pattern for certain indices is similar. This served fairly well in the past, but it has certain limitations.

  • ES own rebalancing logic doesn't always choose the best node to locate from / to, because it only considers number of shards, not actual load on the system
  • Indices cannot be scaled up in isolation

As a result we can end up with sub-optimal resource utilisation in our cluster: While some nodes may be under-utilised, other nodes could offload some shards there to balance their load, before having to scale up.

The proposed solution may look like this: Based on the assumption that all nodes should be utilised equally we try to manually balance the shard-to-node allocation in es-operator. Taking a cost-function we can try to optimise the shard-to-node allocation.

Irritating log message

Expected Behavior

Only log that es-operator is scaling, if there is an actual change in replicas

Actual Behavior

es-operator keeps logging "Updating desired scaling for EDS .... New desired replicas: 20. Decreasing node replicas to 20.", although the current replicas are already 20 (=minReplicas)

Steps to Reproduce the Problem

One EDS that shows this behaviour

spec:
  replicas: 20
  scaling:
    diskUsagePercentScaledownWatermark: 0
    enabled: true
    maxIndexReplicas: 4
    maxReplicas: 40
    maxShardsPerNode: 30
    minIndexReplicas: 4
    minReplicas: 20
    minShardsPerNode: 12
    scaleDownCPUBoundary: 25
    scaleDownCooldownSeconds: 600
    scaleDownThresholdDurationSeconds: 600
    scaleUpCPUBoundary: 40
    scaleUpCooldownSeconds: 120
    scaleUpThresholdDurationSeconds: 60

Specifications

Image registry.opensource.zalan.do/poirot/es-operator:v0.1.0-17-gd237530

Shard replica added before statefull set ready

Expected Behavior

Shard replica should not be added, before node are ready.
As you can see in the actual behavior, because the EDS is updated ? I guess the existing operator get cancel, and restarted, but does not check for ready of the statefulsets

time="2020-09-24T09:00:52Z" level=error msg="Failed to operate resource: failed to rescale StatefulSet: StatefulSet es/data-id-id-v2 is not stable: 2/4 replicas ready"

Actual Behavior

time="2020-09-24T09:00:52Z" level=info msg="Scaling hint: UP" eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Updating last scaling event in EDS 'es/data-id-id-v2'"
time="2020-09-24T09:00:52Z" level=info msg="Waiting for operation to stop" eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Terminating operator loop." eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Updating desired scaling for EDS 'es/data-id-id-v2'. New desired replicas: 4. Keeping shard-to-node ratio (1.00), and increasing index replicas."
time="2020-09-24T09:00:52Z" level=info msg="Waiting for operation to stop" eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Terminating operator loop." eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"es\", Name:\"data-id-id-v2\", UID:\"c63525d9-2b50-4b7d-9a34-102fed5c8327\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"12281498\", FieldPath:\"\"}): type: 'Normal' reason: 'UpdatedStatefulSet' Updated StatefulSet 'es/data-id-id-v2'"
time="2020-09-24T09:00:52Z" level=info msg="Waiting for operation to stop" eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"es\", Name:\"data-id-id-v2\", UID:\"c63525d9-2b50-4b7d-9a34-102fed5c8327\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"12281502\", FieldPath:\"\"}): type: 'Normal' reason: 'ChangingReplicas' Changing replicas 2 -> 4 for StatefulSet 'es/data-id-id-v2'"
time="2020-09-24T09:00:52Z" level=info msg="StatefulSet es/data-id-id-v2 has 2/4 ready replicas"
time="2020-09-24T09:00:52Z" level=error msg="Failed to operate resource: failed to rescale StatefulSet: StatefulSet es/data-id-id-v2 is not stable: 2/4 replicas ready"
time="2020-09-24T09:00:52Z" level=info msg="Terminating operator loop." eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Waiting for operation to stop" eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Setting number_of_replicas for index 'id_id_v2' to 1." endpoint="http://data-id-id-v2.es.svc.cluster.local.:9200"
time="2020-09-24T09:00:52Z" level=info msg="Terminating operator loop." eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Waiting for operation to stop" eds=data-id-id-v2 namespace=es
time="2020-09-24T09:00:52Z" level=info msg="Terminating operator loop." eds=data-id-id-v2 namespace=es
time="2020-09-24T09:01:22Z" level=info msg="Not scaling up, currently in cool-down period." eds=data-id-id-v2 namespace=es

Steps to Reproduce the Problem

  1. Having an EDS with auto scaling enable (And set for example 1 max shards)
  2. Having a k8s without node ready to run a new ES pod
  3. create some load on the shard
  4. the operator will auto scaling -> but will skip the wait for "stable" logic and replica will be created directly

Specifications

  • Version: operator: 0.1.1, es: 7.9.0

ES Operator doesn't scale-down to its boundary

Expected Behavior

When I specify maxShardsPerNode: X I would assume that the ES Operator would still allow scaling to that X shards per node target, not stop before.

Actual Behavior

The boundary is seen non-inclusive, which means we don't save as much cost as we could.

Steps to Reproduce the Problem

  1. Create an EDS with 4 maxReplicas: 4, minReplicas: 3, maxShardsPerNode: 40
  2. Allocate 5 indices, with 1 index replica, and 12 primaries. (= 5212 = 120 shards, 30 shards per node)
  3. Assuming the issue wouldn't exist, the EDS would eventually scale down to 3 nodes (= 120/3 = 40 shards per node)

getting-started: example roles.yaml for users with only namespace permissions

Expected Behavior

users that have only namespace-wide permissions to define Roles (and forbidden from defining ClusterRoles) should have an option to play with.

Actual Behavior

kubectl apply -f docs/cluster-roles.yaml fails when the user does not have cluster-wide privileges.

Steps to Reproduce the Problem

  1. configure kubectl against a cluster where user has only namespace-wide privileges for Role definition
  2. kubectl apply -f docs/cluster-roles.yaml leads to:
serviceaccount/operator created
Error from server (Forbidden): error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=clusterroles", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=ClusterRole"
Name: "es-operator", Namespace: ""
Object: &{map["apiVersion":"rbac.authorization.k8s.io/v1" "kind":"ClusterRole" "metadata":map["annotations":map["kubectl.kubernetes.io/last-applied-configuration":""] "name":"es-operator"] "rules":[map["apiGroups":["<xxxx>"] "resources":["elasticsearchdatasets" "elasticsearchdatasets/status" "elasticsearchmetricsets" "elasticsearchmetricsets/status"] "verbs":["get" "list" "watch" "update" "patch"]] map["apiGroups":[""] "resources":["pods" "services"] "verbs":["get" "watch" "list" "create" "update" "patch" "delete"]] map["apiGroups":["apps"] "resources":["statefulsets"] "verbs":["get" "create" "update" "patch" "delete" "watch" "list"]] map["apiGroups":["policy"] "resources":["poddisruptionbudgets"] "verbs":["get" "create" "update" "patch" "delete" "watch" "list"]] map["apiGroups":[""] "resources":["events"] "verbs":["create" "patch" "update"]] map["apiGroups":[""] "resources":["nodes"] "verbs":["get" "list" "watch"]] map["apiGroups":["metrics.k8s.io"] "resources":["pods"] "verbs":["get" "list" "watch"]]]]}
from server for: "cluster-roles.yaml": clusterroles.rbac.authorization.k8s.io "es-operator" is forbidden: User "<xxxx>" cannot get resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope
Error from server (Forbidden): error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=clusterrolebindings", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding"
Name: "es-operator", Namespace: ""
Object: &{map["apiVersion":"rbac.authorization.k8s.io/v1" "kind":"ClusterRoleBinding" "metadata":map["annotations":map["kubectl.kubernetes.io/last-applied-configuration":""] "name":"es-operator"] "roleRef":map["apiGroup":"rbac.authorization.k8s.io" "kind":"ClusterRole" "name":"es-operator"] "subjects":[map["kind":"ServiceAccount" "name":"operator" "namespace":"kube-system"]]]}
from server for: "cluster-roles.yaml": clusterrolebindings.rbac.authorization.k8s.io "es-operator" is forbidden: User "<xxxx>" cannot get resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope

Specifications

  • Version: master @ d237530
  • Platform: k8s on aws
  • Subsystem:

Support scaling EDS down to 0 instances

Expected Behavior

When updating an EDS without auto-scaling, one should be able to specify desired replicas to 0 in order to drain all data from an EDS.

Actual Behavior

The EDS change is being rejected by the verification of the manifest.

Steps to Reproduce the Problem

  1. Create an EDS with scaling.enabled=false and replicas=1
  2. Update the EDS to set replicas=0

Handling for system indices

ES-Operator doesn't separate system indices and ordinary users indices.
It's already added a deprecation alert about new default behavior for system indices, e.g.

Deprecation: this request accesses system indices: [.kibana_2], but in a future major version, direct access to system indices will be prevented by default

So it makes sense to exclude system indices from es-opeartor management.

Stored ObservedGeneration on EDS

Store observedGeneration on EDS status field such that it easy to validate the status of the EDS e.g. in terms of current number of replicas.

Make e2e test more efficient by deleting resources from successful tests

Currently we leave all resources after a test has been run and only let CDP clean everything up by deleting the namespace. If we instead cleaned up the resources of a test as it successfully succeeded, then we could save resources and other tests could use those resources instead avoid waiting for a new node in some situations.

Snap scaling among non-fractioned shard-to-node ratios

Expected Behavior

Given an EDS size of 5 nodes (for whatever reason it scaled to this number...), and an index with 1 replica and 4 primaries (ie. current shard-to-node ratio is 8/5 = 1.6) I would expect the next scaling up operation to snap to a non-fractioned shard-to-node ratio of 8/8 = 1.0.

Actual Behavior

The ES operator reduces the shard-to-node ratio by one, leading to 8/10, making a shard-to-node ratio of 0.8. This is an issue for several reasons:

a) Imbalance of load because some nodes get a different load than others
b) In this case, some nodes don't get any shards allocated at all, although one could mitigate this by setting maxReplicas.

"randomly" fails to operate on resources

From time to time the operator outputs these logs:

time="2019-04-27T09:01:59Z" level=error msg="Failed to operate resource: failed to ensure resources: resource name may not be empty"
time="2019-04-27T09:02:02Z" level=error msg="Failed to operate resource: failed to ensure resources: resource name may not be empty"
time="2019-04-27T09:02:05Z" level=error msg="Failed to operate resource: failed to ensure resources: PodDisruptionBudget poirot/es-data-prio2a is not owned by the ElasticsearchDataSet poirot/es-data-prio2a"
time="2019-04-27T09:02:09Z" level=error msg="Failed to operate resource: failed to ensure resources: resource name may not be empty"
time="2019-04-27T09:02:12Z" level=error msg="Failed to operate resource: failed to ensure resources: resource name may not be empty"

This essentially prevents it from operating on the resource meaning no scale-up/scale-down can happen. Restarting the operator fixes this for some time.

I fear that we are missing a deepcopy somewhere.

Prioritize autoscaling over rolling update

Expected Behavior

When an EDS needs to both scale and and be rolled because of an update to the EDS or a cluster update. The Operator should prefer to scale the EDS rather than rolling the pods as extra capacity might be more important than e.g. moving pods to new cluster nodes.

Actual Behavior

Currently the operator always checks if any pods needs to be drained before it consideres scaling the EDS. This means that if you have an EDS with say 20 pods and a cluster update is ongoing, then you could wait for all the 20 pods to be upgraded before a potential scale up could be applied. We saw this in production where an EDS was stuck at 35 pods, but the autoscaler recommended scaling to 48.

Proposed solution

Scale Up

I propose that we always favor scale-up over draining pods for a rolling upgrade. That is; if eds.Spec.Replicas > sts.Spec.Replicas then rescale the STS before doing anything else.

Scale Down

Scale down should generally also be favored over rolling upgrade because it's pointless to upgrade pods which would be scaled down anyway, however, it might make sense to favor moving pods on draining nodes before scaling down to ensure that a pod is moved before a node is forcefully terminated.

updating a PVC's size is not propagated to the StatefulSet

Expected Behavior

If I change the size of an existing PVC inside of a ElasticsearchDataSet, it should get propagated to the underlying StatefulSet's pod volumes

Actual Behavior

The StatefulSet does not change its properties and therefore nothing changes.

Steps to Reproduce the Problem

  1. deploy an ElasticsearchDataSet with a PersistentVolumeClaim
  2. afterwards, change the PVC disk size and update the EDS

Specifications

  • Version: registry.opensource.zalan.do/poirot/es-operator:v0.1.1-48-g8ffaa9b
  • Platform: K8s 1.15
  • Subsystem: ??

Avoid node draining during rolling upgrade when using PVC

Several people have asked if the operator can somehow avoid node draining in cases where you have a PVC and don't actually need the draining in order to safely move the data around.

If possible we should add a feature where we can prepare a node for update by making sure it doesn't get traffic (not sure how to do this) and then simply delete the pod and let Kubernetes schedule it and attach the PVC on the new pod. Once ready, it can again get traffic.

Draining would still be needed in case of scaledown to ensure there is no data loss (depending on index configuration).

Stale value in 'current-scaling-operation' causing ES Operator to fail

Expected Behavior

When an index does not exist (anymore) the es-operator should continue to work.

Actual Behavior

When an index that is referenced in the 'current-scaling-operation' doesn't exist anymore, the scaling fails because ES returns a 404 when trying to update number_of_replicas.

es-operator-85d68b858d-q87bh es-operator time="2019-04-28T16:08:43Z" level=info msg="Setting number_of_replicas for index 'index-a' to 1." endpoint="http://es-data-othera.poirot-test.svc.cluster.local.:9200"
apiVersion: zalando.org/v1
kind: ElasticsearchDataSet
metadata:
  annotations:
    es-operator.zalando.org/current-scaling-operation: '{"ScalingDirection":0,"NodeReplicas":3,"IndexReplicas":[{"index":"index-a","pri":5,"rep":1}],"Description":"Keeping
      shard-to-node ratio (35.67), and decreasing index replicas."}'

Steps to Reproduce the Problem

  1. scale up data nodes (including increase of index replicas)
  2. delete one index allocated on the nodes while the StatefulSet has not yet stabilized
  3. check the current-scaling-operation annotation, and watch ES-Operator fail trying to execute the scaling operation.

Apply minIndexReplicas on config change

It looks like that the es-operator respects minIndexReplicas value only on auto scaling actions.
So there is no convenient way of setting desired index replicas count.
This would be very useful for e.g. pre-scaling for some events.
The minIndexReplicas parameter and maybe other parameters like minShardsPerNode could be applied also on config change.

Unable to enable basic authentication

Unable to enable basic authentication for elastic search with username and password.

Is there is any other way to enable authentication for elastic search with username and password?

I am trying to enable xpack.security I am facing errors.

I have just added xpack.security.enabled: true in es-config.

apiVersion: v1

kind: ConfigMap

metadata:

  name: es-config

  namespace: es-operator-demo

data:

  elasticsearch.yml: |

    cluster.name: es-cluster

    network.host: "0.0.0.0"

    bootstrap.memory_lock: false

    discovery.seed_hosts: [es-master]

    cluster.initial_master_nodes: [es-master-0]

    xpack.security.enabled: true

I have tried different versions of elastic search dockers and also by changing the version of ES_JAVA_OPTS.

Used all the configuration given in https://github.com/zalando-incubator/es-operator/tree/master/docs.

Priority node selector should be compatible with the pod template

Expected Behavior

Creating a cluster definition where the pod template doesn't include either a nodeSelector or an affinity that's compatible with the priority node selector should either result in an error, or an automatically modified pod template so the cluster behaves correctly during the updates.

Actual Behavior

If the users don't define a nodeSelector or an affinity in the pod spec, but define a priority node selector in the operator configuration, it's highly likely that a rolling cluster update will not be handled correctly. For example, if the cluster management software is currently draining node A and the cluster pods live on nodes A, B and C that are scheduled to be drained, it's possible that the operator will just keep deleting pod on node B that would be rescheduled to the same node again and again, and will not actually proceed to node A.

Steps to Reproduce the Problem

  1. Setup an operator with a priority node selector.
  2. Setup a cluster whose pod template doesn't include the priority node selector at all.
  3. Observe broken behaviour during cluster updates.

Scaling operations should be cancellable

Expected Behavior

The ES Operator could decide to scale down, but while scaling down the thresholds for scale up are being exceeded, e.g. we're reaching a point where we should stop scaling down, or even scale back up.

Actual Behavior

At the moment the scaling operation is bound to finish before a new scaling operation can be started.

Steps to Reproduce the Problem

  1. create an EDS where scale up and scale down thresholds are close by
  2. put some more data on the nodes to slow down the scaling operation
  3. scale-down and watch out for "Scaling hint: UP" during the scale down operation

Improve README

Goals

  • Make the readme simpler - move background info to docs
  • Teaser and marketing text
  • Stop using the word simple

Infinite loop while updating EDS

First of all, I appreciate you sharing this operator. Currently, I'm gaining some hands-on experience with it and I'm encountering some strange behaviour (to my best knowledge). I'm trying to apply a new configuration to my EDS, but it's having trouble updating the individual pods. Please correct me if I'm doing anything stupid/unsupported.

Expected Behavior

  1. Apply yaml update on EDS
  2. ES operator update all pods within EDS by draining, deleting, and deploying new pods. (Pod by pod)

Actual Behavior

  1. Apply yaml update on EDS
  2. The operator starts draining the pod, and successfully deletes it.
  3. New pod is scheduled. However, ES operator directly shows that the pod should be updated.
  4. ES operator starts draining again and this continues as an infinity loop.

The logs below shows all relevant logging from a single "loop". Notice how it tells that it deleted pod demo/es-data1-0, and then directly pod demo/es-data1-0 should be updated.

Steps to Reproduce the Problem

  1. Set: enabled: true, minReplicas: 1, minIndexReplicas: 0
  2. Apply yaml update on EDS

Specifications

  • Version: es-operator:latest; elasticsearch-oss:7.5.1
  • Platform: Azure Kubernetes
  • Subsystem: any

Logs:

time="2021-05-20T13:12:25Z" level=info msg="Ensuring cluster is in green state" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DrainingPod' Draining Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:25Z" level=info msg="Disabling auto-rebalance" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Excluding pod demo/es-data1-0 from shard allocation" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Waiting for draining to finish" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Found 0 remaining shards on demo/es-data1-0 (10.244.3.147)" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DrainedPod' Successfully drained Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:26Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletingPod' Deleting Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:42Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletedPod' Successfully deleted Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:42Z" level=info msg="Setting exclude list to ''" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:42Z" level=info msg="Enabling auto-rebalance" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:43Z" level=info msg="Pod demo/es-data1-0 should be updated. Priority: 5 (NodeSelector,PodOldRevision,STSReplicaDiff)"
time="2021-05-20T13:12:43Z" level=info msg="Pod demo/es-data1-1 should be updated. Priority: 5 (NodeSelector,PodOldRevision,STSReplicaDiff)"
time="2021-05-20T13:12:43Z" level=info msg="Found 2 Pods on StatefulSet demo/es-data1 to update"
time="2021-05-20T13:12:43Z" level=info msg="StatefulSet demo/es-data1 has 1/2 ready replicas"

Implement disk-based scale-up logic

We could potentially find and resolve disk-issues before ES blocks writing to the index. At the moment the only disk-based check we do is to prevent scaling down in case of high disk usage.

wrong registry used in deployment

Expected Behavior

Deployment to just work

Actual Behavior

The image can't be pulled from the registry because I'm assuming it doesn't exists there

Warning  Failed     40m (x4 over 41m)      kubelet, xxxxxxxxxxx-central-1.compute.internal  Failed to pull image "pierone.stups.zalan.do/poirot/es-operator:latest": rpc error: code = Unknown desc = Error: image poirot/es-operator:latest not found
  Warning  Failed     40m (x4 over 41m)      kubelet, xxxxxxxxxxx-central-1.compute.internal  Error: ErrImagePull
  Warning  Failed     6m27s (x152 over 41m)  kubelet, ip-xxxxxxxxxxx-central-1.compute.internal  Error: ImagePullBackOff
  Normal   BackOff    83s (x173 over 41m)    kubelet, ip-xxxxxxxxxxx-central-1.compute.internal  Back-off pulling image "pierone.stups.zalan.do/poirot/es-operator:latest"

Steps to Reproduce the Problem

  1. kubectl apply -f https://raw.githubusercontent.com/zalando-incubator/es-operator/master/docs/deployment.yaml
  2. kubectl describe po es-operator-677d44db9f-98rx5

Specifications

  • Version: latest

[Feature] auto scaling configurations

In a cluster with a huge variance of usage, it is good to be able to set different configuration for auto scaling depending of the size of the cluster.

it would be good to set different minShardsPerNode, maxShardsPerNode, scaleUpCPUBoundary dependings of the size of the cluster.
Not sure what would be the correct syntax.
But for example adding a rules or overwrites part. And a selector like replicaLte (replica less than). The operator could check the overwrite part and fallback to default if there is none.

Before

  scaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 99
    minIndexReplicas: 1
    maxIndexReplicas: 40
    minShardsPerNode: 3
    maxShardsPerNode: 3
    scaleUpCPUBoundary: 75
    scaleUpThresholdDurationSeconds: 240
    scaleUpCooldownSeconds: 1000
    scaleDownCPUBoundary: 40
    scaleDownThresholdDurationSeconds: 1200
    scaleDownCooldownSeconds: 1200
    diskUsagePercentScaledownWatermark: 80

After

  scaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 99
    minIndexReplicas: 1
    maxIndexReplicas: 40
    minShardsPerNode: 3
    maxShardsPerNode: 3
    scaleUpCPUBoundary: 75
    scaleUpThresholdDurationSeconds: 240
    scaleUpCooldownSeconds: 1000
    scaleDownCPUBoundary: 40
    scaleDownThresholdDurationSeconds: 1200
    scaleDownCooldownSeconds: 1200
    diskUsagePercentScaledownWatermark: 80
    rules:
      - replicaLte: 2
        scaleUpCPUBoundary: 30
      - replicaLte: 4
        scaleUpCPUBoundary: 40
      - replicaLte: 10
        scaleUpCPUBoundary: 60

It is mainly for huge cost optimization. During night a cluster can be very small, but at early morning, the cluster need to be able to scale aggressively, but when cluster start to be big, it can scale slowly

I am willing to implement this feature if it makes sense for this project

Interest in contributing to OpenSearch

I have been watching the work you and the team are doing on the ElasticSearch Operator. I am not sure if you've seen the news that ElasticSearch has turned into a proprietary SSPL license and is no longer open source. AWS and others such as logz.io (my employer) have gotten behind a new Apache 2.0 fork which is focused on different goals than building the Elastic business model. We are interested in seeing if we can integrate and somehow use the nice job you've all done on the Operator with OpenSearch. Do you plan on moving to OpenSearch over at Zalando?

Thanks!

Feature request: Awareness of rolling restart requirement for version upgrade

An Elasticsearch version upgrade is a situation where the number of spare instances needs to exceed the number of index replicas in order to allow both primaries and replicas to be allocated on one of the new nodes. This is different from a normal rolling restart where one extra instance is enough.

To accommodate this, we need either need to make the es-operator aware of a version upgrade, and make treat this specially, or allow the users to define the spare instances in the EDS (e.g. spec.maxSurge) to control the es-operator behavior during the rolling restart. Or, we don't change anything, and users will need to control the version upgrade by temporarily increasing minReplicas.

Custom service

Question

I have an overall question about es-operator. I am migrating from the official ES helm chart (for the data nodes the other node are still deploy with helm). However I am stuck to the last part: configuring my ingress

On the official helm chart, I can setup an ingress / services like:

ingress:
  enabled: True
  annotations:
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/target-type: instance
      alb.ingress.kubernetes.io/subnets: subnet-XX,subnet-YY
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 9200}]'
      alb.ingress.kubernetes.io/healthcheck-path: '/_cluster/health'
  hosts: [""]
  path: "/*"
  tls: []

service:
  type: LoadBalancer

Which will create 2 services (one load balancer and one headless)

However I can not (or find a way) to setup this load balancer correctly. From what I saw from the code, the operator need to own the service and will create a NodePort service.

Does there is anyway to solve that ?

What I will do in the meantime is deploy some coordinate nodes with the load balancer, so I can access the Elasticsearch cluster (or do that on the master node for testing)

Report "current" size of EDS

Expected Behavior

When calling kubectl get eds the desired and current size of the EDS should be reported.

Actual Behavior

Only desired size is reported.

[Bug] IP stays in exclude list when draining fails

Expected Behavior

There are situations when ES will refuse to drain a given node (usually allocation constraints like max. number of shards per index and node). This will cause ES Operator to wait indefinitely for the draining to finish. At some point the scale-down event gets superseded by a scale-up event.

This should lead to the previously "to-be-drained" node to be used again.

Actual Behavior

What happens instead is that the IP stays in the cluster.routing.allocation.exclude._ip and the scale-up event only causes the statefulset to be updated, spawning new nodes. This leaves the node in a commissioned but unused state.

Steps to Reproduce the Problem

  1. Create a cluster with two nodes (minReplicas=1, maxReplicas=2, minIndexReplicas=0), add one index with two shards, no replicas and "routing.allocation.total_shards_per_node: 1"
  2. Wait for es-operator to start draining the second node, which will fail as ES rejects more than one shard of that same index onto the same node
  3. Trigger a scale-out event by putting some CPU load onto ES.
  4. Check :9200/_cluster/settings to see the IP being still in there.

Specifications

  • Version: latest
  • Platform: any
  • Subsystem: any

The ES setup fails at 200 RPS in GKE

Expected Behavior

Should be able to handle a lot more load easily

Actual Behavior

All pods - eds, es-master - pods are failing health checks. The entire cluster is just crashing.

Steps to Reproduce the Problem

1.kubectl -n elasticsearch-zalando describe elasticsearchdataset.zalando.org/es-data-zalando gives the following response:

Name:        es-storage
        Command:
          sysctl
          -w
          vm.max_map_count=262144
        Image:  busybox:1.27.2
        Name:   init-sysctl
        Resources:
          Limits:
            Cpu:     50m
            Memory:  50Mi
          Requests:
            Cpu:     50m
            Memory:  50Mi
        Security Context:
          Privileged:        true
      Service Account Name:  operator
      Volumes:
        Config Map:
          Items:
            Key:   elasticsearch.yml
            Path:  elasticsearch.yml
          Name:    es-config
        Name:      elasticsearch-config
  Volume Claim Templates:
    Metadata:
      Annotations:
        Volume . Beta . Kubernetes . Io / Storage - Class:  fast
      Creation Timestamp:                                   <nil>
      Name:                                                 es-storage
    Spec:
      Access Modes:
        ReadWriteOnce
      Data Source:  <nil>
      Resources:
        Requests:
          Storage:         100Gi
      Storage Class Name:  fast
    Status:
Status:
  Last Scale Up Started:  2019-09-18T17:30:22Z
  Observed Generation:    4
  Replicas:               5
Events:
  Type    Reason       Age                   From         Message
  ----    ------       ----                  ----         -------
  Normal  DrainingPod  4m10s (x278 over 5h)  es-operator  Draining Pod 'elasticsearch-zalando/es-data-zalando-0'

.

  1. Also, I am using a custom image built on top of the elasticsearch-zalando image. The custom image installs gcs plugins for snapshots.

Specifications

  • Version:
  • Platform:GCP
  • Subsystem:

Structured logging for autoscaler

Expected Behavior

Have structured context (EDS) for logged events in autoscaler.go.

Actual Behavior

No or unstructured context provided.

Scale index replicas independently

It is usually sufficient to scale up replicas of one index in a group: the one with the highest traffic. Benefit is increased efficiency of scaling operation, less wasted resources by adding replicas for indices that may not require it.

Implementation would require monitoring per-index or per-node CPU stats to identify the hot-spot in the cluster group. The indices allocated on this node are potential candidates for scaling out.

Tutorial to get started

README or docs should contain a complete tutorial to get started, i.e. including how to deploy Elasticsearch master nodes etc. The tutorial could use kind or Minikube.

Race condition for exclude._ip setting

Expected Behavior

Scaling down should work, even if two stacks get scaled down simultaneously.

Actual Behavior

Only the last IP gets persisted in the exclude._ip setting, and as such only one node gets drained correctly. The reason is that excludePodIP is not atomic, but needs to call ES to retrieve the current exclude list, and then update it.

Steps to Reproduce the Problem

  1. create two EDS joining the same cluster
  2. let them scale down simultaneously
  3. inspect the exclude._ip cluster setting, it might contain only one IP instead of two

Update spec.Replicas when scaling.MinReplicas is greater

In the case where spec.Replicas < scaling.MinReplicas the spec.Replicas is only automatically updated if there is a scaling operation (UP/DOWN).
This can cause confusion when looking at the EDS because status.Replicas would be greater than spec.Replicas and everything would be stable.

Would be helpful if the operator automatically adjusted spec.Replicas to at least scaling.MinReplicas when scaling is enabled and scaling.MinReplicas > spec.Replicas.

Unable to create "Persistent Volume Claims"

I am having trouble creating "Persistent Volume Claims". I request your support

Probably related to "volumeClaimTemplates" in "org_elasticsearchdatasets.yaml". Can you check ?

Exception Message:
create Claim -es-data-simple-0 for Pod es-data-simple-0 in StatefulSet es-data-simple failed error: PersistentVolumeClaim "-es-data-simple-0" is invalid: metadata.name: Invalid value: "-es-data-simple-0": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character

Screen Shot 2021-04-30 at 01 08 08

VCT example fails due to not setting max_map_count

Expected Behavior

Running https://github.com/zalando-incubator/es-operator/blob/master/docs/elasticsearchdataset-vct.yaml should result in a working cluster.

Actual Behavior

Running https://github.com/zalando-incubator/es-operator/blob/master/docs/elasticsearchdataset-vct.yaml fails with a crash loop back off, with a boot time check failure.

Steps to Reproduce the Problem

  1. Run https://github.com/zalando-incubator/es-operator/blob/master/docs/elasticsearchdataset-vct.yaml after creating a master

Specifications

  • Version:
  • Platform:
  • Subsystem:

The initContainer that sets vm.max_map_count is missing in the VCT example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.