Giter Site home page Giter Site logo

Comments (8)

mikkeloscar avatar mikkeloscar commented on August 22, 2024

You probably want to look at the logs of your pods to understand what is wrong or describe the pods to see the events. I would suspect they simply don't have enough resources defined.

from es-operator.

saurabh24292 avatar saurabh24292 commented on August 22, 2024

from es-operator.

saurabh24292 avatar saurabh24292 commented on August 22, 2024

The yaml file for es-data-group:

apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
operator.zalando.org/parent-generation: "4"
creationTimestamp: "2019-09-18T11:57:35Z"
generation: 11
labels:
application: elasticsearch
group: group1
role: data
name: es-data-zalando
namespace: elasticsearch-zalando
ownerReferences:

  • apiVersion: zalando.org/v1
    kind: ElasticsearchDataSet
    name: es-data-zalando
    uid: 80c23df5-da0b-11e9-8644-4201ac160004
    resourceVersion: "3332279"
    selfLink: /apis/apps/v1/namespaces/elasticsearch-zalando/statefulsets/es-data-zalando
    uid: 81047b88-da0b-11e9-8644-4201ac160004
    spec:
    podManagementPolicy: Parallel
    replicas: 5
    revisionHistoryLimit: 10
    selector:
    matchLabels:
    es-operator-dataset: es-data-zalando
    serviceName: es-data-zalando
    template:
    metadata:
    creationTimestamp: null
    labels:
    application: elasticsearch
    es-operator-dataset: es-data-zalando
    group: group1
    role: data
    spec:
    containers:
    • env:
      • name: node.name
        valueFrom:
        fieldRef:
        apiVersion: v1
        fieldPath: metadata.name
      • name: node.attr.group
        value: group1
      • name: node.master
        value: "false"
      • name: node.data
        value: "true"
        image: gcr.io/my-es-test/elasticsearch:6.6.0
        imagePullPolicy: IfNotPresent
        name: elasticsearch
        ports:
      • containerPort: 9300
        name: transport
        protocol: TCP
        readinessProbe:
        failureThreshold: 3
        httpGet:
        path: /_cat/master
        port: 9200
        scheme: HTTP
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 10
        resources:
        limits:
        cpu: "1"
        memory: 1500Mi
        requests:
        cpu: "1"
        memory: 1500Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
      • mountPath: /usr/share/elasticsearch/data
        name: es-storage
      • mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
        name: elasticsearch-config
        subPath: elasticsearch.yml
        dnsPolicy: ClusterFirst
        initContainers:
    • command:
      • sh
      • -c
      • chown -R 1000:1000 /usr/share/elasticsearch/data
        image: busybox
        imagePullPolicy: Always
        name: fix-the-volume-permission
        resources: {}
        securityContext:
        privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
      • mountPath: /usr/share/elasticsearch/data
        name: es-storage
    • command:
      • sysctl
      • -w
      • vm.max_map_count=262144
        image: busybox:1.27.2
        imagePullPolicy: IfNotPresent
        name: init-sysctl
        resources:
        limits:
        cpu: 50m
        memory: 50Mi
        requests:
        cpu: 50m
        memory: 50Mi
        securityContext:
        privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: operator
        serviceAccountName: operator
        terminationGracePeriodSeconds: 30
        volumes:
    • configMap:
      defaultMode: 420
      items:
      • key: elasticsearch.yml
        path: elasticsearch.yml
        name: es-config
        name: elasticsearch-config
        updateStrategy:
        type: OnDelete
        volumeClaimTemplates:
  • metadata:
    annotations:
    volume.beta.kubernetes.io/storage-class: fast
    creationTimestamp: null
    name: es-storage
    spec:
    accessModes:
    • ReadWriteOnce
      dataSource: null
      resources:
      requests:
      storage: 100Gi
      storageClassName: fast
      volumeMode: Filesystem
      status:
      phase: Pending
      status:
      collisionCount: 0
      currentReplicas: 5
      currentRevision: es-data-zalando-64849b74dc
      observedGeneration: 11
      readyReplicas: 5
      replicas: 5
      updateRevision: es-data-zalando-64849b74dc
      updatedReplicas: 5

from es-operator.

mikkeloscar avatar mikkeloscar commented on August 22, 2024

I suggest you look at the logs of the pods and the events (e.g. describe pod) to get an idea of what is wrong. You are welcome to post it here if you don't see anything obvious then I may be able to understand what could be the problem.

from es-operator.

saurabh24292 avatar saurabh24292 commented on August 22, 2024

Also, this is the pod logs for crashing data nodes:

[2019-09-25T13:56:47,074][ERROR][i.n.u.c.D.rejectedExecution] [es-data-zalando-0] Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
    at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:867) ~[netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328) ~[netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321) ~[netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:778) ~[netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:768) [netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:432) [netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:112) [netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1017) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:825) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1066) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:309) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at org.elasticsearch.transport.netty4.Netty4TcpChannel.sendMessage(Netty4TcpChannel.java:139) [transport-netty4-client-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport.internalSendMessage(TcpTransport.java:760) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:847) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:817) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:64) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FileChunkTransportRequestHandler.messageReceived(PeerRecoveryTargetService.java:598) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FileChunkTransportRequestHandler.messageReceived(PeerRecoveryTargetService.java:567) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1288) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.6.0.jar:6.6.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
[2019-09-25T13:56:47,085][WARN ][o.e.t.TcpTransport       ] [es-data-zalando-0] send message failed [channel: Netty4TcpChannel{localAddress=0.0.0.0/0.0.0.0:9300, remoteAddress=/10.20.22.100:39742}]
org.elasticsearch.transport.TransportException: Cannot send message, event loop is shutting down.
    at org.elasticsearch.transport.netty4.Netty4TcpChannel.sendMessage(Netty4TcpChannel.java:142) [transport-netty4-client-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport.internalSendMessage(TcpTransport.java:760) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:847) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:817) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:64) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FileChunkTransportRequestHandler.messageReceived(PeerRecoveryTargetService.java:598) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$FileChunkTransportRequestHandler.messageReceived(PeerRecoveryTargetService.java:567) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1288) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [elasticsearch-6.6.0.jar:6.6.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.6.0.jar:6.6.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
[2019-09-25T13:56:47,088][INFO ][o.e.n.Node               ] [es-data-zalando-0] closed

from es-operator.

saurabh24292 avatar saurabh24292 commented on August 22, 2024

I tried a bunch of things but the data node pods keep crashing at 100 qps. Below are the pod logs:
Warning Unhealthy 32m (x4 over 32m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Readiness probe failed: Get http://10.48.5.6:9200/_cat/master: dial tcp 10.48.5.6:9200: connect: connection refused
Normal Killing 31m kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Killing container with id docker://elasticsearch:Need to kill Pod
Normal Started 31m (x3 over 5h14m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Started container
Warning BackOff 31m (x2 over 31m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Back-off restarting failed container
Normal Created 31m (x3 over 5h14m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Created container
Normal Started 31m (x3 over 5h14m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Started container
Normal Pulled 31m (x3 over 5h14m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Container image "docker.elastic.co/elasticsearch/elasticsearch-oss:6.6.0" already present on machine
Warning Unhealthy 30m (x4 over 31m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Readiness probe failed: Get http://10.48.5.7:9200/_cat/master: dial tcp 10.48.5.7:9200: connect: connection refused
Normal Pulled 13m (x5 over 5h14m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Container image "busybox:1.30" already present on machine
Normal SandboxChanged 13m (x4 over 32m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Pod sandbox changed, it will be killed and re-created.
Normal Created 13m (x5 over 5h14m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Created container
Warning Unhealthy 12m (x4 over 13m) kubelet, gke-es-cluster-1-es-pool-68748ff7-db2z Readiness probe failed: Get http://10.48.5.9:9200/_cat/master: dial tcp 10.48.5.9:9200: connect: connection refused

Hoping to get some help here

from es-operator.

otrosien avatar otrosien commented on August 22, 2024

It is most likely either a memory or config issue on the ES nodes. Can you also provide the elasticsearch.yml configmap, memory settings/JAVA_OPTS and the java version you're using? If it's java8 I would strongly advise to upgrade because of improved docker container integration.

from es-operator.

otrosien avatar otrosien commented on August 22, 2024

I'm closing this ticket because it's surely not an ES-Operator issue but an issue how you set up and scale the Elasticsearch container itself.

from es-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.