Giter Site home page Giter Site logo

ashmere / node-problem-detector Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kubernetes/node-problem-detector

0.0 0.0 0.0 2.46 MB

This is a place for various problem detectors running on the Kubernetes nodes.

License: Apache License 2.0

Makefile 3.13% Shell 2.79% Go 94.07%

node-problem-detector's Introduction

node-problem-detector

node-problem-detector aims to make various node problems visible to the upstream layers in cluster management stack. It is a DaemonSet detecting node problems and reporting them to apiserver. Now it is running as a Kubernetes Addon enabled by default in the GCE cluster.

Background

There are tons of node problems could possibly affect the pods running on the node such as:

  • Hardware issues: Bad cpu, memory or disk;
  • Kernel issues: Kernel deadlock, corrupted file system;
  • Container runtime issues: Unresponsive runtime daemon;
  • ...

Currently these problems are invisible to the upstream layers in cluster management stack, so Kubernetes will continue scheduling pods to the bad nodes.

To solve this problem, we introduced this new daemon node-problem-detector to collect node problems from various daemons and make them visible to the upstream layers. Once upstream layers have the visibility to those problems, we can discuss the remedy system.

Problem API

node-problem-detector uses Event and NodeCondition to report problems to apiserver.

  • NodeCondition: Permanent problem that makes the node unavailable for pods should be reported as NodeCondition.
  • Event: Temporary problem that has limited impact on pod but is informative should be reported as Event.

Problem Daemon

A problem daemon is a sub-daemon of node-problem-detector. It monitors a specific kind of node problems and reports them to node-problem-detector.

A problem daemon could be:

  • A tiny daemon designed for dedicated usecase of Kubernetes.
  • An existing node health monitoring daemon integrated with node-problem-detector.

Currently, a problem daemon is running as a goroutine in the node-problem-detector binary. In the future, we'll separate node-problem-detector and problem daemons into different containers, and compose them with pod specification.

List of supported problem daemons:

Problem Daemon NodeCondition Description
KernelMonitor KernelDeadlock A problem daemon monitors kernel log and reports problem according to predefined rules.

Usage

Build Image

Run make in the top directory. It will:

  • Build the binary.
  • Build the docker image. The binary and config/ are copied into the docker image.
  • Upload the docker image to registry. By default, the image will be uploaded to gcr.io/google_containers. It's easy to modify the Makefile to push the image to another registry

Start DaemonSet

  • Create a file node-problem-daemon.yaml with the following yaml.
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-problem-detector
spec:
  template:
    spec:
      containers:
      - name: node-problem-detector
        image: gcr.io/google_containers/node-problem-detector:v0.2
        imagePullPolicy: Always
        securityContext:
          privileged: true
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        volumeMounts:
        - name: log
          mountPath: /log
          readOnly: true
      volumes:
      - name: log
        # Config `log` to your system log directory
        hostPath:
          path: /var/log/
  • Edit node-problem-detector.yaml to fit your environment: Set log volume to your system log diretory. (Used by KernelMonitor)
  • Create the DaemonSet with kubectl create -f node-problem-detector.yaml
  • If needed, you can use ConfigMap to overwrite the config/.

Links

node-problem-detector's People

Contributors

adohe avatar dchen1107 avatar euank avatar random-liu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.