Giter Site home page Giter Site logo

coredump-detector's Introduction

k8s-coredump-detector


K8s-coredump-detector is an open source tool for managing core dump feature in kubernetes. It enables stable, controllable core dump feature when jobs inside k8s containers were crashed. This feature mainly generate, store, distribute core files which generated by the apps inside pods, it could ease investigation of crashed application in multi-tenancy k8s environment. And tenant could download this core files like they are using any other original k8s functions.


Motivation and Goals

As we know, bugs are inevitable in software development, some bugs could be solved by investigating app logs, but some serious bugs like deep null point exception is very hard to debug without core files.However, Kubernetes has no mechanism to manage core files when job inside pods crashed.

This feature mainly collects, stores, distributes core files which generated by the apps inside pods. It supports any filesystem storage as backend storage, and it embeds the K8s' own authorization mechanism like RBAC to control tenant authorities. Tenants could download those core files they want like they are using any other original K8s resources.    

Design

 

Components

Backend storage

The bakcend storage is an independent filesystem storage to store core files. This storage could be either ceph filesystem storage, nfs storage or any other filesystem storage. For test purpose, you can also use local host[https://kubernetes.io/docs/concepts/storage/volumes/#hostpath] storage to store core files.

Coredump DaemonSet

To control core files generation behavior, a DaemonSet is necessary which would launch each node an admin-pod. Those pods make sure everything will go as we expected when job crashed in other work pods. Each pod will execute following steps:

a. Copy a exectuable file called hanlder to host, it be invoked when any process crashed in that host. It will distinguish if the crashed job was from a pod. If so, it collects realted information and store core file to backend storage.

b. Modify && maintain core_pattern settings on each node which usually exist as a file located in /proc/sys/kernel/core_pattern. This setting is for control the behavior when process crashed. In our case, the admin-pod will modify it so the handler will be invoked.

For detail information, please see https://github.com/fenggw-fnst/coredump-node-detector

Coredump Apiserver

This component is the bridge between backend storge and users. In nature it is a aggregation api layer which is an offical mechanism to implement users’ own business logical. It contains service, self-defined api-server running in pod and etcd storage. It will register APIs and Objects to k8s cluster. Users should download core files by those APIs and Objects.

Group Version Kind Subresource
coredump.fujitsu.com v1alpha1 CoredumpEndpoint dump

For a typical case, user wants to download all core files generated by container test-container in Pod test-namespace/test-pod. User should do like this:

kubectl get --raw=/apis/coredump.fujitsu.com/v1alpha1/namespaces/test-namespace/coredumpendpoints/test-pod/dump?container=test-container>>coredump.tar.gz

tar -zxvf coredump.tar.gz -C ./coredump-files

If everyting works as expected, user could observe all core files in coredump-files.

 

Flow

Core file generation

The core file generation part generates core files when job inside containers crashed. In each node that supports core dump feature, a admin-pod will be deployed by DaemonSet to handle this job. Each admin-pod inject an executable file called handler into node, it also modifies[core-pattern] to let kernel call that handler to store core files into backend storage.

Downloding core files

An aggregation api layer will register a self-defined API called coredump.fujitsu.com. This api is a bridge between backend storage and users. User could download core files by this api. Admin can control users' access to core files by native way like RBAC, ABAC.

 

Warning

The core_pattern would be modified to let our components handle core dump events.

The k8s cluster must boot with allow-privileged option enabled.

   

Deployment

From script

Please See deploy.md

From source code

TBD

Test

After deploying all the components successfully, you could test the function is working by test script

   

Download core files

This section gives examples of how to download corefiles after coredump detector deployed. The coredumpendpoint_template.yaml file under test floder will be used.

From exist pod

Suppose users want to download core files from a container called test-container in exist pod default/test-pod, they should do like:

cat test/coredumpendpoint_template.yaml |sed "s/__NAMESPACE__/default/g" | sed "s/__NAME__/test-pod/g"| kubectl create -f -
kubectl get --raw=/apis/coredump.fujitsu.com/v1alpha1/namespaces/default/coredumpendpoints/test-pod/dump?container=test-container>>coredump.tar.gz

From non-exist pod

When users want to download core files from a pod that has been deleted, pod's uid must be provided. Suppose users want to download core files from a container called test-container in exist pod default/test-pod, and the uid of that pod is 1234-5678, they should do like:

cat test/coredumpendpoint_template.yaml |sed "s/__NAMESPACE__/default/g" | sed "s/__NAME__/test-pod/g"| sed "s/__UID__/1234-5678/g"|kubectl create -f -
kubectl get --raw=/apis/coredump.fujitsu.com/v1alpha1/namespaces/default/coredumpendpoints/test-pod/dump?container=test-container>>coredump.tar.gz

coredump-detector's People

Contributors

wanlinghao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.