Giter Site home page Giter Site logo

kubernetes-csi / livenessprobe Goto Github PK

View Code? Open in Web Editor NEW
68.0 68.0 90.0 23.85 MB

A sidecar container that can be included in a CSI plugin pod to enable integration with Kubernetes Liveness Probe.

License: Apache License 2.0

Makefile 10.03% Go 10.47% Dockerfile 0.55% Shell 69.59% Python 9.37%
k8s-sig-storage

livenessprobe's People

Contributors

andrewsykim avatar andyzhangx avatar chrishenzie avatar darkowlzz avatar dependabot[bot] avatar dobsonj avatar humblec avatar jiawei0227 avatar jsafrane avatar k8s-ci-robot avatar lpabon avatar madhu-1 avatar mauriciopoppe avatar morrislaw avatar mowangdk avatar msau42 avatar mucahitkurt avatar namrata-ibm avatar nettoclaudio avatar pohly avatar raunakshah avatar saad-ali avatar sbezverk avatar scuzhanglei avatar sneha-at avatar spiffxp avatar stefansedich avatar sunnylovestiramisu avatar verult avatar xing-yang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

livenessprobe's Issues

support exec style liveness probes in addition to http

I have scenarios where either or both of the node and controller services run in the hostNetwork namespace. Combined with the possibility of multiple deployments it means I need to eat through host ports (and reserve them) which is less than ideal.

What would be great is an ability to use the exec style probes and simply have the app connect to the uds and exit as appropriate.

High risk vulnerability with v2.9.0

Hi Team,

We have one High risk vunerability with v2.9.0

golang.org/x/net version v0.4.0 has 1 vulnerability

image

Can you please help us by fixing this.
With this, livenessprobe will be vulnerability free.

Thank you.

I‘d like to understand your release strategy.

Since last release was released in May, and many CVE mitigation PRs has been merged. But how often would you release a new version of livenessprobe? Like when we can get a newly release that contains the fixes?
Thank you!

Avoid CVEs with cron! (One simple trick, etc)

Hello, as mentioned in #135, it seems like this container is (for practical purposes) rarely free of CVEs with a high or critical score. Whether or not they are vulnerable is another story. When referenced artifacts get flagged, it adds significant developer workload to folks who do not know the source of projects like this to dismiss the flags. This is exacerbated when there is a new release and the process of dismissing flags starts anew.

It would be far easier if dependent builds could automatically upgrade to the latest versions of components like this rather than getting involved in deep analysis or improperly dismissing vulnerabilities with a broad (and incorrect) assessment.

I believe the prow configuration for this component is using the golang tag with no bugfix (ie 1.18). A build referencing that tag will pull the latest version of 1.18 on the date of the build.

It follows that this project could be automatically avoid CVEs in older versions of Golang by simply releasing the artifact once a month with an incremented bugfix release. The golang tag will move to the latest version and all will be well. Orgs like ours that use compliance scanning as a blunt force instrument will be satisfied, etc.

Security should be maintained since release branches are used. Merges from master would leverage human review, merges to master from contributions would be ignored unless merged.

@pohly, does this seem sane to you? If it becomes useful, it's a pattern that could be propagated to similar projects as well.

Facing vulnerability issues with golang, net

Facing vulnerability issues with golang, net since these are outdated/older

               
      livenessprobe/6233d3ca658768ca9c3 9e8ad55f01f84adff930c/sha256__32ca3d8516d3b0 cd0311d54a49ea0616f4964c49e38b23f1d88f215e9 3fe1b7d.tar.gz/livenessprobe/golang.org/x/net golang.org/x/net < 0.7.0 0.7.0 2023-03-10T03:31:0 4Z
               
               

current env:
k8s.gcr.io/sig-storage/livenessprobe: 2.9.0
EKS 1.22

are there upcoming releases are going have fixes for these vulnerabilities

Memory leak in v2.1.0

This is a slice of the average memory usage plot for the liveness probe container to illustrate the issue (Y-axis in MiB):
image

We had been running the driver with a short 2 second liveness check interval to increase our chances discovering any issues with it (a new cluster and CSI is an uncharted territory for us), racking up ~1.5GiB of memory usage per liveness probe container in around two weeks.

Otherwise the liveness probe chugs along happily with no anomalies in logs or other metrics for the probe itself or any of the related workloads.

CVE-2021-39293

Hello,

We are trying to use this image but our vulnerability scanner detected CVE-2021-39293 as a "High":

A vulnerability was found in archive/zip of the Go standard library. Applications written in Go can panic or potentially exhaust system memory when parsing malformed ZIP files. An attacker capable of submitting a crafted ZIP file to a Go application using archive/zip to process that file could cause a denial of service via memory exhaustion or panic. This particular flaw is an incomplete fix for a previous flaw.

Would it be possible to build this image using an updated version of Go? Per the vulnerability report, this is fixed in: 1.17.1, 1.16.8.

golang/go#47801

This also affects the "node-driver-registrar" and "csi-driver-smb" images we are trying to use (as I'm sure any other images built using the same version of Go). I can open issues with those images as well if needed:

https://github.com/kubernetes-csi/node-driver-registrar
https://github.com/kubernetes-csi/csi-driver-smb

Thank you.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Broken Link of `contributor cheat sheet` need to fix

Bug Report

I have observed same kind of issue in various kubernetes-csi project.
this happens because after the localization there are too much modifications done in the various directories.
I have observed same issue in this page also.

It has one broken link of the contributes cheat sheet which needs to fix.
I will try to look in further csi repo as well and try to fix it as soon as I can

/kind bug
/assign

panic: runtime error: invalid memory address or nil pointer dereference

Hi,

I've a CSI driver, implemented according to CSI standard. My node server includes liveness probe (v2.8.0) as sidecar. After normal execution for some hours I got (twice) the following:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x857ceb]goroutine 130244 [running]:
google.golang.org/grpc.(*ClientConn).Close(0x0)
/home/esepadm/jenkins-exclusive-executor/workspace/SEPReleaseLivenessProbe/3pp/vendor/google.golang.org/grpc/clientconn.go:1017 +0x4b
main.acquireConnection.func1()
/home/esepadm/jenkins-exclusive-executor/workspace/SEPReleaseLivenessProbe/3pp/cmd/livenessprobe/main.go:103 +0x16a
created by main.acquireConnection
/home/esepadm/jenkins-exclusive-executor/workspace/SEPReleaseLivenessProbe/3pp/cmd/livenessprobe/main.go:97 +0x1a5

This causes node server daemonset pod restart.
Can you help?

Best regards,
Antonio Vitiello

livenessprobe consumes too much memory

I ran the CSI driver with liveness probe for several days and the liveness probe memory usage turned to be 765MB which is abnormal compared to other csi sidecars.

The follow is a screenshot from htop:

Screen Shot 2019-04-14 at 5 39 16 PM

Another side affect is my CSI driver turned to be crashing a lot more after liveness probe is enabled:

NAMESPACE     NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system   dns-controller-64db5996cd-wvrmv                         1/1     Running   0          6d7h
kube-system   ebs-csi-controller-0                                    6/6     Running   36         5d18h
kube-system   ebs-csi-node-djvrx                                      3/3     Running   20         5d18h
kube-system   ebs-csi-node-g2c2k                                      3/3     Running   9          5d18h

Here is my liveness probe config:

          livenessProbe:
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            timeoutSeconds: 3
            periodSeconds: 10
            failureThreshold: 5

livenessprobe:v2.1.0 image has VA issues

following is the VA issues in livenessprobe:v2.1.0 image

DLA-2424-1
   Policy Status
   Active
   Summary
   tzdata, the time zone and daylight-saving time data, has been updated to the latest version.
  - Revised predictions for Morocco's changes starting in 2023. - Macquarie Island has stayed in sync with Tasmania since 2011. - Casey, Antarctica is at +08 in winter and +11 in summer since 2018. - Palestine ends DST earlier than predicted, on 2020-10-24. - Fiji starts DST later than usual, on 2020-12-20.
   Vendor Security Notice IDs   Official Notice   
   DLA-2424-1                   https://lists.debian.org/debian-lts-announce/2020/10/msg00037.html   
   Affected Packages   Policy Status   How to Resolve                        Security Notice   
   tzdata              Active          Upgrade tzdata to >= 2020d-0+deb9u1   DLA-2424-1 

Observing many logs in the format "Connecting to %s"

We are using livenessprobe v2.2.0 and see many logs in stderr that look like this:

1 connection.go:153] Connecting to unix:///csi/csi.sock

I tracked the line of code that generates these log messages to the golang connection library:

klog.V(5).Infof("Connecting to %s", address)

Source: https://github.com/kubernetes-csi/csi-lib-utils/blob/v0.9.1/connection/connection.go#L153 . Although the V(5) was added only 3 months ago, 29/Dec/20 in kubernetes-csi/csi-lib-utils@75fbafd.

We configure the livenessprobe sidecar like this:

      - args:
        - --csi-address=/csi/csi.sock
        - --health-port=9808
        - --v=3
        image: k8s.gcr.io/sig-storage/livenessprobe:v2.2.0
        imagePullPolicy: IfNotPresent
        name: liveness-probe
        resources:
          limits:
            cpu: 50m
            memory: 32Mi
          requests:
            cpu: 10m
            memory: 8Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir

My understanding was that the V(5) in the Connection logging whould mean that if we used --v=5 we would see the log, but since we use --v=3, we shouldn't see it.

Are we doing something wrong? or do we simply need to wait until v2.3 to pickup the new V(5) added to the connection library logging?


I noticed in the v2.2 CHANGELOG, there was a PR to reduce the default log level of the livenessprobe-sidecar to '4' (#88).


Currently we get about 20,000 logs like this per hour, which is wasting space in ElasticSearch. As a workaround we can filter out these logs with a logstash filter, for example:

    filter {
      if [kubernetes][container][name] == "liveness-probe" {
        if "Connecting to unix:///csi/csi.sock" in [message] {
          drop{}
        }
      }
    }

use nonroot user in Dockerfile

there is a security warning produced by twistlock that this livenessprobe image should use nonroot user, while if I change this Dockerfile as following (andyzhangx@42fc328), liveness probe would failed finally, not sure what's the right fix to make this image use nonroot user:

FROM gcr.io/distroless/static:nonroot
LABEL maintainers="Kubernetes Authors"
LABEL description="CSI Driver liveness probe"
ARG binary=./bin/livenessprobe

COPY ${binary} /livenessprobe
USER nonroot:nonroot
ENTRYPOINT ["/livenessprobe"]

failed events:

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  3m23s                default-scheduler  Successfully assigned kube-system/csi-smb-node-z54xp to aks-agentpool-90924120-vmss000006
  Normal   Pulling    3m23s                kubelet            Pulling image "andyzhangx/livenessprobe:v2.12.0"
  Normal   Created    3m22s                kubelet            Created container node-driver-registrar
  Normal   Created    3m22s                kubelet            Created container liveness-probe
  Normal   Started    3m22s                kubelet            Started container liveness-probe
  Normal   Pulled     3m22s                kubelet            Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1" already present on machine
  Normal   Pulled     3m22s                kubelet            Successfully pulled image "andyzhangx/livenessprobe:v2.12.0" in 858.088069ms (858.101669ms including waiting)
  Normal   Started    3m22s                kubelet            Started container node-driver-registrar
  Warning  Unhealthy  23s (x5 over 2m23s)  kubelet            Liveness probe failed: Get "http://10.224.0.255:29643/healthz": dial tcp 10.224.0.255:29643: connect: connection refused
  Normal   Killing    23s                  kubelet            Container smb failed liveness probe, will be restarted
  Normal   Pulled     22s (x2 over 3m22s)  kubelet            Container image "gcr.io/k8s-staging-sig-storage/smbplugin:canary" already present on machine
  Normal   Created    22s (x2 over 3m22s)  kubelet            Created container smb
  Normal   Started    22s (x2 over 3m22s)  kubelet            Started container smb
root@andydev:~/go/src/github.com/kubernetes-csi/livenessprobe# k logs csi-smb-node-z54xp  -n kube-system liveness-probe
W0206 13:22:14.691040       1 connection.go:234] Still connecting to unix:///csi/csi.sock
W0206 13:22:24.690443       1 connection.go:234] Still connecting to unix:///csi/csi.sock
W0206 13:22:34.691048       1 connection.go:234] Still connecting to unix:///csi/csi.sock
W0206 13:22:44.691010       1 connection.go:234] Still connecting to unix:///csi/csi.sock

Rebuild with golang v1.18.6 or higher

Hi,

as mentioned in this issue, we saw some CVEs in the latest realeas due to the usage of go 1.18.

Is there and chance that the livenessprobe also gets updated to a later go version?

Thanks!

v2.2.0-eks-1-18-5 has 1 High + 15 others vulnerabilities

Good afternoon,

I pulled and pushed v2.2.0-eks-1-18-5 into an ECR repository in my personal account, and I noticed it has 1 High + 15 others vulnerabilities. I see this also happens for v2.2.0-eks-1-20-1.

Some of these vulnerabilities are:

  • ALAS2-2021-1655 (High)
  • ALAS2-2021-1653 (Medium)
  • ALAS2-2021-1656 (Medium)

Would it be possible to release a new image anytime soon that addresses these vulnerabilities? Would you like me to take a look at this myself and submit a PR?

Thanks!

No Windows 2004 image available

Hi

According to curl -L https://mcr.microsoft.com/v2/oss/kubernetes-csi/livenessprobe/tags/list there is no image other than the 1809 kernel for Windows available.

It would make sense to support the versions according to the official windows-os-version-support

Probe requests still reporting ready status even when socket file doesn't exist anymore

Description
Once established a connection to CSI driver's identity server, the livenessprobe server will not attempt to reconnect again - until a restart occurs. We rely on this persistent connection to dispatch the Probe calls to the identity server.

A side effect of this approach comes when the socket file is removed (deliberately or not) shortly after the first established connection. Under these conditions, the probe requests reach the CSI driver's identity server and may return a healthy status as long as this connection is open.

Components will not succeed to open new connections to this CSI driver, leading to a stuck scenario. For example, the kubelet won't contact the CSI driver's node server about NodePublishVolume calls, causing pending pods forever (until a human intervention).

What is expected
Whether Unix Domain Socket file does not exist anymore, requests to /healthz should return a not ready/unhealthy status.

livenessprobe fails for controller only CSI driver

livenessprobe can be integrate to CSI driver which only implement Controller and Identity service, but no NodeServer service.

The following is making rpc call which is being implemented in NodeServer. We should make liveness probe universally common for CSI driver to only depend on Identity Service rpc calls.

https://github.com/kubernetes-csi/livenessprobe/blob/master/cmd/main.go#L56:
csiDriverNodeID, err := csiConn.NodeGetId(ctx)

Here is the log when integrated it to controller:

I1030 04:17:27.965740 1 main.go:109] Serving requests to /healthz on: 0.0.0.0:9809
I1030 04:17:40.913574 1 main.go:82] Request: /healthz from: 172.17.0.1:40302
I1030 04:17:40.913607 1 main.go:72] Attempting to open a gRPC connection with: /var/lib/csi/sockets/pluginproxy/csi.sock
I1030 04:17:40.913624 1 connection.go:70] Connecting to /var/lib/csi/sockets/pluginproxy/csi.sock
I1030 04:17:40.936761 1 connection.go:97] Still trying, connection is CONNECTING
I1030 04:17:40.937903 1 connection.go:94] Connected
I1030 04:17:40.937922 1 main.go:47] Calling CSI driver to discover driver name.
I1030 04:17:40.937933 1 connection.go:150] GRPC call: /csi.v0.Identity/GetPluginInfo
I1030 04:17:40.937943 1 connection.go:151] GRPC request:
I1030 04:17:40.950593 1 connection.go:153] GRPC response: name:"com.mapr.csi-kdf" vendor_version:"0.3.0"
I1030 04:17:40.950671 1 connection.go:154] GRPC error:
I1030 04:17:40.950680 1 main.go:52] CSI driver name: "com.mapr.csi-kdf"
I1030 04:17:40.950690 1 main.go:55] Calling CSI driver to discover node ID.
I1030 04:17:40.950699 1 connection.go:150] GRPC call: /csi.v0.Node/NodeGetId
I1030 04:17:40.950706 1 connection.go:151] GRPC request:
I1030 04:17:40.951186 1 connection.go:153] GRPC response:
I1030 04:17:40.951226 1 connection.go:154] GRPC error: rpc error: code = Unimplemented desc = unknown service csi.v0.Node
I1030 04:17:40.951289 1 main.go:95] Health check failed with: rpc error: code = Unimplemented desc = unknown service csi.v0.Node.

@sbezverk As discussed in #wg-csi, Let me know if there are additional information required for it?

memory leak on release-0.4

The livenessprobe will create new grpc connection every request and never close it.

func getCSIConnection() (connection.CSIConnection, error) {
// Connect to CSI.
glog.Infof("Attempting to open a gRPC connection with: %s", *csiAddress)
csiConn, err := connection.NewConnection(*csiAddress, *connectionTimeout)

Fitlering out the health check begin/succeded logs

Sending probe request to CSI driver "efs.csi.aws.com"
Health check succeeded

Currently these are noisy and I wish to filter them out, would there be opposition to adding an V(5) to these log lines so that one can use something like --v=3 to filter them out?

support JSON log format for liveness-probe

  • Provide a flag to enable JSON log formatting.

We do this today for CSI driver implementations but the same can't be configured for liveness-probe. This can be done by adding log-format-json=true flag to the liveness-probe and setting the log format based on that.

need version compatibility for windows 2022

in the repo dockerfile.windows
there is base, addon image is windwos images "1809"

Images created in this way do not seem to work on Windows 2022 servers due to version compatibility issues.

CIS issues in mcr.microsoft.com/oss/kubernetes-csi/livenessprobe:v2.10.0

Our Qualys scans are showing the following docker CIS benchmark issues for mcr.microsoft.com/oss/kubernetes-csi/livenessprobe:v2.10.0

SEVERITY STATEMENT REMEDIATION
SERIOUS Status of the ADD instructions in Dockerfile Use COPY rather than ADD instructions in Dockerfiles.
MEDIUM Status of the HEALTHCHECK setting for the Docker Images Follow Docker documentation and rebuild the Docker Images with HEALTHCHECK instruction.

livenessprob 0.4.2 image

The latest image for 0.4.1 doesn't include the following fix: #18

Can we cut a 0.4.2 csi livenessprobe image which has above fix?

New csi-lib-utils/connection.Connect logic can cause permanent CSI plugin outage

The livenessprobe code expects to try forever to connect with the CSI plugin via csi.sock on startup.

csiConn, err := acquireConnection(context.Background(), metricsManager)
if err != nil {
// connlib should retry forever so a returned error should mean
// the grpc client is misconfigured rather than an error on the network
klog.Fatalf("failed to establish connection to CSI driver: %v", err)
}

However, this commit recently picked up a change in csi-lib-utils that returns an error after only 30 seconds.

According to the associated PR, the goal was to avoid a deadlock in which node-driver-registrar failed permanently to connect to a CSI plugin because it was referencing an old file descriptor.

In this analysis, I described a situation in which this new behavior caused a permanent outage of the Longhorn CSI plugin. Details are there, but essentially:

  • The CSI plugin fails to start for an ephemeral reason and enters a CrashLoopBackOff.
  • livenessprobe fails to connect and enters a CrashLoopBackOff.
  • Eventually, the CSI plugin can start successfully. Since livenessprobe is not running at that time, kubelet kills it, increasing the backoff.
  • Every time livenessprobe starts, the CSI plugin is waiting in backoff, so livenessprobe crashes, increasing the backoff.

IMO, livenessprobe's previous behavior was correct. It should not crash unless it is misconfigured so it is always available to answer kubelet's liveness probes.

Assuming the csi-lib-utils change was necessary, my thinking is that we should recognize the timeout error in livenessprobe and ignore it during initialization. However, I'm not I understand the exact cause of kubernetes-csi/csi-lib-utils#131. Maybe this could similarly lead to a liveness probe stuck permanently in initialization?

cc @ConnorJC3 from the csi-lib-utils PR for any thoughts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.