Giter Site home page Giter Site logo

Comments (26)

rpiceage avatar rpiceage commented on July 19, 2024 1

@edwarnicke It's up and running.
Thanks for the great support, we can close this issue. :)

from cmd-nsmgr.

denis-tingaikin avatar denis-tingaikin commented on July 19, 2024

Hello @rpiceage

NSMgr and other endpoints using "google.golang.org/grpc/health/grpc_health_v1" server. So please consider using google.golang.org/grpc/health/grpc_health_v1 client to solve the issue.

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

Hi @denis-tingaikin
Thanks for the quick answer and the info. We will look into this.

from cmd-nsmgr.

denis-tingaikin avatar denis-tingaikin commented on July 19, 2024

@rpiceage Be free to reopen this if the problem will actual :)

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

Hi,
I'm trying to use grpc-health-probe as client to the grpc_health_v1 server in the nsmgr.
https://github.com/grpc-ecosystem/grpc-health-probe

I put the binary in the docker image of the nsmgr, and tried to use that for K8s liveness and readiness probes, but I cannot reach the grpc server of nsmgr either on the containerPort 5001, or using the unix socket /var/lib/networkservicemesh/nsm.io.sock.
I tried to configure TLS for the connection, with no success.

My probes look something like this:

      readinessProbe:
        exec:
          command: ["/bin/grpc_health_probe", "-addr=:5001", "-tls", "-tls-no-verify"]
        initialDelaySeconds: 15
      livenessProbe:
        exec:
          command: ["/bin/grpc_health_probe", "-addr=:5001", "-tls", "-tls-no-verify"]
        initialDelaySeconds: 20

Do you have any idea what might be the problem? What URL is to be used for communication with the grpc server?

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

One note on this issue. Right now out of the box every endpoint is exposing a GRPC liveliness probe:

https://github.com/networkservicemesh/sdk/blob/master/pkg/networkservice/chains/endpoint/server.go#L120-L124

There's pretty good documentation for adding grpc health probing:

https://kubernetes.io/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/

https://codeburst.io/kubernetes-grpc-services-and-probes-by-example-1cb611da45ab

I don't think it requires any code changes... but we may need to add the grpc health probe to our docker containers and update our yaml files to do so.

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

Hi,
Thank you for the answer.
My question was mainly about the configuration of the health probe client. I added the grpc-health-probe to the container and I tried to reach the GRPC server on ListenOn (unix:///var/lib/networkservicemesh/nsm.io.sock and also tcp://:5001) with no success, and it is really hard to debug why I could not connect.
If you add the probe to the container and provide a working example, that would be a really big help.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage Hmm... that feel suspiciously like it might be a TLS related issue... what kinds of errors are you getting when you try to connect?

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage I've also asked the question on the Spire slack: https://spiffe.slack.com/archives/C7XDP01HB/p1617189517055500

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

Thank you @edwarnicke
The error was a pretty generic message in the "kubectl describe pod" output:

Liveness probe failed: timeout: failed to connect service "unix:///var/lib/networkservicemesh/nsm.io.sock" within 1s

The message was the same in every case I tried to change the URL.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

Got it... seems even more likely to be the TLS thing... lets see what the Spire folks say :)

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage I'm working on a PR for grpc-health-probe that would be able to utilize spire:

https://github.com/edwarnicke/grpc-health-probe/tree/spiffe

I haven't had a chance to test it yet (hope to get to that today). I wanted you to have the opportunity to kick the tires if you so desire in case you get to it before I do :)

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

OK, so the grpc-health-probe had no support for spire...
I will try to build your branch and test it.
Thanks a lot @edwarnicke

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

Tried it but with no success, got the same not too informative error messages as before.
I tried using "-addr=/var/lib/networkservicemesh/nsm.io.sock" and "-addr=:5001", none of them worked.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage Thank you for trying... I'll go poke at it some more :)

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage I've pushed grpc-ecosystem/grpc-health-probe#63 and am having an interesting conversation about how to productively add such functionality.

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

@edwarnicke Thanks for your efforts.
Yes, I can understand their concern about SPIFFE integration. On the other hand maybe it would add considerable value to the heath-probe if it had support for frameworks like SPIFFE without additional steps.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage They've actually come back quite a bit more positively :) Also, I've tested with Spire in K8s with nsmgr and the probe as submitted in that PR does work nicely there :)

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

@edwarnicke thanks for the good news, I will try some more testing then.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage This is what worked for me:

          readinessProbe:
            exec:
              command: [ "/bin/grpc-health-probe", "-spiffe", "-addr=:5001" ]
            initialDelaySeconds: 5
          livenessProbe:
            exec:
              command: [ "/bin/grpc-health-probe", "-spiffe", "-addr=:5001" ]
            initialDelaySeconds: 10

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage This should now be working as of: networkservicemesh/deployments-k8s#1133

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

@edwarnicke Works like a charm, thanks :)
Any chance that this feature will be available in the vpp-forwarder? I saw that cmd-registry-memory is updated with it, it also works fine.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage VPP Forwarder doesn't expose any external ports, it only listens on a Unix File socket. Currently that unix file socket is created randomly (tempdir style). It could be made more deterministic and we could then use the same approach with grpc-health-probe. Would that meet the need?

from cmd-nsmgr.

rpiceage avatar rpiceage commented on July 19, 2024

@edwarnicke Yes, certainly. Many thanks.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage networkservicemesh/cmd-forwarder-vpp#170 and networkservicemesh/deployments-k8s#1178 should, when merged, give you readiness/liveliness for cmd-forwarder-vpp.

from cmd-nsmgr.

edwarnicke avatar edwarnicke commented on July 19, 2024

@rpiceage Both are now merged. Let me know if that meets your need for cmd-forwarder-vpp and readiness/liveliness :)

from cmd-nsmgr.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.