Giter Site home page Giter Site logo

Comments (22)

cheftako avatar cheftako commented on July 21, 2024 1

/assign

from apiserver-network-proxy.

cheftako avatar cheftako commented on July 21, 2024 1

Hi @Avatat, Given KAS <-segment 1-> Konnectivity Server <-segment 2-> Konnectivity Agent. Is segment 1 configured with http-connect, grpc over https or grpc over uds? You can find the answer in the /etc/srv/kubernetes/egress_selector_configuration.yaml file.

from apiserver-network-proxy.

Jefftree avatar Jefftree commented on July 21, 2024 1

Hello, looking at the logs it seems that the failure is all tied to one request (id=3). Currently, when a connection is broken and re-established while a data transmission is in progress, the entire request data needs to be retransmitted (the apiserver should handle most of these retries).

Do requests made specifically after the agents reconnect exhibit this behavior as well?

from apiserver-network-proxy.

cheftako avatar cheftako commented on July 21, 2024 1

Not sure if its the same issue. Trying to reproduce I improved our test infra (#124). Set to making two requests with a 10 second delay and restarted the agent between the first and second request. The client clearly gets stuck. Server log is

I0712 19:38:29.625672 1005356 main.go:349] Starting health server for healthchecks.
I0712 19:38:32.254969 1005356 server.go:426] Connect request from agent 67c5bcd6-2371-4e65-9c78-5405acaf2d93
I0712 19:38:32.255020 1005356 backend_manager.go:99] register Backend &{0xc000122000} for agentID 67c5bcd6-2371-4e65-9c78-5405acaf2d93
I0712 19:38:43.613487 1005356 server.go:204] proxy request from client, userAgent [test-client grpc-go/1.27.0]
I0712 19:38:43.613751 1005356 server.go:237] start serving frontend stream
I0712 19:38:43.615792 1005356 server.go:248] >>> Received DIAL_REQ
I0712 19:38:43.615867 1005356 backend_manager.go:168] pick agentID=67c5bcd6-2371-4e65-9c78-5405acaf2d93 as backend
I0712 19:38:43.616228 1005356 server.go:268] >>> DIAL_REQ sent to backend
I0712 19:38:43.621304 1005356 server.go:477] <<< Received DIAL_RSP(rand=5577006791947779410), agentID 67c5bcd6-2371-4e65-9c78-5405acaf2d93, connID 1)
I0712 19:38:43.621683 1005356 server.go:140] register frontend &{grpc 0xc00037f410 <nil> 0xc0000a61e0 1 67c5bcd6-2371-4e65-9c78-5405acaf2d93} for agentID 67c5bcd6-2371-4e65-9c78-5405acaf2d93, connID 1
I0712 19:38:43.626176 1005356 server.go:286] >>> Received 102 bytes of DATA(id=1)
I0712 19:38:43.626384 1005356 server.go:301] >>> DATA sent to backend
I0712 19:38:43.632061 1005356 server.go:506] <<< Received 261 bytes of DATA from agentID 67c5bcd6-2371-4e65-9c78-5405acaf2d93, connID 1
W0712 19:38:47.109384 1005356 server.go:459] stream read error: rpc error: code = Canceled desc = context canceled
I0712 19:38:47.109619 1005356 backend_manager.go:119] remove Backend &{0xc000122000} for agentID 67c5bcd6-2371-4e65-9c78-5405acaf2d93
I0712 19:38:47.109866 1005356 server.go:539] <<< Close backend &{0xc000122000} of agent 67c5bcd6-2371-4e65-9c78-5405acaf2d93
I0712 19:38:48.126811 1005356 server.go:426] Connect request from agent 137cf0a5-f0e4-4c39-b33d-30ae53134645
I0712 19:38:48.126843 1005356 backend_manager.go:99] register Backend &{0xc0003ae300} for agentID 137cf0a5-f0e4-4c39-b33d-30ae53134645
I0712 19:38:48.635625 1005356 server.go:286] >>> Received 102 bytes of DATA(id=1)
W0712 19:38:48.635909 1005356 server.go:299] >>> DATA to Backend failed: rpc error: code = Unavailable desc = transport is closing
I0712 19:38:48.635997 1005356 server.go:301] >>> DATA sent to backend

from apiserver-network-proxy.

Jefftree avatar Jefftree commented on July 21, 2024 1
$ kubectl logs -n kube-system metrics-server-7f6d95d688-4jm9g -f
Error from server: Get https://.../containerLogs/kube-system/metrics-server-7f6d95d688-4jm9g/metrics-server?follow=true: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /etc/kubernetes/konnectivity-server/konnectivity-server.socket: connect: no such file or directory"

If the UDS socket is missing that seems to be an issue with the konnectivity server, before even reaching the step to dial to the agent. Are you observing any restarts/crash loops/errors for the konnectivity server before you intervene manually?

Also noticed you have the -f flag for logs. Was this command ran after the agent reconnect or before?

$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

I was able to reproduce this and this one should be related to the bug described earlier. Will do some more testing to get a fix.

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

I have issues with HTTP and HTTPS protocols: kubernetes/kubernetes#92690

Now I use konnectivity-server and kube-apiserver in one pod together, and they use GRPC over UDS.

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

Yes. After agents restart (not only reconnect) I lost communication from KAS to cluster services behind Konnectivity tunnel.
After KAS+Konnectivity-server (they are in one pod) restart, everything backs to normal.

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

Looks very similar :)
If you prepare a fix, I would like to test it on my case, but I will need a Docker image.

from apiserver-network-proxy.

Jefftree avatar Jefftree commented on July 21, 2024

Per #108 we removed retrying when server -> agent connections fail. I think the case in both your scenarios is that the same tunnel (and dialer) is used when the first connection is established, as well as throughout the agent disconnect and reconnect phase. The dialer/tunnel will still map to the old agent since we don't do any reconciliation without the retry step. Using a new dialer/tunnel and retrying the request should be the fix, but existing connections have no understanding that their tunnels are broken.

@cheftako @caesarxuchao we should propagate an error to the client if too many backend.Send() errors pop up and either force the connection to be dropped or change the backend (https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/master/pkg/server/server.go#L297). This is especially an issue for long lived requests and connections with a large body of data since we don't have the notion of cancelling a request.

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

@Jefftree, thank you for the explanation!

Probably I don't understand how KAS works and do the requests, but the connection isn't working even when I try to do kubectl logs pod_name after konnectivity-agent restart. As I understand, KAS should make a new request/connection through Konnectivity tunnel, but it still can't.
Maybe KAS uses a single connection to do all requests?

from apiserver-network-proxy.

Jefftree avatar Jefftree commented on July 21, 2024

Hmm that's weird. KAS creates a new tunnel for each log request, so it's possible you're running into a different problem. Just confirming, this is GRPC over UDS?

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

I use below egress configuration:

apiVersion: v1
data:
  egress-selector-configuration.yaml: |
    apiVersion: apiserver.k8s.io/v1beta1
    kind: EgressSelectorConfiguration
    egressSelections:
    - name: cluster
      connection:
        proxyProtocol: GRPC
        transport:
          uds:
            udsName: /etc/kubernetes/konnectivity-server/konnectivity-server.socket
kind: ConfigMap
metadata:
  name: egress-config

KAS and konnectivity-server containers works in one pod and they share /etc/kubernetes/konnectivity-server directory (as emptyDir volume).

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

I observed one more thing:
After konnectivity-agent restart I'm getting:

$ kubectl logs -n kube-system metrics-server-7f6d95d688-4jm9g -f
Error from server: Get https://.../containerLogs/kube-system/metrics-server-7f6d95d688-4jm9g/metrics-server?follow=true: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /etc/kubernetes/konnectivity-server/konnectivity-server.socket: connect: no such file or directory"

and:

$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

After konnectivity-server container restart, kubectl logs ... is now timeouting, and kubectl top ... stays the same.

After KAS (hyperkube) container restart, everything backs to normal and both commands work.

from apiserver-network-proxy.

zanetworker avatar zanetworker commented on July 21, 2024

We saw similar issues, was not sure what causes it. Could this also be related to the serverCount parameter ?

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

@cheftako, @Jefftree, thank you for your effort!
I built konnectivity-server image with @Jefftree change ^ and I think that it solves the issue. Can't wait for the release with the fix :)

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

But still, it isn't perfect. After some time and few agent restarts, I'm getting:

kubectl --kubeconfig config_dar0908prod logs -n kube-system kube-proxy-tx9sp
Error from server: Get https://...:10250/containerLogs/kube-system/kube-proxy-tx9sp/kube-proxy: dial timeout

konnectivity-server log:

I0715 09:00:25.347103       1 server.go:300] >>> Received 52 bytes of DATA(id=1)
I0715 09:00:25.347148       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:25.370913       1 server.go:522] <<< Received 1070 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 1
I0715 09:00:25.492191       1 server.go:300] >>> Received 176 bytes of DATA(id=2)
I0715 09:00:25.492239       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:25.494067       1 server.go:522] <<< Received 106 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496732       1 server.go:522] <<< Received 201 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496825       1 server.go:522] <<< Received 1255 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496852       1 server.go:522] <<< Received 360 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496871       1 server.go:522] <<< Received 222 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496883       1 server.go:522] <<< Received 146 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496900       1 server.go:522] <<< Received 236 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496913       1 server.go:522] <<< Received 130 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496924       1 server.go:522] <<< Received 110 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496936       1 server.go:522] <<< Received 260 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496953       1 server.go:522] <<< Received 257 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496965       1 server.go:522] <<< Received 238 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496976       1 server.go:522] <<< Received 248 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.496990       1 server.go:522] <<< Received 594 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497004       1 server.go:522] <<< Received 475 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497019       1 server.go:522] <<< Received 576 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497032       1 server.go:522] <<< Received 1074 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497054       1 server.go:522] <<< Received 1012 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497067       1 server.go:522] <<< Received 502 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497083       1 server.go:522] <<< Received 506 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497096       1 server.go:522] <<< Received 1451 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497110       1 server.go:522] <<< Received 476 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497123       1 server.go:522] <<< Received 1895 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497142       1 server.go:522] <<< Received 476 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:25.497154       1 server.go:522] <<< Received 27 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 2
I0715 09:00:26.351907       1 server.go:300] >>> Received 53 bytes of DATA(id=1)
I0715 09:00:26.351974       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:26.366149       1 server.go:300] >>> Received 53 bytes of DATA(id=1)
I0715 09:00:26.366193       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:26.387205       1 server.go:522] <<< Received 248 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 1
I0715 09:00:26.388196       1 server.go:300] >>> Received 42 bytes of DATA(id=1)
I0715 09:00:26.388232       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:26.395401       1 server.go:522] <<< Received 225 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 1
I0715 09:00:26.396012       1 server.go:300] >>> Received 42 bytes of DATA(id=1)
I0715 09:00:26.396041       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:27.824330       1 server.go:300] >>> Received 52 bytes of DATA(id=1)
I0715 09:00:27.824384       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:27.873359       1 server.go:522] <<< Received 1070 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 1
I0715 09:00:30.269968       1 server.go:300] >>> Received 52 bytes of DATA(id=1)
I0715 09:00:30.270022       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:30.310916       1 server.go:522] <<< Received 1070 bytes of DATA from agentID 3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f, connID 1
I0715 09:00:30.927164       1 server.go:217] proxy request from client, userAgent [grpc-go/1.26.0]
I0715 09:00:30.927297       1 server.go:250] start serving frontend stream
I0715 09:00:30.927314       1 server.go:261] >>> Received DIAL_REQ
I0715 09:00:30.927321       1 backend_manager.go:174] pick agentID=3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f as backend
I0715 09:00:30.927381       1 server.go:282] >>> DIAL_REQ sent to backend
I0715 09:00:32.756866       1 server.go:300] >>> Received 52 bytes of DATA(id=1)
I0715 09:00:32.756915       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:00:48.186314       1 server.go:300] >>> Received 46 bytes of DATA(id=5)
I0715 09:00:48.186349       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
...
I0715 09:01:03.242817       1 server.go:300] >>> Received 46 bytes of DATA(id=5)
I0715 09:01:03.242824       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
I0715 09:01:03.445538       1 server.go:217] proxy request from client, userAgent [grpc-go/1.26.0]
I0715 09:01:03.445695       1 server.go:250] start serving frontend stream
I0715 09:01:03.445712       1 server.go:261] >>> Received DIAL_REQ
I0715 09:01:03.445718       1 backend_manager.go:174] pick agentID=c743ecec-e248-4dea-b61e-1b14110588bf as backend
I0715 09:01:03.445779       1 server.go:282] >>> DIAL_REQ sent to backend
I0715 09:01:08.204758       1 server.go:300] >>> Received 42 bytes of DATA(id=5)
I0715 09:01:08.204798       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]
...
I0715 09:01:13.215334       1 server.go:300] >>> Received 46 bytes of DATA(id=5)
I0715 09:01:13.215341       1 server.go:316] >>> DATA sent to Backend[3ac8054c-a9c5-4116-b43b-e7e1d5fa2e7f]

EDIT:
After everything gets stuck, logs look like below:
konnectivity-server:

I0715 09:26:08.725682       1 server.go:217] proxy request from client, userAgent [grpc-go/1.26.0]
I0715 09:26:08.725785       1 server.go:250] start serving frontend stream
I0715 09:26:08.725795       1 server.go:261] >>> Received DIAL_REQ
I0715 09:26:08.725810       1 backend_manager.go:174] pick agentID=91a024e7-5969-4bff-b05a-129f6ff60958 as backend
I0715 09:26:08.725873       1 server.go:282] >>> DIAL_REQ sent to backend
I0715 09:26:10.626449       1 server.go:300] >>> Received 52 bytes of DATA(id=6)
I0715 09:26:10.626582       1 server.go:316] >>> DATA sent to Backend[91a024e7-5969-4bff-b05a-129f6ff60958]
I0715 09:26:11.615066       1 server.go:300] >>> Received 46 bytes of DATA(id=1)
I0715 09:26:11.615104       1 server.go:316] >>> DATA sent to Backend[91a024e7-5969-4bff-b05a-129f6ff60958]
I0715 09:26:11.615112       1 server.go:300] >>> Received 46 bytes of DATA(id=1)
I0715 09:26:11.615123       1 server.go:316] >>> DATA sent to Backend[91a024e7-5969-4bff-b05a-129f6ff60958]
I0715 09:26:11.615127       1 server.go:300] >>> Received 46 bytes of DATA(id=1)
I0715 09:26:11.615134       1 server.go:316] >>> DATA sent to Backend[91a024e7-5969-4bff-b05a-129f6ff60958]
I0715 09:26:11.615137       1 server.go:300] >>> Received 46 bytes of DATA(id=1)

konnectivity-agent:

I0715 09:26:08.016463       1 client.go:262] received 1070 bytes from remote for connID[6]
I0715 09:26:08.091962       1 client.go:151] [tracing] recv packet, type: DATA
I0715 09:26:08.091983       1 client.go:217] received DATA(id=6)
I0715 09:26:08.092048       1 client.go:291] [connID: 6] write last 52 data to remote
I0715 09:26:08.120206       1 client.go:262] received 375 bytes from remote for connID[6]
I0715 09:26:08.726202       1 client.go:151] [tracing] recv packet, type: DIAL_REQ
I0715 09:26:08.726226       1 client.go:160] received DIAL_REQ

It's weird because from the konnectivity-server site we can see DIAL_REQ and then series of 46 bytes DATA, but on the konnectivity-agent site we can see only DIAL_REQ and everything later is stuck, no DATA.

It happens every kubectl logs ....

EDIT:
One more log, from KAS:

I0715 11:53:12.746988       1 clientconn.go:106] parsed scheme: ""
I0715 11:53:12.747007       1 clientconn.go:106] scheme "" not registered, fallback to default scheme
I0715 11:53:12.747104       1 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/etc/kubernetes/konnectivity-server/konnectivity-server.socket  <nil> 0 <nil>}] <nil> <nil>}
I0715 11:53:12.747118       1 clientconn.go:933] ClientConn switching balancer to "pick_first"
I0715 11:53:12.747155       1 clientconn.go:882] blockingPicker: the picked transport is not ready, loop back to repick
I0715 11:53:12.747661       1 client.go:175] DIAL_REQ sent to proxy server
...
I0715 11:53:42.747863       1 trace.go:116] Trace[1524423019]: "Proxy via HTTP Connect over uds" address:...:10250 (started: 2020-07-15 11:53:12.746947264 +0000 UTC m=+9318.545114137) (total time: 30.000871813s):
Trace[1524423019]: [30.000871813s] [30.000871813s] END
E0715 11:53:42.747914       1 status.go:71] apiserver received an error that is not an metav1.Status: &url.Error{Op:"Get", URL:"https://...:10250/containerLogs/kube-system/metrics-server-7f6d95d688-gvjfq/metrics-server", Err:(*errors.errorString)(0xc008436950)}
I0715 11:53:42.748181       1 trace.go:116] Trace[140969147]: "Get" url:/api/v1/namespaces/kube-system/pods/metrics-server-7f6d95d688-gvjfq/log,user-agent:kubectl/v1.17.3 (linux/amd64) kubernetes/06ad960,client:192.168.159.0 (started: 2020-07-15 11:53:12.740761888 +0000 UTC m=+9318.538928860) (total time: 30.007389174s):
Trace[140969147]: [30.007388316s] [30.001265982s] Transformed response object
E0715 11:53:44.763213       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.13.35:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.13.35:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Definitely there is something wrong with DIAL_REQ between konnectivity-server and konnectivity-agent.

EDIT:
I noticed that kubectl logs works for pods which are scheduled on the same node as konnectivity-agent pod. I will dig it deeper.

from apiserver-network-proxy.

zanetworker avatar zanetworker commented on July 21, 2024

@Avatat I had the same issue with having the konnectivity-agent as a deployment, turned it into a daemonset worked much better, still I didn't get down to why as a deployment it was stuck sometimes.

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

@Avatat I had the same issue with having the konnectivity-agent as a deployment, turned it into a daemonset worked much better, still I didn't get down to why as a deployment it was stuck sometimes.

Thank you!
I thought the same and I've tried deployment and daemonset too - I didn't see any difference :(

from apiserver-network-proxy.

Avatat avatar Avatat commented on July 21, 2024

Ok, I found why kubectl logs didn't want to work. The reason is simple and obvious - firewall on my nodes, allow traffic to kubelet (10250 TCP) only from KAS network. When Konnectivity tunnel was established to the agent on node A, it can't reach kubelet at node B, because the firewall was dropping communication.

from apiserver-network-proxy.

cheftako avatar cheftako commented on July 21, 2024

TCP) only from KAS network. When Konnectivity tunnel was established to the agent on node A, it can't reach kubelet at node B, because the firewall was dropping communication.

Yup. We have a nice to have feature files to try to get the traffic to at least be sent to the correct failure zone (and better yest the right node). However there is benefit to not requiring the traffic have to land on the correct node. The most obvious is that as long as traffic doesn't have to land on the right node, then you can kubectl logs the agent, when its having problems.

from apiserver-network-proxy.

Jefftree avatar Jefftree commented on July 21, 2024

Fixed by #125

/close

from apiserver-network-proxy.

k8s-ci-robot avatar k8s-ci-robot commented on July 21, 2024

@Jefftree: Closing this issue.

In response to this:

Fixed by #125

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from apiserver-network-proxy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.