Giter Site home page Giter Site logo

apiserver-network-proxy's People

Contributors

andrewsykim avatar anfernee avatar avrittrohwer avatar carreter avatar charleszheng44 avatar cheftako avatar cnvergence avatar daixiang0 avatar dberkov avatar dependabot[bot] avatar ipochi avatar irozzo-1a avatar jdnurme avatar jefftree avatar jkh52 avatar jveski avatar k8s-ci-robot avatar liangyuanpeng avatar liggitt avatar liufen90 avatar maxrenaud avatar mihivagyok avatar nikhita avatar rastislavs avatar rata avatar relyt0925 avatar sh4d1 avatar silenceper avatar tallclair avatar timoreimann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apiserver-network-proxy's Issues

use of closed network connection

I'm constantly seeing this message (from the proxy server logs when using http-connect), even when running the proxy server/agent locally:

Received error on connection read unix konnectivity-server.socket->@: use of closed network connection. 

It doesn't have any effect on the proxy/data transfer as far as the client is concerned, but seems to be some clean up we may not be doing.

/cc @cheftako
/cc @caesarxuchao

To repro, start 4 processes in the terminal:

It seems to occur in almost every request.

python -m SimpleHTTPServer 8001

./bin/proxy-server --uds-name=konnectivity-server.socket --mode=http-connect --cluster-cert=certs/agent/issued/proxy-master.crt --cluster-key=certs/agent/private/proxy-master.key --server-port=0

./bin/proxy-agent --ca-cert=certs/agent/issued/ca.crt --agent-cert=certs/agent/issued/proxy-agent.crt --agent-key=certs/agent/private/proxy-agent.key

./bin/proxy-test-client --proxy-uds=konnectivity-server.socket --proxy-host= --proxy-port=0 --mode=http-connect --request-port=8001 --request-host=localhost 

Agent to server communication through an egress proxy

Currently, the agent to server communication is based on gRPC streams. There're cases where an agent's egress has to go through an egress proxy. In many cases, the egress proxy does not support gRPC protocol directly.

Is there a recommended way to solve this problem? For example, has anyone tried tunnel the grpc http/2 traffic through an HTTP CONNECT based proxy? Is that a supported model?

Support SSH Tunnels

Part of our goals here is to allow SSH Tunnels to be removed from the KAS. If we support SSH Tunnels it would allow a smoother migration plan for users of SSH Tunnels.

Use channel instead of mutex for performance improvement

We use a mutex to protect the backend stream in proxy server.

Same in the proxy agent.

Need to investigate if there is a more efficient implementation with channels. Note that both cases are N producers and 1 consumer. It's tricky to stop pipeline, because (1) channel should be closed by the producer not the consumer (2) OTOH it's difficult to let 1 of the N producers to close the channel.

Clean up go.sum

We have many versions of a module in our go.sum. This file should be trimmed down since this is a very early repository and we should ideally only have one version for each module.

`kubectl exec` failure with httpConnect mode

The failure message emitted by the kube-apiserver is: "error dialing backend: EOF".

With instrumentation, it shows it failed at this line: https://github.com/kubernetes/kubernetes/blob/b5b675491b69b5d48bf112a896bc739e500c7275/staging/src/k8s.io/apimachinery/pkg/util/proxy/dial.go#L85

The tls handshake received the "EOF" error.

At this line, the tunnel to the kubelet has already been established, and the kube-apiserver is trying to do the tls handshake with the kubelet over the tunnel, and it fails.

This doesn't happen if the proxy runs in the grpc mode.

I don't know if it's related to #80.

Handle agent disconnects for PendingDial

When an agent disconnects #125 closes all client side connections that use the corresponding agent. However, PendingDial requests may still be in flight and have not been added to the list of clients yet. We should either fail them or retry with a different agent instead of letting the client hit its dial timeout.

Original context from @cheftako:

Most of the time I would expect pending dial to be empty. However if there is something in there, there is a chance its request went out via this backend. If so we will never get the response and that also needs to be dealt with.

The issue is that we do not record in the pending data structure which backend it used, so we cannot tell if anything on the pending list would be effected by a given backend breaking. We also need to work out how to deal with it. One option would be to just fail, which is probably easiest. However as the connection has not yet be established, we should be able to switch to using a different backend.

Test failing with "not enough arguments in call to tunnel.serve"

Fedora Rawhide, with latest stable Golang:

Testing    in: /builddir/build/BUILD/apiserver-network-proxy-0.0.10/_build/src
         PATH: /builddir/build/BUILD/apiserver-network-proxy-0.0.10/_build/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin
       GOPATH: /builddir/build/BUILD/apiserver-network-proxy-0.0.10/_build:/usr/share/gocode
  GO111MODULE: off
      command: go test -buildmode pie -compiler gc -ldflags "-X sigs.k8s.io/apiserver-network-proxy/version=0.0.10 -extldflags '-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld '"
      testing: sigs.k8s.io/apiserver-network-proxy
sigs.k8s.io/apiserver-network-proxy/konnectivity-client/pkg/client
# sigs.k8s.io/apiserver-network-proxy/konnectivity-client/pkg/client [sigs.k8s.io/apiserver-network-proxy/konnectivity-client/pkg/client.test]
./client_test.go:43:17: not enough arguments in call to tunnel.serve
	have ()
	want (*grpc.ClientConn)
./client_test.go:73:17: not enough arguments in call to tunnel.serve
	have ()
	want (*grpc.ClientConn)
./client_test.go:130:17: not enough arguments in call to tunnel.serve
	have ()
	want (*grpc.ClientConn)
FAIL	sigs.k8s.io/apiserver-network-proxy/konnectivity-client/pkg/client [build failed]

Vague errors when running in SA auth mode while the cluster CA cert is set

What Happened
I was testing out service account authentiction with authentication-audience and for some reason it was not working (getting connection closed on the clientside). I was using the following flags:

          - --cluster-ca-cert=ca.crt
          - --cluster-cert=konnectivity-server.crt
          - --cluster-key=konnectivity-server.key
          - --agent-namespace=namespace
          - --agent-service-account=konnectivity-agent
          - --kubeconfig=kubeconfig
          - --authentication-audience=system:konnectivity-server
          - --mode=http-connect

As I was switching between SA auth mode and certificates, the problem was that the --cluster-ca-cert flag was set at the same time as the sa auth flags (auth-audience, agent-namespace, ...) which probably led to the konnectivity server running in certificate auth mode.

What I expect
If I set the agent-service-account, then cluster auth validation should not be used, or at least validation with a clear error to set the right flags.

Network Proxy hangs when large amount of data is transferred

This was observed with kubectl cp on a large binary file. I attempted to run kubectl cp on a 57M file.

To reproduce, start a kubernetes cluster with network proxy enabled

KUBE_ENABLE_EGRESS_VIA_KONNECTIVITY_SERVICE=true ./cluster/kube-up.sh

SSH onto the master and try copying the kubectl binary to a random pod:

Eg:

kubectl apply -f https://k8s.io/examples/application/shell-demo.yaml
kubectl cp $(which kubectl) shell-demo:kubectl

The konnectivity-server.log file shows data transferred, but it stops partially through the data transfer and hangs

EDIT: This can be reproduced without kubernetes

  1. Build network proxy with certs & protos generated. (make build && make gen && make certs)
  2. Copy the kubectl binary (or a similar sized file) to the network proxy dir. (cp $(which kubectl) .)
  3. Run 4 terminals:

HTTP Connect mode fails:

./bin/proxy-server --uds-name=konnectivity-server.socket --mode=http-connect --cluster-cert=certs/agent/issued/proxy-master.crt --cluster-key=certs/agent/private/proxy-master.key --server-port=
./bin/proxy-agent --ca-cert=certs/agent/issued/ca.crt --agent-cert=certs/agent/issued/proxy-agent.crt --agent-key=certs/agent/private/proxy-agent.key
python -m SimpleHTTPServer 8001
./bin/proxy-test-client --proxy-uds=konnectivity-server.socket --proxy-host= --proxy-port=0 --mode=http-connect --request-port=8001 --request-host=localhost --request-path=/kubectl

GRPC mode also fails

./bin/proxy-server --uds-name=konnectivity-server.socket --mode=grpc --cluster-cert=certs/agent/issued/proxy-master.crt --cluster-key=certs/agent/private/proxy-master.key --server-port=
./bin/proxy-agent --ca-cert=certs/agent/issued/ca.crt --agent-cert=certs/agent/issued/proxy-agent.crt --agent-key=certs/agent/private/proxy-agent.key
python -m SimpleHTTPServer 8001
./bin/proxy-test-client --proxy-uds=konnectivity-server.socket --mode=grpc --proxy-host= --proxy-port=0 --mode=http-connect --request-port=8001 --request-host=localhost --request-path=/kubectl

[GRPC Mode] Client -> Proxy connections should be closed when finished

When running in GRPC mode, client -> proxy UDS connections are not closed after the (client -> proxy server -> agent -> destination) connection is closed. After a CLOSE_RSP It seems that we only remove the frontends, but never call close on the underlying GRPC stream between the client and proxy. This causes resource leaks.

Should be an easy fix though.

After just a couple of operations on the master, the number of opened streams can get quite high. This does not happen in http-connect mode.

jying@kubernetes-master ~ $ netstat | grep konnectivity-server.socket | wc -l
204

/cc @caesarxuchao

Reuse GRPC tunnel when dialing from the client

Currently, we create a new grpcTunnel for every client connection to the proxy server (from the same client). This has some performance implications since we could have a lot of concurrent connections all using different tunnels with each tunnel creates only one stream to the proxy server. We should investigate reusing a single tunnel and multiplexing all streams through that instead of creating a new tunnel for each connection.

/cc @caesarxuchao

Implement a local/non agent option for the proxy server.

Currently the proxy-server attempts for forward all connection requests from the client to the proxy-agent. It would be useful to allow the proxy server to have a setting where it put the traffic on a local ethernet connection directly. This would allow us to firewall of the KAS so it could ONLY connect to the proxy-server(s). Then the relevant proxy-server could place traffic locally for things like connecting to the ETCD server.

Clean up UDS file before listening

When the proxy server restarts, if the UDS file already exists, trying to listen on the socket will fail with "bind: address already in use". The server should delete the file if it exists, and then listen to the socket.

cc @Jefftree

Panic when no backend are available

2020/03/26 16:16:41 http: panic serving @: runtime error: invalid memory address or nil pointer dereference
goroutine 1266 [running]:
net/http.(*conn).serve.func1(0xc0003420a0)
	/usr/local/go/src/net/http/server.go:1767 +0x139
panic(0x133c6a0, 0x206ac00)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
sigs.k8s.io/apiserver-network-proxy/pkg/server.(*Tunnel).ServeHTTP(0xc00000e800, 0x16be0c0, 0xc0004a81c0, 0xc0003f8100)
	/go/src/sigs.k8s.io/apiserver-network-proxy/pkg/server/tunnel.go:81 +0x663
net/http.serverHandler.ServeHTTP(0xc0001c01c0, 0x16be0c0, 0xc0004a81c0, 0xc0003f8100)
	/usr/local/go/src/net/http/server.go:2802 +0xa4
net/http.(*conn).serve(0xc0003420a0, 0x16c1f40, 0xc0000a8480)
	/usr/local/go/src/net/http/server.go:1890 +0x875
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:2927 +0x38e

Document release procedure

Our release process is a bit different from normal since we have a multi module golang repository. We should document the release procedure for a new version (TAGS) as well as the release image pipeline in k8s.gcr.io

/cc @caesarxuchao

Validation should not ignore errors that's not os.IsNotExist

We have this pattern many places in the repo:

if _, err := os.Stat(o.serverCert); os.IsNotExist(err) {
return fmt.Errorf("error checking server cert %s, got %v", o.serverCert, err)
}

We shouldn't ignore other kinds of error.

Walter mentioned that maybe the code is like this because os.Stat returns error if the file is a symlink, which I checked was not the case on my Ubuntu machine. Anyway, even if there is a kind of error that we want to ignore, we should whitelist it.

Issue with tag konnectivity-client/v0.0.5

There seems to be an issue with the tag konnectivity-client/v0.0.5. In fact the _GIT_TAG given to the prow job building the image is v20200218-konnectivity-client/v0.0.5-3-gef0d890 which fails the docker build since it's not a valid tag.

If we want to name a tag with the component name as well I guess it should be done in a docker tag way like konnectivity-client-v0.0.5.

/cc @Jefftree

`make gen` doesn't work

  1. agent.pb.go generated by protoc -I . proto/agent/agent.proto --go_out=plugins=grpc:${GOPATH}/src is placed at $GOPATH/src/assumes/sigs.k8s.io/apiserver-network-proxy/proto/agent. But cat hack/go-license-header.txt proto/agent/agent.pb.go > proto/agent/agent.licensed.go try to read it from the current directory, which doesn't work outside the $GOPATH

  2. make gen requires the golang/mock, which is not mentioned in the README

"DATA send to client stream error" in httpConnect mode

I noticed there are many such warning in the log. It's printed here:

klog.Warningf("<<< DATA send to client stream error: %v", err)

The connection is closed here, after receiving EOF from the client connection:

It seems this usually was a premature close, there were data the backend wanted to send to the client, thus the many "DATA send to client stream error" warnings.

I expect this would cause major problems, but it didn't. Why?

/cc @Jefftree @anfernee @cheftako

DATA to Backend failed, after agent restart

Hello,
In the beginning, I would like to thank you, apiserver-network-proxy team, for your amazing work :)

I use Konnectivity in version 0.0.10 and Kubernetes 1.18.5. Most of the time everything works flawlessly, but when I restart agent's pods, konnectivity-server don't want to switch traffic to the new connections:

I0710 13:11:42.881321       1 server.go:293] >>> DATA sent to backend
I0710 13:11:42.952402       1 server.go:498] <<< Received 372 bytes of DATA from agentID d3c1f7ef-4730-4300-964e-2e382af9e484, connID 1
I0710 13:11:42.961352       1 server.go:278] >>> Received 42 bytes of DATA(id=1)
I0710 13:11:42.961521       1 server.go:293] >>> DATA sent to backend
I0710 13:11:47.641175       1 server.go:418] Connect request from agent 3adda35e-94ab-46ae-bd6c-96a8197ff651
I0710 13:11:47.641222       1 backend_manager.go:99] register Backend &{0xc00050e180} for agentID 3adda35e-94ab-46ae-bd6c-96a8197ff651
W0710 13:11:47.909837       1 server.go:451] stream read error: rpc error: code = Canceled desc = context canceled
I0710 13:11:47.909895       1 backend_manager.go:119] remove Backend &{0xc000222480} for agentID 1bf0e057-b131-45b9-bbcb-ab752e9dfef4
I0710 13:11:47.909940       1 server.go:531] <<< Close backend &{0xc000222480} of agent 1bf0e057-b131-45b9-bbcb-ab752e9dfef4
W0710 13:11:51.898482       1 server.go:451] stream read error: rpc error: code = Canceled desc = context canceled
I0710 13:11:51.898535       1 backend_manager.go:119] remove Backend &{0xc0002223c0} for agentID d3c1f7ef-4730-4300-964e-2e382af9e484
I0710 13:11:51.898592       1 server.go:531] <<< Close backend &{0xc0002223c0} of agent d3c1f7ef-4730-4300-964e-2e382af9e484
I0710 13:11:55.825758       1 server.go:278] >>> Received 53 bytes of DATA(id=3)
W0710 13:11:55.825884       1 server.go:291] >>> DATA to Backend failed: rpc error: code = Unavailable desc = transport is closing
I0710 13:11:55.825914       1 server.go:293] >>> DATA sent to backend
I0710 13:11:55.849556       1 server.go:278] >>> Received 53 bytes of DATA(id=3)
W0710 13:11:55.849659       1 server.go:291] >>> DATA to Backend failed: rpc error: code = Unavailable desc = transport is closing
I0710 13:11:55.849691       1 server.go:293] >>> DATA sent to backend
I0710 13:11:56.097642       1 server.go:418] Connect request from agent 8fe505d3-11a8-4f63-9882-8a963ab16273
I0710 13:11:56.097671       1 backend_manager.go:99] register Backend &{0xc00050e240} for agentID 8fe505d3-11a8-4f63-9882-8a963ab16273
I0710 13:11:57.588222       1 server.go:418] Connect request from agent bf37dd3c-8736-484a-b9e6-6806b4b9338a
I0710 13:11:57.588257       1 backend_manager.go:99] register Backend &{0xc00072c300} for agentID bf37dd3c-8736-484a-b9e6-6806b4b9338a
I0710 13:12:00.827480       1 server.go:278] >>> Received 42 bytes of DATA(id=3)
W0710 13:12:00.827593       1 server.go:291] >>> DATA to Backend failed: rpc error: code = Unavailable desc = transport is closing
I0710 13:12:00.827625       1 server.go:293] >>> DATA sent to backend
I0710 13:12:00.850981       1 server.go:278] >>> Received 42 bytes of DATA(id=3)
W0710 13:12:00.851056       1 server.go:291] >>> DATA to Backend failed: rpc error: code = Unavailable desc = transport is closing
I0710 13:12:00.851075       1 server.go:293] >>> DATA sent to backend
I0710 13:12:00.894264       1 server.go:278] >>> Received 53 bytes of DATA(id=3)
W0710 13:12:00.894345       1 server.go:291] >>> DATA to Backend failed: rpc error: code = Unavailable desc = transport is closing
I0710 13:12:00.894376       1 server.go:293] >>> DATA sent to backend
I0710 13:12:05.953939       1 server.go:278] >>> Received 42 bytes of DATA(id=3)
W0710 13:12:05.954051       1 server.go:291] >>> DATA to Backend failed: rpc error: code = Unavailable desc = transport is closing
I0710 13:12:05.954077       1 server.go:293] >>> DATA sent to backend

After konnectivity-server restart, everything is going back to normal.

Consider setting proxy server healthz to "not ok" if there is no backend connection for a long time

The /healthz response controls if kubelet restarts a pod. If there is not proxy agent connecting to the proxy server for a long time, then maybe it's a server problem and restarting it might solve the problem. Thus maybe it's worth setting the healthz to "not ok" if there is no backend connection for a long time.

This is all hypothetical so far. If we see evidence in production, then we will come back adding this healthiness condition.

Ref #102 (comment).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.