Giter Site home page Giter Site logo

linkerd / linkerd-tcp Goto Github PK

View Code? Open in Web Editor NEW
524.0 34.0 28.0 629 KB

A TCP/TLS load balancer for Linkerd 1.x.

Home Page: https://linkerd.io

License: Apache License 2.0

Rust 98.29% Shell 1.06% sed 0.64%
tcp linkerd tls tokio rust service-mesh load-balancer

linkerd-tcp's Introduction

linkerd-tcp

A TCP load balancer for the linkerd service mesh.

Status: beta

CircleCI

Features

  • Lightweight, native TCP and TLS load balancer built on tokio.
    • Weighted-least-loaded P2C load balancing.
    • Minimal resource utilization: typically <.5 cores with ~2MB RSS.
  • Tightly integrated with the linkerd service mesh.
    • Supports endpoint weighting (i.e. for "red line" testing).
  • Modern Transport Layer Security via rustls:
    • TLS1.2 and TLS1.3 (draft 18) only.
    • ECDSA or RSA server authentication by clients.
    • RSA server authentication by servers.
    • Forward secrecy using ECDHE; with curve25519, nistp256 or nistp384 curves.
    • AES128-GCM and AES256-GCM bulk encryption, with safe nonces.
    • Chacha20Poly1305 bulk encryption.
    • ALPN support.
    • SNI support.

Quickstart

  1. Install Rust and Cargo.
  2. Run namerd. ./namerd.sh fetches, configures, and runs namerd using a local-fs-backed discovery (in ./tmp.discovery).
  3. From this repository, run: cargo run -- example.yml

We ❤️ pull requests! See CONTRIBUTING.md for info on contributing changes.

Usage

linkerd-tcp 0.1.0
A native TCP proxy for the linkerd service mesh

USAGE:
    linkerd-tcp <PATH>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <PATH>    Config file path

Example configuration

# Administrative control endpoints are exposed on a dedicated HTTP server. Endpoints
# include:
# - /metrics -- produces a snapshot of metrics formatted for prometheus.
# - /shutdown -- POSTing to this endpoint initiates graceful shutdown.
# - /abort -- POSTing to this terminates the process immediately.
admin:
  port: 9989

  # By default, the admin server listens only on localhost. We can force it to bind
  # on all interfaces by overriding the IP.
  ip: 0.0.0.0

  # Metrics are snapshot at a fixed interval of 10s.
  metricsIntervalSecs: 10

# A process exposes one or more 'routers'. Routers connect server traffic to
# load balancers.
routers:

  # Each router has a 'label' for reporting purposes.
  - label: default

    # Each router is configured to resolve names.
    # Currently, only namerd's HTTP interface is supported:
    interpreter:
      kind: io.l5d.namerd.http
      baseUrl: http://localhost:4180
      namespace: default
      periodSecs: 20

    servers:

      # Each router has one or more 'servers' listening for incoming connections.
      # By default, routers listen on localhost. You need to specify a port.
      - port: 7474
        dstName: /svc/default
        # You can limit the amount of time that a server will wait to obtain a
        # connection from the router.
        connectTimeoutMs: 500

      # By default each server listens on 'localhost' to avoid exposing an open
      # relay by default. Servers may be configured to listen on a specific local
      # address or all local addresses (0.0.0.0).
      - port: 7575
        ip: 0.0.0.0
        # Note that each server may route to a different destination through a
        # single router:
        dstName: /svc/google
        # Servers may be configured to perform a TLS handshake.
        tls:
          defaultIdentity:
            privateKey: private.pem
            certs:
              - cert.pem
              - ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem

    # Clients may also be configured to perform a TLS handshake.
    client:
      kind: io.l5d.static
      # We can also apply linkerd-style per-client configuration:
      configs:
        - prefix: /svc/google
          connectTimeoutMs: 400
          # Require that the downstream connection be TLS'd, with a
          # `subjectAltName` including the DNS name _www.google.com_
          # using either our local CA or the host's default openssl
          # certificate.
          tls:
            dnsName: "www.google.com"
            trustCerts:
              - ../eg-ca/ca/intermediate/certs/ca-chain.cert.pem
              - /usr/local/etc/openssl/cert.pem

Logging

Logging may be enabled by setting RUST_LOG=linkerd_tcp=info on the environment. When debugging, set RUST_LOG=trace.

Docker

To build the linkerd/linkerd-tcp docker image, run:

./dockerize latest

Replace latest with the version that you want to build.

Try running the image with:

docker run -v `pwd`/example.yml:/example.yml linkerd/linkerd-tcp:latest /example.yml

Code of Conduct

This project is for everyone. We ask that our users and contributors take a few minutes to review our code of conduct.

License

Copyright 2017-2018 Linkerd-TCP authors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

linkerd-tcp's People

Contributors

aochagavia avatar blitline-dev avatar briansmith avatar clemensw avatar hawkw avatar klingerf avatar moderation avatar olix0r avatar pcalcado avatar tamird avatar wmorgan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkerd-tcp's Issues

`/shutdown` on the admin port no longer results in the process shutting down

$ curl -X POST -vvvv  http://127.0.0.1:9989/shutdown
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 9989 (#0)
> POST /shutdown HTTP/1.1
> Host: 127.0.0.1:9989
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 0
< Date: Tue, 11 Jul 2017 22:50:51 GMT
<
* Curl_http_done: called premature == 0
* Connection #0 to host 127.0.0.1 left intact

A few minutes later, the process is still running.

Add an option to proxy namerd configuration to use NODE_NAME as url config

I setup linkerd-tcp to run on my cluster alongside Linkerd in a CNI configuration as a Daemonset. In my Kubernetes version 1.5.1, using hostNetwork: true doesn't give access to Kubernete's DNS options(for example, namerd.default.svc.cluster.local) A solution to this would be to expose the web port from Namerd as a NodePort service, however, this would require using the downward API env variable "NODE_NAME" to be able to access the Namerd NodePort service. The problem is that the downward API isn't available for ConfigMap resources.

A workaround was to hardcode a LoadBalancer service URL as the namerd url config for Linkerd-TCP.

I think that a configuration option to allow the url to be built from "NODE_NAME" env variable would solve this more elegantly as a solution to running Linkerd-TCP as a DaemonSet with CNI. Maybe a "fromNodePort" or something.

Of course, this might not be needed with Kubernetes 1.6 because they added a "ClusterFirstWithHostNet" dnsPolicy option which keeps DNS settings even with hostNetwork: true. I haven't updated yet so I haven't played with that option.

Add more client stats

Debugging the mystery of the failing iperf tests would have been easier had /metrics shown me that there were other clients connected and where they were connected from.

linkerd-tcp crashes ungracefully from 'too many open files'

Problem

'Too many open files' leaves linkerd-tcp in a bad state rather than gracefully exiting.

Symptoms:

In a shell with a ulimit -n 1024, if I open 500 connections (which implies 500 outgoing connections) linkerd-tcp will print an error to stdout and stop processing incoming connections or requests from existing connections. Closing some incoming connections will allow connects to succeed again but those new sockets never have their requests processed by linkerd-tcp.

$ RUST_LOG=error RUST_BACKTRACE=yes ./linkerd-tcp-1490585634 example.yml
Listening on http://127.0.0.1:9989.


thread 'main' panicked at 'could not run proxies: Error { repr: Os { code: 24, message: "Too many open files" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868
stack backtrace:
   1:     0x5586a7e4e7ac - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
   2:     0x5586a7e51abe - std::panicking::default_hook::{{closure}}::h59672b733cc6a455
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:351
   3:     0x5586a7e516c4 - std::panicking::default_hook::h1670459d2f3f8843
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:367
   4:     0x5586a7e51f5b - std::panicking::rust_panic_with_hook::hcf0ddb069e7beee7
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:555
   5:     0x5586a7e51df4 - std::panicking::begin_panic::hd6eb68e27bdf6140
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:517
   6:     0x5586a7e51d19 - std::panicking::begin_panic_fmt::hfea5965948b877f8
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:501
   7:     0x5586a7e51ca7 - rust_begin_unwind
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:477
   8:     0x5586a7e7e34d - core::panicking::panic_fmt::hc0f6d7b2c300cdd9
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/panicking.rs:69
   9:     0x5586a7bb6642 - core::result::unwrap_failed::h52f3f53af574d319
  10:     0x5586a7bbbf41 - linkerd_tcp::main::h2f95da4c40bc36fe
  11:     0x5586a7e58f7a - __rust_maybe_catch_panic
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libpanic_unwind/lib.rs:98
  12:     0x5586a7e526c6 - std::rt::lang_start::hd7c880a37a646e81
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:436
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panic.rs:361
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/rt.rs:57
  13:     0x7fa60f0823f0 - __libc_start_main
  14:     0x5586a7bb4f68 - <unknown>
  15:                0x0 - <unknown>```

Improve socket reuse to avoid using too many file descriptors

If I run a slow_cooker with 500 clients, we can run out file descriptors quickly.

slow_cooker -host "perf-cluster" -qps 20 -concurrency 500 -interval 10s http://proxy-test-4d:7474

results in a panic.

$ RUST_LOG=error RUST_BACKTRACE=yes ./linkerd-tcp-1490585634 example.yml
Listening on http://127.0.0.1:9989.
thread 'main' panicked at 'could not run proxies: Error { repr: Os { code: 24, message: "Too many open files" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868
stack backtrace:
   1:     0x557763c6f7ac - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
   2:     0x557763c72abe - std::panicking::default_hook::{{closure}}::h59672b733cc6a455
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:351
   3:     0x557763c726c4 - std::panicking::default_hook::h1670459d2f3f8843
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:367
   4:     0x557763c72f5b - std::panicking::rust_panic_with_hook::hcf0ddb069e7beee7
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:555
   5:     0x557763c72df4 - std::panicking::begin_panic::hd6eb68e27bdf6140
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:517
   6:     0x557763c72d19 - std::panicking::begin_panic_fmt::hfea5965948b877f8
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:501
   7:     0x557763c72ca7 - rust_begin_unwind
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:477
   8:     0x557763c9f34d - core::panicking::panic_fmt::hc0f6d7b2c300cdd9
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/panicking.rs:69
   9:     0x5577639d7642 - core::result::unwrap_failed::h52f3f53af574d319
  10:     0x5577639dcf41 - linkerd_tcp::main::h2f95da4c40bc36fe
  11:     0x557763c79f7a - __rust_maybe_catch_panic
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libpanic_unwind/lib.rs:98
  12:     0x557763c736c6 - std::rt::lang_start::hd7c880a37a646e81
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:436
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panic.rs:361
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/rt.rs:57
  13:     0x7fe85559a3f0 - __libc_start_main
  14:     0x5577639d5f68 - <unknown>
  15:                0x0 - <unknown>```

Can't test linkerd-tcp TLS server with curl or tls-scan

Attempting to test linkerd-tcp with curl has not been working for me.

Here's my configuration (I chose 443 to work with tls-scan more easily). I generated the certs with eg-ca

$ cat with_tls.yml
admin:
  addr: 0.0.0.0:9989
  metricsIntervalSecs: 10

proxies:
  - label: default
    servers:
      - kind: io.l5d.tcp
        addr: 0.0.0.0:7474
      - kind: io.l5d.tls
        addr: 0.0.0.0:443
        identities:
          localhost:
            privateKey: ../eg-ca/test.fruitless.org.tls/private.pem
            certs:
              - ../eg-ca/test.fruitless.org.tls/cert.pem
              - ../eg-ca/test.fruitless.org.tls/ca-chain.cert.pem
    namerd:
      url: http://127.0.0.1:4180
      path: /svc/default
      intervalSecs: 5
$ curl --no-alpn -vvvv -k https://localhost:7575
* Rebuilt URL to: https://localhost:7575/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 7575 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 692 certificates in /etc/ssl/certs
^C

Eventually it times out if I don't exit it.

Here's the trace log output and some commentary after digging through the source.

TRACE:linkerd_tcp::lb::socket                   : SecureServerHandshake(Some(SecureSocket(V4(127.0.0.1:48320)))).poll()
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320 256B
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: process_new_packets: 127.0.0.1:48320
DEBUG:rustls::server_hs                         : we got a clienthello ClientHelloPayload { client_version: TLSv1_2, random: Random([89, 30, 5, 57, 172, 204, 181, 20, 196, 176, 98, 54, 211, 162, 223, 205, 205, 88, 49, 232, 128, 150, 200, 195, 109, 66, 92, 211, 185, 119, 112, 28]), session_id: SessionID, cipher_suites: [TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_CAMELLIA_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256, Unknown(49325), TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_ECDSA_WITH_CAMELLIA_256_CBC_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_CAMELLIA_128_GCM_SHA256, Unknown(49324), TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_CAMELLIA_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_CAMELLIA_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_CAMELLIA_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_CAMELLIA_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_CAMELLIA_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_CAMELLIA_256_GCM_SHA384, TLS_RSA_WITH_AES_256_CCM, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_CAMELLIA_256_CBC_SHA, TLS_RSA_WITH_CAMELLIA_256_CBC_SHA256, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_CAMELLIA_128_GCM_SHA256, TLS_RSA_WITH_AES_128_CCM, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_CAMELLIA_128_CBC_SHA, TLS_RSA_WITH_CAMELLIA_128_CBC_SHA256, TLS_RSA_WITH_3DES_EDE_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_CAMELLIA_256_GCM_SHA384, TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256, TLS_DHE_RSA_WITH_AES_256_CCM, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA, TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_CAMELLIA_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_CCM, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA, TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA256, TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA], compression_methods: [Null], extensions: [ExtendedMasterSecretRequest, Unknown(UnknownExtension { typ: Unknown(22), payload: Payload([]) }), Unknown(UnknownExtension { typ: StatusRequest, payload: Payload([1, 0, 0, 0, 0]) }), ServerName([ServerName { typ: HostName, payload: HostName("localhost") }]), Unknown(UnknownExtension { typ: RenegotiationInfo, payload: Payload([0]) }), SessionTicketRequest, NamedGroups([secp256r1, secp384r1, secp521r1, Unknown(21), Unknown(19)]), ECPointFormats([Uncompressed]), SignatureAlgorithms([RSA_PKCS1_SHA256, ECDSA_NISTP256_SHA256, RSA_PKCS1_SHA384, ECDSA_NISTP384_SHA384, RSA_PKCS1_SHA512, ECDSA_NISTP521_SHA512, Unknown(769), Unknown(771), RSA_PKCS1_SHA1, ECDSA_SHA1_Legacy])] }
DEBUG:rustls::server_hs                         : sni Some("localhost")
DEBUG:rustls::server_hs                         : sig schemes [RSA_PKCS1_SHA256, ECDSA_NISTP256_SHA256, RSA_PKCS1_SHA384, ECDSA_NISTP384_SHA384, RSA_PKCS1_SHA512, ECDSA_NISTP521_SHA512, Unknown(769), Unknown(771), RSA_PKCS1_SHA1, ECDSA_SHA1_Legacy]
DEBUG:linkerd_tcp::app::sni                     : finding cert resolver for Some("localhost")
DEBUG:linkerd_tcp::app::sni                     : found match for localhost
INFO :rustls::server_hs                         : decided upon suite SupportedCipherSuite { suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, kx: ECDHE, bulk: AES_256_GCM, hash: SHA384, sign: RSA, enc_key_len: 32, fixed_iv_len: 4, explicit_nonce_len: 8 }
DEBUG:rustls::server_hs                         : namedgroups [secp256r1, secp384r1, secp521r1, Unknown(21), Unknown(19)]
DEBUG:rustls::server_hs                         : ecpoints [Uncompressed]
DEBUG:rustls::server_hs                         : sending server hello Message { typ: Handshake, version: TLSv1_2, payload: Handshake(HandshakeMessagePayload { typ: ServerHello, payload: ServerHello(ServerHelloPayload { server_version: TLSv1_2, random: Random([253, 13, 112, 52, 236, 107, 46, 111, 70, 34, 40, 101, 237, 30, 87, 226, 214, 156, 60, 204, 108, 45, 178, 254, 81, 144, 252, 41, 214, 42, 17, 124]), session_id: SessionID, cipher_suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, compression_method: Null, extensions: [ServerNameAck, RenegotiationInfo(PayloadU8([])), ExtendedMasterSecretAck] }) }) }
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320: 62B
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320: wrote 62
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320
DEBUG:tokio_core::reactor                       : consuming notification queue
DEBUG:tokio_core::reactor                       : scheduling direction for: 5
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320: Resource temporarily unavailable (os error 11)
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320: 4070B
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320: wrote 4070
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320
DEBUG:tokio_core::reactor                       : consuming notification queue
DEBUG:tokio_core::reactor                       : scheduling direction for: 5
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320: would block
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320: 370B
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320: wrote 370
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320
DEBUG:tokio_core::reactor                       : consuming notification queue
DEBUG:tokio_core::reactor                       : scheduling direction for: 5
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320: would block
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : write_session_to_tcp: write_tls: 127.0.0.1:48320: 9B
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320: wrote 9
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320
DEBUG:tokio_core::reactor                       : consuming notification queue
DEBUG:tokio_core::reactor                       : scheduling direction for: 5
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48320: would block
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48320
TRACE:linkerd_tcp::lb::socket                   : server handshake: 127.0.0.1:48320: not complete
DEBUG:linkerd_tcp::lb::shared                   : poll_complete
TRACE:linkerd_tcp::app                          : runner 1 not ready
TRACE:linkerd_tcp::app                          : runner not finished
TRACE:linkerd_tcp::app                          : runner 0 not ready
TRACE:linkerd_tcp::app                          : runner not finished
DEBUG:tokio_core::reactor                       : loop poll - Duration { secs: 0, nanos: 1066 }
DEBUG:tokio_core::reactor                       : loop time - Instant { tv_sec: 7953554, tv_nsec: 368442048 }

No more references to 127.1:48320 in the logs.

On another test run, I notice there's still 51 bytes left in the Recv-Q of curl.

stevej@proxy-test-4d:~/src/linkerd-tcp$ netstat -an |grep 7575
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:7575            0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:48680         127.0.0.1:7575          ESTABLISHED
tcp       51      0 127.0.0.1:7575          127.0.0.1:48680         ESTABLISHED
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48704: wrote 9
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48704
DEBUG:tokio_core::reactor                       : consuming notification queue
DEBUG:tokio_core::reactor                       : scheduling direction for: 5
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: WouldBlock 127.0.0.1:48704: would block
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48704
TRACE:linkerd_tcp::lb::socket                   : server handshake: 127.0.0.1:48704: not complete
DEBUG:linkerd_tcp::lb::shared                   : poll_complete
TRACE:linkerd_tcp::app                          : runner 1 not ready
TRACE:linkerd_tcp::app                          : runner not finished
TRACE:linkerd_tcp::app                          : runner 0 not ready
TRACE:linkerd_tcp::app                          : runner not finished
DEBUG:tokio_core::reactor                       : loop poll - Duration { secs: 0, nanos: 2895 }
DEBUG:tokio_core::reactor                       : loop time - Instant { tv_sec: 7956349, tv_nsec: 711642613 }
DEBUG:tokio_core::reactor                       : loop process - 0 events, Duration { secs: 0, nanos: 14111 }
DEBUG:tokio_core::reactor                       : loop poll - Duration { secs: 0, nanos: 1705382 }
DEBUG:tokio_core::reactor                       : loop time - Instant { tv_sec: 7956349, tv_nsec: 713367966 }
TRACE:tokio_core::reactor                       : event Ready {Readable | Writable} Token(12)
DEBUG:tokio_core::reactor                       : notifying a task handle
DEBUG:tokio_core::reactor                       : loop process - 1 events, Duration { secs: 0, nanos: 53141 }
DEBUG:tokio_core::reactor                       : loop poll - Duration { secs: 0, nanos: 5075 }
DEBUG:tokio_core::reactor                       : loop time - Instant { tv_sec: 7956349, tv_nsec: 713433957 }
TRACE:tokio_core::reactor                       : event Ready {Readable} Token(1)
DEBUG:tokio_core::reactor                       : loop process - 1 events, Duration { secs: 0, nanos: 10965 }
TRACE:linkerd_tcp::app                          : polling 1 running
TRACE:linkerd_tcp::app                          : polling runner 0
TRACE:linkerd_tcp::app                          : polling 2 running
TRACE:linkerd_tcp::app                          : polling runner 0
DEBUG:tokio_core::reactor                       : consuming notification queue
DEBUG:tokio_core::reactor                       : scheduling direction for: 0
DEBUG:linkerd_tcp::lb::shared                   : poll_complete
TRACE:linkerd_tcp::app                          : runner 0 not ready
TRACE:linkerd_tcp::app                          : polling runner 1
TRACE:linkerd_tcp::lb::socket                   : SecureServerHandshake(Some(SecureSocket(V4(127.0.0.1:48704)))).poll()
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48704
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: read_tls: 127.0.0.1:48704 107B
TRACE:linkerd_tcp::lb::socket                   : read_tcp_to_session: process_new_packets: 127.0.0.1:48704
TRACE:linkerd_tcp::lb::socket                   : server handshake: write_session_to_tcp: 127.0.0.1:48704
TRACE:linkerd_tcp::lb::socket                   : server handshake: 127.0.0.1:48704: not complete
DEBUG:linkerd_tcp::lb::shared                   : poll_complete
TRACE:linkerd_tcp::app                          : runner 1 not ready
TRACE:linkerd_tcp::app                          : runner not finished
TRACE:linkerd_tcp::app                          : runner 0 not ready
TRACE:linkerd_tcp::app                          : runner not finished

It looks to me like we're entering this code block and are never being rescheduled because the other side isn't happy with our handshake and isn't finishing the read:

        // If the remote hasn't read everything yet, resume later.
        if ss.session.is_handshaking() {
            trace!("server handshake: {}: not complete", ss.addr);
            self.0 = Some(ss);
            return Ok(Async::NotReady);
        }
stevej@proxy-test-4d:~$ sudo tcpdump -A -s 1500 -i lo tcp port 7575
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 1500 bytes
21:22:44.446293 IP localhost.48736 > localhost.7575: Flags [S], seq 748745347, win 43690, options [mss 65495,sackOK,TS val 1989056284 ecr 0,nop,wscale 7], length 0
E..<e.@.@............`..,............0.........
v...........
21:22:44.446315 IP localhost.7575 > localhost.48736: Flags [S.], seq 1713794766, ack 748745348, win 43690, options [mss 65495,sackOK,TS val 1989056284 ecr 1989056284,nop,wscale 7], length 0
E..<..@.@.<............`f&n.,........0.........
v...v.......
21:22:44.446329 IP localhost.48736 > localhost.7575: Flags [.], ack 1, win 342, options [nop,nop,TS val 1989056284 ecr 1989056284], length 0
E..4e.@.@..!.........`..,...f&n....V.(.....
v...v...
21:22:44.511685 IP localhost.48736 > localhost.7575: Flags [P.], seq 1:257, ack 1, win 342, options [nop,nop,TS val 1989056300 ecr 1989056284], length 256
E..4e.@.@.. .........`..,...f&n....V.(.....
v..,v..............Y..%Q.|..    B.JG=+.XDV$!...~...._...r.,.......
.$.s.+.....     .#.r...0.......(.w./.....'.v.....{...5.=.......z.../.<.A...
...}.....9.k.......|...3.g.E.......\.........................   localhost......#...
...
..........................................
21:22:44.511710 IP localhost.7575 > localhost.48736: Flags [.], ack 257, win 350, options [nop,nop,TS val 1989056300 ecr 1989056300], length 0
E..4..@[email protected]............`f&n.,......^.(.....
v..,v..,

tls-scan is also not happy.

$ ./tls-scan --tls1 --host=localhost --port=443 --pretty
host: localhost; ip: ; error: Network; errormsg:                      Error encountered while reading

<|---------Scan Summary---------|>
 [27154] ciphers             :  (0)
 [27154] dns-lookup          : 1
 [27154] network-error       : 1
 [27154] dns-errcount        : 0
 [27154] remote-close-error  : 0
 [27154] unknown-error       : 0
 [27154] timeout-error       : 0
 [27154] connect-error       : 1
 [27154] tls-handshake       : 0
 [27154] gross-tls-handshake : 0
 [27154] elapsed-time        : 0.10453 secs
<|------------------------------|>

Linkerd-TCP + Websockets + K8S 1.9.x - Not working

Note: Tried posting this on discourse and got an error regarding link limits in posts by new users, but there are no links in this post...

We're having an extraordinary amount of trouble trying to set up linkerd-tcp on k8s with a simple websocket application. The closest we've been able to get to a working request is having linkerd-tcp spit out some errors like error parsing response: missing field addrs at line 1 column 14 and our test application throwing ECONNREFUSED. Without linkerd, our sample app works as expected.

We have some questions regarding namerd config:

  • Namerd "namespace": does this need to be the same as the k8s namespace running linkerd-tcp, or the k8s namespace running the backend service pods, or does namerd have it's own concept of namespaces?
  • Namerd "label": should this match the name of any k8s services or namespaces, or is this purely an internal value used to configure namerd?

More generally, does anyone see any issues with our config?

Linkerd-tcp/namerd configs:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: l5d
  name: l5d
  namespace: linkerd
spec:
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: l5d
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
      - name: l5d-config
        configMap:
          name: l5d-config
      - name: l5d-tcp-config
        configMap:
          name: l5d-tcp-config
          items:
          - key: config.yaml
            path: config.yaml
      - name: l5d-tcp-namerd
        configMap:
          name: l5d-namerd-config
          items:
          - key: namerd.yaml
            path: namerd.yaml
      - name: tls-cert
        secret:
          secretName: certificates
      containers:
      - name: l5d
        image: buoyantio/linkerd:1.3.6
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        args:
        - /io.buoyant/linkerd/config/config.yaml
        - "-log.level=DEBUG" # for debugging
        ports:
        - name: outgoing
          containerPort: 4140
          hostPort: 4140
        - name: incoming
          containerPort: 4141
        - name: admin
          containerPort: 9990
        volumeMounts:
        - name: l5d-config
          mountPath: /io.buoyant/linkerd/config
          readOnly: true
        - name: tls-cert
          mountPath: /io.buoyant/linkerd/certs
          readOnly: true
      - name: linkerd-tcp
        image: linkerd/linkerd-tcp:0.1.1
        command: [ "/usr/local/bin/linkerd-tcp"]
        args:
        - /io.buoyant/linkerd/config/config.yaml
        volumeMounts:
        - name: l5d-tcp-config
          mountPath: /io.buoyant/linkerd/config/config.yaml
          subPath: config.yaml
        ports:
        - name: tcp-admin
          containerPort: 9989
          hostPort: 9989
        - name: tcp-server
          containerPort: 7474
        env:
        - name: RUST_LOG # for debugging
          value: "trace"
        - name: RUST_BACKTRACE # for debugging
          value: "1"
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
      - name: kubectl
        image: buoyantio/kubectl:v1.8.5
        args:
        - "proxy"
        - "-p"
        - "8001"
---
kind: CustomResourceDefinition
apiVersion: apiextensions.k8s.io/v1beta1
metadata:
  name: dtabs.l5d.io
spec:
  scope: Namespaced
  group: l5d.io
  version: v1alpha1
  names:
    kind: DTab
    plural: dtabs
    singular: dtab
---
apiVersion: l5d.io/v1alpha1
dentries:
- dst: /#/io.l5d.k8s/default/http
  prefix: /svc
kind: DTab
metadata:
  namespace: linkerd
  name: l5d
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: l5d-namerd-config
  namespace: linkerd
data:
 namerd.yaml: |-
    admin:
      port: 9991
      ip: 0.0.0.0
    storage:
      kind: io.l5d.k8s
      host: localhost
      port: 8001
      namespace: linkerd
    interfaces:
    - kind: io.l5d.httpController
      ip: 0.0.0.0
      port: 4180
    telemetry:
    - kind: io.l5d.prometheus
      prefix: tcp_
    namers:
    - kind: io.l5d.k8s
      host: localhost
      port: 8001
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: namerd
  name: namerd
  namespace: linkerd
spec:
  replicas: 2
  selector:
    matchLabels:
      app: namerd
  strategy:
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: namerd
    spec:
      volumes:
      - name: l5d-namerd-config
        configMap:
          name: l5d-namerd-config
          items:
          - key: namerd.yaml
            path: namerd.yaml
      containers:
      - name: namerd
        image: buoyantio/namerd:1.3.5
        args:
        - /io.buoyant/namerd/1.3.5/config/namerd.yaml
        volumeMounts:
        - name: l5d-namerd-config
          mountPath: /io.buoyant/namerd/1.3.5/config/namerd.yaml
          subPath: namerd.yaml
        ports:
        - name: http
          containerPort: 4180
        - name: namerd-admin
          containerPort: 9991
      - name: kubectl
        image: buoyantio/kubectl:v1.8.5
        args:
        - "proxy"
        - "-p"
        - "8001"
      restartPolicy: Always
      securityContext: {}
---
apiVersion: v1
kind: Service
metadata:
  name: l5d
  namespace: linkerd
  labels:
    k8s-app: l5d
    app: l5d
spec:
  selector:
    app: l5d
  type: LoadBalancer
  ports:
  - name: outgoing
    port: 4140
  - name: incoming
    port: 4141
  - name: admin
    port: 9990
  - name: tcp-admin
    port: 9989
  - name: tcp-server
    port: 7474
---
apiVersion: v1
kind: Service
metadata:
  name: namerd
  namespace: linkerd
spec:
  selector:
    app: namerd
  type: LoadBalancer
  ports:
  - name: http
    port: 4180
  - name: admin
    port: 9991
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: l5d-config
  namespace: linkerd
data:
  config.yaml: |-
    admin:
      ip: 0.0.0.0
      port: 9990
    namers:
    - kind: io.l5d.k8s
      host: localhost
      port: 8001
    telemetry:
    - kind: io.l5d.prometheus
    - kind: io.l5d.recentRequests
      sampleRate: 0.25
    usage:
      orgId: linkerd-examples-daemonset
    routers:
    - protocol: http
      label: outgoing
      dtab: |
        /srv        => /#/io.l5d.k8s/default/http;
        /host       => /srv;
        /svc        => /host;
      interpreter:
        kind: default
        transformers:
        - kind: io.l5d.k8s.daemonset
          namespace: linkerd
          port: incoming
          service: l5d
          hostNetwork: true
      servers:
      - port: 4140
        ip: 0.0.0.0
      service:
        responseClassifier:
          kind: io.l5d.http.retryableRead5XX
      client:
        tls:
          commonName: linkerd
          trustCerts:
          - /io.buoyant/linkerd/certs/cacertificate.pem
    - protocol: http
      label: incoming
      dtab: |
        /srv        => /#/io.l5d.k8s/default/http;
        /host       => /srv;
        /svc        => /host;
      interpreter:
        kind: default
        transformers:
        - kind: io.l5d.k8s.localnode
          hostNetwork: true
      servers:
      - port: 4141
        ip: 0.0.0.0
        tls:
          certPath: /io.buoyant/linkerd/certs/certificate.pem
          keyPath: /io.buoyant/linkerd/certs/key.pk8
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: l5d-tcp-config
  namespace: linkerd
data:
  config.yaml: |-
    admin:
      ip: 0.0.0.0
      port: 9989
      metricsIntervalSecs: 10
    routers:
    - label: default
      interpreter:
        kind: io.l5d.namerd.http
        namespace: linkerd
        # baseUrl: http://localhost:4180
        baseUrl: http://namerd:4180
        periodSecs: 20
      servers:
      - ip: 0.0.0.0
        port: 7474
        dstName: /svc/server

Websocket client source:

#!/usr/bin/env node
var WebSocketClient = require('websocket').client;
var client = new WebSocketClient();
client.on('connectFailed', function(error) {
    console.log('Connect Error: ' + error.toString());
});
client.on('connect', function(connection) {
    console.log('WebSocket Client Connected');
    connection.on('error', function(error) {
        console.log("Connection Error: " + error.toString());
    });
    connection.on('close', function() {
        console.log('echo-protocol Connection Closed');
    });
    connection.on('message', function(message) {
        if (message.type === 'utf8') {
            console.log("Received: '" + message.utf8Data + "'");
        }
    });
    
    function sendNumber() {
        if (connection.connected) {
            var number = Math.round(Math.random() * 0xFFFFFF);
            connection.sendUTF(number.toString());
            setTimeout(sendNumber, 1000);
        }
    }
    sendNumber();
});
console.log('start');
// client.connect('ws://172.17.0.2:8081/', 'echo-protocol');
client.connect('ws://' + process.env.linkerd_proxy + '/', 'echo-protocol');
console.log('end');

Websocket server source:

#!/usr/bin/env node
var WebSocketServer = require('websocket').server;
var http = require('http');
var server = http.createServer(function(request, response) {
    console.log((new Date()) + ' Received request for ' + request.url);
    response.writeHead(404);
    response.end();
});
server.listen(80, function() {
    console.log((new Date()) + ' Server is listening on port 80');
});
wsServer = new WebSocketServer({
    httpServer: server,
    // You should not use autoAcceptConnections for production
    // applications, as it defeats all standard cross-origin protection
    // facilities built into the protocol and the browser.  You should
    // *always* verify the connection's origin and decide whether or not
    // to accept it.
    autoAcceptConnections: false
});
function originIsAllowed(origin) {
  // put logic here to detect whether the specified origin is allowed.
  return true;
}
wsServer.on('request', function(request) {
    if (!originIsAllowed(request.origin)) {
      // Make sure we only accept requests from an allowed origin
      request.reject();
      console.log((new Date()) + ' Connection from origin ' + request.origin + ' rejected.');
      return;
    }
    
    var connection = request.accept('echo-protocol', request.origin);
    console.log((new Date()) + ' Connection accepted.');
    connection.on('message', function(message) {
        if (message.type === 'utf8') {
            console.log('Received Message: ' + message.utf8Data);
            connection.sendUTF(message.utf8Data);
        }
        else if (message.type === 'binary') {
            console.log('Received Binary Message of ' + message.binaryData.length + ' bytes');
            connection.sendBytes(message.binaryData);
        }
    });
    connection.on('close', function(reasonCode, description) {
        console.log((new Date()) + ' Peer ' + connection.remoteAddress + ' disconnected.');
    });
});

Support file backed resolver

It would be nice to have linkerd-tcp run fully standalone and load all of the configuration settings / routing / etc.. from files.

Linkerd-tcp panics during redis load-test

Error message:

thread 'main' panicked at 'destination and source slices have different lengths', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs:533

Seeing this during a very high load-test of redis (batch inserting 5000 keys per second). The linkerd-tcp container does restart after throwing this message and I am seeing some keys being dropped in redis. I will try to recreate this with RUST_BACKTRACE=1.

Is linkerd-tcp under active development?

Hi,

I'm very interested to use linkerd-tcp to proxy connections to Kafka and MySQL. However, I noticed that:

  • although the latest release happened in June, the docker image was last pushed 8 months ago (meaning that it's most likely outdated)

  • the last commit on the repository was made 4 months ago

Which makes me wonder, is this project still under active development? Is it planned to be supported in the future?

Thanks!

Refactor linkerd-tcp to accomodate routing.

Prototype

The initial implementation is basically a prototype. It proves the concept, but it has
severe deficiencies that cause performance (and probably correctness) problems.
Specifically, it implements its own polling... poorly.

At startup, the configuration is parsed. For each proxy, the namerd and serving
configurations are split and connectd by an async channel so that namerd updates are
processed outside of the serving thread. All of the namerd watchers are collected to be
run together with the admin server. Once all of the proxy configurations are processed,
the application is run.

The admin thread is started, initiating all namerd polling and starting the admin server.

Simultaneously, all of the proxies are run in the main thread. For each of these, a
connector is created to determine how all downstream connections are established for
the proxy. A balancer is created with the connector and a stream of namerd updates. An
acceptor is created for each listening interface, which manifests as a stream of
connections, connections. The balancer is made shareable across servers by creating an
async channel and each server's connections are streamed into a sink clone. The balancer
is driven to process all of these connections.

The balancer implements a Sink that manages all I/O and connection management. Each
time Balancer::start_send or Balancer::poll_complete is called, the following work is
done:

  • all conneciton streams are checked for I/O and data is transfered;
  • closed connections are reaped;
  • service discovery is checked for updates;
  • new connections are established;
  • stats are recorded;

Lessons/Problems

Inflexible

This model doesn't really reflect that of linkerd. We have no mechanism to route
connections. All connections are simply forwarded. We cannot, for instance, route based on
client credentials or SNI destination.

Inefficient

Currently, each balancer is effectively a scheduler, and a pretty poor one at that. I/O
processing should be far more granular and we shouldn't update load balancer endpoints in
the I/O path (unless absolutely necessary).

Timeouts

We need several types of timeouts that are not currently implemented:

  • Connection timeout: time from incoming connection to outbound established.
  • Stream lifetime: maximum time a stream may stay open.
  • Idle timeout: maximum time a connection may stay open without transmitting data.

Proposal

linkerd-tcp should become a stream router. In the same way that linkerd routes requests,
linkerd-tcp should route connections. The following is a rough, evolving sketch of how
linkerd-tcp should be refactored to accomodate this:

The linkerd-tcp configuration should support one or more routers. Each router is
configured with one or more servers. A server, which may or may not terminate TLS,
produces a stream of incoming connections comprising an envelope--a source identity (an
address, but maybe more) and a destination name--and a bidirectional data stream. The
server may choose the destination by static configuration or as some function of the
connection (e.g. client credentials, SNI, etc). Each connection envelope may be annotated
with a standard set of metadata including, for example, an optional connect deadline,
stream deadline, etc.

The streams of all incoming connections for a router are merged into a single stream of
enveloped connections. This stream is forwarded to a binder. A binder is responsible
for maintaining a cache of balancers by destination name. When a balancer does not exist
in the cache, a new namerd lookup is initiated and its result stream (and value) is cached
so that future connections may resolve quickly. The binder obtains a balancer for each
destination name that maintains a list of endpoints and their load (in terms of
connections, throughput, etc).

If the inbound connection has not expired (i.e. due to a timeout), it is dispatched to the
balancer for processing. The balancer maintains a reactor handle and initiates I/O and
balancer state management on the reactor.

 ------       ------
| srv0 | ... | srvN |
 ------   |   ------
          |
          | (Envelope, IoStream)
          V
 -------------------      -------------
| binder            |----| interpreter |
 -------------------      -------------
  |
  V
 ----------
| balancer |
 ----------
  |
  V
 ----------
| endpoint |
 ----------
  |
  V
 --------
| duplex |
 --------

Unable to build from source

I've been trying to build linkerd-tcp from source (since the docker image doesn't include the latest version, see #82), but I had to make the some changes in order to make it work.

  • Changing the rustls commit ID (as instructed in #79)
  • Changing the rust image from jimmycuadra/rust:1.16.0 to rust:1.22.1 in the dockerize script, otherwise I get Rust compilations errors.

Would you like a PR to fix that?

smarter load balancer polling

Currently, we check all connections each time the load balancer is polled. This is inefficient, especially as the number of active connections increases. As the docs explain, we can be much more efficient about checking only relevant updates.

unable to proxy iperf3 traffic

iperf3 is a standard tool used to measure bandwidth performance on hardware switches and routers as well as software TCP proxies.

I'm able to do iperf3 tests between two hosts on the GCE network but not through linkerd-tcp.

Here's my test setup:

iperf3 installed on on netty-test-8, proxy-test-4d, proxy-test-4e.

proxy-test-4e has an iperf server listening on 7474 using the command iperf3 -s -p 7474 -d
proxy-test-4d has a namerd and a linkerd-tcp.
netty-test-8 runs an iperf client: iperf3 -c proxy-test-4d -p 7474 -i 1 -t 30 -b 1M

I'm attempting a 1MB/s test for 30 seconds. This same command works from netty-test-8 to proxy-test-4e` without linkerd-tcp in the mix.

Here is the linkerd-tcp trace log output. My reading of it is that linkerd lb is flapping between 0 connections established and 1 connections established for the iperf server. I'll dig into tcpdump and see what's different about how iperf makes a connection vs how linkerd-tcp makes a connection.

I also have full logfiles available (RUST_LOG=trace) and have attached that
linkerd_tokio_log.txt

Periodic waves of EOF every 60 seconds.

testing linkerd-tcp with the stock example.yml and slow_cooker at 20k qps is resulting in large waves of EOFs.

Get http://proxy-test-4d:7474: EOF                                                                                                                              [96/25466]
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
2017-06-28T18:53:27Z  19829/0/181 20000  99% 10s   0 [  2   4   5   18 ]   18      0
...
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:47703->10.240.0.21:7474: read: connection reset by peer
2017-06-28T18:54:27Z  19819/0/181 20000  99% 10s   0 [  2   4   6   14 ]   14      0
...
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
2017-06-28T18:55:27Z  19817/0/182 20000  99% 10s   0 [  1   4   5  206 ]  206      0
...
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
2017-06-28T18:56:27Z  19828/0/172 20000  99% 10s   0 [  2   4   6    9 ]    9      0
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
Get http://proxy-test-4d:7474: EOF
2017-06-28T18:57:27Z  19839/0/161 20000  99% 10s   0 [  3   5   7    9 ]    9      0

Interestingly, they seem to come in groups of 170-200 at a time about every 60 seconds.

Waiting For Alpha

Is there any timeline for Alpha release? Or this beta version is ready for production build?

linkerd-tcp process hangs in mutex lock after long load test.

I setup a long-running load test against linkerd-tcp with slow_cooker with the following parameters: 3 nginx backends serving static content and 3 slow_cooker load testers with the following command-lines:

slow_cooker -host "perf-cluster" -qps 100 -concurrency 100 -interval 1m -reportLatenciesCSV linkerd-tcp.csv http://proxy-test-4d:7474
slow_cooker -host "perf-cluster" -qps 1 -concurrency 100 -interval 1s -hashValue 9745850253931700627 -hashSampleRate 0.1 http://proxy-test-4d:7474
slow_cooker -host perf-cluster -qps 1 -concurrency 100 -interval 1m -hashValue 9745850253931700627 -hashSampleRate 0.1 -compress http://proxy-test-4d:7474

So this is about (10k + 200) qps with the second two slow_cookers checking for invalid response bodies.

After about 20 hours, linkerd-tcp stopped accepting new connections and stopped processing requests from existing connections. slow_cooker was reporting the following errors

Get http://proxy-test-4d:7474: read tcp 10.240.0.4:58016->10.240.0.21:7474: read: connection reset by peer
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:57899->10.240.0.21:7474: read: connection reset by peer
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:57902->10.240.0.21:7474: read: connection reset by peer
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:57906->10.240.0.21:7474: read: connection reset by peer
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:57853->10.240.0.21:7474: read: connection reset by peer
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:57912->10.240.0.21:7474: read: connection reset by peer
2017-03-30T16:51:02Z      0/0/55 6000   0% 1m0s   0 [  0   0   0    0 ]    0      0
Get http://proxy-test-4d:7474: dial tcp 10.240.0.21:7474: getsockopt: connection timed out
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:57907->10.240.0.21:7474: read: connection reset by peer
Get http://proxy-test-4d:7474: read tcp 10.240.0.4:57908->10.240.0.21:7474: read: connection reset by peer

Interesting, the admin port was still processing requests and I setup a watch to look at output every few seconds. Here's an example report

$ curl http://localhost:9989/metrics
metrics_count 12
success_count{proxy="default"} 79470054
connects{proxy="default", srv="0.0.0.0:7474"} 79470070
bytes_total{direction="tx", proxy="default"} 2382121311667
namerd_success_count{service="namerd"} 61943
conns_active{proxy="default"} 5
endpoints_retired{proxy="default"} 0
endpoints_ready{proxy="default"} 0
conns_established{proxy="default"} 0
conns_pending{proxy="default"} 0
endpoints_unready{proxy="default"} 3
poll_time_us{stat="count", proxy="default"} 6
poll_time_us{stat="mean", proxy="default"} 17.833333333333332
poll_time_us{stat="min", proxy="default"} 6
poll_time_us{stat="max", proxy="default"} 44
poll_time_us{stat="stddev", proxy="default"} 14.217555658019732
poll_time_us{stat="p50", proxy="default"} 6
poll_time_us{stat="p90", proxy="default"} 6
poll_time_us{stat="p95", proxy="default"} 6
poll_time_us{stat="p99", proxy="default"} 6
poll_time_us{stat="p999", proxy="default"} 6
poll_time_us{stat="p9999", proxy="default"} 6
namerd_request_latency_ms{stat="count", service="namerd"} 2
namerd_request_latency_ms{stat="mean", service="namerd"} 2
namerd_request_latency_ms{stat="min", service="namerd"} 2
namerd_request_latency_ms{stat="max", service="namerd"} 2
namerd_request_latency_ms{stat="stddev", service="namerd"} 0
namerd_request_latency_ms{stat="p50", service="namerd"} 2
namerd_request_latency_ms{stat="p90", service="namerd"} 2
namerd_request_latency_ms{stat="p95", service="namerd"} 2
namerd_request_latency_ms{stat="p99", service="namerd"} 2
namerd_request_latency_ms{stat="p999", service="namerd"} 2
namerd_request_latency_ms{stat="p9999", service="namerd"} 2

The process had a few more threads than I remember from the previous day.

stevej   13505 79.0  0.8  71776 33100 pts/2    Sl+  Mar27 4006:56 ./linkerd-tcp-1490585634 example.yml
stevej   13505 56.2  0.8  71776 33100 pts/2    Sl+  Mar27 2854:06 ./linkerd-tcp-1490585634 example.yml
stevej   13505  0.0  0.8  71776 33100 pts/2    Sl+  Mar27   0:01 ./linkerd-tcp-1490585634 example.yml
stevej   13505  0.0  0.8  71776 33100 pts/2    Sl+  Mar27   0:01 ./linkerd-tcp-1490585634 example.yml
stevej   13505  0.0  0.8  71776 33100 pts/2    Sl+  Mar27   0:01 ./linkerd-tcp-1490585634 example.yml
stevej   13505  0.0  0.8  71776 33100 pts/2    Sl+  Mar27   0:01 ./linkerd-tcp-1490585634 example.yml
stevej   13505  0.0  0.8  71776 33100 pts/2    Sl+  Mar27   0:07 ./linkerd-tcp-1490585634 example.yml
stevej   13505  0.0  0.8  71776 33100 pts/2    Sl+  Mar27   0:03 ./linkerd-tcp-1490585634 example.yml

3 of them were locked on the same mutex. Here's some output from rust-gdb. Unfortunately the process segfaulted before I captured backtraces from all the threads but I do have some output I can share:

Attaching to process 13505
[New LWP 13506]
[New LWP 13507]
[New LWP 13508]
[New LWP 13509]
[New LWP 13510]
[New LWP 13511]
[New LWP 13512]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f0ee2f716a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.

(gdb) thread 3
[Switching to thread 3 (Thread 0x7f0ee1fff700 (LWP 13507))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f0ee3450ebd in __GI___pthread_mutex_lock (mutex=0x7f0ee200e150)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x000056016c9a0389 in std::panicking::try::do_call::hb458ba233f0b55f7 ()
#3  0x000056016c9c2f7b in panic_unwind::__rust_maybe_catch_panic ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libpanic_unwind/lib.rs:98
#4  0x000056016c9a4893 in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hc7b55c280fa43c1f ()
#5  0x000056016c9bad95 in alloc::boxed::{{impl}}::call_once<(),()> ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/liballoc/boxed.rs:624
#6  std::sys_common::thread::start_thread ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys_common/thread.rs:21
#7  std::sys::imp::thread::{{impl}}::new::thread_start ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/thread.rs:84
#8  0x00007f0ee344e6ca in start_thread (arg=0x7f0ee1fff700) at pthread_create.c:333
#9  0x00007f0ee2f710af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f0ee1bfd700 (LWP 13509))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f0ee3450ebd in __GI___pthread_mutex_lock (mutex=0x7f0ee200e150)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x000056016c9a0389 in std::panicking::try::do_call::hb458ba233f0b55f7 ()
#3  0x000056016c9c2f7b in panic_unwind::__rust_maybe_catch_panic ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libpanic_unwind/lib.rs:98
#4  0x000056016c9a4893 in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hc7b55c280fa43c1f ()
#5  0x000056016c9bad95 in alloc::boxed::{{impl}}::call_once<(),()> ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/liballoc/boxed.rs:624
#6  std::sys_common::thread::start_thread ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys_common/thread.rs:21
#7  std::sys::imp::thread::{{impl}}::new::thread_start ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/thread.rs:84
#8  0x00007f0ee344e6ca in start_thread (arg=0x7f0ee1bfd700) at pthread_create.c:333
#9  0x00007f0ee2f710af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
(gdb) thread 6
[Switching to thread 6 (Thread 0x7f0ee19fc700 (LWP 13510))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f0ee3450ebd in __GI___pthread_mutex_lock (mutex=0x7f0ee200e150)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x000056016c9a0389 in std::panicking::try::do_call::hb458ba233f0b55f7 ()
#3  0x000056016c9c2f7b in panic_unwind::__rust_maybe_catch_panic ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libpanic_unwind/lib.rs:98
#4  0x000056016c9a4893 in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hc7b55c280fa43c1f ()
#5  0x000056016c9bad95 in alloc::boxed::{{impl}}::call_once<(),()> ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/liballoc/boxed.rs:624
#6  std::sys_common::thread::start_thread ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys_common/thread.rs:21
#7  std::sys::imp::thread::{{impl}}::new::thread_start ()
    at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/thread.rs:84
#8  0x00007f0ee344e6ca in start_thread (arg=0x7f0ee19fc700) at pthread_create.c:333
#9  0x00007f0ee2f710af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Some netstat output. I noticed that the Recv-Q for the listener has bytes in it. That's unexpected

sudo netstat -anp |head -n 2 && sudo netstat -anp |grep linkerd
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp      129      0 0.0.0.0:7474            0.0.0.0:*               LISTEN      13505/./linkerd-tcp
tcp        0      0 127.0.0.1:9989          0.0.0.0:*               LISTEN      13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48796        CLOSE_WAIT  13505/./linkerd-tcp
tcp        0      0 10.240.0.21:7474        10.240.0.14:60982       ESTABLISHED 13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48788        CLOSE_WAIT  13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48790        CLOSE_WAIT  13505/./linkerd-tcp
tcp        0      0 10.240.0.21:7474        10.240.0.14:34318       ESTABLISHED 13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48791        CLOSE_WAIT  13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48793        CLOSE_WAIT  13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48800        CLOSE_WAIT  13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48795        CLOSE_WAIT  13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48785        CLOSE_WAIT  13505/./linkerd-tcp
tcp        0      0 127.0.0.1:48284         127.0.0.1:4180          ESTABLISHED 13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48798        CLOSE_WAIT  13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48789        CLOSE_WAIT  13505/./linkerd-tcp
tcp       93      0 10.240.0.21:7474        10.240.0.4:48799        CLOSE_WAIT  13505/./linkerd-tcp

ltrace there was movement on the process and something suspicious stood out.
read(4, 0x7fffd9e9dfb0, 128) = -1 EAGAIN (Resource temporarily unavailable)
lsof showed the following FD connected to 4.
linkerd-t 13505 stevej 4r FIFO 0,10 0t0 89104520 pipe

Unfortunately, the process segfaulted after more ltraceing and most of my output was lost to a small tmux scrollback buffer.

Here's the segfault I did capture. Note that the activity here is the admin port writing out prometheus stats to my curl request in a watch loop

memcpy(0xe25fdc40, "\240\241\242\243\244\245\246\247\250\251\252\253", 12) = 0xe25fdc40
memcpy(0x7fffd9e9d192, "\377", 1)                                  = 0x7fffd9e9d192
memcpy(0xe25fdc4c, "\347", 1)                                      = 0xe25fdc4c
memcpy(0x7fffd9e9d193, "default", 7)                               = 0x7fffd9e9d193
memcpy(0xe25fdc4d, "\220\221\222\223\224", 5 <unfinished ...>
memcpy(0x7fffd9e9d19a, "\377", 1 <unfinished ...>
<... memcpy resumed> )                                             = 0xe25fdc4d
<... memcpy resumed> )                                             = 0x7fffd9e9d19a
memcpy(0xe25fdc52, "\347", 1 <unfinished ...>
memcpy(0x7fffd9e9d180, "conns_active", 12 <unfinished ...>
<... memcpy resumed> )                                             = 0xe25fdc52
<... memcpy resumed> )                                             = 0x7fffd9e9d180
memcpy(0xe25fdc53, "\230\231\232\233\234\235\236", 7 <unfinished ...>
memcpy(0x7fffd9e9d18c, "\377", 1 <unfinished ...>
<... memcpy resumed> )                                             = 0xe25fdc53
<... memcpy resumed> )                                             = 0x7fffd9e9d18c
memcpy(0xe25fdc5a, "\347", 1 <unfinished ...>
memcpy(0x7fffd9e9d18d, "proxy", 5 <unfinished ...>
<... memcpy resumed> )                                             = 0xe25fdc5a
<... memcpy resumed> )                                             = 0x7fffd9e9d18d
memcpy(0xe25fdc5b, "\b\t\n\v\f", 5 <unfinished ...>
memcpy(0x7fffd9e9d192, "\377", 1 <unfinished ...>
<... memcpy resumed> )                                             = 0xe25fdc5b
<... memcpy resumed> )                                             = 0x7fffd9e9d192
memcpy(0xe25fdc40, "\r\016\017", 3 <unfinished ...>
memcpy(0x7fffd9e9d193, "default", 7 <unfinished ...>
<... memcpy resumed> )                                             = 0xe25fdc40
<... memcpy resumed> )                                             = 0x7fffd9e9d193
memcpy(0xe25fdc43, "\020\021\022\023\024\025\026\027", 8)          = 0xe25fdc43
unexpected breakpoint at 0x56016c897de8
log2(8, 31, 0xe2121000, 1 <no return ...>
--- SIGSEGV (Segmentation fault) ---
<... log2 resumed> )                                               = 2
sigaction(SIGSEGV, { 0, <>, 0, 0 } <unfinished ...>
floor(0, 0xfffffc01, 0, 0)                                         = 2
pthread_mutex_lock(0xe2912288, 0xe2912280, 52, 0)                  = 0
<... sigaction resumed> , nil)                                     = 0
pthread_mutex_unlock(0xe2912288, 0xe20058e8, 0xe2914080, 0x6da8)   = 0
pthread_mutex_lock(0xe2912288, 0xe214bd00, 0xa0000, 0xe2912288)    = 0
--- SIGSEGV (Segmentation fault) ---
enable_breakpoint pid=13506, addr=0x56016c9ceca2, symbol=(null): No such process
pthread_mutex_unlock(Segmentation fault (core dumped)

I will put together a tool to gather full output from rust-gdb, lsof, and some strace and ltrace samples for the next time this pops up.

addr config fields should accept hostnames

Right now the addr fields in config files parse their values as inet socket addresses. We'll eventually need those to accept hostnames as well. Right now, addr: namerd:4180 throws the following error:

thread 'main' panicked at 'configuration error: Error { repr: Custom(Custom { kind: InvalidData, error: Message("invalid IP address syntax", Some(Pos { marker: Marker { index: 82, line: 6, col: 8 }, path: "proxies[0].namerd" })) }) }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

metrics reporting is dominating our CPU usage.

While profiling linkerd-tcp using linux perf, I uncovered that a large chunk of our CPU usage is stats reporting.

CPU usage without stats reporting is 36% at 10k qps.
CPU usage with stats reporting is 150% at 10k qps.

Attached is a zipfile with the SVG flame graph. Use a Mac App like Gapplin to view it.
flame.zip

When we turned off stats, the exact patch we wrote against tacho master was this

diff --git a/src/recorder.rs b/src/recorder.rs
index 123850f..7d361a6 100644
--- a/src/recorder.rs
+++ b/src/recorder.rs
@@ -36,7 +36,7 @@ impl Recorder {
             *curr += n;
             return;
         }
-        self.sample.counters.insert(k.clone(), n);
+        //self.sample.counters.insert(k.clone(), n);
     }

     pub fn set(&mut self, k: &GaugeKey, n: u64) {
@@ -44,7 +44,7 @@ impl Recorder {
             *curr = n;
             return;
         }
-        self.sample.gauges.insert(k.clone(), n);
+        //self.sample.gauges.insert(k.clone(), n);
     }

     pub fn add(&mut self, k: &StatKey, n: u64) {
@@ -55,17 +55,17 @@ impl Recorder {

         let mut vals = VecDeque::new();
         vals.push_back(n);
-        self.sample.stats.insert(k.clone(), vals);
+        //self.sample.stats.insert(k.clone(), vals);
     }
 }
 impl Drop for Recorder {
     fn drop(&mut self) {
         // Steal the sample from the recorder so we can give it to the channel without
         // copying.
-        let sample = mem::replace(&mut self.sample, Sample::default());
+        /*let sample = mem::replace(&mut self.sample, Sample::default());
         if mpsc::UnboundedSender::send(&self.tx, sample).is_err() {
             info!("dropping metrics");
-        }
+        }*/
     }
 }

iperf3 test causes a panic in linkerd-tcp

Here's my test setup:

iperf3 installed on on netty-test-8, proxy-test-4d, proxy-test-4e.

proxy-test-4e has an iperf server listening on 7474 using the command iperf3 -s -p 7474 -d
proxy-test-4d has a namerd and a linkerd-tcp.
netty-test-8 runs an iperf client: iperf3 -c proxy-test-4d -p 7474 -i 1 -t 120 -b 10M

I did not find an iperf speed that did not cause a panic or a drop to 0 in traffic after the first second.

The iperf client output

stevej@netty-test-8:~$ iperf3 -c proxy-test-4d -p 7474 -i 1 -t 120 -b 10M
Connecting to host proxy-test-4d, port 7474
[  4] local 10.240.0.4 port 37294 connected to 10.240.0.21 port 7474
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   497 KBytes  4.07 Mbits/sec    1    188 KBytes
[  4]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0    188 KBytes
[  4]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0    188 KBytes
[  4]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    0    188 KBytes
[  4]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0    188 KBytes
[  4]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0    188 KBytes
^C[  4]   6.00-6.55   sec  0.00 Bytes  0.00 bits/sec    0    188 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-6.55   sec   497 KBytes   622 Kbits/sec    1             sender
[  4]   0.00-6.55   sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

The panic in linkerd-tcp.

stevej@proxy-test-4d:~/src/linkerd-tcp$ RUST_BACKTRACE=full target/release/linkerd-tcp example.yml
Listening on http://0.0.0.0:9989.
thread 'main' panicked at 'destination and source slices have different lengths', /checkout/src/libcore/slice/mod
.rs:591
stack backtrace:
   0:     0x564fcbf38e63 - std::sys::imp::backtrace::tracing::imp::unwind_backtrace::h0c49f46a3545f908
                               at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:     0x564fcbf34cf4 - std::sys_common::backtrace::_print::hcef39a9816714c4c
                               at /checkout/src/libstd/sys_common/backtrace.rs:71
   2:     0x564fcbf3b7a7 - std::panicking::default_hook::{{closure}}::h7c3c94835e02f846
                               at /checkout/src/libstd/sys_common/backtrace.rs:60
                               at /checkout/src/libstd/panicking.rs:355
   3:     0x564fcbf3b32b - std::panicking::default_hook::h0bf7bc3112fb107d
                               at /checkout/src/libstd/panicking.rs:371
   4:     0x564fcbf3bc7b - std::panicking::rust_panic_with_hook::ha27630c950090fec
                               at /checkout/src/libstd/panicking.rs:549
   5:     0x564fcbf3bb54 - std::panicking::begin_panic::heb97fa3158b71158
                               at /checkout/src/libstd/panicking.rs:511
   6:     0x564fcbf3ba89 - std::panicking::begin_panic_fmt::h8144403278d84748
                               at /checkout/src/libstd/panicking.rs:495
   7:     0x564fcbf3ba17 - rust_begin_unwind
                               at /checkout/src/libstd/panicking.rs:471
   8:     0x564fcbf66efd - core::panicking::panic_fmt::h3b0cca53e68f9654
                               at /checkout/src/libcore/panicking.rs:69
   9:     0x564fcbf66e34 - core::panicking::panic::h4b991f5abe7d76d5
                               at /checkout/src/libcore/panicking.rs:49
  10:     0x564fcbdf5f7c - <linkerd_tcp::lb::proxy_stream::ProxyStream as futures::future::Future>::poll::h16aa78
271f3f436f
                               at /checkout/src/libcore/macros.rs:21
                               at /checkout/src/libcollections/slice.rs:1313
                               at /home/stevej/src/linkerd-tcp/src/lb/proxy_stream.rs:134
  11:     0x564fcbdf0a86 - <linkerd_tcp::lb::duplex::Duplex as futures::future::Future>::poll::h1c7157b5df76e8fa
                               at /home/stevej/src/linkerd-tcp/src/lb/duplex.rs:61
  12:     0x564fcbdf1ffb - linkerd_tcp::lb::endpoint::Endpoint::poll_connections::hd32a91fb1036e124
                               at /home/stevej/src/linkerd-tcp/src/lb/endpoint.rs:184
                               at /home/stevej/src/linkerd-tcp/src/lb/endpoint.rs:169
  13:     0x564fcbdb0b6d - <futures::sink::map_err::SinkMapErr<S, F> as futures::sink::Sink>::poll_complete::h8f7
93e2eb4aa3e54
                               at /home/stevej/src/linkerd-tcp/src/lb/balancer.rs:103
                               at /home/stevej/src/linkerd-tcp/src/lb/balancer.rs:392
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/sink/map_err.rs:29
  14:     0x564fcbdd5f31 - <linkerd_tcp::driver::Driver<S, K> as futures::future::Future>::poll::h2b4329b0261a742
3
                               at /home/stevej/src/linkerd-tcp/src/driver.rs:62
  15:     0x564fcbf0ace6 - tokio_core::reactor::Core::poll::hc95af26313ea35eb
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/future/mod.rs:106
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/task_impl/mod.rs:337
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/task_impl/mod.rs:484
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/task_impl/mod.rs:61
                               at /checkout/src/libstd/thread/local.rs:253
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/task_impl/mod.rs:54
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/task_impl/mod.rs:484
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.13/src
/task_impl/mod.rs:337
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-core-0.1.6/s
rc/reactor/mod.rs:366
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.0/s
rc/lib.rs:135
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-core-0.1.6/s
rc/reactor/mod.rs:366
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-core-0.1.6/s
rc/reactor/mod.rs:324
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-core-0.1.6/s
rc/reactor/mod.rs:312
  16:     0x564fcbc46040 - tokio_core::reactor::Core::run::h96e50dceabf7d65e
                               at /home/stevej/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-core-0.1.6/s
rc/reactor/mod.rs:249
  17:     0x564fcbc4dc77 - linkerd_tcp::main::h58ef6ee19fe9e824
                               at /home/stevej/src/linkerd-tcp/src/app/mod.rs:107
                               at /home/stevej/src/linkerd-tcp/src/main.rs:52
  18:     0x564fcbf3b975 - std::panicking::try::do_call::h689a21caeeef92aa
                               at /checkout/src/libcore/ops.rs:2606
                               at /checkout/src/libstd/panicking.rs:454
  19:     0x564fcbf42c6a - __rust_maybe_catch_panic
                               at /checkout/src/libpanic_unwind/lib.rs:98
 20:     0x564fcbf3c41a - std::rt::lang_start::hf63d494cb7dd034c
                               at /checkout/src/libstd/panicking.rs:433
                               at /checkout/src/libstd/panic.rs:361
                               at /checkout/src/libstd/rt.rs:57
  21:     0x7f0f6f2e53f0 - __libc_start_main
  22:     0x564fcbc45c59 - _start
  23:                0x0 - <unknown>

Idea: DNS-integrated dynamic configuration

Just an idea to consider, but if it was supported by Linkerd-tcp it would've been so amazing and would've switched to it immediately.

Imagine a host that has:

  • linkerd-tcp
  • some kind of lightweight DNS-server (like dnsmasq)
  • an application

Let's say the application needs to talk to MS SQL Server and we want to do that through linkerd-tcp. Instead of pre-configuring linkerd-tcp with an endpoint for that specific DB endpoint (if we want to use 1 instance for many apps or app talks to a lot of services/endpoints) the app can attempt to connect to something like __1433__.foobar-database.linkerd where .linkerd is just a TLD that we decided to use in this case. Our host will be configured to use dnsmasq running locally to resolve DNS names and the dnsmasq will be configured to forward requests for .linkerd to linkerd-tcp DNS interface running on some port on the localhost. When linkerd-tcp will get a DNS request for __1433__.foobar-database.linkerd it can setup a listener on port 1433 on some unique loopback IPv4 (there are millions of them since there is a huge subnet for loopback addresses like 127/24) like 127.213.233.132 and send that IPv4 as an A record with DNS reply.
Now the application will connect to the 127.213.233.132:1433 thinking it's an MS SQL Server instance and linkerd-tcp will be already listening there prepared to proxy traffic.

This is just an abstract idea and probably the "DNS schema" can designed more properly to allow "dtab overrides" and/or "per client routing".

My main goal here is to learn whether it's something you planing to do with linkerd-tcp or you have an alternative solution or it's something out of scope in your plans?

Vault-generated private keys throw FailedToConstructPrivateKey errors

Hello-

I've deployed linkerd as well as a couple sample applications with linkerd-tcp containers. We've successfully applied cfssl and openssl certificates and keys, but are having trouble with keys generated with Vault. Our pk8-encoded key will throw the error "WrongNumberOfKeysInPrivateKeyFile" and our regular .key file will throw "FailedToConstructPrivateKey" errors. There does not appear to be any errors with the certificates on the client side. We have similar vault-generated certs that work with our main linkerd service mesh.

client configuration:

    routers:
...
      client:
        kind: io.l5d.static
        configs:
        - prefix: /svc/server
          connectTimeoutMs: 400
          tls:
            dnsName: "server.default.svc.cluster.local"
            trustCerts:
            - /io.buoyant/linkerd/certs/tls.chain

server configuration:

    routers:
...
      servers:
      - ip: 0.0.0.0
        port: 7474
        dstName: /$/inet/127.1/80
        tls:
          defaultIdentity:
            privateKey: /io.buoyant/linkerd/certs/tls.key
            certs:
            - /io.buoyant/linkerd/certs/tls.crt

Is there any reason that this may be happening to Vault keys and not other private keys?

Properly close connections or implement connection reuse

The target server runs out of sockets due to TIME_WAIT and throughput drops

2016-11-22T22:13:20Z 379143/0/0 400000  94% 10s   0 [  3  11  20   64 ]   64 +
2016-11-22T22:13:30Z 378953/0/0 400000  94% 10s   0 [  3  11  19   74 ]   74
2016-11-22T22:13:40Z 386098/0/0 400000  96% 10s   0 [  3   9  16   51 ]   51
2016-11-22T22:13:50Z 369743/0/0 400000  92% 10s   0 [  3  11  19   74 ]   74
2016-11-22T22:14:00Z 382528/0/0 400000  95% 10s   0 [  2  10  25   73 ]   73
2016-11-22T22:14:10Z 397131/0/0 400000  99% 10s   0 [  1   7  12   41 ]   41
2016-11-22T22:14:20Z 393779/0/0 400000  98% 10s   0 [  2   7  14   53 ]   53
2016-11-22T22:14:30Z 392641/0/0 400000  98% 10s   0 [  2   8  16   85 ]   85
2016-11-22T22:14:40Z 390480/0/0 400000  97% 10s   0 [  2   8  18 1008 ] 1008
2016-11-22T22:14:50Z 397733/0/0 400000  99% 10s   0 [  2   6  11   35 ]   35
2016-11-22T22:15:00Z 388203/0/0 400000  97% 10s   0 [  2   9  19   54 ]   54
2016-11-22T22:15:10Z 378096/0/0 400000  94% 10s   0 [  3  13  22  127 ]  127
2016-11-22T22:15:20Z 388446/0/0 400000  97% 10s   0 [  2   9  14   73 ]   73
2016-11-22T22:15:30Z 387618/0/0 400000  96% 10s   0 [  3  10  16  102 ]  102
2016-11-22T22:15:40Z 388657/0/0 400000  97% 10s   0 [  3   9  15   39 ]   39
2016-11-22T22:15:50Z 213906/0/0 400000  53% 10s   0 [  4 120 129  227 ]  227
2016-11-22T22:16:00Z 208904/0/0 400000  52% 10s   0 [  4 120 128  208 ]  208
2016-11-22T22:16:10Z 204244/0/0 400000  51% 10s   0 [  4 123 132  203 ]  203
2016-11-22T22:16:20Z 209137/0/0 400000  52% 10s   0 [  4 121 131  191 ]  191
2016-11-22T22:16:30Z 201940/0/0 400000  50% 10s   0 [  5 123 135  401 ]  401
2016-11-22T22:16:40Z 202599/0/0 400000  50% 10s   0 [  4 122 133  403 ]  403
2016-11-22T22:16:50Z 201139/0/0 400000  50% 10s   0 [  5 121 132  192 ]  192
2016-11-22T22:17:00Z 203549/0/0 400000  50% 10s   0 [  5 121 131  385 ]  385
2016-11-22T22:17:10Z 255853/0/0 400000  63% 10s   0 [  4  22 129  427 ]  427
2016-11-22T22:17:20Z 190785/0/0 400000  47% 10s   0 [  5 125 139  538 ]  538

TCP SNI for Kubernetes?

Currently k8s ingress only supports HTTPS SNI.

I need k8s support for TLS SNI such that I can dynamically create TCP services with virtual server names and have a dynamically created TCP SNI reverse proxy dispatch connections to the correct k8s service.

I see that the linked-tcp beta is available and supports SNI. I see that linked-tcp integrates with the k8s API via namerd. I see some info on configuring namerd for k8s.

Since I’m hosting k8s on AWS, I’m assuming that the I would be using a loadbalancer service (that creates an ELB instance) as the internet entry point for TCP connections. This would load balance connections across instances of linked-tcp (that have been plumbed-into k8s via namerd).

What I don’t see is the full set of k8s resources that are required to get this to work.

Has anyone done this? What is the best way to get this configured?

linkerd-tcp connection reuse issue

Running a load test without connection reuse (this is part of my testing out the tacho OrderMap changes) and linkerd-tcp starts rejecting traffic after a thousand requests.

stevej@netty-test-8:~$ ./slow_cooker_linux_amd64 -noreuse -host 'default' -totalRequests 1000000 -qps 100 -concurrency 100 http://proxy-test-4d:7474/
# sending 10000 req/s with concurrency=100 to http://proxy-test-4d:7474/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-04-20T21:47:03Z  14105/0/0 100000  14% 10s   5 [ 44  47  48   54 ]   54 +
2017-04-20T21:47:13Z      0/0/0 100000   0% 10s   0 [  0   0   0    0 ]    0 -
2017-04-20T21:47:23Z   1039/0/0 100000   1% 10s  40 [ 54 15319 15327 15335 ] 15329 +++
2017-04-20T21:47:33Z    780/0/0 100000   0% 10s  40 [1060 3081 3083 3083 ] 3083
2017-04-20T21:47:43Z    659/0/0 100000   0% 10s  44 [1048 3073 3075 4073 ] 4073

CPU was at 1% and traffic was trickling through. Here's some example strace output

recvfrom(78, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(70, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(58, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(70, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(65, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(51, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(65, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(60, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(46, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(60, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(53, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(39, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(53, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(48, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(33, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(48, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(41, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(27, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(41, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(36, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(20, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(36, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(30, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(17, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(30, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(23, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 8192, 0, NULL, NULL) = 859
sendto(14, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 859, MSG_NOSIGNAL, NULL, 0) = 859
recvfrom(23, 0x7fea37c5f000, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

and an lsof report

stevej@proxy-test-4d:~$ sudo lsof -p 14178
COMMAND     PID   USER   FD      TYPE    DEVICE SIZE/OFF      NODE NAME
linkerd-t 14178 stevej  cwd       DIR       8,1     4096   1812484 /home/stevej/src/linkerd-tcp
linkerd-t 14178 stevej  rtd       DIR       8,1     4096         2 /
linkerd-t 14178 stevej  txt       REG       8,1  9851408   1816829 /home/stevej/src/linkerd-tcp/target/release/
linkerd-tcp
linkerd-t 14178 stevej  mem       REG       8,1  1088952      1907 /lib/x86_64-linux-gnu/libm-2.24.so
linkerd-t 14178 stevej  mem       REG       8,1  1856752      1903 /lib/x86_64-linux-gnu/libc-2.24.so
linkerd-t 14178 stevej  mem       REG       8,1    92552      1931 /lib/x86_64-linux-gnu/libgcc_s.so.1
linkerd-t 14178 stevej  mem       REG       8,1   142400      1918 /lib/x86_64-linux-gnu/libpthread-2.24.so
linkerd-t 14178 stevej  mem       REG       8,1    31712      1920 /lib/x86_64-linux-gnu/librt-2.24.so
linkerd-t 14178 stevej  mem       REG       8,1    14608      1906 /lib/x86_64-linux-gnu/libdl-2.24.so
linkerd-t 14178 stevej  mem       REG       8,1   158512      1899 /lib/x86_64-linux-gnu/ld-2.24.so
linkerd-t 14178 stevej    0u      CHR     136,3      0t0         6 /dev/pts/3
linkerd-t 14178 stevej    1u      CHR     136,3      0t0         6 /dev/pts/3
linkerd-t 14178 stevej    2u      CHR     136,3      0t0         6 /dev/pts/3
linkerd-t 14178 stevej    3u  a_inode      0,11        0      8469 [eventpoll]
linkerd-t 14178 stevej    4r     FIFO      0,10      0t0 908911732 pipe
linkerd-t 14178 stevej    5w     FIFO      0,10      0t0 908911732 pipe
linkerd-t 14178 stevej    6u     IPv4 908911733      0t0       TCP *:7474 (LISTEN)
linkerd-t 14178 stevej    7u  a_inode      0,11        0      8469 [eventpoll]
linkerd-t 14178 stevej    8r     FIFO      0,10      0t0 908910932 pipe
linkerd-t 14178 stevej    9w     FIFO      0,10      0t0 908910932 pipe
linkerd-t 14178 stevej   10u     IPv4 908911215      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33709 (ESTABLISHED)
linkerd-t 14178 stevej   11u     IPv4 908911205      0t0       TCP proxy-test-4d:54924->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej   12u     IPv4 908910933      0t0       TCP *:9989 (LISTEN)
linkerd-t 14178 stevej   13u     IPv4 908910934      0t0       TCP localhost:50620->localhost:4180 (ESTABLISHED
)
linkerd-t 14178 stevej   14u     IPv4 908910257      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33616 (ESTABLISHED)
linkerd-t 14178 stevej   15u     IPv4 908910258      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33604 (ESTABLISHED)
linkerd-t 14178 stevej   16u     IPv4 908910259      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33609 (ESTABLISHED)
linkerd-t 14178 stevej   17u     IPv4 908910260      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33622 (ESTABLISHED)
linkerd-t 14178 stevej   18u     IPv4 908910261      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33624 (ESTABLISHED)
linkerd-t 14178 stevej   19u     IPv4 908910262      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33620 (ESTABLISHED)
[cutting hundreds of other sockets that are open both upstream and downstream]
linkerd-t 14178 stevej  327u     IPv4 908911328      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33773 (ESTABLISHED)
linkerd-t 14178 stevej  328u     IPv4 908911331      0t0       TCP proxy-test-4d:42162->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  329u     IPv4 908911332      0t0       TCP proxy-test-4d:7474->netty-test-8.c[124/1893]osted.internal:33770 (ESTABLISHED)
linkerd-t 14178 stevej  330u     IPv4 908911333      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33772 (ESTABLISHED)
linkerd-t 14178 stevej  331u     IPv4 908911334      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33775 (ESTABLISHED)
linkerd-t 14178 stevej  332u     IPv4 908911335      0t0       TCP proxy-test-4d:38944->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  333u     IPv4 908911336      0t0       TCP proxy-test-4d:55042->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  334u     IPv4 908911337      0t0       TCP proxy-test-4d:42168->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  335u     IPv4 908911338      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33769 (ESTABLISHED)
linkerd-t 14178 stevej  336u     IPv4 908911339      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33771 (ESTABLISHED)
linkerd-t 14178 stevej  337u     IPv4 908911340      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33765 (ESTABLISHED)
linkerd-t 14178 stevej  338u     IPv4 908911343      0t0       TCP proxy-test-4d:38954->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  339u     IPv4 908911344      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33777 (ESTABLISHED)
linkerd-t 14178 stevej  340u     IPv4 908911345      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33778 (ESTABLISHED)
linkerd-t 14178 stevej  341u     IPv4 908911346      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33776 (ESTABLISHED)
linkerd-t 14178 stevej  342u     IPv4 908911347      0t0       TCP proxy-test-4d:38956->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  343u     IPv4 908911348      0t0       TCP proxy-test-4d:55054->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  344u     IPv4 908911349      0t0       TCP proxy-test-4d:42180->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  345u     IPv4 908911350      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33780 (ESTABLISHED)
linkerd-t 14178 stevej  346u     IPv4 908911351      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33782 (ESTABLISHED)
linkerd-t 14178 stevej  347u     IPv4 908911352      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33781 (ESTABLISHED)
linkerd-t 14178 stevej  348u     IPv4 908911353      0t0       TCP proxy-test-4d:42182->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  349u     IPv4 908911354      0t0       TCP proxy-test-4d:55060->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  350u     IPv4 908911355      0t0       TCP proxy-test-4d:38966->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  351u     IPv4 908911356      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33779 (ESTABLISHED)
linkerd-t 14178 stevej  352u     IPv4 908911357      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33785 (ESTABLISHED)
linkerd-t 14178 stevej  353u     IPv4 908911358      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33783 (ESTABLISHED)
linkerd-t 14178 stevej  354u     IPv4 908911359      0t0       TCP proxy-test-4d:38968->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  355u     IPv4 908911360      0t0       TCP proxy-test-4d:55066->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  356u     IPv4 908911361      0t0       TCP proxy-test-4d:42192->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  357u     IPv4 908911362      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33784 (ESTABLISHED)
linkerd-t 14178 stevej  358u     IPv4 908911363      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33790 (ESTABLISHED)
linkerd-t 14178 stevej  359u     IPv4 908911364      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33787 (ESTABLISHED)
linkerd-t 14178 stevej  360u     IPv4 908911365      0t0       TCP proxy-test-4d:55070->perf-target-4.[62/1893]-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  361u     IPv4 908911366      0t0       TCP proxy-test-4d:38976->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  362u     IPv4 908911367      0t0       TCP proxy-test-4d:42198->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  363u     IPv4 908911368      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33788 (ESTABLISHED)
linkerd-t 14178 stevej  364u     IPv4 908911369      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33789 (ESTABLISHED)
linkerd-t 14178 stevej  365u     IPv4 908911370      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33774 (ESTABLISHED)
linkerd-t 14178 stevej  366u     IPv4 908911371      0t0       TCP proxy-test-4d:55076->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  367u     IPv4 908911372      0t0       TCP proxy-test-4d:38982->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  368u     IPv4 908911373      0t0       TCP proxy-test-4d:42204->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  369u     IPv4 908911374      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33793 (ESTABLISHED)
linkerd-t 14178 stevej  370u     IPv4 908911375      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33792 (ESTABLISHED)
linkerd-t 14178 stevej  371u     IPv4 908911376      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33786 (ESTABLISHED)
linkerd-t 14178 stevej  372u     IPv4 908911377      0t0       TCP proxy-test-4d:55082->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  373u     IPv4 908911378      0t0       TCP proxy-test-4d:38988->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  374u     IPv4 908911379      0t0       TCP proxy-test-4d:42210->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  375u     IPv4 908911380      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33794 (ESTABLISHED)
linkerd-t 14178 stevej  376u     IPv4 908911381      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33797 (ESTABLISHED)
linkerd-t 14178 stevej  377u     IPv4 908911382      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33798 (ESTABLISHED)
linkerd-t 14178 stevej  378u     IPv4 908911383      0t0       TCP proxy-test-4d:42212->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  379u     IPv4 908911384      0t0       TCP proxy-test-4d:55090->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  380u     IPv4 908911385      0t0       TCP proxy-test-4d:38996->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  381u     IPv4 908911386      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33801 (ESTABLISHED)
linkerd-t 14178 stevej  382u     IPv4 908911387      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33799 (ESTABLISHED)
linkerd-t 14178 stevej  383u     IPv4 908911388      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33795 (ESTABLISHED)
linkerd-t 14178 stevej  384u     IPv4 908911389      0t0       TCP proxy-test-4d:38998->perf-target-6.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  385u     IPv4 908911390      0t0       TCP proxy-test-4d:55096->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  386u     IPv4 908911391      0t0       TCP proxy-test-4d:42222->perf-target-5.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  387u     IPv4 908911392      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33791 (ESTABLISHED)
linkerd-t 14178 stevej  388u     IPv4 908911393      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33796 (ESTABLISHED)
linkerd-t 14178 stevej  389u     IPv4 908911394      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-h
osted.internal:33800 (ESTABLISHED)
linkerd-t 14178 stevej  390u     IPv4 908911395      0t0       TCP proxy-test-4d:55100->perf-target-4.c.buoyant
-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  391u     IPv4 908911396      0t0       TCP proxy-test-4d:39006->perf-target-6.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  392u     IPv4 908911397      0t0       TCP proxy-test-4d:42228->perf-target-5.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  393u     IPv4 908911398      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:33802 (ESTABLISHED)
linkerd-t 14178 stevej  394u     IPv4 908911399      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:33807 (ESTABLISHED)
linkerd-t 14178 stevej  395u     IPv4 908911400      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:33803 (ESTABLISHED)
linkerd-t 14178 stevej  396u     IPv4 908911401      0t0       TCP proxy-test-4d:55106->perf-target-4.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  397u     IPv4 908911402      0t0       TCP proxy-test-4d:39012->perf-target-6.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  398u     IPv4 908911403      0t0       TCP proxy-test-4d:42234->perf-target-5.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  399u     IPv4 908911404      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:33805 (ESTABLISHED)
linkerd-t 14178 stevej  400u     IPv4 908911405      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:33806 (ESTABLISHED)
linkerd-t 14178 stevej  401u     IPv4 908911406      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:33804 (ESTABLISHED)
linkerd-t 14178 stevej  402u     IPv4 908911407      0t0       TCP proxy-test-4d:55112->perf-target-4.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  403u     IPv4 908911408      0t0       TCP proxy-test-4d:39018->perf-target-6.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  404u     IPv4 908911409      0t0       TCP proxy-test-4d:42240->perf-target-5.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  405u     IPv4 908911410      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:33808 (ESTABLISHED)
linkerd-t 14178 stevej  406u     IPv4 909244034      0t0       TCP proxy-test-4d:44928->perf-target-5.c.buoyant-hosted.internal:http-alt (SYN_SENT)
linkerd-t 14178 stevej  407u     sock       0,8      0t0 909253650 protocol: TCP
linkerd-t 14178 stevej  408u     IPv4 908911413      0t0       TCP proxy-test-4d:39022->perf-target-6.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  409u     IPv4 908911414      0t0       TCP proxy-test-4d:55120->perf-target-4.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  410u     IPv4 908911415      0t0       TCP proxy-test-4d:42246->perf-target-5.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  411u     IPv4 908911417      0t0       TCP proxy-test-4d:42248->perf-target-5.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  412u     IPv4 908911418      0t0       TCP proxy-test-4d:55126->perf-target-4.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  413u     IPv4 908911419      0t0       TCP proxy-test-4d:39032->perf-target-6.c.buoyant-hosted.internal:http-alt (ESTABLISHED)
linkerd-t 14178 stevej  414u     sock       0,8      0t0 909253652 protocol: TCP
linkerd-t 14178 stevej  415u     sock       0,8      0t0 909253654 protocol: TCP
linkerd-t 14178 stevej  416u     sock       0,8      0t0 909253637 protocol: TCP
linkerd-t 14178 stevej  417u  unknown                              /proc/14178/fd/417 (readlink: No such file or directory)
linkerd-t 14178 stevej  418u     sock       0,8      0t0 909253639 protocol: TCP
linkerd-t 14178 stevej  419u     sock       0,8      0t0 909252607 protocol: TCP
linkerd-t 14178 stevej  420u     sock       0,8      0t0 909253647 protocol: TCP
linkerd-t 14178 stevej  421u     sock       0,8      0t0 909253645 protocol: TCP
linkerd-t 14178 stevej  422u     sock       0,8      0t0 909253649 protocol: TCP
linkerd-t 14178 stevej  423u     IPv4 909235806      0t0       TCP proxy-test-4d:33498->perf-target-6.c.buoyant-hosted.internal:http-alt (SYN_SENT)
linkerd-t 14178 stevej  424u     sock       0,8      0t0 909253641 protocol: TCP
linkerd-t 14178 stevej  425u     sock       0,8      0t0 909253643 protocol: TCP
linkerd-t 14178 stevej  427u     sock       0,8      0t0 909253653 protocol: TCP
linkerd-t 14178 stevej  428u     IPv4 909253633      0t0       TCP proxy-test-4d:7474->netty-test-8.c.buoyant-hosted.internal:39840 (ESTABLISHED)
linkerd-t 14178 stevej  429u     sock       0,8      0t0 909253635 protocol: TCP

Don't use rustls

Its still under development and not fully vetted.

(I contribute from time to time) so I don't want to come down harsh on the Rustls project just throwing it into production isn't a great idea at this point.

Improve namerd and bad path error handling

This morning I spent about 30 minutes debugging why requests were hanging. It turns out I forgot to add a path so namerd had nothing to resolve. I'm going to add my notes below to remind us how difficult it can be to debug issues with linkerd-tcp today.

Problem:
When no path is specified, requests to linkerd-tcp hang indefinitely.

While setting up a new linkerd-tcp, I noticed that requests were failing. (Later Oliver pointed out that my test requests failed to specify a 'Host' header so namerd resolutions were failing.)

Watching the process with strace I noticed the following behavior.

bind(6, {sa_family=AF_INET, sin_port=htons(7474), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(6, 1024)                         = 0
ioctl(6, FIONBIO, [1])
epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2, u64=2}}) = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674480478}) = 0
epoll_wait(3, [], 1024, 0)              = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674517778}) = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674543177}) = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674570803}) = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674590894}) = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674611178}) = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674630385}) = 0
epoll_wait(3, [], 1024, 0)              = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674659413}) = 0
clock_gettime(CLOCK_MONOTONIC, {165866, 674678174}) = 0
epoll_wait(3, Listening on http://0.0.0.0:9989.
[{EPOLLIN, {u32=2, u64=2}}], 1024, -1) = 1
clock_gettime(CLOCK_MONOTONIC, {165947, 817271282}) = 0
write(5, "\1", 1)                       = 1
clock_gettime(CLOCK_MONOTONIC, {165947, 817361718}) = 0
epoll_wait(3, [{EPOLLIN, {u32=4294967295, u64=18446744073709551615}}], 1024, 0) = 1
read(4, "\1", 128)                      = 1
read(4, 0x7fff4e9c86c0, 128)            = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {165947, 817751478}) = 0
accept4(6, {sa_family=AF_INET, sin_port=htons(47002), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_CLOEXEC) = $
2
ioctl(12, FIONBIO, [1])                 = 0
epoll_ctl(3, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=4, u64=4}}) = 0
write(9, "\1", 1)                       = 1
accept4(6, 0x7fff4e9c7c00, 0x7fff4e9c7cd4, SOCK_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {165947, 818158355}) = 0
epoll_wait(3, [{EPOLLOUT, {u32=4, u64=4}}], 1024, 0) = 1
clock_gettime(CLOCK_MONOTONIC, {165947, 818226082}) = 0
clock_gettime(CLOCK_MONOTONIC, {165947, 818257060}) = 0
clock_gettime(CLOCK_MONOTONIC, {165947, 818292018}) = 0
write(9, "\1", 1)                       = 1
clock_gettime(CLOCK_MONOTONIC, {165947, 818508486}) = 0
clock_gettime(CLOCK_MONOTONIC, {165947, 818538988}) = 0
write(9, "\1", 1)                       = 1
clock_gettime(CLOCK_MONOTONIC, {165947, 818601446}) = 0
clock_gettime(CLOCK_MONOTONIC, {165947, 818636679}) = 0
write(9, "\1", 1)                       = 1
clock_gettime(CLOCK_MONOTONIC, {165947, 818711174}) = 0
clock_gettime(CLOCK_MONOTONIC, {165947, 818736119}) = 0
write(9, "\1", 1)                       = 1
clock_gettime(CLOCK_MONOTONIC, {165947, 818813994}) = 0
epoll_wait(3, [], 1024, 0)              = 0
clock_gettime(CLOCK_MONOTONIC, {165947, 818871606}) = 0
clock_gettime(CLOCK_MONOTONIC, {165947, 818899081}) = 0
epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=4, u64=4}}], 1024, -1) = 1
clock_gettime(CLOCK_MONOTONIC, {165950, 323825513}) = 0
clock_gettime(CLOCK_MONOTONIC, {165950, 324029051}) = 0
epoll_wait(3, [{EPOLLIN|EPOLLOUT, {u32=4, u64=4}}], 1024, -1) = 1
clock_gettime(CLOCK_MONOTONIC, {165950, 443832441}) = 0
clock_gettime(CLOCK_MONOTONIC, {165950, 443949129}) = 0

The :7474 listener is binding properly but using telnet to issue an HTTP request resulted in no response.

$ telnet localhost 7474
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.0

lsof output while that HTTP request is in-flight.

COMMAND    PID   USER   FD      TYPE DEVICE SIZE/OFF   NODE NAME
linkerd-t 4580 stevej  cwd       DIR    8,1     4096 270339 /home/stevej/src/linkerd-tcp
linkerd-t 4580 stevej  rtd       DIR    8,1     4096      2 /
linkerd-t 4580 stevej  txt       REG    8,1  9258304 271179 /home/stevej/src/linkerd-tcp/target/release/linkerd-tcp
linkerd-t 4580 stevej  mem       REG    8,1  1088952   1907 /lib/x86_64-linux-gnu/libm-2.24.so
linkerd-t 4580 stevej  mem       REG    8,1  1856752   1903 /lib/x86_64-linux-gnu/libc-2.24.so
linkerd-t 4580 stevej  mem       REG    8,1    92552   1931 /lib/x86_64-linux-gnu/libgcc_s.so.1
linkerd-t 4580 stevej  mem       REG    8,1   142400   1918 /lib/x86_64-linux-gnu/libpthread-2.24.so
linkerd-t 4580 stevej  mem       REG    8,1    31712   1920 /lib/x86_64-linux-gnu/librt-2.24.so
linkerd-t 4580 stevej  mem       REG    8,1    14608   1906 /lib/x86_64-linux-gnu/libdl-2.24.so
linkerd-t 4580 stevej  mem       REG    8,1   158512   1899 /lib/x86_64-linux-gnu/ld-2.24.so
linkerd-t 4580 stevej    0u      CHR  136,1      0t0      4 /dev/pts/1
linkerd-t 4580 stevej    1u      CHR  136,1      0t0      4 /dev/pts/1
linkerd-t 4580 stevej    2u      CHR  136,1      0t0      4 /dev/pts/1
linkerd-t 4580 stevej    3u  a_inode   0,11        0   9463 [eventpoll]
linkerd-t 4580 stevej    4r     FIFO   0,10      0t0  89197 pipe
linkerd-t 4580 stevej    5w     FIFO   0,10      0t0  89197 pipe
linkerd-t 4580 stevej    6u     IPv4  89198      0t0    TCP *:7474 (LISTEN)
linkerd-t 4580 stevej    7u  a_inode   0,11        0   9463 [eventpoll]
linkerd-t 4580 stevej    8r     FIFO   0,10      0t0  87890 pipe
linkerd-t 4580 stevej    9w     FIFO   0,10      0t0  87890 pipe
linkerd-t 4580 stevej   10u     IPv4  87891      0t0    TCP *:9989 (LISTEN)
linkerd-t 4580 stevej   11u     IPv4  87892      0t0    TCP localhost:54064->localhost:4180 (ESTABLISHED)
linkerd-t 4580 stevej   12u     IPv4  88470      0t0    TCP localhost:7474->localhost:47002 (ESTABLISHED)

There is no open socket to any of the perf-cluster hosts.

Running linkerd-tcp with trace logging reveals the culprit: we're failing to parse namerd responses but not failing the request.

TRACE:hyper::http::conn                         : Conn::flush = Ok(Ready(()))
TRACE:tokio_core::reactor                       : event Ready {Readable} Token(1)
DEBUG:tokio_core::reactor                       : loop process - 2 events, Duration { secs: 0, nanos: 85366 }
TRACE:linkerd_tcp::app                          : polling 4 running
TRACE:linkerd_tcp::app                          : polling runner 0
TRACE:hyper::client::response                   : Response::new
DEBUG:hyper::client::response                   : version=Http11, status=Ok
DEBUG:hyper::client::response                   : headers={"Content-Type": "application/json", "Content-Length": "15"}
TRACE:linkerd_tcp::namerd                       : parsing namerd response
INFO :linkerd_tcp::namerd                       : error parsing response: missing field `addrs` at line 1 column 14

Oliver explained that this behavior is caused by issuing a request with no path for the router to process. Since we don't specify a default path for now, we do a lookup to namerd with an empty path and that fails.

I think we should address this in two ways:

  1. Change namerd parsing errors to be at the error! level so a user will more quickly see the error.
  2. Fast fail requests lacking a path until we decide to specify a default path in the configuration file or add dtab support.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.