linkerd / linkerd Goto Github PK

Old repo for Linkerd 1.x. See the linkerd2 repo for Linkerd 2.x.

License: Apache License 2.0

Scala 87.08% CSS 0.32% JavaScript 11.06% Shell 0.51% Thrift 0.12% Java 0.45% Handlebars 0.46%

cloud-native service-mesh linkerd service-discovery

linkerd's Introduction

This repo is for the 1.x version of Linkerd. Feature development is now happening in the linkerd2 repo. This repo is currently only used for periodic maintenance releases of Linkerd 1.x.

Linkerd 1.x (pronounced "linker-DEE") acts as a transparent HTTP/gRPC/thrift/etc proxy, and can usually be dropped into existing applications with a minimum of configuration, regardless of what language they're written in. It works with many common protocols and service discovery backends, including scheduled environments like Nomad, Mesos and Kubernetes.

Linkerd is built on top of Netty and Finagle, a production-tested RPC framework used by high-traffic companies like Twitter, Pinterest, Tumblr, PagerDuty, and others.

Linkerd is hosted by the Cloud Native Computing Foundation (CNCF).

Want to try it?

We distribute binaries which you can download from the Linkerd releases page. We also publish Docker images for each release, which you can find on Docker Hub.

For instructions on how to configure and run Linkerd, see the 1.x user documentation on linkerd.io.

Working in this repo

BUILD.md includes general information on how to work in this repo. Additionally, there are documents on how to build several of the application subprojects:

linkerd -- produces linkerd router artifacts
namerd -- produces namerd service discovery artifacts
grpc -- produces the protoc-gen-io.buoyant.grpc code generator

We ❤️ pull requests! See CONTRIBUTING.md for info on contributing changes.

Related Repos

linkerd2: The main repo for Linkerd 2.x and where current development is happening.
linkerd-examples: A variety of configuration examples and explanations
linkerd-tcp: A lightweight TCP/TLS load balancer that uses Namerd
linkerd-viz: Zero-configuration service dashboard for Linkerd
linkerd-zipkin: Zipkin tracing plugins
namerctl: A commandline utility for controlling Namerd

Code of Conduct

This project is for everyone. We ask that our users and contributors take a few minutes to review our code of conduct.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

linkerd's People

Contributors

Stargazers

Watchers

Forkers

siggy ssalevan liamstewart michealzh kakamessi99 yanlinaung emaxerrno lzgscode apakulov zackangelo tharanga-abeyseela thinker0 mmm8955405 pratikmallya gnomix linearregression rhoml zogwei benley mnjstwins topiaruss sesas smazumder05 cigolabs generalhenry he-pin lsjostro kangzhenkang forthy obeattie jorgebastida zhoudaqing abovex x1957 krux monzo tomzhang njohns-grovo sreecha clhodapp ashald medallia mkhq artpar ahuachen logcos cadelaren devopsbox justinvenus oleksandrberezianskyi penland365 fantayeneh shshe askagirl manishmaheshwari andersschuller endzyme izogain christian-posta hafeez3000 maxfalstein containerz m1lan satybald sevein everesio number0 libby saurabh8380 dmexe jhayotte dschobel jdubs leogomes wuerping maniacs-ops vincentchen gbraad mgedigian-fitbit stvndall alapini andrewrothstein leozc akreiling joerg84 ragil markeijsermans afeinberg izagz chakra-coder iuliandumitru alexlafreniere viglesiasce kapoora evie404 sohailalam2 ewilde leochencipher bdevetak jpkrohling

linkerd's Issues

thriftProtocol:compact not compatible with thriftMethodInDst:true

When a thrift router is configured with thriftMethodInDst: true, the thrift identifier for that router always uses a binary protocol factory to read the method name from the thrift request, regardless of whether or not the thriftProtocol server option has been set. This causes a deserialization error when thriftProtocol is set to "compact". Here's the problematic code:

https://github.com/BuoyantIO/linkerd/blob/master/router/thrift/src/main/scala/io/buoyant/router/thrift/Identifier.scala#L18

And here's the error that is thrown:

org.apache.thrift.protocol.TProtocolException: Bad version in readMessageBegin
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:208)
    at io.buoyant.router.thrift.Identifier.suffix(Identifier.scala:20)
    at io.buoyant.router.thrift.Identifier.apply(Identifier.scala:28)
    at io.buoyant.router.thrift.Identifier.apply(Identifier.scala:10)
    at io.buoyant.router.RoutingFactory$RoutingService.apply(RoutingFactory.scala:83)
    at com.twitter.finagle.Service$$anon$2.apply(Service.scala:16)
    at io.buoyant.router.Thrift$Router$IngestingFilter$.apply(Thrift.scala:43)
    at io.buoyant.router.Thrift$Router$IngestingFilter$.apply(Thrift.scala:41)
    ...

AccessLogger generating too much garbage

in a recent stress test, I noticed that AccessLogger doubled memory usage and lowered throughput by 15%. This was under a load of 30k qps in an 8-core VM on GKE.

I'll dig in with some benchmarks and profiling and find a way to minimize this cost.

Memory usage was measured by docker stats in a kubelet.

Avoid possible Mux bug in Finagle 6.33

Per this mailing list thread, Finagle 6.33 may have introduced a bug in the Mux protocol. We can either build against head Finagle, or revert to a pre-6.33 version.

make debug-logging tracer configurable

Finagle's com.twitter.finagle.tracing.debugTrace=true flag is very useful for debugging linkerd behavior locally. Support this in the tracers configuration.

Pluggable HTTP request identifiers

Currently, a router's Identifier[Req] may be configurable, but it is not possible to replace the request identifier entirely. Supporting a pluggable Identifier type, at least for HTTP, would allow users to write arbitrary request identifiers to inspect authentication, application-specific cookies, or even payloads.

These plugins must be protocol-specific and, at least initially, probably only make sense for HTTP.

Upgrade to finagle 6.34

Finagle 6.34 introduces breaking changes that will require changes throughout linkerd.

Blocked on the finagle 6.34 release.

support grouped app ids in the marathon namer

We currently do not have a good way to map a hostname to a marathon app id containing slashes.

Given an app ID like testing/hello, we should provide a way for an http request to have a host header like hello.testing to route to this marathon app.

I think that the marathon namer should be changed to admit slashed names, rather than requiring that app ids be a single path element.

expose per-endpoint stats in admin interface

Finagle can (optionally) record per-endpoint statistics. We should add a router-client-level configuration parameter to enable recording of these metrics, and integrate them into the admin UI.

These metrics should be disabled by default because they can be quite costly performance-wise (esp for large clusters), but it's a great feature to have when individual backends matter. In larger clusters, tracing is probably a better way to attack this, anyway.

Introduce announcer initialization module api

In the same way that we load protocol and namer support at runtime, we should introduce a pluggable Announcer api. The Linkerd main should load a list of io.bouyant.linkerd.AnnouncerInitializer modules. Something to the effect of:

trait AnnouncerInitializer {
  def kind: String
  def announce(server: ListeningServer): Closable
}

Then, the configuration format should be extended to admit a list of announcers on servers. Supporting a list of announcers is critical to "multiple registration" schemes, which become very important during migrations (i.e. between zk clusters, or from one sd backend to another, etc).

Furthermore, in this announcing regime, we are not supporting announcing of a service's port, but of the router's server (which will forward to the server's port). This primarily benefits linker-to-linker configurations like:

For example

routers:
- protocol: http
  label: upstream
  baseDtab: /http => /$/inet/127.1/8881
  servers:
  - port: 8080
    ip: 0.0.0.0
    - announcers:
      - kind: io.l5d.serverset
        path: /services/userservice
        zkAddrs:
        - host: zkcluster.buoyant.io
      - kind: io.l5d.serverset
        path: /services/userservice
        zkAddrs:
        - host: otherzkcluster.buoyant.io

- protocol: http
  label: downstream
  baseDtab: ...
  servers:
  - port: 4140

Router TLS configuration

Once #21 is merged, we'll need a way to configure clientside TLS on the router. I can think of at least 2 or 3 different configurations we'll want to support, so this should be modular (and perhaps pluggable, so that users may implement their own crazy policies).

Static validation

routers:
- protocol: http
  tls:
    kind: static
    name: buoyant.io

Per-service validation

routers:
- protocol: http
  baseDtab: /http/1.1/GET => /io.l5d.k8s/default/http
  tls:
    kind: boundPath
    certPath: /path/to/ca.pem
    matches:
    - prefix: /io.l5d.k8s
      name: ${3}.buoyant.io

The matchers should:

extract the Name.Bound.id (we may want to explicitly put this on the stack rather than relying on BindingFactory.Dest).
Path.read(Name.Bound.id)
If a prefix matches:
- strip the prefix
- compute offsets in name against the path post-prefix-stripping. If the offset is out of range, the match should not apply
if any of these steps fail or a match is not found, TLS should not be applied
multiple matches should be able to specify overlapping prefixes

Parsing the name values may be slightly tricky, but we can probably write a parser-generator to do this.

No validation

routers:
- protocol: http
  baseDtab: /http/1.1/GET => /io.l5d.k8s/default/http
  tls:
    kind: noValidationYoloSwag

(And, yes, i think that's what that option should be called.)

Per-host validation

We can probably punt this down the road, but, we can use the IP or Hostname of the individual destination address as the peer's name.

long metric names cut off in metrics browser UI

Some of the metric names are very long, and there's no way of actually seeing the entire metric name in the metrics browser without being able to resize (or at a minimum, horizontally scroll) the left-hand pane.

Add tests for the admin javascript

To prevent issues like #119, it would be good if we had some level of automated testing for our admin UI. I think unit tests would be sufficient here (rather than something like Webdriver).

Fix bootstrap responsiveness in admin page

The admin page navigation does not render on narrow screens:

Allow control over thrift protocol upgrade

Wire the AttemptTTwitterUpgrade param through our configs.

See: twitter/finagle@563264d

modify admin page to look like linkerd.io

https://linkerd.io and the linkerd admin page (commonly found at http://localhost:9090) should appear similar in design.

https://linkerd.io

admin:

detect loops in dtab playground

Dtab loops can be hard to debug. I think we can flag probable loops with this one weird trick:

If a given dentry applies more than once in a resolution, it's probably a loop

Need to think about weird corner cases for this but it would be helpful.

Improve dtab ergonomics for dropping path segments

/http/1.1 => /method ;
/method => /$/io.buoyant.http.anyMethodPfx/host ;
/host => /$/io.buoyant.http.anyHostPfx/local ;
/local => /$/inet/127.1/8080 ;

This is verbose, hard to understand, and hard to maintain. It would be great if we had some kind of syntactic sugar to make this easier.

/http/1.1/*/* => /$/inet/127.1/8080

configurable load balancing strategies

configurable client and server socket pools

It's trivial to create a traffic loop:

while curl -H 'Host: woop' -H 'Dtab-local: /host/woop=>/$/inet/127.1/4140' localhost:4140 ; do :; done

💀 system runs out of file descriptors! 🔫

We need to consider other means to prevent/avoid loops, but this highlights the need to be able to constrain:

the number of sockets that a server may accept concurrently.
the number of sockets that a single client (not router) may open.

Finagle has stack parameters to control these. We need to ensure that there are sane default values, and expose a means to configure these parameters.

remove vestigal `/admin` UI

Functionality is now in /, and things have changed sufficiently that this endpoint doesn't even do anything, just shows a empty UI.

Duplicate trace annotations when L5D-Sample header set

I've running linkerd with an http router configured to route requests received on port 4140 to a "web" service running on port 7000. When I make a request with the L5D-Sample: 1 header set, I see duplicate trace annotations as follows:

$ curl -H 'Host: web' -H 'L5d-sample: 1' http://127.1:4140/ping
...
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 ServiceName(0.0.0.0/4140)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 BinaryAnnotation(srv/finagle.version,6.31.0)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 ServerRecv()
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 LocalAddr(/127.0.0.1:4140)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 ServerAddr(/127.0.0.1:4140)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 ClientAddr(/127.0.0.1:49600)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 BinaryAnnotation(namer.dtab.base,/status=>/$/io.l5d.http.status;/www=>/host/web;/host=>/$/io.buoyant.http.anyHostPfx/www;/host=>/io.l5d.fs;/method=>/$/io.buoyant.http.anyMethodPfx/host;/http/1.1=>/method)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 BinaryAnnotation(namer.dtab.local,)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 BinaryAnnotation(namer.path,/http/1.1/GET/web)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 Message(namer.success)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ServiceName(io.l5d.fs/web)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 BinaryAnnotation(dst.id,/io.l5d.fs/web)
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 BinaryAnnotation(dst.path,/)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ServiceName(io.l5d.fs/web)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(clnt/finagle.version,6.31.0)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(clnt/finagle.version,6.31.0)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ClientSend()
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ClientSend()
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.method,GET)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.method,GET)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.uri,/ping)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.uri,/ping)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.host,web)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.host,web)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.version,HTTP/1.1)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.version,HTTP/1.1)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 WireSend
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 WireSend
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ServerAddr(/127.0.0.1:7000)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ServerAddr(/127.0.0.1:7000)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ClientAddr(/127.0.0.1:49588)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ClientAddr(/127.0.0.1:49588)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 WireRecv
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 WireRecv
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.status,200)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.status,200)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.version,HTTP/1.1)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.version,HTTP/1.1)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.content-length,4)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.content-length,4)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.content-type,text/plain)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 BinaryAnnotation(http.content-type,text/plain)
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ClientRecv()
bc2f33bcda4be209.fcca4401059c9b52<:bc2f33bcda4be209 ClientRecv()
bc2f33bcda4be209.bc2f33bcda4be209<:bc2f33bcda4be209 ServerSend()

use events for Marathon namer

The current Marathon namer polls the /v2/apps Marathon API endpoint:
https://github.com/BuoyantIO/linkerd/blob/master/marathon/src/main/scala/io/buoyant/marathon/v2/Api.scala

Using the Events API is a better solution:
https://mesosphere.github.io/marathon/docs/rest-api.html#get-v2-events

linkerd-0.0.8

Introduce a changelog to the linkerd repo.
remove -SNAPSHOT, tag, push tag, bump to 0.0.9-SNAPSHOT
build artifacts from tag (minimal:assembly, bundle:assembly), manually upload to release.

L5D-Sample header is checked against static value

The L5D-Sample header should allow you to specify a sample rate, such that a corresponding percentage of requests with that header are sampled. For instance, if I set L5D-Sample: 0.2 on 100 requests, I'd expect roughly 20 requests to be sampled.

Instead, it looks like the value of that header is being checked against a float value that remains constant for the lifetime of the linkerd process. For instance, if the threshold that's determined when linkerd starts is 0.35, 0% of requests with the L5D-Sample: 0.3 header will be sampled, and 100% of requests with the L5D-Sample: 0.4 header will be sampled. When the process restarts it picks up a new threshold at random.

graph multiple stats on admin homepage

The admin homepage currently graphs the sum of all requests. It may be more useful to graph each request individually in the same chart.

establish linkerd "process path" ids

For the purposes of debugging (and later, for control), we need a way to uniquely identify individual linkerd instances.

Process paths should be specified in the linkerd's runtime environment or generated by the linkerd instance at startup time. Process paths should provide debugging environment specific information (host names, namespaces, whatever is available). They may contain UUIDs if uniqueness cannot be guaranteed otherwise.

Warn on mismatched thriftFramed settings

two issues:

thriftFramed defaults to true at both router-level and server-level.
setting thriftFramed to false at the router level does not affect server-level. I'd expect the server to honor whatever is set on the router, unless thriftFramed is explicitly set on the server to override.

Problems with stopping

linkerd doesn't seem to daemonize properly. Maybe that isn't the right term, but when I execute ./linkerd start, I get a message that it has started. However, when I attempt to stop linkerd, it tells me it isn't running.

ADMIN_PORT=$(grep -A1 '^admin:' config/linkerd.yaml | grep 'port:' | awk '{print ":" $2}') // :9990 by default
curl -s $ADMIN_PORT/admin/ping 
echo $?  // returns 6

Prepending localhost in the ADMIN_PORT assignment seems to fix the issue.

ADMIN_PORT=$(grep -A1 '^admin:' config/linkerd.yaml | grep 'port:' | awk '{print "localhost:" $2}')

Add "client" section to router configs

Once #73 ships, we'll have a few configuration options at the top level of our router config that are also available within the servers section of the same router config. The options at the top level apply to clients that are dynamically created by the router, whereas the options in the servers section apply to servers that are created when the router starts. For clarity, we should move the client-only top level options into a client config section.

Previously:

routers:
- protocol: thrift
  baseDtab: |
    /thrift => /io.l5d.fs/aservice;
  thriftFramed: false
  tls:
    kind: io.l5d.NoValidationTlsClient
  servers:
  - port: 4114
    ip: 0.0.0.0
    thriftFramed: false
    tls:
      certPath: /foo/cacert.pem
      keyPath: /foo/private/cakey.pem

Proposed future:

routers:
- protocol: thrift
  baseDtab: |
    /thrift => /io.l5d.fs/aservice;
  client:
    thriftFramed: false
    tls:
      kind: io.l5d.NoValidationTlsClient
  servers:
  - port: 4114
    ip: 0.0.0.0
    thriftFramed: false
    tls:
      certPath: /foo/cacert.pem
      keyPath: /foo/private/cakey.pem

Kubernetes namespace watches sometimes hang indefinitely

We've observed a bug in the kubernetes namer where name resolution (and requests) hang indefinitely when initializing a watch on a second namespace.

For example, in a test cluster, we launch an application into the prod namespace and do continuous deployment to a pre-prod namespace that may be accessed with an override.

The configuration contains:

namers:
- kind: io.l5d.experimental.k8s
  prefix: /ns
  tlsWithoutValidation: true
  authTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

and a base dtab like:

  /iface => /io.l5d.k8s/prod;
  /method => /$/io.buoyant.http.anyMethodPfx/iface ;
  /http/1.1 => /method ;
  ...

Requests are routed to the pre-prod namespace with an override like:

/iface => /io.l5d.k8s/pre-prod | /$/fail;

Initially, we observe requests through prod operating as expected, resolving through the k8s namer:

D 0126 23:45:28.991 THREAD29: k8s initializing service0
...
D 0126 23:45:29.327 THREAD30: k8s ns prod initial state: service0
...
D 0126 23:45:42.955 THREAD30: k8s lookup: /pre-prod/http/service0 /pre-prod/http/service0
D 0126 23:45:42.956 THREAD30: k8s initializing pre-prod

And we see no further k8s logging after this. Furthermore, because the namer never satisfies this lookup, the request never succeeds and remains pending until the client disconnects

Something appears to be wrong in the kubernetes namer that prevents this second watch from succeeding.

make delegator more explicitly handle connection failures from namers

If you try to use the marathon namer while not running marathon, the namer throws a connection exception that isn't caught anywhere.

I think it should instead fail the lookup with NameTree.Neg or NameTree.Fail

Identical request ID on every new request

It looks like we're assigning the same request ID to every linkerd request that doesn't already have an existing context set. To track this down, I modified the HttpTraceInitializer as follows:

diff --git a/linkerd/protocol/http/src/main/scala/com/twitter/finagle/buoyant/linkerd/HttpTraceInitializer.scala b/linkerd/protocol/http/src/main/scala/com/twitter/finagle/buoyant/linkerd/HttpTraceInitializer.scala
index bd63829..49d3056 100644
--- a/linkerd/protocol/http/src/main/scala/com/twitter/finagle/buoyant/linkerd/HttpTraceInitializer.scala
+++ b/linkerd/protocol/http/src/main/scala/com/twitter/finagle/buoyant/linkerd/HttpTraceInitializer.scala
@@ -4,6 +4,7 @@ import com.twitter.finagle.{Status => _, _}
 import com.twitter.finagle.http._
 import com.twitter.finagle.tracing._
 import com.twitter.finagle.buoyant.SampledTracer
+import java.util.logging.Logger

 /**
  * Typically, finagle clients initialize trace ids to capture a
@@ -15,6 +16,8 @@ import com.twitter.finagle.buoyant.SampledTracer
 object HttpTraceInitializer {
   val role = TraceInitializerFilter.role

+  private[this] val log = Logger.getLogger(getClass.getName)
+
   object clear extends Stack.Module0[ServiceFactory[Request, Response]] {
     val role = HttpTraceInitializer.role
     val description = "Clears all tracing info"
@@ -45,6 +48,7 @@ object HttpTraceInitializer {

         withTracer(tracer, ctx) {
           setId(tracer, Trace.id) {
+            log.fine(s"request id: ${Trace.id}")
             service(req)
           }
         }

And sure enough, when issuing three independent http requests, I get:

D 0209 19:28:46.819 THREAD17: request id: 59f8caf4faf96b61.59f8caf4faf96b61<:59f8caf4faf96b61
D 0209 19:28:47.644 THREAD18: request id: 59f8caf4faf96b61.59f8caf4faf96b61<:59f8caf4faf96b61
D 0209 19:28:48.322 THREAD19: request id: 59f8caf4faf96b61.59f8caf4faf96b61<:59f8caf4faf96b61

support admin configuration

The admin server should be configurable via linkerd's configuration file. I.e something like:

admin:
  port: 9991
  ip: any
  superCoolExperimentalAdminFeature: true

routers:
- ...

The admin server should be initialized with a Linker's config state (including)

This probably necessitates moving away from TwitterServer's provided admin service and instantiating our own (reusing as much as possible from TwitterServer).

An admin field be added to the Linker trait, probably accompanied by a withAdmin helper. The admin configuration should be parsed after all routers have been parsed so that it may be configured against the routers.

human-readable interface names on admin

The admin homepage displays verbose interface names such as:
rt/outgoing/dst/id/io.l5d.k8s/prod/incoming/foo/requests
Consider a way to render a more friendly version, for example:
k8s/prod/foo

replace namer logging with tracing

We have a bunch of debugging logging in our namers. It would be far more useful for this to be replaced by trace annotations (i.e. Trace.record(...)) so that it can be more tightly correlated with requests.

There are some gotchas in here, as we want watch-updates to be separate from tracing in the request path.

This would help in debugging things like #83

upgrade to finagle-6.33

for bug fixes, etc

lowercase host headers in http identifier

From RFC 2616:

        - Comparisons of host names MUST be case-insensitive;

In order to live this life, ensure that the host header value is lowercased before being inserted into a Dst.Path.

Stop shipping wrapper script

The linkerd wrapper script is cumbersome, since it doesn't understand values in the configuration--for instance, the admin port. If we can't do daemonization well from the wrapper script, we should remove it until we can do proper process management.

hard-to-understand error with non-existing namers

If you put a namer in the dtab that linkerd doesn't know about, you get the following behavior:

The log says something unintelligible:

E 0201 20:23:21.968 THREAD19: Exception propagated to the root monitor!
Failure(io.l5d.http.anyMethodPfx, flags=0x100000000)
    with Service -> 0.0.0.0/4140
    at com.twitter.finagle.NoStacktrace(Unknown Source)

The name of the unknown namer is returned as a text string with a 503 error (at least, with HTTP--haven't tried other protocols).

Ideally, the log message would be more informative. For the HTTP error, the 503 part is good, but ideally, the content would be either nothing or a more descriptive error message.

JavaScript error in admin ui

It appears there's a JS error in routers.js.

Replacing servers.find( with _.find(servers fixes the issue.

Much of the admin UI appears broken without this fix.

Add CONTRIBUTING.md

See: https://help.github.com/articles/setting-guidelines-for-repository-contributors/

add e2e test to verify #130

In discussion on #130, it was suggested we add some e2e tests to verify the fix as a follow-up.

error

RFC: metric name formatting

By default, Finagle clients format metrics as follows:

clnt/<label>/<metric>.<sfx>
srv/<label>/<metric>.<sfx>

Router stats don't exactly fit into the model--for instance the bindcache stats are currently unscoped (and therefore lossy/confusing when multiple routers exist). I'd like to propose a new naming scheme that more closely reflects reality and carves out space for furhther expansion:

rt/<router-label>/bindcache/...
rt/<router-label>/dst/path/<unbound-dst-path>/...
rt/<router-label>/dst/id/<bound-dst-id>/...
rt/<router-label>/srv/<server-ip>/<server-port/...
rt/<router-label>/...

This allows us to potentially expose multiple scopes of client stats (e.g. logical sr vs cluster sr -- logical is what you monitor, cluster is how you compare code versions).

Later, various stats scopes should be controllable should be controllable via configuration parameter (because for any large site, there may be many many unbound-dst-path values).

In order to complete this, the following things must happen:

we need to craft stats scopes during router initialization
the admin page must be updated to map client and server stat names.

Note: It is technically feasible to preserve the existing stat name structure, however to do so incurs complexity on both initialization and consumers for marginal value. For example, we could try to construct names like:

clnt/<router-label>/<bound-dst-id>/...
srv/<router-label>/<ip>/<port>/...
rt/<router-label>/bindcache
rt/<router-label/...

Regardless of implementation details, I think it's preferable to establish a single naming hierarchy for router stats. It also happens to be the case that it's much simpler to implement hierarchical stats scoping in router initialization.

Namer and Tracer stats should be scoped by namer/<label> and tracer/<label>, respectively.

This change will render some parts of TwitterServer's Admin page obsolete, but we intend to replace TwitterServer in favor of a config-driven admin page anyway. I don't believe that this change breaks any other APIs that we care about.

routers:
- kind: http
  baseDtab: /http/1.1/GET => /io.l5d.k8s/default/http
  tls:
    kind: boundPathMatcher
    certPath: /path/to/ca.pem
    matches:
    - prefix: /io.l5d.k8s
      name: ${3}.buoyant.io