linkerd / linkerd2 Goto Github PK

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.

License: Apache License 2.0

Shell 1.89% Batchfile 0.01% Go 68.42% CSS 0.16% JavaScript 10.48% HTML 0.03% Dockerfile 0.28% Smarty 0.49% PowerShell 0.02% Mustache 0.01% Rust 17.75% Makefile 0.46%

service-mesh rust golang kubernetes linkerd cloud-native

linkerd2's Introduction

This repo is for the 1.x version of Linkerd. Feature development is now happening in the linkerd2 repo. This repo is currently only used for periodic maintenance releases of Linkerd 1.x.

Linkerd 1.x (pronounced "linker-DEE") acts as a transparent HTTP/gRPC/thrift/etc proxy, and can usually be dropped into existing applications with a minimum of configuration, regardless of what language they're written in. It works with many common protocols and service discovery backends, including scheduled environments like Nomad, Mesos and Kubernetes.

Linkerd is built on top of Netty and Finagle, a production-tested RPC framework used by high-traffic companies like Twitter, Pinterest, Tumblr, PagerDuty, and others.

Linkerd is hosted by the Cloud Native Computing Foundation (CNCF).

Want to try it?

We distribute binaries which you can download from the Linkerd releases page. We also publish Docker images for each release, which you can find on Docker Hub.

For instructions on how to configure and run Linkerd, see the 1.x user documentation on linkerd.io.

Working in this repo

BUILD.md includes general information on how to work in this repo. Additionally, there are documents on how to build several of the application subprojects:

linkerd -- produces linkerd router artifacts
namerd -- produces namerd service discovery artifacts
grpc -- produces the protoc-gen-io.buoyant.grpc code generator

We ❤️ pull requests! See CONTRIBUTING.md for info on contributing changes.

Related Repos

linkerd2: The main repo for Linkerd 2.x and where current development is happening.
linkerd-examples: A variety of configuration examples and explanations
linkerd-tcp: A lightweight TCP/TLS load balancer that uses Namerd
linkerd-viz: Zero-configuration service dashboard for Linkerd
linkerd-zipkin: Zipkin tracing plugins
namerctl: A commandline utility for controlling Namerd

Code of Conduct

This project is for everyone. We ask that our users and contributors take a few minutes to review our code of conduct.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

linkerd2's People

Contributors

Stargazers

Watchers

Forkers

dadjeibaah hawkw pcalcado rawvnode dylangraham oulinbao chamrc tgllres fakod kyessenov clustellar mgicode xuzhaokui sullivanchan fjibj mylinyuzhi zllak alexxnica kryndex xiaods huangtall yan96in andyrao alena1108 myagley clemensw linecode skymysky fnet123 zhuguihua wxpjimmy carllerche wangshuaixin ltxz2008 wormon tiansiyuan kc17 vinceano jhaynie amitkumarj441 jpzhang jungho ahume raytung hypnoglow 40a javierarrieta sandramadhu simonuser aledbf xdnice raisingstar markjacksonfishing realforce1024 cimomo xushiwei fengzixu imskyer xabxx wq-du xiongeee paulmaddox liyanyanli guangxuli jondlm liugj sfroment raghu999 enginelabs zqheng lparth chenrui2014 frtmelody jinpeng309 crystaldust roki1988 enterstudio enterstudios ninokop happyshrm haptear rim99 chenqiangzhishen ganguoxi etsangsplk liyong1028826685 joshlemer mylovetop cupello walterstorey let-us-read-source-code bianhezhen twkun wuweiplus arbio5zt carllhw forrestcai achanda dingoeatingfuzz criptalia

linkerd2's Issues

conduit dashboard ,routes page blocked script .

I run

$ conduit dashboard
Running `kubectl proxy --port=8001`
Starting to serve on 127.0.0.1:8001
Opening http://127.0.0.1:8001/api/v1/namespaces/conduit/services/web:http/proxy/ in the default browser

and then I visithttp://127.0.0.1:8001/api/v1/namespaces/conduit/services/web:http/proxy/routes,and get no content in web main area.

And my devtool get these message:

Deployment for Prometheus needs RBAC settings as well

Got the following log from Container Prometheus of Deployment prometheus

time="2017-12-21T09:39:03Z" level=error msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:209: Failed to list *v1.Pod: User "system:serviceaccount:conduit:default" cannot list pods in the namespace "conduit". (get pods)" component="kube_client_runtime" source="kubernetes.go:76"

Adding f.e. SA conduit-controller to it, fixes the issue

tap container fails if pods do not belong a deployment

conduit v0.1.1

~> kubectl -n conduit logs controller-2041965127-1kflh tap
serving scrapable metrics on :9998
time="2017-12-21T05:56:49Z" level=info msg="starting gRPC server on :8088"
E1221 05:56:49.374317       1 runtime.go:66] Observed a panic: "index out of range" (runtime error: index out of range)
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime/runtime.go:72
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime/runtime.go:65
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/usr/local/go/src/runtime/panic.go:28
/go/src/github.com/runconduit/conduit/controller/k8s/replicasets.go:77
/go/src/github.com/runconduit/conduit/controller/tap/server.go:336
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/thread_safe_store.go:232
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/thread_safe_store.go:129
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/store.go:217
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:332
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:256
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:198
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:96
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:97
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:52
/usr/local/go/src/runtime/asm_amd64.s:2337
panic: runtime error: index out of range [recovered]
	panic: runtime error: index out of range

goroutine 13 [running]:
github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime/runtime.go:58 +0x111
panic(0x14002c0, 0x1ef4020)
	/usr/local/go/src/runtime/panic.go:491 +0x283
github.com/runconduit/conduit/controller/k8s.(*ReplicaSetStore).GetDeploymentForPod(0xc4203cd280, 0xc420618c10, 0x0, 0xc42050d870, 0x8ba2eb, 0x141c6a0)
	/go/src/github.com/runconduit/conduit/controller/k8s/replicasets.go:77 +0x3aa
github.com/runconduit/conduit/controller/tap.NewServer.func1(0x156d480, 0xc420618c10, 0xc42050d918, 0x1, 0xc42090e178, 0x0, 0x0)
	/go/src/github.com/runconduit/conduit/controller/tap/server.go:336 +0x53
github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache.(*threadSafeMap).updateIndices(0xc42024b5c0, 0x0, 0x0, 0x156d480, 0xc420618c10, 0xc4208e0750, 0x2e, 0x0, 0x0)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/thread_safe_store.go:232 +0x265
github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache.(*threadSafeMap).Replace(0xc42024b5c0, 0xc420182180, 0xc42007e0e0, 0x7)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/thread_safe_store.go:129 +0x171
github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache.(*cache).Replace(0xc4203cd300, 0xc42037e000, 0x8d, 0x8d, 0xc42007e0e0, 0x7, 0x1, 0x7fed99d9f010)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/store.go:217 +0x1d8
github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache.(*Reflector).syncWith(0xc4202e7680, 0xc420488000, 0x8d, 0x8d, 0xc42007e0e0, 0x7, 0x0, 0x0)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:332 +0x1c3
github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc4202e7680, 0xc42009c2a0, 0x0, 0x0)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:256 +0x722
github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache.(*Reflector).RunUntil.func1()
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:198 +0x33
github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait.JitterUntil.func1(0xc4203cd360)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:96 +0x5e
github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait.JitterUntil(0xc4203cd360, 0x3b9aca00, 0x0, 0x1, 0xc42009c2a0)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:97 +0xa1
github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait.Until(0xc4203cd360, 0x3b9aca00, 0xc42009c2a0)
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:52 +0x4d
created by github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache.(*Reflector).RunUntil
	/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:197 +0x18d

Proxy tests should have all settings configuration go through the normal settings parser

In proxy/tests/support/proxy.rs we have:

    // TODO: We currently can't use `config::ENV_METRICS_FLUSH_INTERVAL_SECS`
    // because we need to be able to set the flush interval to a fraction of a
    // second. We should change config::ENV_METRICS_FLUSH_INTERVAL_SECS so that
    // it can support this.

This issue is about addressing that TODO. I suggest we implement #27 and then set the flush interval to whatever number of milliseconds is needed in the test harness.

Secure data plane communication

The Conduit proxy should secure pod to pod communication with TLS.

Stop vendoring proxy dependencies

Currently, the proxy has many of its dependencies vendored in this git repository. These libraries are now being developed in their own git repositories. We should update the proxy to track these repositories in a controlled fashion.

Tower: https://github.com/tower-rs/tower
Tower H2: https://github.com/tower-rs/tower-h2
Tower gRPC: https://github.com/tower-rs/tower-grpc
Codegen: https://github.com/carllerche/codegen

Additionally, the proxy should probably track Tokio & futures development:

Tokio: http://github.com/tokio-rs/tokio
Futures: https://github.com/alexcrichton/futures-rs/tree/tokio-reform

We want to make sure that breaking changes in these repositories do not break our build, so we should pin to specific git refs.

TCP Telemetry in the Controller

With the addition of #131, it seems likely that a user will want to be able to see transport level telemetry about non-HTTP (raw TCP) connections. The current transport events don't have a way of identifying what they are.

change css variables for color to be more descriptive

currently some of the variables in css representing a color are named:
--latency-p99; --latency-p95; --latency-p50

it would be great to change the naming to be descriptive of the variables representing colors, as is done with other variable names.

Auto-update docker dependency image SHAs

Followup from #115.

The Rust proxy and Go Docker images rely on base dependency images with
hard-coded SHA's:

gcr.io/runconduit/go-deps depends on

Gopkg.lock
Dockerfile-go-deps

gcr.io/runconduit/proxy-deps depends on

Cargo.lock
proxy/Dockerfile-deps

If any of these files change, we should auto-update all relevant Dockerfile's:

GO_DEPS_SHA=$(sh -c ". bin/_tag.sh && go_deps_sha")
PROXY_DEPS_SHA=$(sh -c ". bin/_tag.sh && proxy_deps_sha")

find . -type f -name 'Dockerfile*' -exec sed -i '' -e 's/gcr\.io\/runconduit\/go-deps:[^ ]*/gcr\.io\/runconduit\/go-deps:'$GO_DEPS_SHA'/g' {} \;
find . -type f -name 'Dockerfile*' -exec sed -i '' -e 's/gcr\.io\/runconduit\/proxy-deps:[^ ]*/gcr\.io\/runconduit\/proxy-deps:'$PROXY_DEPS_SHA'/g' {} \;

Require tap user to be explicitly authorized for the tap

Tap should be restricted to authorized users. I suggest we focus on RBAC-based authorization:

Initial Version:

Create a ClusterRole that for using the tap feature.
Have the tap controller authenticate and authorize the user for that ClusterRole. if the user doing the tapping doesn't have the role then the tap request should be denied.
Document how to grant people access to the tap feature through this role.

Future Version:
We need to be able to scope the authorization for tap in a flexible way. For example, we need to enable rules like "Allow Bob to tap all pods except those in the certificate-authority namespace" or "Allow Bob to only tap pods in the playground-bob" namespace.

I think for production environments we can assume that RBAC is enabled and working. However, note that current versions of minikube do not enable RBAC by default: kubernetes/minikube#1722. We probably should accomodate running Conduit in minikube with RBAC is disabled until the version of minikube with RBAC enabled by default is released. Alternatively we should tell people that enabling RBAC in minikube is required for Conduit.

Support Openshift Origin

origin is k8s enterprise version, we should support it.

cli tool requires kubectl to be run before it in order for GKE auth to work

kubectl apparently does something that triggers a OAuth token refresh when it is expired, which the conduit CLI tool doesn't do. We need to figure out what kubectl is doing and do the same in the conduit CLI tool.

Log all invocations of the tap feature to the audit log

Since tap is a security-sensitive feature, we should log uses of tap to the audit log. Given the "secure by default" design goal, this auditing should be enabled in the default configuration. We should use Kubernetes' standard configuration mechanisms (see https://kubernetes.io/docs/tasks/debug-application-cluster/audit/) to control (e.g. allow disabling) of auditing for the tap feature.

implement richer version check messaging

motivated by: #79 (comment)

It would be useful for our version checker to support free-form messaging, when, for example, there is a high-priority security update.

Upgrade to latest ant.design

Ant 3 was released in November.
Upgrading will fix the Uncaught TypeError: obj.onKeyDown is not a function error from the Menu component.

failed to open dashboard in minikube in some situations where --extra-config is used

darrenfu:~$ kubectl version --short
Client Version: v1.9.0
Server Version: v1.8.0
darrenfu:~$ conduit version
Client version: v0.1.1
Server version: v0.1.1

darrenfu:~$ conduit dashboard
Running `kubectl proxy -p 8001`
Starting to serve on 127.0.0.1:8001

Opening [http://127.0.0.1:8001/api/v1/namespaces/conduit/services/web:http/proxy/] in the default browser

but i get nothing but an error when visit the url : http://127.0.0.1:8001/api/v1/namespaces/conduit/services/web:http/proxy/routes

# error on the page
Error: 'EOF'
Trying to reach: 'http://172.17.0.11:8084/routes'

check the health of Conduit using the command-line

Narrative

So that I can quickly identify problems with my service mesh infrastructure
As an operator
I want to check the health of Conduit using the command-line

Rationale

As it happens with any distributed system, there are plenty of things that can go wrong in a Conduit deployment. To make it worse, sometimes it can be hard to distinguish between problems in the users' services or with the underlying infrastructure.

The $ conduit status command-line utility will help users save time diagnosing problems with the service mesh itself.

The command must check as much as possible, being accurate and exhaustive is more important than finishing quickly. It must provide useful and actionable advice whenever possible, and ask users to open an issue with the project if the current condition feels like a bug (e.g. some component throws errors instead of returning proper error states).

Acceptance criteria:

Required items to check:

Presence of, access & health of kubectl
Version compatibility for K8S both server and client
Access & health of K8s API
Access & health of Conduit API
Health of control plane server-side components

Given a  Conduit deployment
When an operator runs the status command
And everything is healthy and accessible
Then a message is displayed in the terminal saying that everything is ok
And the process exits with status code 0

Given a Conduit deployment 
When an operator runs the status command
And one or more components aren't healthy or accessible
Then a message is displayed in the terminal with descriptions for each error found
And the process exits with status code != 0

Given a Conduit deployment 
When an operator runs the status command
And one or more components are in an unexpected error condition
Then a message is displayed in the terminal asking users to open an issue
And the message contains useful yet anonymised information they should copy and paste on the issue to help with the investigation
And the process exits with status code != 0

reassess DOCKER_FORCE_BUILD

motivated by: #137 (comment)

DOCKER_FORCE_BUILD may not be doing what we think it's doing:
https://github.com/runconduit/conduit/blob/master/bin/_docker.sh#L90

consider fixing the build scripts to always rebuild if the latest tag is set, or even remove this flag entirely and rely on docker build caching.

Document source modules structure in Conduit project

As per #103 :

Moving these files to a shared location makes sense to me. In the projects that I've worked with that have a top-level pkg dir, the organization of that dir always seems (to me) to be a bit inscrutable. It's usually not clear which packages belong in the dir, and the naming of the packages doesn't give a lot of context.

To help make sure we don't run into this, I think we should be really strict about naming, and we should move all shared packages into pkg. I've commented below with a few suggestions. I think we should also get rid of all of these util pacakges, and move them into better-named pkg directories: controller/api/util, web/util, controller/util.

And

overall i like the direction this is going. echoing some of kevin's comments, i'd like to understand/establish guiding principals on where things should go. a few questions along these lines:

should anything without a main go under /pkg?
when to put something under controller/k8s vs. /pkg/k8s? is it strictly a matter of whether the code is shared? is it easier to reason about if all the k8s code is in one place?
building on the previous question: does it make sense to make a distinction between library code that is shared across mains vs. single-use? when browsing the repo, i'd prefer to see all library code in one place.

Proxy Transparency

goal

A goal of the proxy is to be transparent, meaning that the application need not change behavior to benefit from the proxy. Transparency is primarily achieved by using iptables(8) to redirect traffic into the proxy.

all tcp traffic enters the proxy on a single port.
support for h2, http, and raw tcp transport

accept-side

HTTP1

An incoming HTTP/1 request has the format of method request-target HTTP/1.{0,1}\r\n. Unfortunately, that means that with a really large request target, it can require multiple reads and buffering before finally knowing if this is an HTTP/1 request.

It’s likely that non-HTTP1 data will be quick to identify, since as soon as it doesn’t match the shape of 3 words, we can assume its not HTTP.

A question that arises is whether slightly invalid HTTP requests should be classified as raw TCP instead. If the target route only speaks one of those options, the answer is simple, but if it speaks both, the answer is unresolved.

HTTP2

We have this working as the required default at the moment. With transparency, we could know incoming is HTTP2 by peeking the socket for the PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n magic bytes header (or from ALPN in the TLS detection).

TCP

If an accepted connection does not parse as either HTTP1 or HTTP2, then we may assume it’s plain TCP. There is an unresolved question around whether to accept the plain TCP or to reject as an invalid HTTP request, see the HTTP1 section.

TLS

We can peek the socket looking for the start of a TLS record to determine if TLS should be used. This peek should happen before the HTTP/TCP detection happens.

On a brand new connection, we can assume that the first TLS record is a handshake, and thus can peek for these bytes:

0x16 0x03 0xNN 0xNN 0xNN 0x01 - We’re looking for a HANDSHAKE byte (0x16), TLS version (3.N), some length we don’t yet care about, and then the CLIENT_HELLO byte (0x01).
We could decide to be more lenient or more strict with the version check and the length bytes.

Another question here is whether we want to assume that a slightly broken TLS record should just be considered plain TCP, or should we send an error back about the bad TLS format.

connect-side

Knowing what protocols to route to a destination likely requires some prior knowledge, retrieved by the Destination service from the controller.

Eventually, if no hint can be provided by the controller (a user hasn’t configured the service with such a hint), we can try some defaults: use HTTP1 with an Upgrade header to allow upgrading to HTTP2.

Would this default mean that without a prior knowledge hint from the controller, we assume the destination does not want plain TCP or TLS?

Prior knowledge from controller

The destination service would be updated to report back hints about what protocols a destination supports. A proposed new destination service:

service Destination {
  // Given a destination, return all addresses in that destination as a long-
  // running stream of updates.
  rpc Get(common.Destination) returns (stream Update) {}
}

message Update {
  oneof update {
    Add add = 1;
    Remove remove = 2;
  }
}

message Remove {
  repeated common.TcpAddress addrs = 1;
  repeated Protocol protocols = 2;
}

message Add {
  repeated WeightedAddr addrs = 1;
  repeated Protocol protocols = 2;
}

message WeightedAddr {
  common.TcpAddress addr = 1;
  uint32 weight = 3; 
}

enum Protocol {
  HTTP_1 = 0; // should we differentiate between 1.0 and 1.1?
  HTTP_2 = 1;
  TCP = 3;
  TLS = 4; // should we include TLS versions?
}

These hints can be stored in Kubernetes metadata annotations in the Endpoints API.

Work to be done

Here is a list of tasks to be done to acheive this goal:

Accept

Peek new connections for TLS records
Peek new connections for HTTP2 preface
Peek new connections for HTTP1 start-line
- Include an HTTP1 server (hyper?)
Forward connections that don't match any protocol as plain TCP

Connect

Update controller Destination service to include Protocol hints
- Document how to configure protocols
Use protocol hints in proxy
- Add an HTTP/1 client (hyper?)
- Create a TCP forwarding client
- Add TLS to proxy Connect
Default to HTTP/1 with Upgrade to H2 if no prior knowledge
- Get HTTP/1 with upgrades client (hyper?)

cargo test came across failed.

$ cargo test --all
   Compiling abstract-ns v0.4.2
   Compiling env_logger v0.4.3
error[E0277]: the trait bound `std::sync::Arc<str>: std::convert::From<&str>` is not satisfied
  --> /Users/xiaods/.cargo/registry/src/github.com-1ecc6299db9ec823/abstract-ns-0.4.2/src/name.rs:67:23
   |
67 |         Ok(Name(value.into()))
   |                       ^^^^ the trait `std::convert::From<&str>` is not implemented for `std::sync::Arc<str>`
   |
   = help: the following implementations were found:
             <std::sync::Arc<T> as std::convert::From<T>>
   = note: required because of the requirements on the impl of `std::convert::Into<std::sync::Arc<str>>` for `&str`

error: aborting due to previous error

error: Could not compile `abstract-ns`.
warning: build failed, waiting for other jobs to finish...
error: build failed

Configurations with Kubernetes RBAC authorization enforced aren't working

For 0.1.0 we punted on RBAC support because minikube doesn't yet enable the RBAC authorization mode and we didn't want to ask people to enable RBAC to try out Conduit. However, now we have people trying to use Conduit with RBAC authorization enabled and it's not working. At a minimum we need to make sure that Conduit works regardless of whether RBAC authorization is enforced or not. In particular, some of the controller services need to be running under service accounts that have roles that enable them to make the Kubernetes API server queries for the specific data sets they access.

Ideally we'd do this 100% automatically and it would work regardless of whether RBAC enforcement is enabled. If we can't then we can create a switch for conduit install to disable RBAC. (Because of the "Secure by Default" design goal, the default should be to enable RBAC.)

Create BUILD.md

We have build/development instructions across a couple readme's. Let's consolidate into a BUILD.md file.

CLI check command should display useful information when it encounters an error

This is an extraction from a feature originally in #92.

Every time check return ERROR we have a potential bug. As users face this situation and report problems, it is handy to give them something that's easy to copy and paste and has some basic information about their system to reduce the back-and-forth on issues and support forums.

Given a Conduit deployment 
When an operator runs the check command
And one or more components are in an unexpected error condition
Then a message is displayed in the terminal asking users to open an issue
And the message contains useful yet anonymised information they should copy and paste on the issue to help with the investigation
And the process exits with status code != 0

A mock UI would be something like:

 $ conduit check --force-print-debug-info
 kubectl: is in $PATH................................[ok]
 kubectl: has compatible version.....................[ok]
 kubectl: can talk to Kubernetes cluster.............[ok]
 kubernetes-api: can initialize the client...........[ok]
 kubernetes-api: can query the Kubernetes API........[ok]
 conduit-api: can access the Conduit API.............[ok] 
 conduit-api: Conduit control plane is healthy.......[ok]
 
 Status check results are [ok]
 -------------------System information-------------------
 conduit_version=0.0.1
 k8s_version_server=1.8.7
 k8s_version_client=1.8.7
 k8s_provider=minikube
 uname=Darwin arbeitsziege.local 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64
 user_is_root=false
 can_access_internet=false

Start running `go vet ./...` in CI

We can use go vet to detect go code issues that are not caught by the compiler. If we add this to CI, then it should prevent regressions going forward. We'll also need to fix these issues as part of that branch:

$ go vet ./...
pkg/k8s/kubectl.go:88: arg kctl.ProxyPort in printf call is a function value, not a function call
pkg/k8s/kubectl_test.go:80: arg kctl.ProxyPort in printf call is a function value, not a function call
pkg/k8s/kubectl_test.go:108: arg kctl.ProxyPort in printf call is a function value, not a function call
exit status 1
pkg/shell/shell_test.go:103: arg output for printf verb %s of wrong type: *bufio.Reader
exit status 1
proxy-init/integration_test/iptables/http_test.go:198: arg resp for printf verb %s of wrong type: *net/http.Response
exit status 1
proxy-init/integration_test/iptables/test_service/test_service.go:50: arg amITheProxy for printf verb %s of wrong type: bool
exit status 1
web/main.go:67: the cancel function returned by context.WithTimeout should be called, not discarded, to avoid a context leak
web/main.go:41: arg kubernetesApiHost for printf verb %s of wrong type: *string
web/main.go:45: arg kubernetesApiHost for printf verb %s of wrong type: *string
exit status 1

Tap server panics when indexing ReplicaSets

When a Kubernetes cluster contains Pods that belong to ReplicaSets that themselves do not belong to Deployments, the Tap server panics with an "index out of range" error.

E1212 05:46:43.691633       1 runtime.go:66] Observed a panic: "index out of range" (runtime error: index out of range)
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime/runtime.go:72
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime/runtime.go:65
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/usr/local/go/src/runtime/panic.go:28
/go/src/github.com/runconduit/conduit/controller/k8s/replicasets.go:77
/go/src/github.com/runconduit/conduit/controller/tap/server.go:336
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/thread_safe_store.go:232
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/thread_safe_store.go:129
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/store.go:217
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:332
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:256
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/tools/cache/reflector.go:198
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:96
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:97
/go/src/github.com/runconduit/conduit/vendor/k8s.io/client-go/pkg/util/wait/wait.go:52
/usr/local/go/src/runtime/asm_amd64.s:2337
panic: runtime error: index out of range [recovered]
  panic: runtime error: index out of range

Add HTTP/1 routing to the proxy

When a new request comes in, we want to try to determine if the connection is using HTTP/1 instead of HTTP/2, and if so, still support the request.

Initially, to support the idea of "proxy transparency" (where we try to behave the same as if the proxy weren't there), when the proxy receives an HTTP/1 request, it will try to open HTTP/1 to the destination as well. Future enhancements can consider if it's possible to upgrade to HTTP/2, but is not required to achieve "transparency".

HTTP/1 requests will be routed in a similar fashion to HTTP/2, by asking the controller for a destination based on the Host header.

Support HTTP/1.1

Add support for HTTP/1.1 including automatic protocol upgrade from HTTP/1.1 to HTTP/2.

Support custom `dnsPolicy` and `dnsConfig`

For background on Kubernetes dnsPolicy and dnsConfig, see https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods-dns-policy

For any pod managed by Conduit, the DNS configuration is completely bypassed. Assuming we want to continue to do this, we should:

Document this in the documentation.
In conduit inject, warn when a pod contains dnsConfig or a non-default dnsPolicy. (Note that the default dnsPolicy is not "Default"; the default is "ClusterFirst."

We might also consider whether we want to actually honor the pod's DNS policy and/or the DNS config. This would probably require us to implement DNS in the proxy.

CLI doesn't complain when given unknown deployments, pods, or paths

For tap and stat, and maybe other commands, specifying a non-existing deployment or pod doesn't result in any kind of error message.

E.g. stat:

william:~/devel/conduit$ conduit stat deploy default/hello
NAME            REQUEST_RATE   SUCCESS_RATE   P50_LATENCY   P99_LATENCY
default/hello         2.5rps         96.00%         352ms         875ms
william:~/devel/conduit$ conduit stat deploy default/hello2
NAME   REQUEST_RATE   SUCCESS_RATE   P50_LATENCY   P99_LATENCY

and tap:

william:~/devel/conduit$ conduit tap deploy default/hello
[172.17.0.3:53280 -> 172.17.0.11:80]
HTTP Request
Stream ID: (0, 263)
HTTP POST world.default.svc.cluster.local/helloworld.World/Greeting
...
^C
william:~/devel/conduit$ conduit tap deploy hello
^C

This isn't too bad on its surface, but a) it means that e.g. forgetting to put the namespace for tap deploy means that the command just hangs forever (this is an easy mistake, and we had a user confused about this in #conduit today), and b) it doesn't match how kubectl works, e.g.

william:~/devel/conduit$ kubectl get -n conduit deploy/prometheus
NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
prometheus   1         1         1            1           7d
william:~/devel/conduit$ kubectl get -n conduit deploy/prometheus2
Error from server (NotFound): deployments.extensions "prometheus2" not found

I think these commands should return an error if the specifying pod, deployment, or path (maybe?) is not found.

Secure communication between the data plane and the control plane

When Conduit proxies communicate with the controller, this communication should be secured with TLS.

Configuration settings should take units in the value, not put units in the setting name

Consider CONDUIT_PROXY_EVENT_BUFFER_CAPACITY. What units is this setting measured in? Currently the units are implicitly bytes, e.g. CONDUIT_PROXY_EVENT_BUFFER_CAPACITY=10000 means 10,000 bytes. Instead of having implicit units, we should use explicit units so we can write, e.g. CONDUIT_PROXY_EVENT_BUFFER_CAPACITY=10kb (for base-2 kilobytes) or CONDUIT_PROXY_EVENT_BUFFER_CAPACITY=10000b (for bytes), etc.

Similarly we have CONDUIT_PROXY_METRICS_FLUSH_INTERVAL_SECS. This is better because the name of the setting gives the units so one doesn't have to guess. However, we've already run into a case during testing where we want to use a sub-second unit. This setting should instead by named CONDUIT_PROXY_METRICS_FLUSH_INTERVAL and accept values with units specified, e.g. 10s for 10 seconds or 200ms for 200 milliseconds.

Similar concerns apply to other settings, not just in the proxy, but also in the other components.

proxy-deps docker image tag hash should depend on Dockerfile

#105 modified proxy/Dockerfile-deps to depend on rust:1.23.0 rather than rust:1.21.0, but since the tag on gcr.io/runconduit/proxy-deps only depends on Cargo.lock, the image never updated.

We should fix our build such that modifying proxy/Dockerfile-deps or Cargo.lock causes the image tag to change.

Fail CI if docker builds fail

If an image fails to build correctly, CI still passes. We should at least fail CI if this is the case, and probably do more testing of the images too.

Extract and test protobuf-over-http code

As part of #103, our client/server implementations for the public API were refactored, but they're still pretty convoluted. Specifically, the protobuf-over-http setup is brittle and not thoroughly tested. We should move that code to a separate package and add additional tests.

No tap output for emojivoto demo running in minikube

After deploying the latest conduit master and the emojivoto app to minikube, I'm unable to see any tap output when running conduit tap deploy emojivoto/voting-svc (from the Getting Started guide).

In the tap server logs, I see:

time="2018-01-11T02:01:59Z" level=info msg="Tapping 1 pods for target emojivoto/voting-svc"
time="2018-01-11T02:01:59Z" level=info msg="Establishing tap on 172.17.0.10:4190"

But the tap command does not produce any output.

In the voting service's proxy logs, I see:

1515635322.487561 INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 }
1515635322.487669 INFO conduit_proxy routing on V4(127.0.0.1:4140)
1515635322.487675 INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
1515635322.536696 INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.107.64.120

Add TCP support to the proxy

When a new connection is made to the proxy, the proxy will try to determine if it is using HTTP (1 or 2), and if not, consider it a raw TCP connection to just pass through.

For the initial feature, the SO_ORIGINAL_DST will be used to determine where to open a socket to.

Web sidebar does not extend to the bottom of the page on long pages

If the page content is long enough (200vh), then the sidebar does not extend to the bottom of the page. It looks like this:

Proxy's logs have timestamps in an unreadable format different than what's typically used by Kubernetes

A line of output of kubectl logs for a conduit-proxy container looks like this:

1513399875.907490 TRCE tokio_io::framed_read ("serve", V4(127.0.0.1:4140)), attempting to decode a frame

The output of kubectl logs po/kube-dns-86f6f55dd5-xzgtn dnsmasq -n kube-system looks like this:

I1216 00:41:20.452866       1 nanny.go:108] dnsmasq[24]: using nameserver 127.0.0.1#10053 for domain ip6.arpa

kubedns's log is in the same format.

Notice in particular that conduit-proxy's logs's timestamps are unreadable, whereas Kuberenetes's logs's timestamps are readable. The proxy should format its timestamps like Kuberenetes does, or at least in some other more readable format.

cli tool doesn't respect $KUBECONFIG file location

For example:

$ conduit stat pod
stat .kube/config: no such file or directory

conduit CLI --api-addr flag requires scheme in addition to host:port

The conduit CLI supports a global --api-addr flag for overriding the connection to the Kubernetes API. It's documented in the help message as follows:

      --api-addr string      Override kubeconfig and communicate directly with the control plane at host:port (mostly for testing)

It appears that that flag now requires the input to be a fully qualified url, including the scheme. Omitting the scheme results in:

$ conduit --api-addr 192.168.99.100:8085 stat pods
Error: error creating api client while making stats request: error generating base URL for Kubernetes API from [192.168.99.100:8085/api/v1/]

It looks like this behavior changed in #68. We should either fix the help message, or change the behavior back to requiring only host:port. My preference is to go back to not requiring the scheme.

As part of this change, can we also fix the capitalization on the k8s api initializer? NewK8sAPi => NewK8sAPI | NewK8sApi

Proxy should accept non-fully qualified domain names

All of the following authorities should resolve to foo.ns.svc.cluster.local:

foo
foo.ns
foo.ns.svc
foo.ns.svc.cluster.local

logo hosting

conduit_proxy doesn't resolve IPv6 name

Hi,

I try Conduit in my IPv6-only cluster,

I correctly Got the dashboard, but when deploying my first deployment (with conduit sidecar enabled), It seem that ConduitProxy can't resolve IPv6 Name :

 conduit_proxy::control "controller-client", poll rpc services
 conduit_proxy::control::discovery "controller-client", poll_rpc
 conduit_proxy::dns "controller-client", resolve proxy-api.conduit.svc.k8s.domain.tld
 conduit_proxy::control::telemetry "controller-client", poll_rpc
 conduit_proxy::control::telemetry "controller-client", controller unavailable
 conduit_proxy::telemetry::control "controller-client", poll
 conduit_proxy::telemetry::control "controller-client", recv.poll(Receiver { capacity: AtomicUsize(10000) })
 conduit_proxy::control "controller-client", poll rpc services
 conduit_proxy::control::discovery "controller-client", poll_rpc
WARN conduit_proxy::control "controller-client", controller error: Connect(Connect(Connect(Error { repr: Custom(Custom { kind: NotFound, error: StringError("DNS resolution failed") }) })))
 conduit_proxy::control::telemetry "controller-client", poll_rpc
 conduit_proxy::control::telemetry "controller-client", controller unavailable
 conduit_proxy::telemetry::control "controller-client", poll
 conduit_proxy::telemetry::control "controller-client", recv.poll(Receiver { capacity: AtomicUsize(10000) })

But in a Alpine pod :

bash-4.3# dig AAAA proxy-api.conduit.svc.k8s.domain.ltd +short
ff1234:d400:ff:88::e:88:2b38

It seem the bug happen in conduit/proxy/src/dns.rs but I'm not such good on Rust to find the problem.

Thanks !!

Support TCP

Add support for the proxy to act as a TCP proxy when the higher level protocol is unknown.

Dynamic Routing

Add the ability to dynamically configure routing.

document minimum required rust version in BUILD.md

Error information:

error[E0599]: no method named `eq_ignore_ascii_case` found for type `&str` in the current scope
  --> proxy/src/fully_qualified_authority.rs:56:19
   |
56 |             (part.eq_ignore_ascii_case("svc"), false)
   |                   ^^^^^^^^^^^^^^^^^^^^
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope, perhaps add a `use` for it:
           candidate #1: `use std::ascii::AsciiExt;`

error: aborting due to previous error

error: Could not compile `conduit-proxy`.

I run this build in ubuntu 16.04 with rustc 1.22.1. Should I upgrade to rustc 1.23 to build conduit?

`conduit inject` configures the proxy to log way too much detail

Currently there's a lot of output for kubectl logs "-lconduit.io/plane=data" -c conduit-proxy because conduit inject configures the proxy with CONDUIT_PROXY_LOG=trace,h2=debug,mio=info,tokio_core=info. I found it to be overwhelming when debugging a configuration error.

I think a better default would be CONDUIT_PROXY_LOG=warn,conduit_proxy=info but I'm not sure of all the thinking that's gone into the current choice.

/cc @adleong

Tap panic after a while

Unfortunately, I don't yet have a consistent repro, but I've hit this panic twice with conduit tap while sending lots of gRPC requests to a conduit-injected deployment:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1d0bb22]

goroutine 1 [running]:
github.com/runconduit/conduit/cli/cmd.print(0x285b500, 0xc42000a240)
	/go/src/github.com/runconduit/conduit/cli/cmd/tap.go:135 +0x9d2
github.com/runconduit/conduit/cli/cmd.glob..func6(0x28a4b40, 0xc4202ca980, 0x2, 0x2, 0x0, 0x0)
	/go/src/github.com/runconduit/conduit/cli/cmd/tap.go:77 +0x2d5
github.com/runconduit/conduit/vendor/github.com/spf13/cobra.(*Command).execute(0x28a4b40, 0xc4202ca8e0, 0x2, 0x2, 0x28a4b40, 0xc4202ca8e0)
	/go/src/github.com/runconduit/conduit/vendor/github.com/spf13/cobra/command.go:698 +0x47a
github.com/runconduit/conduit/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x28a4700, 0x20725e0, 0xc42056ff70, 0x10043a4)
	/go/src/github.com/runconduit/conduit/vendor/github.com/spf13/cobra/command.go:783 +0x30e
github.com/runconduit/conduit/vendor/github.com/spf13/cobra.(*Command).Execute(0x28a4700, 0x1d57ddc, 0xc4202ca860)
	/go/src/github.com/runconduit/conduit/vendor/github.com/spf13/cobra/command.go:736 +0x2b
main.main()
	/go/src/github.com/runconduit/conduit/cli/main.go:14 +0x31

I am using a custom-built Conduit version, which consists of master commit 81fb0fe onto which I have merged 22475d4 (from alex/rbac branch) and c31e892 (from alex/skip-outbound-ports branch).

I'm using the conduit CLI bin from the built image, and my conduit install and conduit injects are referencing my custom-built images.

Support Hashicorp Nomad

I am interested in using Conduit with Nomad instead of Kubernetes; I would be interested in taking a stab at a port if the changes involve modifying the Golang part of the code (total Rust noob here).

Are there plans to make Conduit not so tightly coupled with Kubernetes? If so, which part of the code base would you suggest I start with. Any pointers appreciated :)

Thanks!