Giter Site home page Giter Site logo

aliyuncontainerservice / terway Goto Github PK

View Code? Open in Web Editor NEW
537.0 18.0 147.0 17.43 MB

CNI plugin for Alibaba Cloud VPC/ENI

Home Page: https://www.aliyun.com/product/kubernetes

License: Apache License 2.0

Dockerfile 0.38% Shell 2.85% Go 93.21% Smarty 0.30% PowerShell 1.94% HCL 0.76% Makefile 0.57%
cni vpc eni

terway's Introduction

Terway CNI Network Plugin

CNI plugin for Alibaba Cloud VPC/ENI

Go Report Card codecov Linter

English | 简体中文

Try It

Install Kubernetes

  • Prepare Aliyun ECS instance. The ECS OS we tested is Centos 7.4/7.6.
  • Install Kubernetes via kubeadm: create-cluster-kubeadm

After setup kubernetes cluster.

  • Change iptables Forward default policy to ACCEPT on every node of cluster: iptables -P FORWARD ACCEPT.
  • Check the rp_filter in sysctl parameters, set them to "0" on every node of cluster.

Make sure cluster up and healthy by kubectl get cs.

Install Terway network plugin


Terway plugin have two installation modes
  • VPC Mode

    VPC Mode, Using `Aliyun VPC` route table to connect the pods. Can assign dedicated ENI to Pod. Install method: <br />
    Replace `Network` and `access_key/access_secret` in [terway.yml](./terway.yml) with your cluster pod subnet and aliyun openapi credentials. Then use `kubectl apply -f terway.yml` to install Terway into kubernetes cluster.
  • ENI Secondary IP Mode

    ENI Secondary IP Mode, Using `Aliyun ENI's secondary ip` to connect the pods. This mode not limited by VPC route tables quotation. Install method: <br />
    Replace `access_key/access_secret` and `security_group/vswitches` in [terway-multiip.yml](./terway-multiip.yml) with your aliyun openapi credentials and resources id. Then use `kubectl apply -f terway-multiip.yml` to install Terway into kubernetes cluster.

Terway requires the access_key have following RAM Permissions

{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "ecs:CreateNetworkInterface",
        "ecs:DescribeNetworkInterfaces",
        "ecs:AttachNetworkInterface",
        "ecs:DetachNetworkInterface",
        "ecs:DeleteNetworkInterface",
        "ecs:DescribeInstanceAttribute",
        "ecs:DescribeInstanceTypes",
        "ecs:AssignPrivateIpAddresses",
        "ecs:UnassignPrivateIpAddresses",
        "ecs:DescribeInstances",
        "ecs:ModifyNetworkInterfaceAttribute"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "vpc:DescribeVSwitches"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    }
  ]
}

Using kubectl get ds terway -n kube-system to watch plugin launching. Plugin install completed while terway daemonset available pods equal to nodes.

Terway network plugin usage

Vpc network container

On VPC installation mode, Terway will config pod's address using node's podCidr when pod not have any special config. eg:

[root@iZj6c86lmr8k9rk78ju0ncZ ~]# kubectl run -it --rm --image busybox busybox
If you don't see a command prompt, try pressing enter.
/ # ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 46:02:02:6b:65:1e brd ff:ff:ff:ff:ff:ff
/ # ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 46:02:02:6b:65:1e brd ff:ff:ff:ff:ff:ff
    inet 172.30.0.4/24 brd 172.30.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4402:2ff:fe6b:651e/64 scope link
       valid_lft forever preferred_lft forever

Using ENI network interface to get the performance equivalent to the underlying network

On VPC installation mode, Config eni request aliyun/eni: 1 in one container of pod. The following example will create an Nginx Pod and assign an ENI:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - name: nginx
      image: nginx
      resources:
        limits:
          aliyun/eni: 1
[root@iZj6c86lmr8k9rk78ju0ncZ ~]# kubectl exec -it nginx sh
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
    link/ether 00:16:3e:02:38:05 brd ff:ff:ff:ff:ff:ff
    inet 172.31.80.193/20 brd 172.31.95.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe02:3805/64 scope link
       valid_lft forever preferred_lft forever
4: veth1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 1e:60:c7:cb:1e:0e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1c60:c7ff:fecb:1e0e/64 scope link
       valid_lft forever preferred_lft forever

ENI Secondary IP Pod

On ENI secondary IP installation mode, Terway will create & allocate ENI secondary IP for pod. The IP of pod will in same IP Range:

[root@iZj6c86lmr8k9rk78ju0ncZ ~]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE                                 NOMINATED NODE
nginx-64f497f8fd-ckpdm   1/1     Running   0          4d    192.168.0.191   cn-hangzhou.i-j6c86lmr8k9rk78ju0nc   <none>
[root@iZj6c86lmr8k9rk78ju0ncZ ~]# kubectl get node -o wide cn-hangzhou.i-j6c86lmr8k9rk78ju0nc
NAME                                 STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION              CONTAINER-RUNTIME
cn-hangzhou.i-j6c86lmr8k9rk78ju0nc   Ready    <none>   12d   v1.11.5   192.168.0.154   <none>        CentOS Linux 7 (Core)   3.10.0-693.2.2.el7.x86_64   docker://17.6.2
[root@iZj6c86lmr8k9rk78ju0ncZ ~]# kubectl exec -it nginx-64f497f8fd-ckpdm bash
root@nginx-64f497f8fd-ckpdm:/# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if106: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 4a:60:eb:97:f4:07 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.191/32 brd 192.168.0.191 scope global eth0
       valid_lft forever preferred_lft forever

Using network policy to limit accessible between containers

The Terway plugin is compatible with NetworkPolicy in the standard K8S to control access between containers, for example:

  1. Create and expose an deployment for test

    [root@iZbp126bomo449eksjknkeZ ~]# kubectl run nginx --image=nginx --replicas=2
    deployment "nginx" created
    [root@iZbp126bomo449eksjknkeZ ~]# kubectl expose deployment nginx --port=80
    service "nginx" exposed
  2. Run busybox to test connection to deployment:

    [root@iZbp126bomo449eksjknkeZ ~]# kubectl run busybox --rm -ti --image=busybox /bin/sh
    If you don't see a command prompt, try pressing enter.
    / # wget --spider --timeout=1 nginx
    Connecting to nginx (172.21.0.225:80)
    / #
  3. Config network policy,only allow pod access which have run: nginx label:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: access-nginx
    spec:
      podSelector:
      matchLabels:
        run: nginx
      ingress:
      - from:
      - podSelector:
        matchLabels:
          access: "true"
  4. The Pod access service without the specified label is rejected, and the container of the specified label can be accessed normally.

    [root@iZbp126bomo449eksjknkeZ ~]# kubectl run busybox --rm -ti --image=busybox /bin/sh
    If you don't see a command prompt, try pressing enter.
    / # wget --spider --timeout=1 nginx
    Connecting to nginx (172.21.0.225:80)
    wget: download timed out
    / #
    
    [root@iZbp126bomo449eksjknkeZ ~]# kubectl run busybox --rm -ti --labels="access=true" --image=busybox /bin/sh
    If you don't see a command prompt, try pressing enter.
    / # wget --spider --timeout=1 nginx
    Connecting to nginx (172.21.0.225:80)
    / #

Limit container in/out bandwidth

The Terway network plugin can limit the container's traffic via limit policy in pod's annotations. For example:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  annotations:
    kubernetes.io/ingress-bandwidth: 10M
    kubernetes.io/egress-bandwidth: 10M
spec:
  nodeSelector:
    kubernetes.io/hostname: cn-shanghai.i-uf63p6s96kf4jfh8wpwn
  containers:
    - name: nginx
      image: nginx:1.7.9
      ports:
        - containerPort: 80

Build Terway

Prerequisites:

  • Docker >= 17.05 with multi-stage build
docker build -t acs/terway:latest .

Test

unit test:

git clone https://github.com/AliyunContainerService/terway.git
docker run -i --rm \
  -v $(pwd)/terway:/go/src/github.com/AliyunContainerService/terway \
  -w /go/src/github.com/AliyunContainerService/terway \
  sunyuan3/gometalinter:v1 bash -c "go test -race ./..."

function test:

export KUBECONFIG=$HOME/.kube/config  # path to your kubeconfig file
cd terway/tests
go test -tags e2e -timeout 30m0s -v ./ 
  -args -trunk=true/false -policy=true/false

example:

go test -tags e2e -timeout 30m0s -v ./ 
  -args -trunk=false -policy=false

Contribute

You are welcome to make new issues and pull requests.

Built With

Felix: Terway's NetworkPolicy is implemented by integrating ProjectCalico's Felix components. Felix watch NetworkPolicy configuration and config ACL rules on container veth.

Cilium: In the IPvlan mode, Terway integrate Cilium components to support NetworkPolicy and optimize the Service performance. Cilium watch NetworkPolicy and Service configuration and inject ebpf program into pod's IPvlan slave device.

Community

DingTalk

Join DingTalk group by DingTalkGroup id "35924643".

Security

Please report vulnerabilities by email to [email protected]. Also see our SECURITY.md file for details.

terway's People

Contributors

ansen avatar bswang avatar caoyingjunz avatar casperliu avatar chenshuoshi-inagora avatar denverdino avatar dependabot[bot] avatar domechn avatar gaorong avatar husterwan avatar jzwlqx avatar jzwlqx01 avatar komey avatar l1b0k avatar letty5411 avatar lx1036 avatar lyt99 avatar mars1024 avatar mengskysama avatar njucjc avatar shuoyans avatar stormgbs avatar sunyuan3 avatar thxcode avatar xh4n3 avatar xuanyunhui avatar yahaa avatar ysicing avatar zhabinecho avatar zhiyuan0x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terway's Issues

golint scan failure

#docker run -ti --rm -v /disk1/sunyuan/tmp/terway/:/go/src/github.com/AliyunContainerService/terway -w /go/src/github.com/AliyunContainerService/terway pouchcontainer/pouchlinter:v0.1.2 bash -c "gometalinter --disable-all --skip vendor -E golint -d ./..."
DEBUG: [Jul  1 02:31:18.955] setenv PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:31:18.966] setenv GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:31:18.966] Current environment:
DEBUG: [Jul  1 02:31:18.966] PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:31:18.966] GOPATH="/go"
DEBUG: [Jul  1 02:31:18.966] GOBIN=""
DEBUG: [Jul  1 02:31:18.966] GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:31:18.967] linting path .
DEBUG: [Jul  1 02:31:18.967] linting path ./daemon
DEBUG: [Jul  1 02:31:18.967] linting path ./deviceplugin
DEBUG: [Jul  1 02:31:18.967] linting path ./examples/maxpods
DEBUG: [Jul  1 02:31:18.967] linting path ./pkg/aliyun
DEBUG: [Jul  1 02:31:18.967] linting path ./pkg/link
DEBUG: [Jul  1 02:31:18.967] linting path ./pkg/metric
DEBUG: [Jul  1 02:31:18.967] linting path ./pkg/pool
DEBUG: [Jul  1 02:31:18.967] linting path ./pkg/storage
DEBUG: [Jul  1 02:31:18.967] linting path ./pkg/tc
DEBUG: [Jul  1 02:31:18.967] linting path ./plugin/driver
DEBUG: [Jul  1 02:31:18.967] linting path ./plugin/terway
DEBUG: [Jul  1 02:31:18.967] linting path ./rpc
DEBUG: [Jul  1 02:31:18.967] linting path ./types
DEBUG: [Jul  1 02:31:18.967] linting path ./version
DEBUG: [Jul  1 02:31:18.973] [golint.1]: executing /go/bin/golint -min_confidence 0.800000 . ./daemon ./deviceplugin ./examples/maxpods ./pkg/aliyun ./pkg/link ./pkg/metric ./pkg/pool ./pkg/storage ./pkg/tc ./plugin/driver ./plugin/terway ./rpc ./types ./version
DEBUG: [Jul  1 02:31:19.335] [golint.1]: golint hits 6: ^(?P<path>.*?\.go):(?P<line>\d+):(?P<col>\d+):\s*(?P<message>.*)$
DEBUG: [Jul  1 02:31:19.335] nolint: parsing daemon/eni-multi-ip.go for directives
DEBUG: [Jul  1 02:31:19.335] [golint.1]: golint linter took 362.016088ms
DEBUG: [Jul  1 02:31:19.337] nolint: parsing daemon/eni-multi-ip.go took 1.957198ms
DEBUG: [Jul  1 02:31:19.337] nolint: parsing daemon/server.go for directives
daemon/eni-multi-ip.go:15:2:warning: const maxIpBacklog should be maxIPBacklog (golint)
daemon/eni-multi-ip.go:141:9:warning: if block ends with a return statement, so drop this else and outdent its block (golint)
DEBUG: [Jul  1 02:31:19.338] nolint: parsing daemon/server.go took 689.868µs
DEBUG: [Jul  1 02:31:19.338] nolint: parsing plugin/terway/cni.go for directives
daemon/server.go:14:2:warning: a blank import should be only in a main or test package, or have a comment justifying it (golint)
DEBUG: [Jul  1 02:31:19.340] nolint: parsing plugin/terway/cni.go took 2.141706ms
plugin/terway/cni.go:75:2:warning: don't use ALL_CAPS in Go names; use CamelCase (golint)
plugin/terway/cni.go:76:2:warning: don't use ALL_CAPS in Go names; use CamelCase (golint)
plugin/terway/cni.go:77:2:warning: don't use ALL_CAPS in Go names; use CamelCase (golint)
DEBUG: [Jul  1 02:31:19.340] total elapsed time 374.274954ms

关于terway的小疑问

  1. 看了下,terway基本借鉴的calico
  2. 如果用默认的terway.yml manifest安装,那么必定是有问题,terway的初始化容器只会下载terway,其他的二进制文件并不会补全,需要手动搞定
  3. 有两种模式,基本上不是考虑过后都会选择vpc模式,但是如果选择vpc模式,就需要在vpc上添加路由,然而terway并没有添加路由的逻辑,这部分逻辑是在alicloud-cloud-manager中...
  4. 如果使用多IP模式,那么一个Pod一个IP,对于已经做了子网划分的场景,显然是不够的,而且也不太好做策略.

terway是否支持calico的“application layer policy”

Hi,
对于terway (eni or vpc模式),不知道现在对于application layer policy的支持事怎么样的,application layer policy支持对API call 方法和路径进行限制,不知道现在的支持状态事如何的?

goimports scan failure

#docker run -ti --rm -v /disk1/sunyuan/tmp/terway/:/go/src/github.com/AliyunContainerService/terway -w /go/src/github.com/AliyunContainerService/terway pouchcontainer/pouchlinter:v0.1.2 bash -c "gometalinter --disable-all --skip vendor -E goimports -d ./..."
DEBUG: [Jul  1 02:27:09.573] setenv PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:27:09.585] setenv GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:27:09.585] Current environment:
DEBUG: [Jul  1 02:27:09.585] PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:27:09.585] GOPATH="/go"
DEBUG: [Jul  1 02:27:09.585] GOBIN=""
DEBUG: [Jul  1 02:27:09.585] GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:27:09.586] linting path .
DEBUG: [Jul  1 02:27:09.586] linting path ./daemon
DEBUG: [Jul  1 02:27:09.586] linting path ./deviceplugin
DEBUG: [Jul  1 02:27:09.586] linting path ./examples/maxpods
DEBUG: [Jul  1 02:27:09.586] linting path ./pkg/aliyun
DEBUG: [Jul  1 02:27:09.586] linting path ./pkg/link
DEBUG: [Jul  1 02:27:09.586] linting path ./pkg/metric
DEBUG: [Jul  1 02:27:09.586] linting path ./pkg/pool
DEBUG: [Jul  1 02:27:09.586] linting path ./pkg/storage
DEBUG: [Jul  1 02:27:09.586] linting path ./pkg/tc
DEBUG: [Jul  1 02:27:09.586] linting path ./plugin/driver
DEBUG: [Jul  1 02:27:09.586] linting path ./plugin/terway
DEBUG: [Jul  1 02:27:09.586] linting path ./rpc
DEBUG: [Jul  1 02:27:09.587] linting path ./types
DEBUG: [Jul  1 02:27:09.587] linting path ./version
DEBUG: [Jul  1 02:27:09.592] [goimports.1]: executing /go/bin/goimports -l main.go daemon/context.go daemon/daemon.go daemon/eni-multi-ip.go daemon/eni.go daemon/k8s.go daemon/null.go daemon/resource_manager.go daemon/server.go daemon/veth.go deviceplugin/eni.go examples/maxpods/maxpods.go pkg/aliyun/aliyun_client_mgr.go pkg/aliyun/ecs.go pkg/aliyun/eni.go pkg/aliyun/errors.go pkg/aliyun/metadata.go pkg/aliyun/utils.go pkg/link/interface.go pkg/link/interface_unsupport.go pkg/link/veth.go pkg/link/veth_test.go pkg/metric/aliyun.go pkg/metric/rpc.go pkg/metric/util.go pkg/pool/pool.go pkg/pool/pool_test.go pkg/pool/queue.go pkg/pool/queue_test.go pkg/storage/store.go pkg/tc/tc.go plugin/driver/drivers.go plugin/driver/ipvlan.go plugin/driver/raw_nic.go plugin/driver/utils.go plugin/terway/cni.go rpc/rpc.pb.go types/config.go types/types.go version/spec.go
DEBUG: [Jul  1 02:27:09.819] [goimports.1]: goimports hits 24: ^(?P<path>.*?\.go)$
DEBUG: [Jul  1 02:27:09.820] nolint: parsing main.go for directives
DEBUG: [Jul  1 02:27:09.821] nolint: parsing main.go took 833.549µs
DEBUG: [Jul  1 02:27:09.821] nolint: parsing daemon/daemon.go for directives
main.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.827] [goimports.1]: goimports linter took 235.02653ms
DEBUG: [Jul  1 02:27:09.827] nolint: parsing daemon/daemon.go took 6.068556ms
DEBUG: [Jul  1 02:27:09.827] nolint: parsing daemon/eni-multi-ip.go for directives
daemon/daemon.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.829] nolint: parsing daemon/eni-multi-ip.go took 1.706347ms
DEBUG: [Jul  1 02:27:09.829] nolint: parsing daemon/k8s.go for directives
daemon/eni-multi-ip.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.831] nolint: parsing daemon/k8s.go took 1.88451ms
DEBUG: [Jul  1 02:27:09.831] nolint: parsing daemon/null.go for directives
daemon/k8s.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.831] nolint: parsing daemon/null.go took 144.851µs
DEBUG: [Jul  1 02:27:09.831] nolint: parsing daemon/server.go for directives
daemon/null.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.832] nolint: parsing daemon/server.go took 667.67µs
DEBUG: [Jul  1 02:27:09.832] nolint: parsing daemon/veth.go for directives
daemon/server.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.833] nolint: parsing daemon/veth.go took 664.367µs
DEBUG: [Jul  1 02:27:09.833] nolint: parsing deviceplugin/eni.go for directives
daemon/veth.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.834] nolint: parsing deviceplugin/eni.go took 1.140974ms
DEBUG: [Jul  1 02:27:09.834] nolint: parsing examples/maxpods/maxpods.go for directives
deviceplugin/eni.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.834] nolint: parsing examples/maxpods/maxpods.go took 301.319µs
DEBUG: [Jul  1 02:27:09.834] nolint: parsing pkg/aliyun/ecs.go for directives
examples/maxpods/maxpods.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.837] nolint: parsing pkg/aliyun/ecs.go took 2.732659ms
DEBUG: [Jul  1 02:27:09.837] nolint: parsing pkg/aliyun/eni.go for directives
pkg/aliyun/ecs.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.838] nolint: parsing pkg/aliyun/eni.go took 987.45µs
DEBUG: [Jul  1 02:27:09.838] nolint: parsing pkg/aliyun/metadata.go for directives
pkg/aliyun/eni.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.839] nolint: parsing pkg/aliyun/metadata.go took 527.043µs
DEBUG: [Jul  1 02:27:09.839] nolint: parsing pkg/metric/util.go for directives
pkg/aliyun/metadata.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.839] nolint: parsing pkg/metric/util.go took 128.156µs
DEBUG: [Jul  1 02:27:09.839] nolint: parsing pkg/pool/pool.go for directives
pkg/metric/util.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.840] nolint: parsing pkg/pool/pool.go took 1.470534ms
DEBUG: [Jul  1 02:27:09.840] nolint: parsing pkg/pool/pool_test.go for directives
pkg/pool/pool.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.841] nolint: parsing pkg/pool/pool_test.go took 1.16479ms
DEBUG: [Jul  1 02:27:09.842] nolint: parsing pkg/pool/queue_test.go for directives
pkg/pool/pool_test.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.842] nolint: parsing pkg/pool/queue_test.go took 446.131µs
DEBUG: [Jul  1 02:27:09.842] nolint: parsing pkg/storage/store.go for directives
pkg/pool/queue_test.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.843] nolint: parsing pkg/storage/store.go took 810.879µs
DEBUG: [Jul  1 02:27:09.843] nolint: parsing pkg/tc/tc.go for directives
pkg/storage/store.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.843] nolint: parsing pkg/tc/tc.go took 408.302µs
DEBUG: [Jul  1 02:27:09.843] nolint: parsing plugin/driver/drivers.go for directives
pkg/tc/tc.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.845] nolint: parsing plugin/driver/drivers.go took 2.088716ms
DEBUG: [Jul  1 02:27:09.845] nolint: parsing plugin/driver/ipvlan.go for directives
plugin/driver/drivers.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.847] nolint: parsing plugin/driver/ipvlan.go took 1.095883ms
DEBUG: [Jul  1 02:27:09.847] nolint: parsing plugin/driver/raw_nic.go for directives
plugin/driver/ipvlan.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.847] nolint: parsing plugin/driver/raw_nic.go took 876.034µs
DEBUG: [Jul  1 02:27:09.848] nolint: parsing plugin/driver/utils.go for directives
plugin/driver/raw_nic.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.848] nolint: parsing plugin/driver/utils.go took 363.802µs
DEBUG: [Jul  1 02:27:09.848] nolint: parsing plugin/terway/cni.go for directives
plugin/driver/utils.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.850] nolint: parsing plugin/terway/cni.go took 2.156188ms
DEBUG: [Jul  1 02:27:09.850] nolint: parsing rpc/rpc.pb.go for directives
plugin/terway/cni.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.856] nolint: parsing rpc/rpc.pb.go took 5.609133ms
rpc/rpc.pb.go:1::warning: file is not goimported (goimports)
DEBUG: [Jul  1 02:27:09.856] total elapsed time 270.661271ms

请求指定 IP

使用 ENI multi IP 模式

在一些业务场景下,容器希望能够使用固定的 IP 地址。现在 terway 可以支持吗?

或者能不能建议下简单的方案去扩展 terway 的 IPAM 策略。

感谢!

network: unexpected end of JSON input

event may echo incorrect likethis

default     9m19s       Warning   FailedCreatePodSandBox   pod/nginx-78c689dfdc-hjjjv       (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "be04b956aba0fd8edce89805d8364afc0d2ed5507f2f07e6fd5312457c8c56b4" network for pod "nginx-78c689dfdc-hjjjv": networkPlugin cni failed to set up pod "nginx-78c689dfdc-hjjjv_default" network: unexpected end of JSON input

this is dure to terway cni plugin response not a json format

Some suggestions for document

Recently I was searching a CNI plugin for k8s cluster in aliyun cloud. After struggling through tons of nework plugins, I found terway is a good plugin that I can rely on. I realize that there exist some defaults in terway document after consulting @BSWANG some questions on DingTalk. I'd like to point them out in this issue :)

  1. Troubleshooting section
  2. Interpretation should be more detailed, both in readme and documents in aliyun(explaination of how ENI work, limitation of terway, advantage of terway. etc. )
  3. Migration(maybe?) routeway for other CNI(like for flannel, which is the default network plugin when setting up new k8s cluster on aliyun)
    And, finally, thanks to every contributer in this project. Thank you all.

tune RPS automatically to avoid imbalanced softIRQ of newly-created network interface

We have deployed terway in our oversea Kubernetes cluster and are migrating out legacy jobs into this cluster. But we hit some issues in the meantime. when we move a deployment into this cluster, which handles an extremely high number of simultaneously active connections and has a high QPS, we found the network latency become high sometimes. After digging into this problem, we found the softIRQ is high and is almost distributed at a sole CPU core.

image

As we always tune our host network parameter before the host is added into Kubernetes cluster, this issue seems really wired. Then we found the newly-created network interface's RPS parameter is not been touched and has the default value: 00000000. we thought this might be the reason. so we made a test with the network interface's receive queue RPS has 00000000 and ffffffff value separately. then we got our softIRQ distribution metrics as below.

image
(note: this metric is generated in a test cluster, so the peak of softIRQ is not as high as production environment )

As we can see, the RPS parameter in /sys/class/net/eth*/queues/rx-*/rps_cpus can greatly affect the distribution of softIRQ in each CPU core and the performance of network, we should carefully tune this parameter for each network interface.
The network card is created by terway dynamically and other applications can hardly detect the creation/deletion event in time, So maybe terway should have this ability naturally.

Do we have any ideas about adding this feature in terway?

Decouple of ENI CURD

Now, ENI CURD is done in eni daemon through aliyun openapi, which leads to some problems I think as below

  1. unable to do flow control on aliyun openapi, the frequency of requests increase with the size of the cluster
  2. no ENI-related info stores in k8s, if we want to do something using ENI id or else, we have to get it from aliyun openapi
  3. eni-daemon should not be upgraded frequently, if we use aliyun openapi sdk in daemon, we have to upgrade to follow the upgrade of sdk

So I think we can design a centralized ENI controller which to do jobs of ENI CURD and let eni-daemon connect to it through CRD (watch) or ClusterIP (polling).

misspell scan failure

#docker run -ti --rm -v /disk1/sunyuan/tmp/terway/:/go/src/github.com/AliyunContainerService/terway -w /go/src/github.com/AliyunContainerService/terway pouchcontainer/pouchlinter:v0.1.2 bash -c "find ./* -name \"*\" | grep -v vendor | xargs misspell -error"
policy/0001-terway.patch:25:20: "processer" is a misspelling of "processor"
./policy/0001-terway.patch:25:20: "processer" is a misspelling of "processor"

关于ENI模式访问service网络的疑问

如docs/design.md#eni中所说, 访问service网络会经过veth网卡,数据包到达host net namespace之后, iptables/ipvs会dnat把service ip转换成目的pod ip, 这个时候发包正常。 当目的pod接收到数据包之后, 回包会经过vpc路由,之后会直接到达源pod的物理网卡, 这个时候数据包的源ip并不是service ip, 所以数据包应该会被丢弃? 这个问题请问是如何处理的,是需要把kube-proxy中的masqueradeAll设置成true?

ps:我并没有阿里云的环境, 所以并不能验证这个问题,以上分析纯粹是理论分析

terway插件辅助网卡主ip地址不可见,但是能ping通

购买托管版容器服务集群,进入ECS节点后,应用想要使用辅助网卡ip地址,发现该辅助网卡绑定ECS后,并没有获取ip地址。
经过初步测试,发现从ECS/Pod都可以ping通该IP地址,请问该辅助网卡主ip地址在哪里配置的?实现基本原理是啥?应用怎么才能绑定该地址?

使用go module

目前改一些代码还不太方便,希望引入go module

eip缺少按量付费的annotation

terway使用k8s.aliyun.com/pod-with-eip创建出来的eip默认是按带宽计费的,可以加一个k8s.aliyun.com/eip-charge-type: "PayByTraffic" 做成按量计费

goconst scan failure

#docker run -ti --rm -v /disk1/sunyuan/tmp/terway/:/go/src/github.com/AliyunContainerService/terway -w /go/src/github.com/AliyunContainerService/terway pouchcontainer/pouchlinter:v0.1.2 bash -c "gometalinter --disable-all --skip vendor -E goconst -d ./..."
DEBUG: [Jul  1 02:32:24.411] setenv PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:32:24.422] setenv GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:32:24.422] Current environment:
DEBUG: [Jul  1 02:32:24.422] PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:32:24.422] GOPATH="/go"
DEBUG: [Jul  1 02:32:24.422] GOBIN=""
DEBUG: [Jul  1 02:32:24.422] GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:32:24.423] linting path .
DEBUG: [Jul  1 02:32:24.423] linting path ./daemon
DEBUG: [Jul  1 02:32:24.423] linting path ./deviceplugin
DEBUG: [Jul  1 02:32:24.423] linting path ./examples/maxpods
DEBUG: [Jul  1 02:32:24.423] linting path ./pkg/aliyun
DEBUG: [Jul  1 02:32:24.423] linting path ./pkg/link
DEBUG: [Jul  1 02:32:24.423] linting path ./pkg/metric
DEBUG: [Jul  1 02:32:24.423] linting path ./pkg/pool
DEBUG: [Jul  1 02:32:24.423] linting path ./pkg/storage
DEBUG: [Jul  1 02:32:24.423] linting path ./pkg/tc
DEBUG: [Jul  1 02:32:24.423] linting path ./plugin/driver
DEBUG: [Jul  1 02:32:24.423] linting path ./plugin/terway
DEBUG: [Jul  1 02:32:24.423] linting path ./rpc
DEBUG: [Jul  1 02:32:24.423] linting path ./types
DEBUG: [Jul  1 02:32:24.423] linting path ./version
DEBUG: [Jul  1 02:32:24.429] [goconst.1]: executing /go/bin/goconst -min-occurrences 3 -min-length 3 . ./daemon ./deviceplugin ./examples/maxpods ./pkg/aliyun ./pkg/link ./pkg/metric ./pkg/pool ./pkg/storage ./pkg/tc ./plugin/driver ./plugin/terway ./rpc ./types ./version
DEBUG: [Jul  1 02:32:24.451] [goconst.1]: goconst hits 3: ^(?P<path>.*?\.go):(?P<line>\d+):(?P<col>\d+):\s*(?P<message>.*)$
DEBUG: [Jul  1 02:32:24.451] [goconst.1]: goconst linter took 22.28464ms
DEBUG: [Jul  1 02:32:24.451] nolint: parsing daemon/k8s.go for directives
DEBUG: [Jul  1 02:32:24.454] nolint: parsing daemon/k8s.go took 3.41257ms
DEBUG: [Jul  1 02:32:24.454] nolint: parsing daemon/daemon.go for directives
daemon/k8s.go:203:83:warning: 2 other occurrence(s) of "false" found in: daemon/daemon.go:678:20 daemon/daemon.go:679:17 (goconst)
DEBUG: [Jul  1 02:32:24.458] nolint: parsing daemon/daemon.go took 3.151222ms
daemon/daemon.go:678:20:warning: 2 other occurrence(s) of "false" found in: daemon/k8s.go:203:83 daemon/daemon.go:679:17 (goconst)
daemon/daemon.go:679:17:warning: 2 other occurrence(s) of "false" found in: daemon/k8s.go:203:83 daemon/daemon.go:678:20 (goconst)
DEBUG: [Jul  1 02:32:24.458] total elapsed time 35.470389ms

Errors you may encounter when upgrading the library

(The purpose of this report is to alert AliyunContainerService/terway to the possible problems when AliyunContainerService/terway try to upgrade the following dependencies)

An error will happen when upgrading library _prometheus/client_golang:

github.com/prometheus/client_golang

-Latest Version: v1.7.1 (Latest commit fe7bd95 5 days ago )
Master branch
-Where did you use it:
https://github.com/AliyunContainerService/terway/search?l=Go&q=github.com%2Fprometheus%2Fclient_golang
-Detail:

https://github.com/prometheus/client_golang/go.mod

module github.com/prometheus/client_golang
go 1.11
require (
	github.com/beorn7/perks v1.0.1
	github.com/cespare/xxhash/v2 v2.1.1	
        …
)

https://github.com/prometheus/client_golang/prometheus/registry.go

package prometheus
import (
	"github.com/cespare/xxhash/v2"
	"github.com/golang/protobuf/proto"
	…
)

This problem was introduced since prometheus/client_golang v1.2.0(Latest commit 9a2ab94 on 16 Oct 2019 ).Now you used version v1.1.0. If you try to upgrade prometheus/client_golang to version v1.2.0 and above, you will get an error--- no package exists at "github.com/cespare/xxhash/v2"

I investigated the libraries (prometheus/client_golang >= v1.2.0) release information and found the root cause of this issue is that----

  1. These dependencies all added Go modules in the recent versions.

  2. They all comply with the specification of "Releasing Modules for v2 or higher" available in the Modules documentation. Quoting the specification:

A package that has migrated to Go Modules must include the major version in the import path to reference any v2+ modules. For example, Repo github.com/my/module migrated to Modules on version v3.x.y. Then this repo should declare its module path with MAJOR version suffix "/v3" (e.g., module github.com/my/module/v3), and its downstream project should use "github.com/my/module/v3/mypkg" to import this repo’s package.

  1. This "github.com/my/module/v3/mypkg" is not the physical path. So earlier versions of Go (including those that don't have minimal module awareness) plus all tooling (like dep, glide, govendor, etc) don't have minimal module awareness as of now and therefore don't handle import paths correctly See golang/dep#1962, golang/dep#2139.

Note: creating a new branch is not required. If instead you have been previously releasing on master and would prefer to tag v3.0.0 on master, that is a viable option. (However, be aware that introducing an incompatible API change in master can cause issues for non-modules users who issue a go get -u given the go tool is not aware of semver prior to Go 1.11 or when module mode is not enabled in Go 1.11+).
Pre-existing dependency management solutions such as dep currently can have problems consuming a v2+ module created in this way. See for example dep#1962.
https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher

Solution

1. Migrate to Go Modules.

Go Modules is the general trend of ecosystem, if you want a better upgrade package experience, migrating to Go Modules is a good choice.

Migrate to modules will be accompanied by the introduction of virtual paths(It was discussed above).

This "github.com/my/module/v3/mypkg" is not the physical path. So Go versions older than 1.9.7 and 1.10.3 plus all third-party dependency management tools (like dep, glide, govendor, etc) don't have minimal module awareness as of now and therefore don't handle import paths correctly.

Then the downstream projects might be negatively affected in their building if they are module-unaware (Go versions older than 1.9.7 and 1.10.3; Or use third-party dependency management tools, such as: Dep, glide, govendor…).

2. Maintaining v2+ libraries that use Go Modules in Vendor directories.

If AliyunContainerService/terway want to keep using the dependency manage tools (like dep, glide, govendor, etc), and still want to upgrade the dependencies, can choose this fix strategy.
Manually download the dependencies into the vendor directory and do compatibility dispose(materialize the virtual path or delete the virtual part of the path). Avoid fetching the dependencies by virtual import paths. This may add some maintenance overhead compared to using modules.

As the import paths have different meanings between the projects adopting module repos and the non-module repos, materialize the virtual path is a better way to solve the issue, while ensuring compatibility with downstream module users. A textbook example provided by repo github.com/moby/moby is here:
https://github.com/moby/moby/blob/master/VENDORING.md
https://github.com/moby/moby/blob/master/vendor.conf
In the vendor directory, github.com/moby/moby adds the /vN subdirectory in the corresponding dependencies.
This will help more downstream module users to work well with your package.

3. Request upstream to do compatibility processing.

The prometheus/client_golang have 1039 module-unaware users in github, such as: AndreaGreco/mqtt_sensor_exporter, seekplum/plum_exporter, arl/monitoring…
https://github.com/search?q=prometheus%2Fclient_golang+filename%3Avendor.conf+filename%3Avendor.json+filename%3Aglide.toml+filename%3AGodep.toml+filename%3AGodep.json

Summary

You can make a choice when you meet this DM issues by balancing your own development schedules/mode against the affects on the downstream projects.

For this issue, Solution 1 can maximize your benefits and with minimal impacts to your downstream projects the ecosystem.

References

Do you plan to upgrade the libraries in near future?
Hope this issue report can help you ^_^
Thank you very much for your attention.

Best regards,
Kate

如果创建pod时不使用eni,使用vpc,是否有路由条目条数的限制

文档里面关于这个的描述比较疑惑

结合VPC网络
纯ENI网络受限于每台ECS上ENI网卡数量,很多时候会显得不实用。一种方式是在ENI插件中支持VPC网络(现在的容器VPC网络,基于vrouter转发的方式),用户可以选择一个Pod使用ENI网络还是VPC网络,如果选择ENI网络,Pod独占ENI网卡,否则Pod只分配一个VPC内能用的IP地址。 优先级P2

在ENI插件中支持VPC网络,Pod只分配一个VPC内能用的IP地址,这个不是很理解,说的意思是podcidr属于vpc网段,pod 分配的ip在vrouter路由表里存在的?但是我没在代码里面找到路由表相关的api的调用

ARP解析问题

把host namespace的ipvlan slave设备(这里用ipvl0指代)IP地址设置成与master IP地址一致,子网掩码32位,scope为host且加上NOARP flag时,从master设备进来的包都会进到ipvl0设备接收,出去时根据之前主机的默认路由直接走master设备发出,不经过ipvl0,这样看到arp表项里很多都是incomplete的,请问有遇到这个问题吗?
image

How to realize Ingress Egress traffic control

Hi, I have seen the ingress and egress traffic control parameter in driver setup interface, but no detail realization about it. Can you share about how to design and realize the traffic control in terway?

unknown field "aliyun/eni" in io.k8s.api.core.v1.ResourceRequirements

met error when i change my deployment to use terway ENI mode

# Pleasese edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
# deployments.extensions "iperf3-terway" was not valid:
# * : Invalid value: "The edited file failed validation": ValidationError(Deployment.spec.template.spec.containers[0].resources): unknown field "aliyun/eni" in io.k8s.api.core.v1.ResourceRequirements

calico network policy

Hi! Could you please verify if it's possible to use Calico Network Policy with Terway? Or is it possible to integrate calico network with terway ?
For example,

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: allow-tcp-6379
spec:
  selector: role == 'database'
  types:
  - Ingress
  - Egress
  ingress:
  - action: Allow
    metadata:
      annotations:
        from: frontend
        to: database
    protocol: TCP
    source:
      selector: role == 'frontend'
    destination:
      ports:
      - 6379
  egress:
  - action: Allow

markdownlint check

docker run -ti -v `pwd`:/go/src/github.com/AliyunContainerService/terway -w /go/src/github.com/AliyunContainerService/terway sunyuan3/gometalinter:v1 bash -c "find  ./ -name  \"*.md\" | grep -v vendor | grep -v commandline |  grep -v .github |  grep -v swagger |  grep -v api |  xargs mdl -r ~MD010,~MD013,~MD024,~MD029,~MD033,~MD036"
./docs/design.md:1: MD022 Headers should be surrounded by blank lines
./docs/design.md:8: MD022 Headers should be surrounded by blank lines
./docs/design.md:11: MD022 Headers should be surrounded by blank lines
./docs/design.md:14: MD022 Headers should be surrounded by blank lines
./docs/design.md:16: MD022 Headers should be surrounded by blank lines
./docs/design.md:17: MD022 Headers should be surrounded by blank lines
./docs/design.md:18: MD022 Headers should be surrounded by blank lines
./docs/design.md:19: MD022 Headers should be surrounded by blank lines
./README.md:2: MD009 Trailing spaces
./README.md:65: MD009 Trailing spaces
./README.md:133: MD009 Trailing spaces
./README.md:141: MD009 Trailing spaces
./README.md:151: MD009 Trailing spaces
./README.md:177: MD009 Trailing spaces
./README.md:183: MD009 Trailing spaces
./README.md:184: MD009 Trailing spaces
./README.md:206: MD009 Trailing spaces
./README.md:225: MD009 Trailing spaces
./README.md:7: MD012 Multiple consecutive blank lines
./README.md:35: MD012 Multiple consecutive blank lines
./README.md:185: MD012 Multiple consecutive blank lines
./README.md:1: MD022 Headers should be surrounded by blank lines
./README.md:10: MD022 Headers should be surrounded by blank lines
./README.md:67: MD022 Headers should be surrounded by blank lines
./README.md:208: MD022 Headers should be surrounded by blank lines
./README.md:10: MD026 Trailing punctuation in header
./README.md:38: MD026 Trailing punctuation in header
./README.md:40: MD026 Trailing punctuation in header
./README.md:67: MD026 Trailing punctuation in header
./README.md:128: MD026 Trailing punctuation in header
./README.md:134: MD031 Fenced code blocks should be surrounded by blank lines
./README.md:139: MD031 Fenced code blocks should be surrounded by blank lines
./README.md:142: MD031 Fenced code blocks should be surrounded by blank lines
./README.md:152: MD031 Fenced code blocks should be surrounded by blank lines
./README.md:183: MD031 Fenced code blocks should be surrounded by blank lines
./README.md:15: MD032 Lists should be surrounded by blank lines
./README.md:12: MD034 Bare URL used
./README-zh_CN.md:2: MD009 Trailing spaces
./README-zh_CN.md:61: MD009 Trailing spaces
./README-zh_CN.md:132: MD009 Trailing spaces
./README-zh_CN.md:140: MD009 Trailing spaces
./README-zh_CN.md:150: MD009 Trailing spaces
./README-zh_CN.md:176: MD009 Trailing spaces
./README-zh_CN.md:182: MD009 Trailing spaces
./README-zh_CN.md:183: MD009 Trailing spaces
./README-zh_CN.md:205: MD009 Trailing spaces
./README-zh_CN.md:31: MD012 Multiple consecutive blank lines
./README-zh_CN.md:184: MD012 Multiple consecutive blank lines
./README-zh_CN.md:1: MD022 Headers should be surrounded by blank lines
./README-zh_CN.md:8: MD022 Headers should be surrounded by blank lines
./README-zh_CN.md:36: MD022 Headers should be surrounded by blank lines
./README-zh_CN.md:133: MD031 Fenced code blocks should be surrounded by blank lines
./README-zh_CN.md:138: MD031 Fenced code blocks should be surrounded by blank lines
./README-zh_CN.md:141: MD031 Fenced code blocks should be surrounded by blank lines
./README-zh_CN.md:151: MD031 Fenced code blocks should be surrounded by blank lines
./README-zh_CN.md:182: MD031 Fenced code blocks should be surrounded by blank lines
./README-zh_CN.md:12: MD032 Lists should be surrounded by blank lines
./README-zh_CN.md:9: MD034 Bare URL used

使用IP 预留模式可能导致 GC 无法按预期工作

func (p *simpleObjectPool) peekOverfullIdle() *poolItem {
	p.lock.Lock()
	defer p.lock.Unlock()

	if !p.tooManyIdleLocked() {
		return nil
	}

	item := p.idle.Peek()  <----- Peek 永远返回 第一个
	if item == nil {
		return nil
	}

	if item.reservation.After(time.Now()) {   <----- 如果时间不满足就不能释放其他无用的IP 资源 
		return nil
	}
	return p.idle.Pop()
}

terway启动报错“open /var/lib/cni/terway/pod.db: no such file or directory”

我是在阿里云的ecs上自建k8s,在部署terway的时候,terway的pod报如下错误:
time="2020-06-29T08:14:51Z" level=info msg="Starting terway of version: 7da5160"
time="2020-06-29T08:14:51Z" level=info msg="got config: &{Version:1 AccessId:****** AccessSecret:****** ServiceCIDR:10.96.0.0/12 VSwitches:map[] MaxPoolSize:5 MinPoolSize:0 Prefix: SecurityGroup: HotPlug: EniCapRatio:0 EniCapShift:0} from: /etc/eni/eni.json"
time="2020-06-29T08:14:51Z" level=info msg="alicloud: clientmgr, use accesskeyid and accesskeysecret mode to authenticate user. without token"
time="2020-06-29T08:14:51Z" level=fatal msg="error init k8s service: failed init db storage with path /var/lib/cni/terway/pod.db and bucket pods: open /var/lib/cni/terway/pod.db: no such file or directory"
从错误看,是没有/var/lib/cni/terway这个文件夹,我自己创建了这个文件夹就能正常起来了,但是正常情况下应该不需要我自己创建该文件夹才对

shellcheck scan failure

#docker run -ti --rm -v /disk1/sunyuan/tmp/terway/:/go/src/github.com/AliyunContainerService/terway -w /go/src/github.com/AliyunContainerService/terway pouchcontainer/pouchlinter:v0.1.2 bash -c "find ./ -name \"*.sh\" | grep -v vendor | xargs shellcheck"

In ./policy/policyinit.sh line 3:
if [ "$DATASTORE_TYPE" == "kubernetes" ]; then
                       ^-- SC2039: In POSIX sh, == is not supported.


In ./policy/policyinit.sh line 14:
export CALICO_IPV4POOL_CIDR=${Network}
                            ^-- SC2154: Network is referenced but not assigned.


In ./policy/policyinit.sh line 24:
if [ ! -z $NODENAME ]; then
          ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/test.sh line 7:
source install_env.sh $@
                      ^-- SC2068: Double quote array expansions, otherwise they're like $* and break on spaces.


In ./tests/install_env.sh line 23:
	if [ -z ${terway_image} ]; then
                ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/install_env.sh line 30:
	export temp_dir=`mktemp -d`
               ^-- SC2155: Declare and assign separately to avoid masking return values.
                        ^-- SC2006: Use $(..) instead of legacy `..`.


In ./tests/install_env.sh line 32:
	export aliyun_cluster=`aliyun cs GET /clusters/${cluster_id}`
               ^-- SC2155: Declare and assign separately to avoid masking return values.
                              ^-- SC2006: Use $(..) instead of legacy `..`.


In ./tests/install_env.sh line 33:
	export security_group=`echo ${aliyun_cluster} | jq .security_group_id | tr -d '"'`
               ^-- SC2155: Declare and assign separately to avoid masking return values.
                              ^-- SC2006: Use $(..) instead of legacy `..`.
                                    ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/install_env.sh line 34:
	export vswitch=`echo ${aliyun_cluster} | jq .vswitch_id | tr -d '"'`
               ^-- SC2155: Declare and assign separately to avoid masking return values.
                       ^-- SC2006: Use $(..) instead of legacy `..`.
                             ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/install_env.sh line 35:
	aliyun cs GET /k8s/${cluster_id}/user_config | jq -r .config > ${temp_dir}/kubeconfig.yaml
                                                                       ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/install_env.sh line 37:
	export service_cidr=`aliyun cs GET /clusters/${cluster_id} | jq .parameters.ServiceCIDR | tr -d '"'`
               ^-- SC2155: Declare and assign separately to avoid masking return values.
                            ^-- SC2006: Use $(..) instead of legacy `..`.


In ./tests/install_env.sh line 38:
	export pod_cidr=`aliyun cs GET /clusters/${cluster_id} | jq .parameters.ContainerCIDR | tr -d '"'`
               ^-- SC2155: Declare and assign separately to avoid masking return values.
                        ^-- SC2006: Use $(..) instead of legacy `..`.


In ./tests/install_env.sh line 64:
	cp templates/terway/${terway_template} ${temp_dir}/
                                               ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/install_env.sh line 72:
	    -i ${temp_dir}/${terway_template}
               ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/install_env.sh line 74:
	kubectl apply -f ${temp_dir}/${terway_template}
                         ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/stress/stress.sh line 18:
	for deploy in ${stress_deploys[@]}; do
                      ^-- SC2068: Double quote array expansions, otherwise they're like $* and break on spaces.


In ./tests/stress/stress.sh line 19:
		kubectl delete deploy ${deploy}
                                      ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/stress/stress.sh line 26:
eni_max_scale=10
^-- SC2034: eni_max_scale appears unused. Verify it or export it.


In ./tests/stress/stress.sh line 32:
		scale_num=$(($RANDOM%max_scale))
                             ^-- SC2004: $/${} is unnecessary on arithmetic variables.


In ./tests/stress/stress.sh line 33:
		kubectl -n ${stress_ns} scale --replicas ${scale_num} deploy ${deploy}
                                                                             ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/stress/stress.sh line 34:
		jitter_time=$(($RANDOM%($scale_jitter*2*60)-$scale_jitter*60))
                               ^-- SC2004: $/${} is unnecessary on arithmetic variables.
                                        ^-- SC2004: $/${} is unnecessary on arithmetic variables.
                                                            ^-- SC2004: $/${} is unnecessary on arithmetic variables.


In ./tests/stress/stress.sh line 44:
		sleep $((${delete_period}*60))
                         ^-- SC2004: $/${} is unnecessary on arithmetic variables.


In ./tests/stress/stress.sh line 45:
		podlist=($(kubectl -n ${stress_ns} get pod -l run=${deploy} -o name))
                                                                  ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/stress/stress.sh line 47:
		if [ ${pod_len} -gt ${delete_count} ]; then
                     ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./tests/stress/stress.sh line 50:
		for (( i=0; i<$pod_len; i++ )) do
                              ^-- SC2004: $/${} is unnecessary on arithmetic variables.


In ./tests/stress/stress.sh line 51:
			kubectl -n ${stress_ns} delete ${podlist[$i]}
                                                       ^-- SC2086: Double quote to prevent globbing and word splitting.

deadcode scan failure

#docker run -ti --rm -v /disk1/sunyuan/tmp/terway/:/go/src/github.com/AliyunContainerService/terway -w /go/src/github.com/AliyunContainerService/terway pouchcontainer/pouchlinter:v0.1.2 bash -c "gometalinter --disable-all --skip vendor -E deadcode -d ./..."
DEBUG: [Jul  1 02:39:04.983] setenv PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:39:04.992] setenv GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:39:04.992] Current environment:
DEBUG: [Jul  1 02:39:04.992] PATH="/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
DEBUG: [Jul  1 02:39:04.992] GOPATH="/go"
DEBUG: [Jul  1 02:39:04.992] GOBIN=""
DEBUG: [Jul  1 02:39:04.992] GOROOT="/usr/local/go"
DEBUG: [Jul  1 02:39:04.993] linting path .
DEBUG: [Jul  1 02:39:04.993] linting path ./daemon
DEBUG: [Jul  1 02:39:04.993] linting path ./deviceplugin
DEBUG: [Jul  1 02:39:04.993] linting path ./examples/maxpods
DEBUG: [Jul  1 02:39:04.994] linting path ./pkg/aliyun
DEBUG: [Jul  1 02:39:04.994] linting path ./pkg/link
DEBUG: [Jul  1 02:39:04.994] linting path ./pkg/metric
DEBUG: [Jul  1 02:39:04.994] linting path ./pkg/pool
DEBUG: [Jul  1 02:39:04.994] linting path ./pkg/storage
DEBUG: [Jul  1 02:39:04.994] linting path ./pkg/tc
DEBUG: [Jul  1 02:39:04.994] linting path ./plugin/driver
DEBUG: [Jul  1 02:39:04.994] linting path ./plugin/terway
DEBUG: [Jul  1 02:39:04.994] linting path ./rpc
DEBUG: [Jul  1 02:39:04.994] linting path ./types
DEBUG: [Jul  1 02:39:04.994] linting path ./version
DEBUG: [Jul  1 02:39:05.002] [deadcode.1]: executing /go/bin/deadcode . ./daemon ./deviceplugin ./examples/maxpods ./pkg/aliyun ./pkg/link ./pkg/metric ./pkg/pool ./pkg/storage ./pkg/tc ./plugin/driver ./plugin/terway ./rpc ./types ./version
DEBUG: [Jul  1 02:39:05.030] [deadcode.1]: warning: /go/bin/deadcode returned exit status 2: deadcode: daemon/eni-multi-ip.go:260:1: newENIIPFactory is unused
deadcode: daemon/null.go:8:1: nullResourceManager is unused
deadcode: deviceplugin/eni.go:273:1: main is unused
deadcode: examples/maxpods/maxpods.go:13:1: debug is unused
deadcode: pkg/aliyun/metadata.go:13:1: instanceTypePath is unused
deadcode: plugin/driver/drivers.go:498:1: getNSHw is unused

DEBUG: [Jul  1 02:39:05.030] [deadcode.1]: deadcode hits 6: ^deadcode: (?P<path>.*?\.go):(?P<line>\d+):(?P<col>\d+):\s*(?P<message>.*)$
DEBUG: [Jul  1 02:39:05.030] nolint: parsing daemon/eni-multi-ip.go for directives
DEBUG: [Jul  1 02:39:05.030] [deadcode.1]: deadcode linter took 28.746016ms
DEBUG: [Jul  1 02:39:05.032] nolint: parsing daemon/eni-multi-ip.go took 1.813695ms
DEBUG: [Jul  1 02:39:05.032] nolint: parsing daemon/null.go for directives
daemon/eni-multi-ip.go:260:1:warning: newENIIPFactory is unused (deadcode)
DEBUG: [Jul  1 02:39:05.032] nolint: parsing daemon/null.go took 147.878µs
DEBUG: [Jul  1 02:39:05.032] nolint: parsing deviceplugin/eni.go for directives
daemon/null.go:8:1:warning: nullResourceManager is unused (deadcode)
DEBUG: [Jul  1 02:39:05.033] nolint: parsing deviceplugin/eni.go took 1.13617ms
DEBUG: [Jul  1 02:39:05.033] nolint: parsing examples/maxpods/maxpods.go for directives
deviceplugin/eni.go:273:1:warning: main is unused (deadcode)
DEBUG: [Jul  1 02:39:05.034] nolint: parsing examples/maxpods/maxpods.go took 252.944µs
DEBUG: [Jul  1 02:39:05.034] nolint: parsing pkg/aliyun/metadata.go for directives
examples/maxpods/maxpods.go:13:1:warning: debug is unused (deadcode)
DEBUG: [Jul  1 02:39:05.034] nolint: parsing pkg/aliyun/metadata.go took 482.032µs
DEBUG: [Jul  1 02:39:05.034] nolint: parsing plugin/driver/drivers.go for directives
pkg/aliyun/metadata.go:13:1:warning: instanceTypePath is unused (deadcode)
DEBUG: [Jul  1 02:39:05.036] nolint: parsing plugin/driver/drivers.go took 2.098173ms
plugin/driver/drivers.go:498:1:warning: getNSHw is unused (deadcode)
DEBUG: [Jul  1 02:39:05.036] total elapsed time 43.889571ms

关于terway网络性能

image
您好,最近我司计划将生产环境重建并大规模落地到基于terway的阿里云容器服务平台上,所以对网络性能做了评估,对于评估结果有几点疑问
所有的测试均为跨主机跨可用区访问
1)terway默认(未绑定弹性网卡)模式到terway默认的延时(ping 的RTT延时)测试结果最优,理论上说经过网桥转发或者hostgateway方式也会多一层转发,应该比直接主机网络栈性能有损失才对,这块不明白是为什么
2)terway默认模式到宿主机(跨可用区worker节点)的性能明显比其他场景低

麻烦大神给解答下上述两个问题的原因,谢谢

ENI分配失败导致IP分配以及GC停止工作

重现方法

  • 创建带有较小数量IP的集群
  • 一次性创建出超过IP数量的POD
  • 减小POD副本数到1
  • 调整POD副本到合理范围,观察IP分配状态
  • 此时集群POD无法正常分配IP,观察GC无法被触发
  • 且重启terway无法回复正常

一些日志

加了一些日志来排查问题

time="2020-05-29T14:59:52Z" level=debug msg="waiting popResult: 1"
time="2020-05-29T14:59:52Z" level=warning msg="Assign private ip address failed: Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress., retrying"
time="2020-05-29T14:59:52Z" level=debug msg="allocated ips for eni: eni = &{ID:eni-xxx Name:eth4 Address:{IP:10.110.12.15 Mask:ffffff80} MAC:00:16:3e:00:2b:9b Gateway:10.110.12.125 DeviceNumber:82 MaxIPs:20 VSwitch:vsw-xxx}, ips = [], err = error assign address for eniID: eni-xxx, Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress.: Assign private ip address failed: Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress."
time="2020-05-29T14:59:52Z" level=error msg="error allocate ips for eni: error assign address for eniID: eni-xxx, Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress.: Assign private ip address failed: Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress."
time="2020-05-29T14:59:52Z" level=info msg="eni's associated vswitch vsw-xxx has no available IP, set eni ipAllocInhibitExpireAt = 2020-05-29 15:09:52"
time="2020-05-29T14:59:52Z" level=debug msg="waiting popResult: done"

因为ENI创建失败,超过maxIPBacklog,第11次调用Alloc之后所有Alloc都处于异常状态

time="2020-05-29T14:59:56Z" level=debug msg="simpleObjectPool wait tokenCh or ctx.Done"
time="2020-05-29T14:59:56Z" level=debug msg="simpleObjectPool p.factory.Create(1) begin"
time="2020-05-29T14:59:56Z" level=debug msg=submit
time="2020-05-29T14:59:56Z" level=info msg="adjusted vswitch slice: [], original eni slice: [0xc000046080 0xc000a02480 0xc0010de780 0xc000046380 0xc000a02080 0xc0009f9580]"
...
ime="2020-05-29T14:59:56Z" level=debug msg="Create submit begin waiting:1 count:1"
time="2020-05-29T14:59:56Z" level=debug msg="Create submit done initENIIPCount:0  f.eniMaxIP:20  waiting:1"
time="2020-05-29T14:59:56Z" level=debug msg="waiting popResult: 1"
time="2020-05-29T15:00:48Z" level=debug msg="do resource gc on node"
time="2020-05-29T15:00:48Z" level=debug msg="GC: try lock ..."

原因

ENI创建失败之后,allocateWorker不会被执行

go eni.allocateWorker(f.ipResultChan)

导致popResult操作channel被饿死

func (f *eniIPFactory) popResult() (ip *types.ENIIP, err error) {
	result := <-f.ipResultChan
	if result.ENIIP == nil || result.err != nil {

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.