quentin-m / weave-tc Goto Github PK

View Code? Open in Web Editor NEW

40.0 40.0 16.0 12 KB

Shell 96.98% Dockerfile 3.02%

weave-tc's People

Contributors

Stargazers

Watchers

Forkers

thomaschaaf duboisf azman0101 ijji jeremygaither chotiwat kradalby giantswarm alauda waldiirawan hairyhenderson lewayne-aws morganyvm jnqa

weave-tc's Issues

sysctl: error: 'net.core.default_qdisc' is an unknown key

Getting the following error trying to run this on my cluster (kops, aws, kube 1.9, weave, jessie):

+ DNSMASQ_PORT=53
+ NET_OVERLAY_IF=weave
+ sysctl -w net.core.default_qdisc=fq_codel
sysctl: error: 'net.core.default_qdisc' is an unknown key

On the host node, this looks okay:

$ sudo sysctl -a | grep qdisc
net.core.default_qdisc = pfifo_fast

Any ideas @Quentin-M ?

No AAAA packet delay observed in tcpdump

Hi, thank you so much for the blog post! We've been seeing the intermittent dns issue for quite some time now, and your workaround seems to be the most promising so far.

I've installed it on one of our kubernetes cluster but I'm still seeing 2.5s dns queries at about the same rate as before installation. As you suggested in #3, I tried playing around with the delay. When I set it to 50ms, I couldn't see any delay for the AAAA packets; the A and AAAA packets are within a millisecond apart.

My questions are:

How do I check if the tc rules are working as they should? Where can I observe the delay?
I guess I don't really understand the packet flow of this workaround in general. Does the marked AAAA packet enter the netem queue POSTROUTING on the client host or the mark persists across host and it enters the netem queue on the server?

Things I have tried:

tcpdump -i weave 'udp port 53' | grep some-random-query on client/server hosts and a coredns pod, no delay
Adding -j LOG to the iptables rule, packets seem to be marked just fine. Below is an example output of iptables -L POSTROUTING -v -n -t mangle --line-numbers | grep 00001c0001 command:

1      43M 3407M MARK       udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:53 STRING match  "|00001c0001|" ALGO name bm FROM 40 TO 65535 u32 "0x1c&0xf8=0x0" MARK or 0x100

watch tc -d -s qdisc show dev weave the netem queue packets didn't increase when I was making the query. I've tried this on both client and server hosts.

Every 2.0s: tc -s qdisc show dev weave                                            Fri Nov 16 22:24:46 2018

qdisc prio 1: root refcnt 2 bands 2 priomap  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 Sent 15579969405 bytes 19324241 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 12: parent 1:2 limit 10240p flows 1024 quantum 8206 target 5.0ms interval 100.0ms ecn
 Sent 15579857578 bytes 19323180 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 65970 drop_overlimit 0 new_flow_count 11744479 ecn_mark 0
  new_flows_len 1 old_flows_len 33
qdisc netem 11: parent 1:1 limit 1000 delay 50.0ms  1.0ms
 Sent 111827 bytes 1061 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

I also have two nodejs processes (containerized, one on alpine and another on debian) that do 64 dns lookups every second and send datadog metrics of the lookup time.

Haven't been able to get this to work on k8s 1.10, weave 2.5 + coredns

I'm not quite sure what I'm missing; but adding this to my cluster hasn't helped the dns issue yet.
(kops, aws, 1.10 k8s on jessie, weave-net 2.5, coredns)

The dns ports used by coredns are default:

Pod Template:
  Containers:
   coredns:
    Image:       k8s.gcr.io/coredns:1.1.3
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP

NET_OVERLAY_IF is ofcourse also weave.

From within a standard ubuntu container, most of the dns tests generate timeouts within the first few hundred attemps. For example:

# - seq 1000 | parallel -j50 --joblog log curl -s https://google.com/ ">" /dev/null; sort -k4 -n log

.....
.....

227	:	1547187611.527	     0.614	0	0	7	0	curl -s https://google.com/ > /dev/null 227
556	:	1547187614.381	     0.615	0	0	7	0	curl -s https://google.com/ > /dev/null 556
25	:	1547187609.772	     0.658	0	0	7	0	curl -s https://google.com/ > /dev/null 25
803	:	1547187616.697	     5.977	0	0	7	0	curl -s https://google.com/ > /dev/null 803
609	:	1547187614.885	     6.029	0	0	7	0	curl -s https://google.com/ > /dev/null 609
603	:	1547187614.846	     6.051	0	0	7	0	curl -s https://google.com/ > /dev/null 603

The weave-tc container is otherwise up:

kubectl log weave-net-2ddv2 -n kube-system -f weave-tc
+ DNSMASQ_PORT=53
+ NET_OVERLAY_IF=weave
+ sysctl -w net.core.default_qdisc=fq_codel
net.core.default_qdisc = fq_codel
+ route
+ grep ^default
+ grep -o [^ ]*$
+ tc qdisc del dev eth0 root
+ true
+ route
+ grep -o [^ ]*$
+ grep ^default
+ tc qdisc add dev eth0 root handle 0: mq
+ ip link
+ grep weave
+ tc qdisc del dev weave root
+ true
+ tc qdisc add dev weave root handle 1: prio bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ tc qdisc add dev weave parent 1:2 handle 12: fq_codel
+ tc qdisc add dev weave parent 1:1 handle 11: netem delay 4ms 1ms distribution pareto
+ tc filter add dev weave protocol all parent 1: prio 1 handle 0x100/0x100 fw flowid 1:1
+ iptables -A POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 28 & 0xF8 = 0 --hex-string |00001C0001| --algo bm --from 40 -j MARK --set-mark 0x100/0x100
+ sleep 3600
+ :
+ sleep 3600

It is applied like so within the weave-net daemonset:

    spec:
      containers:
      - image: qmachu/weave-tc:bd94b89
        imagePullPolicy: IfNotPresent
        name: weave-tc
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /run/xtables.lock
          name: xtables-lock
        - mountPath: /lib/tc
          name: lib-tc
.....
.....
      - hostPath:
          path: /usr/lib/tc
          type: ""
        name: lib-tc

Has anyone tried this on Azure....?

Well I have.... and although everything "looks" like its working, it is not - and DNS resolution is still very poor.

So this is a long post, apologies. I don't really have a suitable daemonset to "tack" this onto, so I am using a dedicated daemonset for it:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: weave-tc
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: weave-tc
  template:
    metadata:
      labels:
        name: weave-tc
    spec:
      hostNetwork: true
      containers:
      - name: weave-tc
        image: rlees85/summit-weave-tc-temp:latest
        env:
          - name: DNSMASQ_PORT
            value: "53"
          - name: NET_OVERLAY_IF
            value: "azure0"
          - name: TARGET_DELAY_MS
            value: "10"
          - name: TARGET_SKEW_MS
            value: "1"
        securityContext:
          privileged: true
        volumeMounts:
        - name: xtables-lock
          mountPath: /run/xtables.lock
        - name: lib-tc
          mountPath: /lib/tc
          readOnly: true
      volumes:
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
      - name: lib-tc
        hostPath:
          path: /usr/lib/tc

The image: rlees85/summit-weave-tc-temp:latest is pretty much the same as yours, except the 4ms and 1ms parameters are exposed as environment variables for easier tweaking without rebuilding the image.

This is what the logs look like:

+ DNSMASQ_PORT=53
+ NET_OVERLAY_IF=azure0
+ sysctl -w 'net.core.default_qdisc=fq_codel'
net.core.default_qdisc = fq_codel
+ route
+ grep ^default
+ grep -o '[^ ]*$'
+ tc qdisc del dev azure0 root
+ grep -o '[^ ]*$'
+ grep ^default
+ route
+ tc qdisc add dev azure0 root handle 0: mq
RTNETLINK answers: Not supported
+ true
+ iptables -F POSTROUTING -t mangle
+ ip link
+ grep azure0
+ tc qdisc del dev azure0 root
+ true
+ tc qdisc add dev azure0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ tc qdisc add dev azure0 parent 1:2 handle 12: fq_codel
+ tc qdisc add dev azure0 parent 1:1 handle 11: netem delay 10ms 1ms distribution pareto
+ tc filter add dev azure0 protocol all parent 1: prio 1 handle 0x100/0x100 fw flowid 1:1
+ iptables -C POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 '28 & 0xF8 = 0' --hex-string '|00001C0001|' --algo bm --from 40 -j MARK --set-mark 0x100/0x100
iptables: No chain/target/match by that name.
+ iptables -A POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 '28 & 0xF8 = 0' --hex-string '|00001C0001|' --algo bm --from 40 -j MARK --set-mark 0x100/0x100
+ sleep 3600

I am getting packets through the filter, which proves the marking I guess. But the count is quite low... This has been running for a hour or so on a quiet cluster.

/ # tc -d -s qdisc show dev azure0
qdisc prio 1: root refcnt 2 bands 2 priomap  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 Sent 69961726 bytes 117085 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 12: parent 1:2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 69934283 bytes 116854 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 44885 drop_overlimit 0 new_flow_count 18727 ecn_mark 0
  new_flows_len 1 old_flows_len 13
qdisc netem 11: parent 1:1 limit 1000 delay 10.0ms  1.0ms
 Sent 26866 bytes 229 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

If I enable logging I can see packets are marked:

[979508.507677] IN= OUT=azure0 PHYSIN=azv977e954f886 PHYSOUT=eth0 SRC=10.102.6.136 DST=10.102.10.171 LEN=122 TOS=0x00 PREC=0x00 TTL=64 ID=9933 DF PROTO=UDP SPT=32785 DPT=53 LEN=102 MARK=0x100

But.... performance is still garbage.

url='ifconfig.co'
if [ -f /tmp/log.txt ]; then rm /tmp/log.txt; fi
for i in `seq 1 20`; do curl -w '%{time_namelookup}\n' -o /dev/null -s $url >> /tmp/log.txt; echo "${i}"; done
sort /tmp/log.txt | uniq | tail -10

I've tried messing with 4/1ms right up to 100/?ms and nothing seems to improve it.

We don't have access to other options like running DNS as a daemonset. These are Azure AKS based clusters and unfortunately moving out of Azure or going self-managed on Azure is not an option. I just wondered if there was any more debug paths to get this workaround to work?

Does /lib/tc need mount as hostPath?

In README,

Depending on the package, the appropriate mount may not be /lib/tc, the tc binary as well as the pareto.dist file are required.

$ docker run -ti --rm --entrypoint /bin/ash qmachu/weave-tc:bd94b89
/ # ls -al /usr/lib/tc/
total 112
drwxr-xr-x    2 root     root           125 Jul 22 07:56 .
drwxr-xr-x    4 root     root          4096 Jul 22 07:56 ..
-rw-r--r--    2 root     root         23077 Oct 31  2017 experimental.dist
lrwxrwxrwx    1 root     root             7 Aug 20  2018 m_ipt.so -> m_xt.so
-rwxr-xr-x    2 root     root         10024 Oct 31  2017 m_xt.so
-rw-r--r--    2 root     root         23573 Oct 31  2017 normal.dist
-rw-r--r--    2 root     root         23729 Oct 31  2017 pareto.dist
-rw-r--r--    2 root     root         23447 Oct 31  2017 paretonormal.dist
/ # ls -al /lib/tc/
ls: /lib/tc/: No such file or directory

Could you explain more details about how to setup /lib/tc/, Thank you.

Attempted Daemonset weave-tc, but DNS requests flatlined

So I think I've been experiencing these DNS timeouts for a couple of months now, and I decided to try your patch as it seemed the least invasive first step (and maybe last step) at fixing the issue. I didn't want to modify the weave-net daemonset since it is core to kubernetes so I decided to create a separate "weave-tc" daemonset that did the same thing. Here is the manifest:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    name: weave-tc
    role.kubernetes.io/networking: "1"
  name: weave-tc
  namespace: kube-system
spec:
  template:
    metadata:
      labels:
        name: weave-tc
        role.kubernetes.io/networking: "1"      
    spec:
      containers:
      - name: weave-tc
        image: 'qmachu/weave-tc:bd94b89'
        securityContext:
          privileged: true
        volumeMounts:
        - name: xtables-lock
          mountPath: /run/xtables.lock
        - name: lib-tc
          mountPath: /lib/tc
      hostNetwork: true
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - key: CriticalAddonsOnly
        operator: Exists
      volumes:
      - hostPath:
          path: /usr/lib/tc
          type: Directory
        name: lib-tc
      - hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        name: xtables-lock

I actually attempted this first in a test kubernetes cluster and it seemed to work fine. I tried it in our main cluster and network traffic immediately started failing. Services started responding extremely slowly (or not at all). Here is a picture of our CoreDNS metrics:

I didn't spend much time digging into what went wrong, I just quickly did a rolling update of the cluster to undo the changes. I though I'd post it here in case you had any idea what could have gone wrong. I'd still prefer to use your solution over a node-local-dns configuration, but I'm a but hesitant after the last attempt.

In terms of our cluster, we're running kubernetes 1.11.9, deployed via Kops, network is weave-net and DNS is CoreDNS. Please let me know if you need any other details.

Also, I pulled the logs from our logging cluster (pods were already deleted), but here it is:

+ NET_OVERLAY_IF=weave
+ DNSMASQ_PORT=53
+ sysctl -w net.core.default_qdisc=fq_codel
+ grep -o [^ ]*$
+ grep ^default
net.core.default_qdisc = fq_codel
+ route
+ tc qdisc del dev eth0 root
+ true
+ grep ^default
+ route
+ tc qdisc add dev eth0 root handle 0: mq
+ grep -o [^ ]*$
+ grep weave
+ ip link
+ tc qdisc del dev weave root
+ tc qdisc add dev weave root handle 1: prio bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ true
+ tc qdisc add dev weave parent 1:2 handle 12: fq_codel
+ tc qdisc add dev weave parent 1:1 handle 11: netem delay 4ms 1ms distribution pareto
+ tc filter add dev weave protocol all parent 1: prio 1 handle 0x100/0x100 fw flowid 1:1
+ iptables -A POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 28 & 0xF8 = 0 --hex-string |00001C0001| --algo bm --from 40 -j MARK --set-mark 0x100/0x100
+ sleep 3600

effectiveness of this workaround

Does this workaround suppose to eliminate the problem with dns lookup delays completely?
I've deployed it to our cluster (kops, AWS, weave) but still see 2.5 seconds delays.
I'm testing from within ubuntu container

VERSION="18.04.1 LTS (Bionic Beaver)"

with

options use-vc
options single-request-reopen

in /etc/resolv.conf
it's vanilla kube-dns (not CoreDNS as a backend)

I'm testing with

:>  /tmp/log.txt
url='ifconfig.co'
for i in `seq 1 10000`; do curl -4 -w '%{time_namelookup}\n' -o /dev/null -s $url >> /tmp/log.txt; done
sort /tmp/log.txt | uniq | tail -10

and get

Amazon EC2

Hello Quentin,

Do you perhaps have any experience deploying this solution to AWS? We used kops to deploy everything and have been running into this issue. Now we launched this weave-tc as a DaemonSet in kube-system, and everything seems fine inside of the weave-tc pods. In other pods however the problem persists. We tried mounting /usr/lib/tc and /sbin/tc from the host machine to the same locations inside of the container.

Any ideas?

Best Regards,

Jordy

quentin-m / weave-tc Goto Github PK

weave-tc's People

Contributors

Stargazers

Watchers

Forkers

weave-tc's Issues

sysctl: error: 'net.core.default_qdisc' is an unknown key

No AAAA packet delay observed in tcpdump

Haven't been able to get this to work on k8s 1.10, weave 2.5 + coredns

Has anyone tried this on Azure....?

Does /lib/tc need mount as hostPath?

Attempted Daemonset weave-tc, but DNS requests flatlined

effectiveness of this workaround

Amazon EC2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent