weave-tc's People
Forkers
thomaschaaf duboisf azman0101 ijji jeremygaither chotiwat kradalby giantswarm alauda waldiirawan hairyhenderson lewayne-aws morganyvm jnqaweave-tc's Issues
sysctl: error: 'net.core.default_qdisc' is an unknown key
Getting the following error trying to run this on my cluster (kops, aws, kube 1.9, weave, jessie):
+ DNSMASQ_PORT=53
+ NET_OVERLAY_IF=weave
+ sysctl -w net.core.default_qdisc=fq_codel
sysctl: error: 'net.core.default_qdisc' is an unknown key
On the host node, this looks okay:
$ sudo sysctl -a | grep qdisc
net.core.default_qdisc = pfifo_fast
Any ideas @Quentin-M ?
No AAAA packet delay observed in tcpdump
Hi, thank you so much for the blog post! We've been seeing the intermittent dns issue for quite some time now, and your workaround seems to be the most promising so far.
I've installed it on one of our kubernetes cluster but I'm still seeing 2.5s dns queries at about the same rate as before installation. As you suggested in #3, I tried playing around with the delay. When I set it to 50ms, I couldn't see any delay for the AAAA packets; the A and AAAA packets are within a millisecond apart.
My questions are:
- How do I check if the tc rules are working as they should? Where can I observe the delay?
- I guess I don't really understand the packet flow of this workaround in general. Does the marked AAAA packet enter the netem queue POSTROUTING on the client host or the mark persists across host and it enters the netem queue on the server?
Things I have tried:
tcpdump -i weave 'udp port 53' | grep some-random-query
on client/server hosts and a coredns pod, no delay- Adding
-j LOG
to the iptables rule, packets seem to be marked just fine. Below is an example output ofiptables -L POSTROUTING -v -n -t mangle --line-numbers | grep 00001c0001
command:
1 43M 3407M MARK udp -- * * 0.0.0.0/0 0.0.0.0/0 udp dpt:53 STRING match "|00001c0001|" ALGO name bm FROM 40 TO 65535 u32 "0x1c&0xf8=0x0" MARK or 0x100
watch tc -d -s qdisc show dev weave
the netem queue packets didn't increase when I was making the query. I've tried this on both client and server hosts.
Every 2.0s: tc -s qdisc show dev weave Fri Nov 16 22:24:46 2018
qdisc prio 1: root refcnt 2 bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Sent 15579969405 bytes 19324241 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 12: parent 1:2 limit 10240p flows 1024 quantum 8206 target 5.0ms interval 100.0ms ecn
Sent 15579857578 bytes 19323180 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 65970 drop_overlimit 0 new_flow_count 11744479 ecn_mark 0
new_flows_len 1 old_flows_len 33
qdisc netem 11: parent 1:1 limit 1000 delay 50.0ms 1.0ms
Sent 111827 bytes 1061 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
- I also have two nodejs processes (containerized, one on alpine and another on debian) that do 64 dns lookups every second and send datadog metrics of the lookup time.
Haven't been able to get this to work on k8s 1.10, weave 2.5 + coredns
I'm not quite sure what I'm missing; but adding this to my cluster hasn't helped the dns issue yet.
(kops, aws, 1.10 k8s on jessie, weave-net 2.5, coredns)
The dns ports used by coredns are default:
Pod Template:
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.1.3
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
NET_OVERLAY_IF
is ofcourse also weave.
From within a standard ubuntu container, most of the dns tests generate timeouts within the first few hundred attemps. For example:
# - seq 1000 | parallel -j50 --joblog log curl -s https://google.com/ ">" /dev/null; sort -k4 -n log
.....
.....
227 : 1547187611.527 0.614 0 0 7 0 curl -s https://google.com/ > /dev/null 227
556 : 1547187614.381 0.615 0 0 7 0 curl -s https://google.com/ > /dev/null 556
25 : 1547187609.772 0.658 0 0 7 0 curl -s https://google.com/ > /dev/null 25
803 : 1547187616.697 5.977 0 0 7 0 curl -s https://google.com/ > /dev/null 803
609 : 1547187614.885 6.029 0 0 7 0 curl -s https://google.com/ > /dev/null 609
603 : 1547187614.846 6.051 0 0 7 0 curl -s https://google.com/ > /dev/null 603
The weave-tc container is otherwise up:
kubectl log weave-net-2ddv2 -n kube-system -f weave-tc
+ DNSMASQ_PORT=53
+ NET_OVERLAY_IF=weave
+ sysctl -w net.core.default_qdisc=fq_codel
net.core.default_qdisc = fq_codel
+ route
+ grep ^default
+ grep -o [^ ]*$
+ tc qdisc del dev eth0 root
+ true
+ route
+ grep -o [^ ]*$
+ grep ^default
+ tc qdisc add dev eth0 root handle 0: mq
+ ip link
+ grep weave
+ tc qdisc del dev weave root
+ true
+ tc qdisc add dev weave root handle 1: prio bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ tc qdisc add dev weave parent 1:2 handle 12: fq_codel
+ tc qdisc add dev weave parent 1:1 handle 11: netem delay 4ms 1ms distribution pareto
+ tc filter add dev weave protocol all parent 1: prio 1 handle 0x100/0x100 fw flowid 1:1
+ iptables -A POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 28 & 0xF8 = 0 --hex-string |00001C0001| --algo bm --from 40 -j MARK --set-mark 0x100/0x100
+ sleep 3600
+ :
+ sleep 3600
It is applied like so within the weave-net daemonset:
spec:
containers:
- image: qmachu/weave-tc:bd94b89
imagePullPolicy: IfNotPresent
name: weave-tc
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /lib/tc
name: lib-tc
.....
.....
- hostPath:
path: /usr/lib/tc
type: ""
name: lib-tc
Has anyone tried this on Azure....?
Well I have.... and although everything "looks" like its working, it is not - and DNS resolution is still very poor.
So this is a long post, apologies. I don't really have a suitable daemonset to "tack" this onto, so I am using a dedicated daemonset for it:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: weave-tc
namespace: kube-system
spec:
selector:
matchLabels:
name: weave-tc
template:
metadata:
labels:
name: weave-tc
spec:
hostNetwork: true
containers:
- name: weave-tc
image: rlees85/summit-weave-tc-temp:latest
env:
- name: DNSMASQ_PORT
value: "53"
- name: NET_OVERLAY_IF
value: "azure0"
- name: TARGET_DELAY_MS
value: "10"
- name: TARGET_SKEW_MS
value: "1"
securityContext:
privileged: true
volumeMounts:
- name: xtables-lock
mountPath: /run/xtables.lock
- name: lib-tc
mountPath: /lib/tc
readOnly: true
volumes:
- name: xtables-lock
hostPath:
path: /run/xtables.lock
- name: lib-tc
hostPath:
path: /usr/lib/tc
The image: rlees85/summit-weave-tc-temp:latest
is pretty much the same as yours, except the 4ms and 1ms parameters are exposed as environment variables for easier tweaking without rebuilding the image.
This is what the logs look like:
+ DNSMASQ_PORT=53
+ NET_OVERLAY_IF=azure0
+ sysctl -w 'net.core.default_qdisc=fq_codel'
net.core.default_qdisc = fq_codel
+ route
+ grep ^default
+ grep -o '[^ ]*$'
+ tc qdisc del dev azure0 root
+ grep -o '[^ ]*$'
+ grep ^default
+ route
+ tc qdisc add dev azure0 root handle 0: mq
RTNETLINK answers: Not supported
+ true
+ iptables -F POSTROUTING -t mangle
+ ip link
+ grep azure0
+ tc qdisc del dev azure0 root
+ true
+ tc qdisc add dev azure0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ tc qdisc add dev azure0 parent 1:2 handle 12: fq_codel
+ tc qdisc add dev azure0 parent 1:1 handle 11: netem delay 10ms 1ms distribution pareto
+ tc filter add dev azure0 protocol all parent 1: prio 1 handle 0x100/0x100 fw flowid 1:1
+ iptables -C POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 '28 & 0xF8 = 0' --hex-string '|00001C0001|' --algo bm --from 40 -j MARK --set-mark 0x100/0x100
iptables: No chain/target/match by that name.
+ iptables -A POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 '28 & 0xF8 = 0' --hex-string '|00001C0001|' --algo bm --from 40 -j MARK --set-mark 0x100/0x100
+ sleep 3600
I am getting packets through the filter, which proves the marking I guess. But the count is quite low... This has been running for a hour or so on a quiet cluster.
/ # tc -d -s qdisc show dev azure0
qdisc prio 1: root refcnt 2 bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Sent 69961726 bytes 117085 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 12: parent 1:2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 69934283 bytes 116854 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 44885 drop_overlimit 0 new_flow_count 18727 ecn_mark 0
new_flows_len 1 old_flows_len 13
qdisc netem 11: parent 1:1 limit 1000 delay 10.0ms 1.0ms
Sent 26866 bytes 229 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
If I enable logging I can see packets are marked:
[979508.507677] IN= OUT=azure0 PHYSIN=azv977e954f886 PHYSOUT=eth0 SRC=10.102.6.136 DST=10.102.10.171 LEN=122 TOS=0x00 PREC=0x00 TTL=64 ID=9933 DF PROTO=UDP SPT=32785 DPT=53 LEN=102 MARK=0x100
But.... performance is still garbage.
url='ifconfig.co'
if [ -f /tmp/log.txt ]; then rm /tmp/log.txt; fi
for i in `seq 1 20`; do curl -w '%{time_namelookup}\n' -o /dev/null -s $url >> /tmp/log.txt; echo "${i}"; done
sort /tmp/log.txt | uniq | tail -10
0.016456
0.017804
0.023226
0.028758
0.029971
0.037986
0.039217
0.041410
2.659334
5.059728
I've tried messing with 4/1ms right up to 100/?ms and nothing seems to improve it.
We don't have access to other options like running DNS as a daemonset. These are Azure AKS based clusters and unfortunately moving out of Azure or going self-managed on Azure is not an option. I just wondered if there was any more debug paths to get this workaround to work?
Does /lib/tc need mount as hostPath?
In README,
Depending on the package, the appropriate mount may not be /lib/tc, the tc binary as well as the pareto.dist file are required.
$ docker run -ti --rm --entrypoint /bin/ash qmachu/weave-tc:bd94b89
/ # ls -al /usr/lib/tc/
total 112
drwxr-xr-x 2 root root 125 Jul 22 07:56 .
drwxr-xr-x 4 root root 4096 Jul 22 07:56 ..
-rw-r--r-- 2 root root 23077 Oct 31 2017 experimental.dist
lrwxrwxrwx 1 root root 7 Aug 20 2018 m_ipt.so -> m_xt.so
-rwxr-xr-x 2 root root 10024 Oct 31 2017 m_xt.so
-rw-r--r-- 2 root root 23573 Oct 31 2017 normal.dist
-rw-r--r-- 2 root root 23729 Oct 31 2017 pareto.dist
-rw-r--r-- 2 root root 23447 Oct 31 2017 paretonormal.dist
/ # ls -al /lib/tc/
ls: /lib/tc/: No such file or directory
Could you explain more details about how to setup /lib/tc/
, Thank you.
Attempted Daemonset weave-tc, but DNS requests flatlined
So I think I've been experiencing these DNS timeouts for a couple of months now, and I decided to try your patch as it seemed the least invasive first step (and maybe last step) at fixing the issue. I didn't want to modify the weave-net daemonset since it is core to kubernetes so I decided to create a separate "weave-tc" daemonset that did the same thing. Here is the manifest:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
name: weave-tc
role.kubernetes.io/networking: "1"
name: weave-tc
namespace: kube-system
spec:
template:
metadata:
labels:
name: weave-tc
role.kubernetes.io/networking: "1"
spec:
containers:
- name: weave-tc
image: 'qmachu/weave-tc:bd94b89'
securityContext:
privileged: true
volumeMounts:
- name: xtables-lock
mountPath: /run/xtables.lock
- name: lib-tc
mountPath: /lib/tc
hostNetwork: true
tolerations:
- effect: NoSchedule
operator: Exists
- key: CriticalAddonsOnly
operator: Exists
volumes:
- hostPath:
path: /usr/lib/tc
type: Directory
name: lib-tc
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
I actually attempted this first in a test kubernetes cluster and it seemed to work fine. I tried it in our main cluster and network traffic immediately started failing. Services started responding extremely slowly (or not at all). Here is a picture of our CoreDNS metrics:
I didn't spend much time digging into what went wrong, I just quickly did a rolling update of the cluster to undo the changes. I though I'd post it here in case you had any idea what could have gone wrong. I'd still prefer to use your solution over a node-local-dns configuration, but I'm a but hesitant after the last attempt.
In terms of our cluster, we're running kubernetes 1.11.9, deployed via Kops, network is weave-net and DNS is CoreDNS. Please let me know if you need any other details.
Also, I pulled the logs from our logging cluster (pods were already deleted), but here it is:
+ NET_OVERLAY_IF=weave
+ DNSMASQ_PORT=53
+ sysctl -w net.core.default_qdisc=fq_codel
+ grep -o [^ ]*$
+ grep ^default
net.core.default_qdisc = fq_codel
+ route
+ tc qdisc del dev eth0 root
+ true
+ grep ^default
+ route
+ tc qdisc add dev eth0 root handle 0: mq
+ grep -o [^ ]*$
+ grep weave
+ ip link
+ tc qdisc del dev weave root
+ tc qdisc add dev weave root handle 1: prio bands 2 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ true
+ tc qdisc add dev weave parent 1:2 handle 12: fq_codel
+ tc qdisc add dev weave parent 1:1 handle 11: netem delay 4ms 1ms distribution pareto
+ tc filter add dev weave protocol all parent 1: prio 1 handle 0x100/0x100 fw flowid 1:1
+ iptables -A POSTROUTING -t mangle -p udp --dport 53 -m string -m u32 --u32 28 & 0xF8 = 0 --hex-string |00001C0001| --algo bm --from 40 -j MARK --set-mark 0x100/0x100
+ sleep 3600
effectiveness of this workaround
Does this workaround suppose to eliminate the problem with dns lookup delays completely?
I've deployed it to our cluster (kops, AWS, weave) but still see 2.5 seconds delays.
I'm testing from within ubuntu container
VERSION="18.04.1 LTS (Bionic Beaver)"
with
options use-vc
options single-request-reopen
in /etc/resolv.conf
it's vanilla kube-dns (not CoreDNS as a backend)
I'm testing with
:> /tmp/log.txt
url='ifconfig.co'
for i in `seq 1 10000`; do curl -4 -w '%{time_namelookup}\n' -o /dev/null -s $url >> /tmp/log.txt; done
sort /tmp/log.txt | uniq | tail -10
and get
0.060573
0.060597
0.060609
0.124728
0.124729
0.252937
10.522188
2.510896
2.510978
2.511511
Amazon EC2
Hello Quentin,
Do you perhaps have any experience deploying this solution to AWS? We used kops to deploy everything and have been running into this issue. Now we launched this weave-tc as a DaemonSet in kube-system, and everything seems fine inside of the weave-tc pods. In other pods however the problem persists. We tried mounting /usr/lib/tc and /sbin/tc from the host machine to the same locations inside of the container.
Any ideas?
Best Regards,
Jordy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.