Comments (41)
I have a kubernetes cluster monitoring test that continually deploys a busybox pod in a cluster and verifies DNS resolution within the pod by executing kubectl exec nslookup. It started failing right when I downloaded the latest busybox image. Installing a busybox pod with the previous version 1.28 of the image nslookup works. All signs point to a change in this latest version that is causing the failure.
from busybox.
Same issue here. Reverting to 1.28 fixed the issue for me.
from busybox.
Hi @tianon - I can understand that you don't want to have a regression on :latest, but there is a surprising amount of fallout from this simple issue because so many people and documentation out there use busybox:latest as the "Hello, World" example. Temporarily changing the tag would help mitigate that pain and these unintended consequences.
Cheers,
Piers.
from busybox.
/label bug
we are having the same issue.
1.27/1.28 are working , 1.29/1.29.1 are not
kubectl run --attach busybox --rm --image=busybox:1.27 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.
Server: 192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
kubectl run --attach busybox --rm --image=busybox:1.28 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.
Server: 192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
kubectl run --attach busybox --rm --image=busybox:1.29 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.
Server: 192.168.0.10
Address: 192.168.0.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
kubectl run --attach busybox --rm --image=busybox:1.29.1 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.
Server: 192.168.0.10
Address: 192.168.0.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
from busybox.
According to the 1.30 releasenotes the patch for https://bugs.busybox.net/show_bug.cgi?id=11161 is in there – however, I still had to ping my image to 1.28 in order to execute a simple
$ kubectl run -i --tty --image busybox:1.28 dns-test --restart=Never --rm nslookup web-0.nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx
Address 1: 172.17.0.2 web-0.nginx.default.svc.cluster.local
pod "dns-test" deleted
Whereas :latest
aka :1.30.1
have me this
$ kubectl run -i --tty --image busybox:1.30.1 dns-test --restart=Never --rm nslookup web-0.nginx
If you don't see a command prompt, try pressing enter.
*** Can't find web-0.nginx: No answer
pod "dns-test" deleted
This is just using minikube and an nginx statefulset from https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
I'm not sure if I'm missing something here, but is this issue really solved?
from busybox.
Thanks @tianon. I should have read the entire thread first. This is crazy!
Just so others who scroll to the bottom don't bug you, the TL;DR is to pin to busybox:1.28
instead of using latest. Or just don't use busybox for anything. Use alpine instead if possible?
For further reading, here are all of the issues busybox has had with nslookup. Seems like #13006 might be the current one, but it looks like they just close them without actually fixing the problem:
from busybox.
this is not cool. still version 1.32.1 is buggy and does not work properly. so busybox for kubernetes is the wrong way. use dnsutils image from google gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 to get nslookup with new syntax and working
from busybox.
For some people it’s still difficult to admit a mistake. Being aggressive and brave with new changes is one thing, breaking stuff that worked before is another one, especially these days when a lot of people are using “:latest” by default - introducing a BC and calling that was on purpose is just far from wise.
Please read more about semantic versioning as well.
from busybox.
tried 1.33, this issue is still there
from busybox.
Encountered the same issue in busybox:1.35.
Is there anyone pushing this issue to resolve?
from busybox.
Any chance on getting it fixed soon?
It needs to be addressed upstream -- we simply package what they provide.
from busybox.
Same happens on 1.33.1. For comparison, the image gcr.io/kubernetes-e2e-test-images/dnsutils:1.3, used in Kubernetes documentation DNS troubleshooting example still works as expected. As metioned by @blodone.
Any chance on getting it fixed soon?
Just use 1.27, the package in that version has always worked
from busybox.
@djsly Try using "sleep 4 && nslookup -type=a kubernetes.default"
I've added my findings here: https://bugs.busybox.net/show_bug.cgi?id=11161#c4
from busybox.
See this issue:
kubernetes/kubernetes#66924
from busybox.
Given that this issue is an upstream issue (not something we've introduced), that it is appropriately filed at https://bugs.busybox.net/show_bug.cgi?id=11161, and apparently will be fixed in the next release (https://git.busybox.net/busybox/commit/?id=9408978a438ac6c3becb2216d663216d27b59eab), I'm going to close.
It would appear that Kubernetes has adjusted to use busybox:1.28
explicitly in the meantime (kubernetes/website#9901), which is the simplest workaround for folks affected by this upstream change.
from busybox.
Stumbled in the same problem yesterday, while doing the CKA exam :-(
from busybox.
Can confirm that this still happens - spent about an hour trying to debug before stumbling across this issue
from busybox.
confirming still happening aswell. still struggling on which version should have the 'search'.
busybox:1.28.4 - doesnt work for me - kubernetes/kubernetes#66924 (comment)
from busybox.
Why is this issue closed? I still see this problem in the uclibc version:
$ kubectl run -it --rm evanstest-b5 --image=busybox:1.32.0-uclibc -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server: 10.0.0.10
Address: 10.0.0.10:53
*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer
*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer
/ #
The musl and glibc versions partially work:
$ kubectl run -it --rm evanstest-b3 --image=busybox:1.32.0-musl -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server: 10.0.0.10
Address: 10.0.0.10:53
*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer
Name: opentelemetry-collector.observability.svc.cluster.local
Address: 10.0.228.84
/ #
$ kubectl run -it --rm evanstest-b4 --image=busybox:1.32.0-glibc -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server: 10.0.0.10
Address: 10.0.0.10:53
Name: opentelemetry-collector.observability.svc.cluster.local
Address: 10.0.228.84
*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer
/ #
from busybox.
Seems to be a kubernetes configuration issue
Not able to reproduce the issue through Docker standalone
$ docker run --rm -dit --name busybox busybox:latest
$ docker exec -it busybox sh
# ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=53 time=14.993 ms
64 bytes from 172.217.11.174: seq=1 ttl=53 time=14.598 ms
64 bytes from 172.217.11.174: seq=2 ttl=53 time=14.039 ms
^C
# nslookup github.com
Server: 8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com
Name: github.com
Address 1: 192.30.255.112 lb-192-30-255-112-sea.github.com
Address 2: 192.30.255.113 lb-192-30-255-113-sea.github.com
# nslookup google.com
Server: 8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com
Name: google.com
Address 1: 2607:f8b0:4007:804::200e lax28s15-in-x0e.1e100.net
Address 2: 216.58.219.14 lax17s03-in-f14.1e100.net
Kubernetes with hostNetwork: true
$ kubectl exec busybox-7cc555b5d6-2mmcr ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=54 time=13.444 ms
64 bytes from 172.217.11.174: seq=1 ttl=54 time=14.249 ms
64 bytes from 172.217.11.174: seq=2 ttl=54 time=20.149 ms
^C
$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup google.com 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Name: google.com
Address: 172.217.11.174
*** Can't find google.com: No answer
$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default
Server: 127.0.0.53
Address: 127.0.0.53:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
This seems to be the most relevant issue I found kubernetes/kubernetes#33798
from busybox.
This reminds me of the fun we had back in #9, but that doesn't seem related. 😞
from busybox.
Unfortunately, that only narrows it down to somewhere in the sea of 438 files changed, 9453 insertions(+), 4480 deletions(-)
(from 1_28_4
to 1_29_1
in the Git tags of the two releases).
from busybox.
Something in here seems most likely:
$ git log --oneline 1_28_4...1_29_1 -- networking/nslookup.c
2f7738e47 nslookup: placate "warning: unused variable i"
c72499584 nslookup: simplify make_ptr
71e4b3f48 nslookup: get rid of query::rlen field
58e43a4c4 nslookup: move array of queries to "globals"
4b6091f92 nslookup: accept lowercase -type=soa, document query types
6cdc3195a nslookup: change -stats to -debug (it's a bug in bind that it accepts -s)
d4461ef9f nslookup: rework option parsing
a980109c6 nslookup: smaller qtypes[] array
2cf75b3c8 nslookup: process replies immediately, do not store them
4e73c0f65 nslookup: fix output corruption for "nslookup 1.2.3.4"
cf950cd3e nslookup: more closely resemble output format of bind-utils-9.11.3
71e016d80 nslookup: shrink send_queries()
db93b21ec nslookup: use xmalloc_sockaddr2dotted() instead of homegrown function
55bc8e882 nslookup: usee bbox network functions instead of opne-coded mess
0dd3be8c0 nslookup: add openwrt / lede version
from busybox.
from busybox.
https://bugs.busybox.net/show_bug.cgi?id=11161
from busybox.
How does this relate to #27? Are they the same issue?
from busybox.
From what I can tell, the new resolver in BusyBox's nslookup
doesn't support DNS search domains at all, which seems like a pretty hefty regression.
from busybox.
Thanks tianon. How will this be addressed?
from busybox.
from busybox.
As a suggestion, would it be possible to regress the :latest tag to point to 1.8.x until upstream is resolved?
from busybox.
Given that the upstream change was intentional and is a reflection of upstream, I'm not comfortable changing latest
back to 1.28 (especially given that 1.29 is considered "stable" by upstream) -- I'd recommend instead pinning usage to busybox:1.28
(or more specifically, busybox:1.28-variant
) for now until the updated functionality which resolves this issue is implemented upstream. (Pinning to a particular release or release series of dependencies is generally good advice anyhow, and it looks like Busybox upstream might intend to get more aggressive about changes in the future, so it seems more prudent than ever.)
from busybox.
Hi, after months of using busybox in Kubernetes with no problem, today I've just got something that seems to be the same NXDOMAIN bug as reported in this thread:
/ # nslookup kubernetes.default
Server: 10.0.0.10
Address: 10.0.0.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
/ # echo $?
1
But this works:
/ # nslookup kubernetes.default.svc.cluster.local
Server: 10.0.0.10
Address: 10.0.0.10:53
Non-authoritative answer:
Name: kubernetes.default.svc.cluster.local
Address: 10.0.0.1
*** Can't find kubernetes.default.svc.cluster.local: No answer
/ # echo $?
0
/ # cat /etc/resolv.conf
nameserver 10.0.0.10
search flowr-besix-stay.svc.cluster.local svc.cluster.local cluster.local c.taktik-dev.internal google.internal
options ndots:5
In my chart I have always simply been using 'busybox', I'm not sure on which tag I am currently, all I could find is the hash of the image:
Image: busybox
Image ID: docker-pullable://busybox@sha256:bf510723d2cd2d4e3f5ce7e93bf1e52c8fd76831995ac3bd3f90ecc866643aff
Meanwhile, the workaround is just to use nslookup cassandra.cassandra.svc.cluster.local
instead of nslookup cassandra.cassandra
.
from busybox.
@Simon3 The busybox issue when this was raised was related to a failure to distinguish between A and AAAA responses and was inherently intermittent as it depended on the ordering of the response to concurrent requests.
Explicitly requesting a record type would result in either reliable success or reliable failure:
https://github.com/docker-library/busybox/issues/48#issuecomment-408239537
If specifying the type doesn't fix this, then it's a different problem than the original.
from busybox.
Specifying -type=a
doesn't fix the problem. I created a new issue.
from busybox.
This still happens in 1.32 and is specific to the uclibc-variant of busybox. glibc and musl both work. I guess, latest points to uclibc?
from busybox.
Why is this issue closed? I still see this problem in the uclibc version:
It needs to be addressed upstream -- we simply package what they provide.
from busybox.
Same happens on 1.33.1. For comparison, the image gcr.io/kubernetes-e2e-test-images/dnsutils:1.3, used in Kubernetes documentation DNS troubleshooting example still works as expected. As metioned by @blodone.
Any chance on getting it fixed soon?
from busybox.
Just use 1.27, the package in that version has always worked
Thanks
from busybox.
kubernetes/kubernetes#66924 (comment)
It's very infrequent hit. I ran nslookup by updating /etc/resolv.conf
for ndots:5, ndots:7, ndots:10
in while loop approx. 200 times with timeout=2 seconds. Below are the results.
- ndots:5 = 39 times nslookup query worked/200
- ndots:7 = 22 times nslookup query worked/200
- ndots:10 = 16 times nslookup query worked/200
Below shell script I used to calculate this result.
echo 'while(true); do
nslookup -timeout=2 kubernetes > /dev/null 2>&1
result=$?
if [ "$result" == "0" ]; then
echo "$(date +%s) : $result : pass" >> /tmp/nslookup_status
elif [ "$result" == "1" ]; then
echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
else
echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
fi
done' > nslookup_status.sh
chmod +x nslookup_status.sh
./nslookup_status.sh &
busybox-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: "busybox1"
spec:
containers:
- image: busybox
name: busybox
command: [ "sleep","6000"]
dnsConfig:
options:
- name: ndots
value: "7"
busybox Image hash : busybox:latest@sha256:34c3559bbdedefd67195e766e38cfbb0fcabff4241dbee3f390fd6e3310f5ebc
from busybox.
Just for the records, I opened a new issue at the bugtracker of busybox: https://bugs.busybox.net/show_bug.cgi?id=14671
from busybox.
1.34.1 is not stable, does not work most time, sometimes works:
# ./dnstest.sh
dnstest
/ #
/ #
/ # nslookup es
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find es.default.svc.cluster.local: NXDOMAIN
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
*** Can't find es.default.svc.cluster.local: No answer
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.069 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.108 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.069/0.088/0.108 ms
/ #
1.34.1 works once in my tests:
/ # busybox | head -1
BusyBox v1.34.1 (2021-12-29 21:12:15 UTC) multi-call binary.
/ # nslookup es
Server: 169.254.25.10
Address: 169.254.25.10:53
Name: es.default.svc.cluster.local
Address: 10.233.36.216
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
*** Can't find es.default.svc.cluster.local: No answer
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.117 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.117 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.117/0.117/0.117 ms
/ #
1.36.0 works fine:
/ # busybox | head -1
BusyBox v1.36.0 (2023-05-11 16:48:06 UTC) multi-call binary.
/ # nslookup es
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find es.svc.cluster.local: NXDOMAIN
Name: es.default.svc.cluster.local
Address: 10.233.36.216
** server can't find es.svc.cluster.local: NXDOMAIN
** server can't find es.cluster.local: NXDOMAIN
** server can't find es.cluster.local: NXDOMAIN
/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.078 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.128 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.078/0.103/0.128 ms
/ #
docker images | grep busy
busybox 1.36.0 af2c3e96bcf1 5 days ago 4.86MB
busybox 1.34.1 beae173ccac6 16 months ago 1.24MB
busybox latest beae173ccac6 16 months ago 1.24MB
from busybox.
Related Issues (20)
- GNOME HIG.gpl.txt
- nc: bad port spec 'local:/...' HOT 4
- ping: bad address, Running in k8s. HOT 3
- busybox 1.34.1 exited in arm64 with code 139 HOT 5
- busybox
- The latest busybox becomes compiled dynamic? HOT 12
- Symlink /usr/bin to /bin HOT 1
- wget Connection reset by peer HOT 12
- BR2_cortex_a15 support available? HOT 1
- Can this docker with glibc support Dynamic linked binaries/ELF. HOT 3
- busybox 1.36 sha256sum crashes with Illegal instruction (SIGILL) on amd64 HOT 3
- busybox doesn't support bash HOT 1
- :confused: `busybox` is just a collection of tools, so depending on what you are running it may or may not respond to a `SIGTERM`. There isn't anything we could do in the image for that.
- Consider single (standalone) applets variants HOT 2
- How to use with an SPA HOT 3
- Zip utility in Busybox HOT 1
- confused by for loop HOT 5
- How to Set System Locales in busybox docker image? HOT 3
- busybox 1.36.1 may have been built with an older version of build tool HOT 2
- Manifest issues for 7 architectures on versions 1.35 and 1.36 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from busybox.