Giter Site home page Giter Site logo

Comments (41)

krishshenoy avatar krishshenoy commented on May 31, 2024 18

I have a kubernetes cluster monitoring test that continually deploys a busybox pod in a cluster and verifies DNS resolution within the pod by executing kubectl exec nslookup. It started failing right when I downloaded the latest busybox image. Installing a busybox pod with the previous version 1.28 of the image nslookup works. All signs point to a change in this latest version that is causing the failure.

from busybox.

tokiwinter avatar tokiwinter commented on May 31, 2024 14

Same issue here. Reverting to 1.28 fixed the issue for me.

from busybox.

piersharding avatar piersharding commented on May 31, 2024 12

Hi @tianon - I can understand that you don't want to have a regression on :latest, but there is a surprising amount of fallout from this simple issue because so many people and documentation out there use busybox:latest as the "Hello, World" example. Temporarily changing the tag would help mitigate that pain and these unintended consequences.

Cheers,
Piers.

from busybox.

djsly avatar djsly commented on May 31, 2024 11

/label bug
we are having the same issue.
1.27/1.28 are working , 1.29/1.29.1 are not

kubectl run --attach busybox --rm --image=busybox:1.27 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
kubectl run --attach busybox --rm --image=busybox:1.28 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
 kubectl run --attach busybox --rm --image=busybox:1.29 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:         192.168.0.10
Address:        192.168.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer
 kubectl run --attach busybox --rm --image=busybox:1.29.1 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.


Server:         192.168.0.10
Address:        192.168.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

from busybox.

voelzmo avatar voelzmo commented on May 31, 2024 7

According to the 1.30 releasenotes the patch for https://bugs.busybox.net/show_bug.cgi?id=11161 is in there – however, I still had to ping my image to 1.28 in order to execute a simple

$ kubectl run -i --tty --image busybox:1.28 dns-test --restart=Never --rm nslookup web-0.nginx
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      web-0.nginx
Address 1: 172.17.0.2 web-0.nginx.default.svc.cluster.local
pod "dns-test" deleted

Whereas :latest aka :1.30.1 have me this

$ kubectl run -i --tty --image busybox:1.30.1 dns-test --restart=Never --rm nslookup web-0.nginx
If you don't see a command prompt, try pressing enter.

*** Can't find web-0.nginx: No answer

pod "dns-test" deleted

This is just using minikube and an nginx statefulset from https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset

I'm not sure if I'm missing something here, but is this issue really solved?

from busybox.

evanstucker-hates-2fa avatar evanstucker-hates-2fa commented on May 31, 2024 5

Thanks @tianon. I should have read the entire thread first. This is crazy!

Just so others who scroll to the bottom don't bug you, the TL;DR is to pin to busybox:1.28 instead of using latest. Or just don't use busybox for anything. Use alpine instead if possible?

For further reading, here are all of the issues busybox has had with nslookup. Seems like #13006 might be the current one, but it looks like they just close them without actually fixing the problem:

https://bugs.busybox.net/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED&f0=OP&f1=OP&f2=product&f3=component&f4=alias&f5=short_desc&f7=content&f8=CP&f9=CP&j1=OR&o2=substring&o3=substring&o4=substring&o5=substring&o6=substring&o7=matches&order=changeddate%20DESC%2Cbug_status%2Cpriority%2Cassigned_to%2Cbug_id&query_format=advanced&v2=nslookup&v3=nslookup&v4=nslookup&v5=nslookup&v6=nslookup&v7=%22nslookup%22

from busybox.

blodone avatar blodone commented on May 31, 2024 5

this is not cool. still version 1.32.1 is buggy and does not work properly. so busybox for kubernetes is the wrong way. use dnsutils image from google gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 to get nslookup with new syntax and working

from busybox.

cparjaszewski avatar cparjaszewski commented on May 31, 2024 3

For some people it’s still difficult to admit a mistake. Being aggressive and brave with new changes is one thing, breaking stuff that worked before is another one, especially these days when a lot of people are using “:latest” by default - introducing a BC and calling that was on purpose is just far from wise.

Please read more about semantic versioning as well.

from busybox.

xingchijin avatar xingchijin commented on May 31, 2024 3

tried 1.33, this issue is still there

from busybox.

astraw99 avatar astraw99 commented on May 31, 2024 3

Encountered the same issue in busybox:1.35.
Is there anyone pushing this issue to resolve?

from busybox.

yosifkit avatar yosifkit commented on May 31, 2024 2

Any chance on getting it fixed soon?

#48 (comment):

It needs to be addressed upstream -- we simply package what they provide.

from busybox.

ChristopherHanson avatar ChristopherHanson commented on May 31, 2024 2

Same happens on 1.33.1. For comparison, the image gcr.io/kubernetes-e2e-test-images/dnsutils:1.3, used in Kubernetes documentation DNS troubleshooting example still works as expected. As metioned by @blodone.

Any chance on getting it fixed soon?

Just use 1.27, the package in that version has always worked

from busybox.

hickeng avatar hickeng commented on May 31, 2024 1

@djsly Try using "sleep 4 && nslookup -type=a kubernetes.default"

I've added my findings here: https://bugs.busybox.net/show_bug.cgi?id=11161#c4

from busybox.

cparjaszewski avatar cparjaszewski commented on May 31, 2024 1

See this issue:
kubernetes/kubernetes#66924

from busybox.

tianon avatar tianon commented on May 31, 2024 1

Given that this issue is an upstream issue (not something we've introduced), that it is appropriately filed at https://bugs.busybox.net/show_bug.cgi?id=11161, and apparently will be fixed in the next release (https://git.busybox.net/busybox/commit/?id=9408978a438ac6c3becb2216d663216d27b59eab), I'm going to close.

It would appear that Kubernetes has adjusted to use busybox:1.28 explicitly in the meantime (kubernetes/website#9901), which is the simplest workaround for folks affected by this upstream change.

from busybox.

monotek avatar monotek commented on May 31, 2024 1

Stumbled in the same problem yesterday, while doing the CKA exam :-(

from busybox.

chriswiggins avatar chriswiggins commented on May 31, 2024 1

Can confirm that this still happens - spent about an hour trying to debug before stumbling across this issue

from busybox.

EpicApex avatar EpicApex commented on May 31, 2024 1

confirming still happening aswell. still struggling on which version should have the 'search'.

busybox:1.28.4 - doesnt work for me - kubernetes/kubernetes#66924 (comment)

from busybox.

evanstucker-hates-2fa avatar evanstucker-hates-2fa commented on May 31, 2024 1

Why is this issue closed? I still see this problem in the uclibc version:

$ kubectl run -it --rm evanstest-b5 --image=busybox:1.32.0-uclibc -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

/ # 

The musl and glibc versions partially work:

$ kubectl run -it --rm evanstest-b3 --image=busybox:1.32.0-musl -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

Name:	opentelemetry-collector.observability.svc.cluster.local
Address: 10.0.228.84

/ # 

$ kubectl run -it --rm evanstest-b4 --image=busybox:1.32.0-glibc -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

Name:	opentelemetry-collector.observability.svc.cluster.local
Address: 10.0.228.84

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

/ #

from busybox.

wglambert avatar wglambert commented on May 31, 2024

Seems to be a kubernetes configuration issue

Not able to reproduce the issue through Docker standalone

$ docker run --rm -dit --name busybox busybox:latest
$ docker exec -it busybox sh

# ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=53 time=14.993 ms
64 bytes from 172.217.11.174: seq=1 ttl=53 time=14.598 ms
64 bytes from 172.217.11.174: seq=2 ttl=53 time=14.039 ms
^C
# nslookup github.com
Server:    8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com

Name:      github.com
Address 1: 192.30.255.112 lb-192-30-255-112-sea.github.com
Address 2: 192.30.255.113 lb-192-30-255-113-sea.github.com
# nslookup google.com
Server:    8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com

Name:      google.com
Address 1: 2607:f8b0:4007:804::200e lax28s15-in-x0e.1e100.net
Address 2: 216.58.219.14 lax17s03-in-f14.1e100.net

Kubernetes with hostNetwork: true

$ kubectl exec busybox-7cc555b5d6-2mmcr ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=54 time=13.444 ms
64 bytes from 172.217.11.174: seq=1 ttl=54 time=14.249 ms
64 bytes from 172.217.11.174: seq=2 ttl=54 time=20.149 ms
^C

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.11.174

*** Can't find google.com: No answer

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default
Server:         127.0.0.53
Address:        127.0.0.53:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

This seems to be the most relevant issue I found kubernetes/kubernetes#33798

from busybox.

tianon avatar tianon commented on May 31, 2024

This reminds me of the fun we had back in #9, but that doesn't seem related. 😞

from busybox.

tianon avatar tianon commented on May 31, 2024

Unfortunately, that only narrows it down to somewhere in the sea of 438 files changed, 9453 insertions(+), 4480 deletions(-) (from 1_28_4 to 1_29_1 in the Git tags of the two releases).

from busybox.

tianon avatar tianon commented on May 31, 2024

Something in here seems most likely:

$ git log --oneline 1_28_4...1_29_1 -- networking/nslookup.c
2f7738e47 nslookup: placate "warning: unused variable i"
c72499584 nslookup: simplify make_ptr
71e4b3f48 nslookup: get rid of query::rlen field
58e43a4c4 nslookup: move array of queries to "globals"
4b6091f92 nslookup: accept lowercase -type=soa, document query types
6cdc3195a nslookup: change -stats to -debug (it's a bug in bind that it accepts -s)
d4461ef9f nslookup: rework option parsing
a980109c6 nslookup: smaller qtypes[] array
2cf75b3c8 nslookup: process replies immediately, do not store them
4e73c0f65 nslookup: fix output corruption for "nslookup 1.2.3.4"
cf950cd3e nslookup: more closely resemble output format of bind-utils-9.11.3
71e016d80 nslookup: shrink send_queries()
db93b21ec nslookup: use xmalloc_sockaddr2dotted() instead of homegrown function
55bc8e882 nslookup: usee bbox network functions instead of opne-coded mess
0dd3be8c0 nslookup: add openwrt / lede version

from busybox.

tianon avatar tianon commented on May 31, 2024

from busybox.

djsly avatar djsly commented on May 31, 2024

https://bugs.busybox.net/show_bug.cgi?id=11161

from busybox.

tianon avatar tianon commented on May 31, 2024

How does this relate to #27? Are they the same issue?

from busybox.

tianon avatar tianon commented on May 31, 2024

From what I can tell, the new resolver in BusyBox's nslookup doesn't support DNS search domains at all, which seems like a pretty hefty regression.

from busybox.

krishshenoy avatar krishshenoy commented on May 31, 2024

Thanks tianon. How will this be addressed?

from busybox.

tianon avatar tianon commented on May 31, 2024

from busybox.

piersharding avatar piersharding commented on May 31, 2024

As a suggestion, would it be possible to regress the :latest tag to point to 1.8.x until upstream is resolved?

from busybox.

tianon avatar tianon commented on May 31, 2024

Given that the upstream change was intentional and is a reflection of upstream, I'm not comfortable changing latest back to 1.28 (especially given that 1.29 is considered "stable" by upstream) -- I'd recommend instead pinning usage to busybox:1.28 (or more specifically, busybox:1.28-variant) for now until the updated functionality which resolves this issue is implemented upstream. (Pinning to a particular release or release series of dependencies is generally good advice anyhow, and it looks like Busybox upstream might intend to get more aggressive about changes in the future, so it seems more prudent than ever.)

from busybox.

Simon3 avatar Simon3 commented on May 31, 2024

Hi, after months of using busybox in Kubernetes with no problem, today I've just got something that seems to be the same NXDOMAIN bug as reported in this thread:

/ # nslookup kubernetes.default
Server:		10.0.0.10
Address:	10.0.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

/ # echo $?
1

But this works:

/ # nslookup kubernetes.default.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

Non-authoritative answer:
Name:	kubernetes.default.svc.cluster.local
Address: 10.0.0.1

*** Can't find kubernetes.default.svc.cluster.local: No answer

/ # echo $?
0
/ # cat /etc/resolv.conf 
nameserver 10.0.0.10
search flowr-besix-stay.svc.cluster.local svc.cluster.local cluster.local c.taktik-dev.internal google.internal
options ndots:5

In my chart I have always simply been using 'busybox', I'm not sure on which tag I am currently, all I could find is the hash of the image:

    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:bf510723d2cd2d4e3f5ce7e93bf1e52c8fd76831995ac3bd3f90ecc866643aff

Meanwhile, the workaround is just to use nslookup cassandra.cassandra.svc.cluster.local instead of nslookup cassandra.cassandra.

from busybox.

hickeng avatar hickeng commented on May 31, 2024

@Simon3 The busybox issue when this was raised was related to a failure to distinguish between A and AAAA responses and was inherently intermittent as it depended on the ordering of the response to concurrent requests.

Explicitly requesting a record type would result in either reliable success or reliable failure:

https://github.com/docker-library/busybox/issues/48#issuecomment-408239537

If specifying the type doesn't fix this, then it's a different problem than the original.

from busybox.

Simon3 avatar Simon3 commented on May 31, 2024

Specifying -type=a doesn't fix the problem. I created a new issue.

from busybox.

dploeger avatar dploeger commented on May 31, 2024

This still happens in 1.32 and is specific to the uclibc-variant of busybox. glibc and musl both work. I guess, latest points to uclibc?

from busybox.

tianon avatar tianon commented on May 31, 2024

Why is this issue closed? I still see this problem in the uclibc version:

#48 (comment)

It needs to be addressed upstream -- we simply package what they provide.

from busybox.

Piotr1215 avatar Piotr1215 commented on May 31, 2024

Same happens on 1.33.1. For comparison, the image gcr.io/kubernetes-e2e-test-images/dnsutils:1.3, used in Kubernetes documentation DNS troubleshooting example still works as expected. As metioned by @blodone.

Any chance on getting it fixed soon?

from busybox.

Piotr1215 avatar Piotr1215 commented on May 31, 2024

Just use 1.27, the package in that version has always worked

Thanks

from busybox.

koolwithk avatar koolwithk commented on May 31, 2024

kubernetes/kubernetes#66924 (comment)

It's very infrequent hit. I ran nslookup by updating /etc/resolv.conf for ndots:5, ndots:7, ndots:10 in while loop approx. 200 times with timeout=2 seconds. Below are the results.

  • ndots:5 = 39 times nslookup query worked/200
  • ndots:7 = 22 times nslookup query worked/200
  • ndots:10 = 16 times nslookup query worked/200

Below shell script I used to calculate this result.

echo 'while(true); do
nslookup -timeout=2 kubernetes > /dev/null 2>&1
result=$?
if [ "$result" == "0" ]; then
	echo "$(date +%s) : $result : pass" >> /tmp/nslookup_status
elif [ "$result" == "1" ]; then
	echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
else
	echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
fi
done' > nslookup_status.sh

chmod +x nslookup_status.sh
./nslookup_status.sh &

busybox-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: "busybox1"
spec:
  containers:
  - image: busybox
    name: busybox
    command: [ "sleep","6000"]
  dnsConfig:
    options:
      - name: ndots
        value: "7"

busybox Image hash : busybox:latest@sha256:34c3559bbdedefd67195e766e38cfbb0fcabff4241dbee3f390fd6e3310f5ebc

from busybox.

guettli avatar guettli commented on May 31, 2024

Just for the records, I opened a new issue at the bugtracker of busybox: https://bugs.busybox.net/show_bug.cgi?id=14671

from busybox.

lhzw avatar lhzw commented on May 31, 2024

1.34.1 is not stable, does not work most time, sometimes works:

# ./dnstest.sh
dnstest
/ #
/ #
/ # nslookup es
Server:         169.254.25.10
Address:        169.254.25.10:53

** server can't find es.default.svc.cluster.local: NXDOMAIN

*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
*** Can't find es.default.svc.cluster.local: No answer
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer

/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.069 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.108 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.069/0.088/0.108 ms
/ #

1.34.1 works once in my tests:

/ # busybox | head -1
BusyBox v1.34.1 (2021-12-29 21:12:15 UTC) multi-call binary.
/ # nslookup es
Server:         169.254.25.10
Address:        169.254.25.10:53

Name:   es.default.svc.cluster.local
Address: 10.233.36.216

*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
*** Can't find es.default.svc.cluster.local: No answer
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer

/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.117 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.117 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.117/0.117/0.117 ms
/ #

1.36.0 works fine:

/ # busybox | head -1
BusyBox v1.36.0 (2023-05-11 16:48:06 UTC) multi-call binary.
/ # nslookup es
Server:         169.254.25.10
Address:        169.254.25.10:53

** server can't find es.svc.cluster.local: NXDOMAIN

Name:   es.default.svc.cluster.local
Address: 10.233.36.216

** server can't find es.svc.cluster.local: NXDOMAIN


** server can't find es.cluster.local: NXDOMAIN

** server can't find es.cluster.local: NXDOMAIN

/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.078 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.128 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.078/0.103/0.128 ms
/ #
docker images | grep busy
busybox                                                1.36.0     af2c3e96bcf1   5 days ago      4.86MB
busybox                                                1.34.1     beae173ccac6   16 months ago   1.24MB
busybox                                                latest     beae173ccac6   16 months ago   1.24MB

from busybox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.