Comments (24)
I'm seeing the same issue running in Kubernetes. Might be related to this bug in Alpine.
Edit: scratch that. I rebuilt using node:12-alpine3.10
and still had the problem.
from foundryvtt-docker.
I ported to node:12-slim to successfully work around the problem. I'm running into a lot of DNS issues on alpine based images. Not sure if it's my k8s cluster's configuration, or what.
from foundryvtt-docker.
I've resolved the DNS issue I've been having while running this and other Alpine based images in Kubernetes clusters on my network.
Short answer: I turned off DNSSEC for my domain name managed by Cloudflare and everything started working.
Read on for details.
Some information about my setup:
- I use Cloudflare DNS to setup DNS TXT entries for letsencrypt so that my internal only servers can browser trusted certificates.
- I don't use Cloudflare DNS for normal (A, AAAA, etc...) DNS records for my internal domain. I have an internal, Unbound DNS service for that.
- Crucially, I had DNSSEC enabled for my internal domain in the Cloudflare DNS settings. I must have enabled it when I had different plans for that domain.
Some general information about what causes the problem for me (and possibly for you):
- When Kubernetes starts a container, it adds search domains and
options ndots:5
to/etc/resolve.conf
inside the container- It copies the search domains from the host (my local domain, say,
mylocaldomain.tld
in my case) and adds a bunch of Kubernetes specific ones likecluster.local
andsvc.cluster.local
. - This
resolve.conf
configuration has to do with looking up local services inside the cluster. - Aside: you can also override ndots to be "1" in each pod spec to solve the problem in another way
- It copies the search domains from the host (my local domain, say,
- Now, when a DNS lookup for, say,
foundryvtt.com
is performed inside of a container, all of those search domains are checked first. For example,foundryvtt.com.svc.cluster.local
thenfoundryvtt.com.cluster.local
andfoundryvtt.com.mylocaldomain.tld
. Finally, if none of those other domains "resolve", thenfoundryvtt.com
is checked.- The
...cluster.local
domains are rejected by CoreDNS inside of the cluster, I guess. No beef with those. foundryvtt.com.mylocaldomain.tld
escapes the cluster and gets to my internal Unbound DNS server.- Unbound doesn't recognize it, so passes it, transparently, to another DNS server (8.8.8.8, Google's public DNS in my case).
- Maybe I should configure Unbound to reject anything with that base domain that it doesn't recognize?
- That DNS server recognizes the
mylocaldomain.tld
part and asks Cloudflare how to resolve it because Cloudflare is the authority on that particular domain. - Cloudflare would normally respond with NXDOMAIN, which, I guess (not a DNS expert here) means "doesn't exist". Instead, because I had DNSSEC enabled, it responds with NOERROR, but doesn't respond with an actual IP address. This is something like "I can neither confirm nor deny the existence of that or related domains". Read here about how Cloudflare justifies that response.
- That "no comment" response winds its way back to the original requestor. Any non musl-based DNS client library would then shrug and continue looking through the search domains until it got to the implied '.' and tried 'foundryvtt.com' with a happy ending. musl will stop looking after recieving a NOERROR. Read here about how musl justifies that response.
- The
Here are some links that helped me figure this out:
I could verify that this was a problem and that my fix worked using alpine/git and dig.
Before fix:
[jdmarble@jdmarble-desktop ~]$ kubectl run alpine-git --image=alpine/git --restart=Never -it --rm clone https://github.com/octocat/Spoon-Knife.git
fatal: unable to access 'https://github.com/octocat/Spoon-Knife.git/': Could not resolve host: github.com
...
(note that github.com
did not resolve inside an Alpine based container inside of the cluster)
[jdmarble@jdmarble-desktop ~]$ dig github.com.mylocaldomain.tld
...
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26637
...
;; AUTHORITY SECTION:
mylocaldomain.tld. 1720 IN SOA cleo.ns.cloudflare.com. dns.cloudflare.com. ...
...
(note the NOERROR
response)
After fix:
[jdmarble@jdmarble-desktop ~]$ kubectl run alpine-git --image=alpine/git --restart=Never -it --rm clone https://github.com/octocat/Spoon-Knife.git
Cloning into 'Spoon-Knife'...
remote: Enumerating objects: 16, done.
remote: Total 16 (delta 0), reused 0 (delta 0), pack-reused 16
Receiving objects: 100% (16/16), done.
Resolving deltas: 100% (3/3), done.
(note that github.com
resolved inside an Alpine based container inside of the cluster)
[jdmarble@jdmarble-desktop ~]$ dig github.com.myinternaldomain.tld
...
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 56469
...
;; AUTHORITY SECTION:
myinternaldomain.tld. 1044 IN SOA cleo.ns.cloudflare.com. dns.cloudflare.com. ...
...
(note the NXDOMAIN
response)
In my case, it was an easy decision to disable DNSSEC because the domain is only used internally and I'm not using Cloudflare for normal records. If you want to keep DNSSEC on, you may have to get creative or switch away from Cloudflare.
from foundryvtt-docker.
I also have this networking issue in my k3s cluster. @jdmarble's repo worked :D
from foundryvtt-docker.
Sure thing. I'll test it this evening (or possibly tomorrow if I run out of time) and I'll post back here
from foundryvtt-docker.
In case this is helpful.
I noticed that the felddy/foundryvtt:improvement-debian
worked fine, however the following are errors in felddy/foundryvtt:latest
Entrypoint | 2021-03-16 16:15:16 | [debug] Timezone set to: UTC
Entrypoint | 2021-03-16 16:15:16 | [info] Starting felddy/foundryvtt container v0.7.9
Entrypoint | 2021-03-16 16:15:16 | [debug] CONTAINER_VERBOSE set. Debug logging enabled.
Entrypoint | 2021-03-16 16:15:16 | [info] No Foundry Virtual Tabletop installation detected.
Entrypoint | 2021-03-16 16:15:16 | [info] Using FOUNDRY_USERNAME and FOUNDRY_PASSWORD to authenticate.
Authenticate | 2021-03-16 16:15:16 | [debug] Saving cookies to: cookiejar.json
Authenticate | 2021-03-16 16:15:16 | [info] Requesting CSRF tokens from https://foundryvtt.com
Authenticate | 2021-03-16 16:15:16 | [debug] Fetching: https://foundryvtt.com
Authenticate | 2021-03-16 16:15:16 | [error] Unable to authenticate: request to https://foundryvtt.com/ failed, reason: getaddrinfo ENOTFOUND foundryvtt.com
Results Locally
Unable to find image 'node:14-alpine' locally
14-alpine: Pulling from library/node
e95f33c60a64: Pull complete
0f691a8bb887: Pull complete
daf9b71c0a0d: Pull complete
d92a928c7b7d: Pull complete
Digest: sha256:a75f7cc536062f9266f602d49047bc249826581406f8bc5a6605c76f9ed18e98
Status: Downloaded newer image for node:14-alpine
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Name: foundryvtt.com
Address: 44.234.61.225
Non-authoritative answer:
inside k3s: (yaml included) (this also worked setting the dns server to 8.8.8.8)
apiVersion: batch/v1
kind: Job
metadata:
name: hello
spec:
template:
# This is the pod template
spec:
containers:
- name: dns-test
image: node:14-alpine
command: ['nslookup', 'foundryvtt.com']
restartPolicy: OnFailure
---
Server: 10.43.0.10
Address: 10.43.0.10:53
Non-authoritative answer:
Non-authoritative answer:
Name: foundryvtt.com
Address: 44.234.61.225
from foundryvtt-docker.
I have not been able to fixt his yet but I suspect this may be an issue with core DNS.
Lookups for foundryvtt.com appear to be failing because passthrough does not seem to be working
from coredns logs
[INFO] 10.1.182.28:51321 - 64102 "A IN foundryvtt.com.svc.cluster.local. udp 50 false 512" NXDOMAIN qr,aa,rd 143 0.000390493s
[INFO] 10.1.182.28:51321 - 41623 "A IN foundryvtt.com.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000535954s
[INFO] 10.1.182.28:51321 - 17998 "A IN foundryvtt.com.local. udp 38 false 512" SERVFAIL qr,rd,ra 113 0.03611267s
no lookups for foundryvtt.com though.
from foundryvtt-docker.
I'll test again this on my 3 k8s clusters with the Alpine image (my default), and update here and in the other thread too. I'm still have the 8.8.8.8 on my CoreDNS so I'll try both, and edit this post
My 3 clusters runs today K8S Version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T20:01:24Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Runnin CoreDNS k8s.gcr.io/coredns/coredns:v1.8.4
➜ k describe replicaset coredns-78fcd69978 -n kube-system
Name: coredns-78fcd69978
Namespace: kube-system
Selector: k8s-app=kube-dns,pod-template-hash=78fcd69978
Labels: k8s-app=kube-dns
pod-template-hash=78fcd69978
Annotations: deployment.kubernetes.io/desired-replicas: 2
deployment.kubernetes.io/max-replicas: 3
deployment.kubernetes.io/revision: 1
Controlled By: Deployment/coredns
Replicas: 2 current / 2 desired
Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: k8s-app=kube-dns
pod-template-hash=78fcd69978
Service Account: coredns
Containers:
coredns:
Image: k8s.gcr.io/coredns/coredns:v1.8.4
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Priority Class Name: system-cluster-critical
Events: <none>
Confirmed with the same
Authenticate | 2022-01-24 19:52:07 | [error] Unable to authenticate: request to https://foundryvtt.com/auth/login/ failed, reason: getaddrinfo EAI_AGAIN foundryvtt.com
I have found something interesting that may solve the issue.
Though the call to dns.lookup() will be asynchronous from JavaScript's perspective, it is implemented as a synchronous call to getaddrinfo(3) that runs on libuv's threadpool. This can have surprising negative performance implications for some applications, see the UV_THREADPOOL_SIZE documentation for more information.
and from:
https://nodejs.org/api/cli.html#cli_uv_threadpool_size_size
more here: https://medium.com/@amirilovic/how-to-fix-node-dns-issues-5d4ec2e12e95
This solved my issue running 200 deployments.
from foundryvtt-docker.
Thanks for the research on this. I'm not entirely against switching the base image from Alpine to Debian. I'd like give upstream a bit of time to resolve this before jumping ship.
@jdmarble what was the impact to the image size using Debian-slim?
from foundryvtt-docker.
I expected the Debian (even slim) based image to be larger than the Alpine one. I was surprised, although I'm not sure I can trust the results because I don't understand them. I'm getting different numbers depending on the source.
$ podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.gitlab.com/jdmarble/foundryvtt-docker develop 6ad53b690aeb 3 days ago 106 MB
docker.io/felddy/foundryvtt latest e3706094d2a7 2 weeks ago 111 MB
The Gitlab repo reports for my "slim" spin: 32.56 MiB (edit: 34.14MB)
Your image size badge reports: 34 MB
Docker Hub reports the compressed image size for felddy/foundryvtt as 33.92 MB.
I tried pushing my image to Docker Hub to get an apples-to-apples, but it's taking a while to show up.
Maybe podman
is reporting uncompressed size?
Regardless, I wouldn't suggest something as drastic as a base image change only to fix this type of problem, but if a slightly smaller image size is interesting (if it's true). :)
from foundryvtt-docker.
I think I'm being affected by this issue too, but in the weirdest way I could imagine. I've spent the last 4 hours debugging and searching lol. I'm spinning this up in Kubernetes.
Started when I got errors talking about rejected certs during the download process. I managed to get a shell into a container, and voila!
(all four commands ran in quick succession)
The 404s are from my on public facing traefik instance, and then it eventually curls correctly, randomly. The next request was back to the 404s
I'm going to try building the image myself from different bases like @jdmarble did, but this is just an impact report I guess
Edit Bless you jdmarble you forked and pushed your port. May the coding gods smile upon you
from foundryvtt-docker.
Update: Looks like that was unsuccessful. I was able to build the image successfully, but I still have the same problem. Sorry for the noise. Considering this may be unrelated, I can move my information to another ticket if you prefer.
from foundryvtt-docker.
I had hoped upstream would have fixed this issue in busybox, but that doesn't seem to be happening. Also, this is starting to affect more people.
I have started a branch using the node:14-slim
base image:
https://github.com/felddy/foundryvtt-docker/tree/improvement/debian
I'm a little concerned about the size increase (but it is not a show stopper):
❱ docker images | grep foundry
felddy/foundryvtt 0.7.9-slim ce29f9a2bc03 44 minutes ago 195MB
felddy/foundryvtt 0.8.0 f676a803cfcb 3 weeks ago 126MB
felddy/foundryvtt release e3706094d2a7 2 months ago 103MB
felddy/foundryvtt release-0.7.9 38a78b0459a4 2 months ago 103MB
The bigger issue that I need to resolve is that only half of the architectures supported by Alpine are offered by Debian:
os/arch | node:14-alpine | node:14-slim |
---|---|---|
linux/amd64 |
✅ | ✅ |
linux/arm/v6 |
✅ | |
linux/arm/v7 |
✅ | ✅ |
linux/arm64/v8 |
✅ | ✅ |
linux/ppc64le |
✅ | |
linux/s390x |
✅ |
I don't have any idea how many users this would impact. I'd guess that loss of arm/v6
would be the biggest impact. I know a good number of people run Foundry on Raspberry Pis and this would remove support for the RPi 1 B
and RPi 1 B+
.
In any case, if you'd like to test the image from this branch it is available to be pulled as felddy/foundryvtt:improvement-debian
. I would appreciate any feedback from the folks on this issue since I don't have a K8s cluster readily available.
If you have any comments about the limited architectures, that would also be helpful.
from foundryvtt-docker.
Could I also get folks to try running this and posting the results. I'm unable to reproduce the behavior here, and want to verify that it hasn't been fixed upstream:
❱ docker run -it --rm --dns 8.8.8.8 node:14-alpine nslookup foundryvtt.com
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Non-authoritative answer:
Name: foundryvtt.com
Address: 44.234.61.225
from foundryvtt-docker.
@annonch Those are promising results.
When you get a chance could you check if the nightly build is exhibiting the same behavior as the last release: felddy/foundryvtt:nightly
If node:14-alpine
is working, I'd expect that felddy/foundryvtt:nightly
should work as well.
🤞
from foundryvtt-docker.
Unfortunately I can't provide such good results. I'm running these in kubernetes
I just curled the foundry website to to test resolution. Here I used wc
to condense the output. But the 4 word result is the bad DNS resolution, the 698 word result is the proper web page.
I tried this against improvement-debain
but the behavior is still there:
And against nightly it was all 4s.. I didn't get a single good hit to the foundry website.
Now, if I'm the only one here I'm willing to concede that its just my setup, this may be unrelated, and I'm just making noise 😆
I can work around by setting my DNS policy to None and manually assigning DNS servers.
from foundryvtt-docker.
@adam8797 how are you running your K8S I never had an issue with the DNS resolution using the alpine container.
I did a 1000 requests in a row using @felddy example command and they all came out clean.
I know that with K8S sometimes policies or security groups if you are using in AWS can result in some inconsistent DNS resolutions. I'm running foundry today in RPi4 with k3s, local with composer, and in a server with KIND and k8s for development and testing.
If you guys have any other set of tests that I could run please let me know.
from foundryvtt-docker.
I am also having this issue on a k8s cluster setup via kubeadm. This is the only container exhibiting the behavior and does so on both nightly and release-0.7.9. Not sure if it matters, but my k8s cluster is using CoreDNS and not kube-dns.
from foundryvtt-docker.
Hi @aetaric how the network on your clusters are configured, I saw problems with k8s and CoreDNS naming resolution due to the security groups and firewalls connections between the nodes.
On all my environments I never had issues and my K8S development that runs on AWS with EKS also has CoreDNS and doesn't have the problem.
from foundryvtt-docker.
Well, I am using flannel as the backing network fabric. So no network policy antics should be going on. I am running in vxlan mode for communication between nodes so that might have something to do with it?
As for physical and logical networking, all k8s nodes are same VLAN, same ToR switch, same subnet.
As I mentioned before, other containers are able to resolve DNS without issue and improvement-debain
does seem to work, if not perfectly, well enough for the container to pull the app distribution and license info.
from foundryvtt-docker.
So I might have some insight into what the container is doing weird here. I was reviewing my DNS query logs and it seems the container is appending the search domain from DHCP options to the foundry address.
Got query for foundryvtt.com.k8s.domain.tld|A from 192.168.9.254:8517, relayed to 192.168.8.100:53
Got query for foundryvtt.com.k8s.domain.tld|AAAA from 192.168.9.254:62140, relayed to 192.168.8.100:53
from foundryvtt-docker.
I have upgraded my K8S Cluster to 1.22 and first time I got this error.
Just to let registered here the fix for me was ensure that CoreDNS
was sending the resolution to a external resolver adding the 8.8.8.8
to the ConfigMap
Data
====
Corefile:
----
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . 8.8.8.8 /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
from foundryvtt-docker.
This issue has been automatically marked as stale because it has been inactive for 28 days. To reactivate the issue, simply post a comment with the requested information to help us diagnose this issue. If this issue remains inactive for another 7 days, it will be automatically closed.
from foundryvtt-docker.
This issue has been automatically closed due to inactivity. If you are still experiencing problems, please open a new issue.
from foundryvtt-docker.
Related Issues (20)
- FOUNDRY_UID crashes running chown HOT 6
- Add support for `--nobackup` flag and configuration option
- Automate container_cache cleanup HOT 3
- ERROR: request returned Internal Server Error for API route and version http://%2F%2F.%2Fpipe%2Fdocker_engine/_ping, check if the server supports the requested API version HOT 3
- Install Breaks when Admin provides Windows Install URL causing container to store the exe file in cache as zip file blocking further start after putting the correct Linux NODE URL. HOT 3
- SSL key not found HOT 1
- Foundry website login change broke authentication helper utility HOT 10
- Foundry login screen field name changes results in failed login attempts HOT 1
- Fatal error in Docker on a Raspberry Pi 4 HOT 8
- Health Check for docker container falsely reports unhealthy when foundryvtt is running with inbuilt https server
- Missing 10.312 tag HOT 1
- Shutting down the docker keeps the packs locked HOT 4
- container_patches directory cannot be accessed by container patch script
- Restarting causes it to download and install again HOT 1
- Container not working HOT 3
- user: 5000:5000 support
- RiskV support HOT 1
- Cannot launch instance due to unzip conflict HOT 2
- Allow ability to see module compatibility checks HOT 14
- Portainer sets FOUNDRY_VERSION and downloads the wrong version HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from foundryvtt-docker.