Giter Site home page Giter Site logo

Comments (8)

dseomn avatar dseomn commented on May 31, 2024

It seems that using a fallback_upstream on localhost is irrelevant. After changing to --fallback_upstream=8.8.8.8:53, I still had the same issue. However, changing systemd-resolved to point (indirectly) at unbound instead of dnss (thereby breaking the loop if dnss was trying to use getaddrinfo()) seems to have fixed the problem. So I think dnss is completely ignoring the fallback configuration, maybe?

from dnss.

dseomn avatar dseomn commented on May 31, 2024

Since systemd-resolved had errors for . and google, I just tried running dnss with --fallback_domains=. google. dns.google., and the error went away. Is fallback_domains based on exact matches, or does it include subdomains? If it includes subdomains, then I just sent everything to the fallback_upstream which is pointless. But if it doesn't include subdomains, that might indicate that the problem is in interaction between the system resolver and dnss.

from dnss.

albertito avatar albertito commented on May 31, 2024

Thank you for the detailed bug report!

It's not immediately clear to me what's going on, since dnss has no special casing for systemd-resolved, fallback_upstream is used verbatim to resolve the domains in fallback_domains (but no subdomains, just exact matches).

I will take a closer look at this this week, and try to reproduce it.

In the meantime, if you can, you can use -monitoring_listen_addr to start a monitoring http server, and navigate to it. There you should be able to see the traces of the requests, including fallback requests, and those might give us a better clue of what's going on. Running dnss with -v=3 will log more detailed information (to the logs and the monitoring traces).

Thanks again!

from dnss.

dseomn avatar dseomn commented on May 31, 2024

Here's the table from http://localhost:9981/debug/requests?fam=dnsserver&b=0&exp=1 taken right after a SERVFAIL response. Sorry for the formatting, I couldn't think of an easy way to get that to render right on GitHub.

When Elapsed (s) .
2021/03/03 18:58:36.453939 4.001256 Handler
18:58:36.453947 . 8 ... from:[::1]:34608 id:28789
18:58:36.453960 . 13 ... Q: (google. DNSKEY IN)
18:58:36.453992 . 32 ... DoH POST https://dns.google/dns-query
18:58:40.454911 4.000920 ... POST failed: Post https://dns.google/dns-query: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021/03/03 18:58:36.451806 0.000734 Handler
18:58:36.451813 . 7 ... from:[::1]:27757 id:11695
18:58:36.451830 . 16 ... Q: (dns.google. DS IN)
18:58:36.452393 . 564 ... used fallback upstream ([::1]:14653)
18:58:36.452407 . 13 ... ;; opcode: QUERY, status: NOERROR, id: 11695 ;; flags: qr rd ra ad cd;
18:58:36.452424 . 17 ... dns.google. 3600 IN DS 56044 8 2 1B0A7E90AA6B1AC65AA5B573EFC44ABF6CB2559444251B997103D2E40C351B08
18:58:36.452451 . 27 ... dns.google. 3600 IN RRSIG DS 8 2 3600 20210322234431 20210228234431 57286 google. Ll+Fc9J8fUZ245tfppW846OXRZCsVrET+5x4cIQHXu4UtTkDwPKPcIrEn4gdtrYqvYmUezxLFMbyOjkJpCf5VoeCmcN/Y/p5eW722vRufc4dsceT9EC5M9uqofJ5hfoIzOjAPC27dtqP04qfPu4SMTfLkzjlynQDtTlnPNQ7gxk=
2021/03/03 18:58:36.449200 0.001100 Handler
18:58:36.449210 . 10 ... from:[::1]:58097 id:5630
18:58:36.449239 . 29 ... Q: (dns.google. DNSKEY IN)
18:58:36.450031 . 793 ... used fallback upstream ([::1]:14653)
18:58:36.450051 . 19 ... ;; opcode: QUERY, status: NOERROR, id: 5630 ;; flags: qr rd ra cd;
18:58:36.450080 . 29 ... dns.google. 86400 IN DNSKEY 256 3 8 AwEAAZxHngRtF/7L/WllES2Blv7IAE5I7EUNzuYh77xkOoZNtVP9oyIl/VscbW5mWMIL78LTxVooIVyJH5jTIhsYwo8xO2UjdgkXxBak+W6Z3o0iAiFfUHhAZa+UuXeAiAoOfCDLy2bkuL5sl8ZKGmO8+xDxkOTPZXG7jo8mpmjYaq/3
18:58:36.450106 . 26 ... dns.google. 86400 IN DNSKEY 257 3 8 AwEAAbwcDvL6jj3SGrhFmpiqahBVGlEIrt5OoVmgCINPJozYpXozSsb0yVkK86vOGcaYZXFoaUisWp0s3GJ5pYD74CyHc1s0CIbGdzNOXthhH2rAhR2YgsnKJUmv8Odo1bFbTy9vzrWd3qQ/Mgp9fOFTE7Dkys2c9bpJmvw0WScdW4xIxTXHilwIhqmrxeNjyRlgb3C8r2NWxjj4sqcATN9BSgk4fLvvB7UneWqWgM8kSd29bdL907nsg7eSuzETwTAHkU2mGHdC5i3kvRDgTqPC2zwk+tF5Rau6x78XkZJm5pcyD0AYsmusJqjGa60FFe759CZQ0Y8K7cZPvom/cozR3lM=
18:58:36.450149 . 43 ... dns.google. 86400 IN RRSIG DNSKEY 8 2 86400 20210402170952 20210303170952 56044 dns.google. hCC6mAhmPe4lkfExd5uLEJvUrEPrxQa2ruBjbu+2RwsWV6Ez/E9Vq+EDnD5ULS88bU6BI5RiwJ1lrpO/X5P4bDB9motXJ7cxQQDMMsq9pPBqgkrR4zzonwFLyHhIBw6uDbKx8DbJu99ILIgaTJaAEDb8JQRTSshOosxFpSovtByyqfUsUleCN9e6ts08D8JCBUgQEeLHR9v8Kjbm1Z1MaHlLczV+Ct8DQ+Z6TjUwFRzrH6rewDNPfDzVgGcdGoPvgqvYEp1u7BE7I+12z6iW9jrZL305dBsrpy+UnxV8U/7WMgqELoJXv+LqYjZj13rbo9AnRK6ndfG21EJWo8uFlQ==
2021/03/03 18:58:35.907520 0.539393 Handler
18:58:35.907536 . 16 ... from:[::1]:30984 id:12199
18:58:35.907565 . 29 ... Q: (dns.google. AAAA IN)
18:58:36.446543 .538978 ... used fallback upstream ([::1]:14653)
18:58:36.446579 . 37 ... ;; opcode: QUERY, status: NOERROR, id: 12199 ;; flags: qr rd ra ad cd;
18:58:36.446608 . 29 ... dns.google. 900 IN AAAA 2001:4860:4860::8844
18:58:36.446640 . 32 ... dns.google. 900 IN AAAA 2001:4860:4860::8888
18:58:36.446691 . 51 ... dns.google. 900 IN RRSIG AAAA 8 2 900 20210402170952 20210303170952 1773 dns.google. MGGNXtS05I8OGA7Isg6PH+N7bvIfsLO/jCRA1qzhV8ep8l9t8sh0hZo/Y9H41S3Df1jtMVAqnmsorMiqZ9kkMfjG9BEoFNbrhy1E93QqRSi3EYW+sJ6Qi9sCi7u0jqH/jsTzYXG74Io0ezR2gTifDEZpDfJyValleT03TcMvPLQ=
2021/03/03 18:58:35.904871 0.542443 Handler
18:58:35.904883 . 11 ... from:[::1]:19787 id:17363
18:58:35.904935 . 53 ... Q: (dns.google. A IN)
18:58:36.447125 .542190 ... used fallback upstream ([::1]:14653)
18:58:36.447141 . 16 ... ;; opcode: QUERY, status: NOERROR, id: 17363 ;; flags: qr rd ra ad cd;
18:58:36.447156 . 15 ... dns.google. 900 IN A 8.8.8.8
18:58:36.447169 . 14 ... dns.google. 900 IN A 8.8.4.4
18:58:36.447198 . 28 ... dns.google. 900 IN RRSIG A 8 2 900 20210402170952 20210303170952 1773 dns.google. MbxtIILgzvZ0jOdKA4Oan9nRucHQ2KEj2hANZzMyATzxPNp7ZxEpf6xUUpxBT1e404jPcBfKfdcJPT7fT3AKzQzxzDmWPuPU8J6E3mvhNQQg0CGQknaTWXtMTwrm9qMUDTB8GsmkX/MfcwFdkCIX/dyZyMSYxBQJuHJD2RfOfPw=
2021/03/03 18:58:35.898867 4.001653 Handler
18:58:35.898912 . 44 ... from:[::1]:36245 id:14133
18:58:35.898948 . 36 ... Q: (google.com. A IN)
18:58:35.899187 . 239 ... DoH POST https://dns.google/dns-query
18:58:39.900213 4.001027 ... POST failed: Post https://dns.google/dns-query: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I also tried adding DNSSEC=no to /etc/systemd/resolved.conf and that fixed the issue. So I think what's happening is that dnss is using the system resolver for dns.google.. systemd-resolved then queries dnss for dns.google. A and AAAA which succeeds, but it also queries dnss for DNSSEC-related records on google. and . which fails because those aren't in fallback_domains and dnss can't POST to https://dns.google/dns-query because it's still waiting for a response from systemd-resolved.

So if I'm understanding this correctly, I think that means there are 4 options:

  1. I could set --fallback_domains='. google. dns.google.' so that systemd-resolved can successfully do DNSSEC verification before dnss is able to POST to https://dns.google/dns-query.
  2. I could configure systemd-resolved with DNSSEC=no so that it only looks up A and AAAA on dns.google. without trying to query for google. or ..
  3. dnss could bypass the system resolver and talk to fallback_upstream directly.
  4. dnss could add a feature to specify the address to connect to for https_upstream separately from the hostname used for TLS SNI and the HTTP Host header. Maybe something like -https_upstream=https://dns.google/dns-query -https_upstream_address=8.8.8.8,8.8.4.4. Then the fallback flags wouldn't be necessary.

Do you think either (3) or (4) are desirable? Any idea how easy they would be? I don't really know any go, but I could potentially give it a shot if you want.

from dnss.

albertito avatar albertito commented on May 31, 2024

I think you're right. I spent a bit looking at the code and can confirm that dnss is using the system's default resolver to find upstream. Normally that request is either directed to dnss, and the fallback mechanism gets engaged, or it's a normal DNS lookup that returns a usable value; but in this case due to the systemd-resolved and dnssec interactions the loop isn't working properly. I don't know why it occasionally works, but I suspect that's due to some systemd-resolved behaviour, and it doesn't change the nature of the problem anyway.

As for the options, (3) is probably cleaner, reasonably safe to change, and should work in a wide range of circumstances. It's also the least surprising, I think. It's not straightforward to do cleanly due to how the code is structured, but it's doable.

For (4) I think it's also doable but much more brittle and prone to accidental issues on a sensitive area, so I rather avoid it. Note that on domains with certificates with IP fields, dnss works just fine. That's how for example https://1.1.1.1/dns-query can be used with dnss, because there is a valid certificate for it.

Until I have a change ready to try (which will take me a couple of days) as a workaround (1) should work just fine, and you won't lose dnssec resolution for most domains.

I'll let you know once there's a patch to try :)

Thanks a lot again for such detailed bug report and all the help in figuring out what's going on!

from dnss.

albertito avatar albertito commented on May 31, 2024

If you want, you can give commit 7d3ca40 a try, which is on the next branch.

It's a work in progress so it will be amended, but I did some ad-hoc tests and it seems to work fine, and in a way that should solve your issue since systemd-resolved should be totally bypassed now.

I will polish and extend it in the next few days, and will let you know once there's a more tested version; but just in case you want to give it a try early.

Thanks!

from dnss.

dseomn avatar dseomn commented on May 31, 2024

I don't know why it occasionally works, but I suspect that's due to some systemd-resolved behaviour, and it doesn't change the nature of the problem anyway.

Oh sorry, I think I figured that out already but didn't say anything because it didn't seem too relevant. While debugging initially, I kept changing settings and restarting dnss but not restarting systemd-resolved. I think systemd-resolved has its own cache, so if I got things working, it could cache dns.google. and then when I changed dnss's settings back to something that normally didn't work, it would sometimes work because of systemd-resolved's cache. Once I started running the below command between every change, everything was deterministic. (And yes, I'm aware that 4 separate DNS servers on one machine is a lot :) )

systemctl daemon-reload && service systemd-resolved restart && service dnsmasq restart && service unbound restart && service dnss restart

If you want, you can give commit 7d3ca40 a try, which is on the next branch.

That worked, thanks!

Thanks a lot again for such detailed bug report and all the help in figuring out what's going on!

Thanks for fixing it!

from dnss.

albertito avatar albertito commented on May 31, 2024

The fix for this issue is now in the master branch, thanks again!

from dnss.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.