systemd-resolved and libvirt dnsmasq instance get into a busy loop when a query is issued for a URI or SRV record

Bug #1694156 reported by Steve Langasek on 2017-05-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Undecided
Unassigned
libvirt (Ubuntu)
Undecided
Unassigned
systemd (Ubuntu)
Undecided
Unassigned

Bug Description

I have a zesty system that uses systemd-resolved, as per default, which also has dnsmasq configured for use on interface virbr0 for my libvirt bridge.

This system is also part of a Kerberos realm. Recent versions of Kerberos do a lookup of a URI RR, à la:

$ nslookup -q=URI _kerberos.dodds.net
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
*** Can't find _kerberos.dodds.net: No answer

Authoritative answers can be found from:

$

There is no URI DNS record published for this domain, so the lack of response is correct. However, systemd-resolved and dnsmasq then get in a busy loop, passing the same query back and forth between each other. (Confirmed with wireshark.)

If I query SRV records under the same domain (which is also part of what kerberos does), these positive results are correctly returned to the client, but systemd-resolved and dnsmasq again get into a busy loop.

If I query a URI record for a domain /other than/ what I have configured as my DNS search domain, there is no busy loop.

If I query other kinds of records (whether they return results or not), such as A, CNAME, and MX records, there is no busy loop.

/etc/resolv.conf looks like:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
domain dodds.net
search dodds.net
nameserver 127.0.0.53
nameserver 192.168.122.1

systemd-resolve says:

$ systemd-resolve --status
Global
         DNS Servers: 192.168.122.1
          DNS Domain: dodds.net
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

[...]
Link 2 (wlan2)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.0.1
          DNS Domain: dodds.net

$

I'm not sure where this bug lies. Both DNS servers are by design configured not to cache results, in order to avoid cache poisoning and information leaks, so neither DNS server can detect that they've already asked for the record and don't need to recurse. I think it's probably a bug for dnsmasq to be configured as a server for resolving 'dodds.net' - I have nowhere specified that this is appropriate and this potentially conflicts with legitimate records in this domain. But it also must be a bug that SRV/URI records result in recursion but A/CNAME/MX records do not.

Steve Langasek (vorlon) on 2017-05-28
summary: systemd-resolved and libvirt dnsmasq instance get into a busy loop when
- a query is issued for a URI record
+ a query is issued for a URI or SRV record
Steve Langasek (vorlon) wrote :

Trying to understand why the dnsmasq is registered in /etc/resolv.conf at all, I find that it's listed in /etc/resolvconf/resolv.conf.d/tail. So 192.168.122.1 being listed as a global DNS server is a result of local configuration, which means this problem is at least partly self-inflicted.

If I remove this from /etc/resolvconf/resolv.conf.d/tail and restart systemd-resolved, I no longer see 192.168.122.1 listed at all in systemd-resolve --status. So there is no longer any DNS loop; OTOH, I also no longer get DNS resolution of the names of my VMs. While this works around the original symptom (which is still a bug somewhere, due to the correct handling of A/CNAME/MX but wrong handling of SRV/URI), there also needs to be a proper way to register libvirt's dnsmasq as an auxiliary DNS server for the VMs.

Hi Steve,
thanks for bringing that up and debugging that much already.
In regard to this issue you kind of continued in bug 1694161.
You outlined there that the reason this hit you more than others was that you added a /etc/resolvconf/resolv.conf.d/tail to make libvirts dns resolution known in the host.
I like the suggestion you made there as a proper solution via libvirt registering "correctly" into systemd resolvd.

I wondered if I should close the bug here as a dup to the other one then.
But since this will take time to be worked on, upstreamed, and so on I wondered if you tried to add
"--dns-loop-detect" as a workaround for your setup.
It might break the DNS forwarding loops, but retain the name resolution you wanted.

Do you think that is worth a try?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers