resolved does not send .local requests to upstream DNS by default

Bug #2007728 reported by Frank Trampe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

On a network with multiple DNS servers provided by DHCP, only the first two of which cover local names, resolved returns universally known names but fails to return the special names even when the "Current DNS Server" shown by `resolvectl status` returns the special names.

Suppose that 172.16.9.5 and 172.16.10.5 are the two internal DNS servers with the local names. Windows servers with Active Directory enabled in this case. The DHCP server (a Cisco 4451 in this case) provides DNS servers 172.16.9.5, 172.16.10.5, 192.168.0.1, and 8.8.8.8. `resolvectl status` shows all of these as "DNS Servers" and 172.16.9.5 as the "Current DNS Server".

`host localdomain.local` returns SRVFAIL, and `host localdomain.local 127.0.0.53` returns SRVFAIL, but `host localdomain.local 172.16.9.5` returns the correct result. This all happens regardless of the "Current DNS Server".

Sometimes the "Current DNS Server" switches to 8.8.8.8 for reasons that are not clear even when the other servers are working properly, which seems to violate the principle of RFC 2132 section 3.8 that servers are listed in order of preference.

So, in short, it seems that the correct behavior is that (1) resolved returns results consistent with its "Current DNS Server" and (2) resolved picks as its "Current DNS Server" the first reachable server in the list. The current behavior is that (1) resolved returns results sometimes inconsistent with its "Current DNS Server" and (2) resolved sometimes picks as its "Current DNS Server" some server other than the first reachable server in the list. The first issue is consistently reproducible, and the second is readily reproducible in a short period of time.

The problem appears on Ubuntu 22.04 and seems not to be present on Ubuntu 18.04.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I believe this is because you are defining ".local" domains in your DNS server. According to [1], "lookups for domains with the ".local" suffix are not routed to DNS servers, unless the domain is specified explicitly as routing or search domain for the DNS server and interface. This means that on networks where the ".local" domain is defined in a site-specific DNS server, explicit search or routing domains need to be configured to make lookups work within this DNS domain. Note that these days, it's generally recommended to avoid defining ".local" in a DNS server, as RFC6762 reserves this domain for exclusive MulticastDNS use."

In other words, I think you can either (1) choose a different domain suffix, or (2) override the default behavior by configuring the Domains= property in resolved.conf[2].

[1] https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html#Protocols%20and%20Routing
[2] https://www.freedesktop.org/software/systemd/man/resolved.conf.html#Domains=

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Frank Trampe (frank-trampe) wrote :

The first two servers do indeed provide the .local domains. The possible violation of RFC 6762 does not explain the inconsistency of the results or the regression from Ubuntu 18.04 and Ubuntu 20.04. There is no case in which the correct behavior for a single configuration is to query the "Current DNS Server" for the .local name sometimes and mDNS other times. This also does not explain why the "Current DNS Server" selection sometimes fails to observe the order provided in the DHCP response. If resolved ignores the server ordering and the low-priority servers lack the internal names, even switching the suffix of the internal names is insufficient to provide the desired results. We have reverted the clients in question to Ubuntu 20.04 for now, and they work correctly.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I guess you are talking abut two separate issues here. I was addressing the "fails to resolve .local domains" issue. Please open a separate bug report, including debug-level logs from systemd-resolved, for the "inconsistent DNS server selection" issue. Generally, once a DNS server fails in some way, resolved will switch to the next server in the list, and stick with that while it is working. So there may just be some errors while using the first two servers. Hence, debug-level logs from systemd-resolved would be helpful to diagnose that problem.

I will have to look closer at changes from 20.04 to 22.04, but at the moment I think the behavior WRT .local domains is working as documented.

Revision history for this message
Frank Trampe (frank-trampe) wrote :

Would you describe the "as documented" behavior? It still seems wacky to me that resolved returns the DNS result the majority of the time but not all of the time. If the design intent is to use only mDNS for .local domains, it ought to ignore DNS entirely for those domains. Inconsistent behavior means that a configuration can test as correct, fail in the field, fail to replicate the failure, and frustrate isolation of the problem. I think that the earlier behavior makes a lot more sense and would prefer to return to it.

Are you able to replicate the issue?

Given how closely the two possibly separate problems are related and their similar effects, I am inclined to wait on filing a second bug report on the server selection until it is clear that these are in fact separate issues. The fact that no other hosts on the network exhibit the problem (a highly symptomatic one since it breaks most services) suggests that this is not an issue of both internal servers failing at the same time.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I am referring to the section of [1] that states that "lookups for domains with the ".local" suffix are not routed to DNS servers, unless the domain is specified explicitly as routing or search domain for the DNS server and interface." But now I am confused by what you are saying. Initially you said:

> `host localdomain.local` returns SRVFAIL, and `host localdomain.local 127.0.0.53` returns SRVFAIL, but `host localdomain.local 172.16.9.5` returns the correct result. This all happens regardless of the "Current DNS Server".

But now you are saying systemd-resolved *sometimes* resolves .local domains correctly?

[1] https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html

Revision history for this message
Frank Trampe (frank-trampe) wrote :

Now that you mention it, I'm not sure. Something was definitely inconsistent, but the inconsistency may have been across different internal names rather than across requests on the same name, and it did not occur to me at the time that the .local names were in a different category. I will check tomorrow and report back.

Revision history for this message
Frank Trampe (frank-trampe) wrote :

Alright. The failure on a specific .local domain is consistent. I have not tested adding a non-".local" domain to the preferred name server, but your explanation that resolved now fully excludes .local from DNS queries makes sense. I still think that this is undesirable behavior since it breaks common legacy configurations without a clear indication of what the issue is and without an easy fix even for those who know what is broken.

Revision history for this message
Frank Trampe (frank-trampe) wrote :

I split the "Current DNS Server" issue into bug #2008964.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

Have you tried explicitly setting .local as a search domain? I.e. add it to Domains= in /etc/systemd/resolved.conf (or an override). I understand your frustration, but it is documented correctly on the systemd-resolved man page, and there is a documented work around.

Also, I just checked, and this logic has been in systemd-resolved since v229, i.e. around 2015. Is it possible that you're missing some local configuration that made this work in your setup on previous Ubuntu releases?

summary: - resolved results differ from those from its current upstream server.
+ resolved does not send .local requests to upstream DNS by default
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.