missing EDNS0 record confuses systemd-resolved

Bug #1785383 reported by Steve Dodd on 2018-08-04
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Undecided
Unassigned
systemd (Ubuntu)
Undecided
Unassigned

Bug Description

dnsmasq 2.79 and below omits EDNS0 OPT records when returning an empty answer for a domain it is authoritative for. systemd-resolved seems to get confused by this in certain circumstances; when using the stub resolver and requesting an address for which there are no AAAA records, there can sometimes be a five second hang in resolution.

This is fixed by upstream commit http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=1682d15a744880b0398af75eadf68fe66128af78

Not sure if it is worth cherry picking? I imagine the most likely trigger will be dnsmasq on routers which are not likely to be running Ubuntu, but maybe just in case.

I also think there are some logic issues in systemd-resolved, upstream bug filed:

https://github.com/systemd/systemd/issues/9785

Simple-ish test case:

---
IFACE=dummy0
SUBNET=10.0.0

ip link add $IFACE type dummy
ifconfig $IFACE ${SUBNET}.1/24
dnsmasq -h -R -d -C /dev/null -2 $IFACE -z -i $IFACE -I lo --host-record=test.test,${SUBNET}.1 &

dig -t a test.test @10.0.0.1 | grep EDNS
# should return "; EDNS ..."
dig -t aaaa test.test @10.0.0.1 | grep EDNS
# again, should return "; EDNS ..." but doesn't
---

To reproduce the systemd-resolved side of the problem

---
# as above, but
# now configure systemd-resolved to look at only 10.0.0.1, then

systemd-resolve --reset-server-features
# should exhibit five second delay then connect, assuming sshd is running :)
ssh test.test
---

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: dnsmasq-base 2.79-1
ProcVersionSignature: Ubuntu 4.15.0-23.25-generic 4.15.18
Uname: Linux 4.15.0-23-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
Date: Sat Aug 4 11:33:56 2018
InstallationDate: Installed on 2018-05-31 (64 days ago)
InstallationMedia: Xubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: dnsmasq
UpgradeStatus: No upgrade log present (probably fresh install)

Steve Dodd (anarchetic) wrote :
Steve Dodd (anarchetic) wrote :

Amend to test case:

dnsmasq -h -R -d -C /dev/null -2 $IFACE -z -i $IFACE -I lo -S /test/ --host-record=test.test,${SUBNET}.1

Cannot reproduce bug in systemd 239, but would be good to know which commit fixed the problem for cherry picking purposes.

Steve Dodd (anarchetic) wrote :

On further investigation this seems to be specific to the Ubuntu version of systemd 237. I cannot reproduce it with the upstream release.

Steve Dodd (anarchetic) wrote :

Reverting the patch "resolved-Mitigate-DVE-2018-0001-by-retrying-NXDOMAIN-with.patch" solves this problem for me. My best guess is that the following patch segment changes some key logic:

@@ -388,12 +388,12 @@ static int dns_transaction_pick_server(DnsTransaction *t) {
         if (!server)
                 return -ESRCH;

- /* If we changed the server invalidate the feature level clamping, as the new server might have completely
- * different properties. */
- if (server != t->server)
+ /* If we changed the server invalidate the current & clamp feature levels, as the new server might have
+ * completely different properties. */
+ if (server != t->server) {
                 t->clamp_feature_level = _DNS_SERVER_FEATURE_LEVEL_INVALID;
-
- t->current_feature_level = dns_server_possible_feature_level(server);
+ t->current_feature_level = dns_server_possible_feature_level(server);
+ }

Note that it makes the assignment dependent on the test, I don't know if this was intentional or not.

Chris E (cbz) wrote :

In my opinion the log message from system also needs to be dropped - a number of systems will use NXDOMAIN as a means of domain blocking/ad blocking, and this isn't thus an exceptional event that needs logging each time.

Arduous (samuel-progin) wrote :

Returning NXDOMAIN is the behavior of Adblock on Turris-os (a derivative of OpenWRT) with Knot resolver as back-end. I am of the same opinion than @cbz . At the moment I will limit the logging rate.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dnsmasq (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers