dnsmasq prematurely returns REFUSED, breaking resolver

Bug #1726017 reported by Martin Wilck on 2017-10-22
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
Zesty
Undecided
Unassigned

Bug Description

[Impact]

 * DNS name resolution fails in certain network configurations, where
   different DNS servers are responsible for different domains and one or
   more servers reply REFUSED to queries that regard other domains than
   their own. Without the patch, dnsmasq returns a negative reply to
   if only one such negative answer is received from a forwarder, even
   if other forwarders return valid responses.

   This breaks
   the resolver and practically all internet connectivity, including web
   browsing, email, and receiving updates.

 * This should be backported to stable to fix internet connectivity
   for users.

 * The patch fixes the problem by querying all servers and only returning
   a negative reply to the requestor only if *all* forwarders return negative
   responses.

[Test Case]

 * It should be possible to test this in a virtual network. One DNS server
   should be responsible for queries to the outside world, and the other one
   could be a DHCP/DNS instance (perhaps dnsmasq, also) that handles internal
   IP addresses and names. It's important that at least one of these servers
   return REFUSED to queries that don't belong into its realm (assuming the
   domain name is "my.net", the server for "my.net" would reply REFUSED to
   "ubuntu.com" and every other domain. I am not sure if this is normally the
   case, all I can say is that my Linux-based ASUS router does it.

   Connect an Ubuntu VM to this network.

   To aggravate the problem, the DHCP server would put the internal DNS
   server first in the nameservers field. If that's the case, the problem
   would also occur if the client used "strict-order" in dnsmasq.conf.

[Regression Potential]

 * I don't see any. Would there be networks where admins rely upon getting
   NXDOMAIN back if just one server fails for a DNS query? I don't know.

 * [racb] As the behaviour in the area of REFUSED and SERVFAIL is being changed, it's probably worth checking during SRU verification that dnsmasq correctly passes back successful, REFUSED, SERVFAIL, zero-answer and 1+ answer responses in the simple, single upstream DNS server case. If there is a regression introduced by these patches, it is likely to be in the area of handling SERVFAIL, REFUSED and successful replies.

[Other Info]

Original bug description follows.

Seen with dnsmasq 2.75-1ubuntu0.16.04.3, after Trusty->Xenial update.

In my local network, I have two DNS servers; 192.168.1.1 is the local DHCP/DNS server configured to reply to queries inside the local network, and 192.168.1.4 is the forwarder in my DSL Router, responsible to answer queries about the outside world. THe DHCP server returns these in the order 192.168.1.4,192.168.1.1. The internal server replies REFUSED to queries about external domains.

This configuration has worked well with Ubuntu 14.04 and other Linux Distros (using Fedora and OpenSUSE internally here), as well as various other OSes.

It does not work with Ubuntu 16.04. NetworkManager's dnsmasq instance pushes the REFUSED reply from 192.168.1.1 to applications and ignores the successful reply from 2.168.1.4. This causes all DNS queries to external servers to fail.

I believe this is fixed in dnsmasq 2.76 and related to

http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2016q1/010263.html

http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=68f6312d4bae30b78daafcd6f51dc441b8685b1e
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=4ace25c5d6

According to these sources, the bug was introduced with
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=object;h=51967f9807665dae403f1497b827165c5fa1084b

In my local setup at least, I can work around the problem by using the "strict-order" option to dnsmasq.

echo strict-order >/etc/NetworkManager/dnsmasq.d/order.conf

But that's not a general solution. If dnsmasq has several forwarders, and some return SERVFAIL or REFUSED and others return SUCCESS, the successful answer should be returned to clients, independent of the strict-order setting.

Hi Martin,
thanks for your great pre-analysis.
I agree to the general issue and the changes are small enough to review although the context it implies is a lot.

Also we need to be clear that this has two stages of not-correct:
2.76: 4ace25c5d6: Treat REFUSED (not SERVFAIL) as an unsuccessful upstream response
2.77: 68f6312d4b: Stop treating SERVFAIL as a successful response from upstream servers.

So:
- Xenial is lacking both
- Zesty is lacking the second
- Artful is good

Since I rarely patch dsnsmasq I wanted to ask for a check before going to an SRU with that.
I provided a ppa at [1] with test builds for Xenial and Zesty.

If you could try those out that would be very kind.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3003

Changed in dnsmasq (Ubuntu):
status: New → Fix Released
Martin Wilck (mwilck) wrote :

I tried the package in my environment, removed the workaround I had for the stock package, it worked.

Thanks!

Thanks for the Check Martin.

I pushed merge proposals with my change so that one can double check for issues/mistakes.
Once they get approved this will enter the SRU [1] process.

If you could add an SRU Template [2] (to the head of the bug description) in the time my MPs are in review that would be a great help for me ( I can add regression risk if you are unsure, but especially "how to reproduce" is often best by the reporter).

[1]: https://wiki.ubuntu.com/StableReleaseUpdates
[2]: https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

Changed in dnsmasq (Ubuntu Xenial):
status: New → In Progress
Changed in dnsmasq (Ubuntu Zesty):
status: New → In Progress
Martin Wilck (mwilck) on 2017-10-27
description: updated
Robie Basak (racb) on 2017-10-27
description: updated

Ok, thanks for the SRU template update Martin.
Thanks for the MP review Robie.

Yes Robie, on proposed verification we need to do a few extra things to be sure. Thanks for the note in the SRU Template.

Thereby uploaded and ready for review by the SRU Team now.

Hello Martin, or anyone else affected,

Accepted dnsmasq into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dnsmasq/2.76-5ubuntu0.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in dnsmasq (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-zesty

@Martin - you were so active (thanks a lot) could you also verify that in Zesty-proposed?

@SRU-Team where is the Xenial SRU - any reason to block it that I missed?

Martin Wilck (mwilck) wrote :

@Christian: I can't do Zesty short term, I don't use it.

Download full text (7.0 KiB)

Ok testing zesty on my own then, verified with three KVM guests:
dns1 192.168.122.79
dns2 192.168.122.225
zesty 192.168.122.220

# basic servers
$ sudo apt-get install bind9 bind9utils bind9-doc

/etc/bind/named.conf.local:
zone "paelzertest1.lan" {
        type master;
        file "/etc/bind/for.paelzertest1.lan";
 };
zone "1.168.192.in-addr.arpa" {
        type master;
        file "/etc/bind/rev.paelzertest1.lan";
 };

The other one the same but with a 2 instead of a 1

Also the forwar/reverse zones with 1 on dns1 and 2 on dns2
/etc/bind/for.paelzertest2.lan:
$TTL 86400
@ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. (
        2011071001 ;Serial
        3600 ;Refresh
        1800 ;Retry
        604800 ;Expire
        86400 ;Minimum TTL
)
@ IN NS pri.paelzertest1.lan.
@ IN A 192.168.1.200
@ IN A 192.168.1.201
pri IN A 192.168.1.200
test IN A 192.168.1.200

/etc/bind/rev.paelzertest1.lan:
$TTL 86400
@ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. (
        2011071002 ;Serial
        3600 ;Refresh
        1800 ;Retry
        604800 ;Expire
        86400 ;Minimum TTL
)
@ IN NS pri.paelzertest1.lan.
@ IN PTR paelzertest1.lan.
pri IN A 192.168.1.200
test IN A 192.168.1.201
200 IN PTR pri.paelzertest1.lan.
201 IN PTR test.paelzertest1.lan.

Disable recursion by adding the following to /etc/bind/named.conf.options:
allow-transfer {"none";};
allow-recursion {"none";};
recursion no;

$ sudo systemctl restart bind9

This is now having dns1 only answering for test.paelzertest1.lan and refusing if asking dns2 for it (and vice versa)

Example:
$ dig test.paelzertest1.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62119
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest1.lan. IN A

;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
;; WHEN: Tue Nov 07 07:14:52 UTC 2017
;; MSG SIZE rcvd: 50

ubuntu@zesty-dnsmasq-test:~$ dig test.paelzertest2.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37335
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest2.lan. IN A

;; ANSWER SECTION:
test.paelzertest2.lan. 86400 IN A 192.168.2.201

;; AUTHORITY SECTION:
paelzertest2.lan. 86400 IN NS pri.paelzertest2.lan.

;; ADDITIONAL SECTION:
pri.paelzertest2.lan. 86400 IN A 192.168.2.200

;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
...

Read more...

tags: added: verification-done verification-done-zesty
removed: verification-needed verification-needed-zesty
Andy Whitcroft (apw) wrote :

Hello Martin, or anyone else affected,

Accepted dnsmasq into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dnsmasq/2.75-1ubuntu0.16.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in dnsmasq (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
removed: verification-done

dns1: 192.168.122.225
dns1: 192.168.122.226
xenial: 192.168.122.222

Setting up as described in c #8

As before things depend on the order, so asking the second mostly fails while for the first it works mostly. One can use a loop like:

i=0; until dig test.paelzertest1.lan @127.0.0.1 | grep REFUSED; do echo $((i++)); done

For the name served by 225 this is mostly at ~40 but for 226 more like ~0-6.
Anyway they all fail without the fix at some point.

After installing the fix from proposed I was able to do thousands of requests without any issue.
The SERVFAIL case works as well.

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dnsmasq - 2.76-5ubuntu0.2

---------------
dnsmasq (2.76-5ubuntu0.2) zesty; urgency=medium

  * Fix replying prematurely if one of many servers replies REFUSED
    (LP: #1726017) by adding an upstream patche.
    - 2.77: 68f6312d4b: Stop treating SERVFAIL as a successful response from
      upstream servers.

 -- Christian Ehrhardt <email address hidden> Mon, 23 Oct 2017 08:48:44 +0200

Changed in dnsmasq (Ubuntu Zesty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for dnsmasq has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dnsmasq - 2.75-1ubuntu0.16.04.4

---------------
dnsmasq (2.75-1ubuntu0.16.04.4) xenial; urgency=medium

  * Fix replying prematurely if one of many servers replies REFUSED
    (LP: #1726017) by adding two upstream patches.
    - 2.76: 4ace25c5d6: Treat REFUSED (not SERVFAIL) as an unsuccessful
      upstream response
    - 2.77: 68f6312d4b: Stop treating SERVFAIL as a successful response from
      upstream servers.

 -- Christian Ehrhardt <email address hidden> Mon, 23 Oct 2017 08:32:22 +0200

Changed in dnsmasq (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers