Duplicate/retried DNS queries fail with REFUSED (Fixed in upstream)

Bug #1981794 reported by Reuben Lifshay
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Fix Released
Undecided
Lena Voytek
Jammy
Fix Released
Undecided
Lena Voytek
Kinetic
Fix Released
Undecided
Lena Voytek

Bug Description

[Impact]

When a DNS query fails to complete and the system retries it, subsequent copies of the query will be refused by dnsmasq. The client will automatically receive the REFUSED return value without a retry attempt.

Adding this fix will stop dnsmasq from unnecessarily breaking connections, especially for situations where an internet connection is flaky.

This bug is fixed by patching in an upstream commit - https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=2561f9fe0eb9c0be - which allows retried DNS requests through rather than refusing them.

[Test Plan]

The fix can be tested using 2 lxd containers, 1 for running the fix, and 1 for acting as a dns server.

Start by setting up the dns server container:

# lxc launch images:ubuntu/jammy dns-resolver
# lxc exec dns-resolver bash

# apt update && apt dist-upgrade -y
# systemctl disable systemd-resolved
# systemctl stop systemd-resolved
# unlink /etc/resolv.conf
# echo nameserver 8.8.8.8 | tee /etc/resolv.conf
# apt install net-tools dnsmasq -y
# systemctl enable dnsmasq

Get the container's ip on lxd's network, in this case ifconfig is used, showing 10.62.42.157:

# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.62.42.157 netmask 255.255.255.0 broadcast 10.62.42.255
        ...

Now set up the test container:

# lxc launch images:ubuntu/jammy test-dnsmasq
# lxc exec test-dnsmasq bash

# apt update && apt dist-upgrade -y
# systemctl disable systemd-resolved
# systemctl stop systemd-resolved
# unlink /etc/resolv.conf

Use other container's ip here, along with an ip that does not resolve

# echo "nameserver 10.62.42.157
nameserver 192.0.2.1" | tee /etc/resolv.conf

# apt install dnsmasq -y
# systemctl enable dnsmasq

On the dns server side, set the nameserver to 127.0.0.1 to cause denials on the test server:

# echo nameserver 127.0.0.1 | tee /etc/resolv.conf; systemctl restart dnsmasq

Now ping a known domain on the test container, and while it runs set the dns server side nameserver back to 8.8.8.8:

# ping ubuntu.com

> swap containers

# echo nameserver 8.8.8.8 | tee /etc/resolv.conf; systemctl restart dnsmasq

Ping will continue to not pick up the domain and fails with:

ping: ubuntu.com: Temporary failure in name resolution

With the fix, ping should now pick up the new successful responses:

PING ubuntu.com (185.125.190.29) 56(84) bytes of data.
64 bytes from website-content-cache-3.ps5.canonical.com (185.125.190.29): icmp_seq=1 ttl=48 time=165 ms
64 bytes from website-content-cache-3.ps5.canonical.com (185.125.190.29): icmp_seq=2 ttl=48 time=162 ms
64 bytes from website-content-cache-3.ps5.canonical.com (185.125.190.29): icmp_seq=3 ttl=48 time=166 ms
64 bytes from website-content-cache-3.ps5.canonical.com (185.125.190.29): icmp_seq=4 ttl=48 time=164 ms
64 bytes from website-content-cache-3.ps5.canonical.com (185.125.190.29): icmp_seq=5 ttl=48 time=163 ms
64 bytes from website-content-cache-3.ps5.canonical.com (185.125.190.29): icmp_seq=6 ttl=48 time=163 ms
^C
--- ubuntu.com ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5005ms

Note: ping is used here instead of another dns tester such as dig because it continues its dns resolution attempts with retry packets even after receiving a REFUSED error. Other programs fail immediately on REFUSED and are unable to send duplicate packets to reproduce the issue.

[Where problems could occur]

This change was added upstream in version 2.87, which means it has not been tested in many situations alongside 2.86. Allowing the retries could lead to a flood of requests to remote DNS servers if the replies are unable to make it back through dnsmasq to the user.

[Other Info]

This bug was fixed in Kinetic in version 2.86-1.1ubuntu2.

[Original Description]

Duplicate or retried DNS queries will return REFUSED for one of the queries causing intermittent failures in clients.

This probably breaks lots of things, but for me is causing 22.04's internet connection sharing to be unstable. It's particularly bad for my Xbox which seems to like sending duplicate queries.

Here's an example capture:
22:37:25.308212 IP 10.42.0.16.54248 > 10.42.0.1.53: 22442+ A? title.auth.xboxlive.com. (41)
22:37:25.332711 IP 10.42.0.16.54248 > 10.42.0.1.53: 22442+ A? title.auth.xboxlive.com. (41)
22:37:25.332740 IP 10.42.0.1.53 > 10.42.0.16.54248: 22442 Refused 0/0/0 (41)
22:37:25.353003 IP 10.42.0.1.53 > 10.42.0.16.54248: 22442 2/0/0 CNAME title.auth.xboxlive.com.akadns.net., A 40.64.90.82 (105)

This has been fixed in upstream as of Sept 2021 in the unreleased 2.87 version. It's apparently a regression in version 2.86 (also released in Sept 2021). Ubuntu 22.04 and later all use the broken 2.86 version.

Upstream fix:
https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=2561f9fe0eb9c0be1df48da1e2bd3d3feaa138c2

Upstream bug thread:
https://www.mail-archive.com/search?l=dnsmasq-discuss%40lists.thekelleys.org.uk&q=subject:%22%5C%5BDnsmasq%5C-discuss%5C%5D+REFUSED+after+dropped+packets%22&o=oldest&f=1

Related branches

Revision history for this message
Lena Voytek (lvoytek) wrote :

Hello,

Sorry we missed this bug for so long. I can confirm the issue is present in 22.04 and 22.10 and can be fixed with the commit you provided. I created a patch file that can be added to 22.04, attached here. I will mark this bug as confirmed too.

Thanks!

Changed in dnsmasq (Ubuntu Jammy):
status: New → Confirmed
Changed in dnsmasq (Ubuntu Kinetic):
status: New → Confirmed
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "fix-dns-retry-confusion-jammy.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Changed in dnsmasq (Ubuntu Kinetic):
assignee: nobody → Lena Voytek (lvoytek)
Changed in dnsmasq (Ubuntu Jammy):
assignee: nobody → Lena Voytek (lvoytek)
tags: added: server-todo
Revision history for this message
Reuben Lifshay (computator) wrote :

Awesome, thank you! Looking forward to it being released

Revision history for this message
Lena Voytek (lvoytek) wrote :

Hello Reuben,

I've started working on getting this fix into Ubuntu. In the meantime, if you would like this fix in 22.04 you can get it from a ppa I made here: https://launchpad.net/~lvoytek/+archive/ubuntu/dnsmasq-fix-denied-dns-retries

To install run the following commands:

sudo add-apt-repository ppa:lvoytek/dnsmasq-fix-denied-dns-retries
sudo apt update

thanks!

Changed in dnsmasq (Ubuntu Kinetic):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dnsmasq - 2.86-1.1ubuntu2

---------------
dnsmasq (2.86-1.1ubuntu2) kinetic; urgency=medium

  * src/forward.c: Do not refuse retries from client DNS queries. Behaviour to
    stop infinite loops when all servers return REFUSED was wrongly activated
    on client retries, resulting in incorrect REFUSED replies to client
    retries. The code added here is a cherry pick released in upstream version
    2.87, originating at
    https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=2561f9fe0eb9c0be
    (LP: #1981794)

 -- Lena Voytek <email address hidden> Fri, 30 Sep 2022 08:42:39 -0700

Changed in dnsmasq (Ubuntu Kinetic):
status: In Progress → Fix Released
Lena Voytek (lvoytek)
description: updated
Changed in dnsmasq (Ubuntu Jammy):
status: Confirmed → In Progress
Lena Voytek (lvoytek)
description: updated
Lena Voytek (lvoytek)
description: updated
Lena Voytek (lvoytek)
description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Reuben, or anyone else affected,

Accepted dnsmasq into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dnsmasq/2.86-1.1ubuntu0.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in dnsmasq (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (dnsmasq/2.86-1.1ubuntu0.2)

All autopkgtests for the newly accepted dnsmasq (2.86-1.1ubuntu0.2) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

netplan.io/0.105-0ubuntu2~22.04.1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#dnsmasq

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Reuben Lifshay (computator) wrote (last edit ):
Download full text (5.0 KiB)

I have tested version 2.86-1.1ubuntu0.2 and that seems to have fixed it.

The bug only appears when dnsmasq is forwarding responses in --strict-order mode and there's a duplicate request packet received before dnsmasq has replied to the original request packet. This is a procedure I came up with to reproduce and test that specific circumstance:

I am using a network namespace to simplify networking, avoid port conflicts, and maintain tcpdump's protocol decoding. This test process requires running and interacting with multiple processes at once, and so it's easiest to just open multiple terminal windows. I create the net namespace with the first terminal, and then join the created namespace with additional terminals.

To create the NS in the first terminal:

$ sudo unshare --net
# echo $$
> 12123
# ip addr add 127.0.0.1/8 dev lo
# ip link set up dev lo
#

The `echo $$` gives the shell PID (12123 in this example), which is also the namespace PID. This can optionally be verified with lsns:

# lsns -t net
> NS TYPE NPROCS PID USER NETNSID NSFS COMMAND
> 1234567890 net 2 12123 root unassigned -bash

To join the NS from subsequent terminals:

$ sudo nsenter -n -t 12123
#

Test process:

The first process to run is a fake upstream server for dnsmasq to forward responses from. We will run this on an alternate port (5353) and have it respond to requests for a test domain.

# dnsmasq -d -p 5353 --no-resolv -A /example.com/192.0.2.1

The second process is the main dnsmasq process we want to actually test. It uses --strict-order and uses our fake upstream to answer all requests. We will also use systemd-run to limit dnsmasq's cpu resources to slow it down enough to more easily queue up duplicate requests.

# systemd-run --scope -p CPUQuota=1% -- dnsmasq -d --strict-order --no-resolv -S 127.0.0.1#5353

The third process is just tcpdump to monitor requests to and responses from dnsmasq. I have piped the output through grep to make it easier to analyze the results for refused packets, but you can also leave that off and search the output manually.

# tcpdump -ln udp port 53 | grep --line-buffered -iC 5 refused

The final process is using `scapy` to generate dns packets with a duplicated request id. Scapy can be installed from the default repositories as `python3-scapy`. Scapy has to be switched to raw sockets mode to be able to work over the loopback interface before we can start sending dns packets.

# scapy
>>> conf.L3socket = L3RawSocket

Now we can send some dns packets with scapy. All the packets should have the same request id in them, so we set it to a static number (we picked 47 in this example). We have also set it to send 50 packets, which is usually enough to get a few packets queued before dnsmasq can process the previous ones.

>>> send(IP(dst='127.0.0.1')/UDP()/DNS(id=47,qd=DNSQR(qname='example.com')),count=50)

You can also choose to send packets continuously until cancelled with Ctrl-C.

>>> send(IP(dst='127.0.0.1')/UDP()/DNS(id=47,qd=DNSQR(qname='example.com')),loop=True)

We can now check tcpdump for refused packets, which should show up anytime after there are two or more requests in a row with...

Read more...

tags: added: verification-done-jammy
removed: verification-needed-jammy
Lena Voytek (lvoytek)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dnsmasq - 2.86-1.1ubuntu0.2

---------------
dnsmasq (2.86-1.1ubuntu0.2) jammy; urgency=medium

  * src/forward.c: Do not refuse retries from client DNS queries. Behaviour to
    stop infinite loops when all servers return REFUSED was wrongly activated
    on client retries, resulting in incorrect REFUSED replies to client
    retries. The code added here is a cherry pick released in upstream version
    2.87, originating at
    https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=2561f9fe0eb9c0be
    (LP: #1981794)

 -- Lena Voytek <email address hidden> Fri, 14 Oct 2022 14:39:41 -0700

Changed in dnsmasq (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for dnsmasq has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.