DNS resolver mixes IPv6 and IPv4 caches

Bug #1716976 reported by gpothier
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

In our network we have a DNS server that resolves some names to local addresses, while the same names are resolved to our public IP when public DNSs are used. For instance (using fictitious names and IPs), xyz.mydomain.com resolves to the public IP 65.254.242.180 when using an external DNS server, but resolves to 192.168.0.14 when using our internal DNS server (which all our computers are told to use via DHCP).

This used to work fine until a somewhat recent update in Ubuntu 17.10. Now, xyz.domain.com almost always resolves to the public IP instead of the internal IP. Interestingly, restarting the systemd-resolved service fixes the problem for a while (from a few seconds to a few minutes). Right after restarting the service, the dig command reports the expected internal IP, but after a while it gets back to reporting the public IP. Forcing the dig command to query our DNS server instead of the local resolver returns the correct IP.

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: systemd 234-2ubuntu9
ProcVersionSignature: Ubuntu 4.12.0-13.14-generic 4.12.10
Uname: Linux 4.12.0-13-generic x86_64
ApportVersion: 2.20.7-0ubuntu1
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Sep 13 13:34:50 2017
InstallationDate: Installed on 2015-01-23 (963 days ago)
InstallationMedia: Ubuntu-GNOME 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.2)
MachineType: LENOVO 20266
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.12.0-13-generic.efi.signed root=UUID=eecad38d-4fff-462c-92bc-357fa12e5515 ro quiet splash vt.handoff=7
SourcePackage: systemd
UpgradeStatus: Upgraded to artful on 2017-06-15 (90 days ago)
dmi.bios.date: 03/30/2015
dmi.bios.vendor: LENOVO
dmi.bios.version: 76CN43WW
dmi.board.asset.tag: No Asset Tag
dmi.board.name: Yoga2
dmi.board.vendor: LENOVO
dmi.board.version: 31900058STD
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo Yoga 2 Pro
dmi.modalias: dmi:bvnLENOVO:bvr76CN43WW:bd03/30/2015:svnLENOVO:pn20266:pvrLenovoYoga2Pro:rvnLENOVO:rnYoga2:rvr31900058STD:cvnLENOVO:ct10:cvrLenovoYoga2Pro:
dmi.product.family: IDEAPAD
dmi.product.name: 20266
dmi.product.version: Lenovo Yoga 2 Pro
dmi.sys.vendor: LENOVO

Revision history for this message
gpothier (gpothier) wrote :
Revision history for this message
gpothier (gpothier) wrote :

Maybe interesting: systemd-resolve --status eth2 always reports the correct, internal DNS server, even though names are incorrectly resolved to their public IPs (I tried resolving with both dig and systemd-resolve).

gpothier@tadzim3:~$ systemd-resolve --status eth2
Link 3 (eth2)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.0.2
          DNS Domain: ozone.caligrafix.cl

Also, sudo systemd-resolve --flush-caches temporarily solves the problem, in the same way restarting the service does.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

The caches should be flushed each time machine changes networking =/ does your system e.g. bounce between "public dns wifi network" and a "internal dns ethernet network"?

A full output of $ systemd-resolve --status -> would be helpful to see. Especially "when everything works correctly" and "when things are broken" to see if there are any differences in the resolved state.

If that information is private, you may change the settings on this bug report to Private, such that it is only shared with Ubuntu developers and is not public.

Revision history for this message
gpothier (gpothier) wrote :

Output of systemd-resolve --status when the problem occurs

Revision history for this message
gpothier (gpothier) wrote :

Output of systemd-resolve --status when the problem does not occur

Revision history for this message
gpothier (gpothier) wrote :

I attached the output of systemd-resolve --status in both cases. There is no difference. In both cases it says the DNS server is 192.168.0.2 (our local resolver), although it seems it is using another, external DNS server after a while.

Indeed the cache seems to be flushed when changing networking (e.g. turning ethernet off and back on through Gnome). Thus resolving works correctly for a while after changing networking. But after a few dozen seconds, it starts failing (ie. returning our public IP, as if it was using an external DNS server) again.

Revision history for this message
gpothier (gpothier) wrote :

It looks like this has been fixed, it is not occurring anymore.

Revision history for this message
gpothier (gpothier) wrote :

Sorry, sorry, it does still happen.

Revision history for this message
gpothier (gpothier) wrote :

This is still happening with 17.10 final. I have been digging a bit and found something that makes me think that this is a caching / IPv6 issue. Attached is the screenshot of a Wireshark capture of the DNS packets on all interfaces on the affected machine (the IP address of the machine is 192.168.0.154).

When querying a hostname that should be resolved to a local network address (in this case odoo.caligrafix.cl), the resolver makes two requests to our local DNS server 192.168.0.2 (and not to any external DNS server, as I first thought):
1. The request for odoo.caligrafix.cl
2. A request for o3.caligrafix.cl.

The second request is made before receiving the response to the first request. This second request can be explained by the fact that outside of our network, the name odoo.caligrafix.cl resolves to a CNAME o3.caligrafix.cl, and for some reason the resolver uses this cached information instead of waiting for the result of the first request.

The response to the first request, which correctly indicates the expected local network address, seems to be discarded, and the result of subsequent requests that resolves to our public address trough a chain of CNAMES, is used instead.

The funny thing is that after flushing the resolver's cache, the resolver also makes two requests to our local DNS server, but both with the name odoo.caligrafix.cl, and gets the correct answer. But then it makes a request for the AAAA (IPv6) record, and gets the chain the CNAME records that lead to our public IP. So it seems that somehow the IPv6 and IPv4 caches get mixed up afterwards.

Although I guess I could (and will attempt to) mitigate the issue by configuring the AAAA record differently on our DNS server, I think the current behavior of the resolver is incorrect, as it uses cached info for an IPv6 record when querying an IPv4 record.

Revision history for this message
gpothier (gpothier) wrote :
summary: - DNS resolver silently switches to an unknown DNS server
+ DNS resolver mixes IPv6 and IPv4 caches
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Abam (abams) wrote :

I have a the same kind of errors. DNS resolution sometimes fails due to cache issues.

Is you bug similar to mine https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1818527 ?

Revision history for this message
gpothier (gpothier) wrote :

I'm currently on 18.10 and this bug seems to have been fixed. Abam, the bug you point to does not currently happen on my system, so maybe both have been fixed in 18.10.

Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
gpothier (gpothier) wrote :

Hi, this is still happening on 20.04.3. The reason I thought it had been fixed is that I had implemented, and then forgot about, the workaround I mentioned at the end of comment #9: I added AAAA records to our local resolver. But if I remove them, the problem is still here.

Revision history for this message
gpothier (gpothier) wrote :

Launchpad does not let me reopen this bug, but it should be reopened

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.