DNS resolver mixes IPv6 and IPv4 caches

Bug #1716976 reported by gpothier on 2017-09-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Undecided
Unassigned

Bug Description

In our network we have a DNS server that resolves some names to local addresses, while the same names are resolved to our public IP when public DNSs are used. For instance (using fictitious names and IPs), xyz.mydomain.com resolves to the public IP 65.254.242.180 when using an external DNS server, but resolves to 192.168.0.14 when using our internal DNS server (which all our computers are told to use via DHCP).

This used to work fine until a somewhat recent update in Ubuntu 17.10. Now, xyz.domain.com almost always resolves to the public IP instead of the internal IP. Interestingly, restarting the systemd-resolved service fixes the problem for a while (from a few seconds to a few minutes). Right after restarting the service, the dig command reports the expected internal IP, but after a while it gets back to reporting the public IP. Forcing the dig command to query our DNS server instead of the local resolver returns the correct IP.

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: systemd 234-2ubuntu9
ProcVersionSignature: Ubuntu 4.12.0-13.14-generic 4.12.10
Uname: Linux 4.12.0-13-generic x86_64
ApportVersion: 2.20.7-0ubuntu1
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Sep 13 13:34:50 2017
InstallationDate: Installed on 2015-01-23 (963 days ago)
InstallationMedia: Ubuntu-GNOME 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.2)
MachineType: LENOVO 20266
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.12.0-13-generic.efi.signed root=UUID=eecad38d-4fff-462c-92bc-357fa12e5515 ro quiet splash vt.handoff=7
SourcePackage: systemd
UpgradeStatus: Upgraded to artful on 2017-06-15 (90 days ago)
dmi.bios.date: 03/30/2015
dmi.bios.vendor: LENOVO
dmi.bios.version: 76CN43WW
dmi.board.asset.tag: No Asset Tag
dmi.board.name: Yoga2
dmi.board.vendor: LENOVO
dmi.board.version: 31900058STD
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo Yoga 2 Pro
dmi.modalias: dmi:bvnLENOVO:bvr76CN43WW:bd03/30/2015:svnLENOVO:pn20266:pvrLenovoYoga2Pro:rvnLENOVO:rnYoga2:rvr31900058STD:cvnLENOVO:ct10:cvrLenovoYoga2Pro:
dmi.product.family: IDEAPAD
dmi.product.name: 20266
dmi.product.version: Lenovo Yoga 2 Pro
dmi.sys.vendor: LENOVO

gpothier (gpothier) wrote :
gpothier (gpothier) wrote :

Maybe interesting: systemd-resolve --status eth2 always reports the correct, internal DNS server, even though names are incorrectly resolved to their public IPs (I tried resolving with both dig and systemd-resolve).

gpothier@tadzim3:~$ systemd-resolve --status eth2
Link 3 (eth2)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.0.2
          DNS Domain: ozone.caligrafix.cl

Also, sudo systemd-resolve --flush-caches temporarily solves the problem, in the same way restarting the service does.

Dimitri John Ledkov (xnox) wrote :

The caches should be flushed each time machine changes networking =/ does your system e.g. bounce between "public dns wifi network" and a "internal dns ethernet network"?

A full output of $ systemd-resolve --status -> would be helpful to see. Especially "when everything works correctly" and "when things are broken" to see if there are any differences in the resolved state.

If that information is private, you may change the settings on this bug report to Private, such that it is only shared with Ubuntu developers and is not public.

gpothier (gpothier) wrote :

Output of systemd-resolve --status when the problem occurs

gpothier (gpothier) wrote :

Output of systemd-resolve --status when the problem does not occur

gpothier (gpothier) wrote :

I attached the output of systemd-resolve --status in both cases. There is no difference. In both cases it says the DNS server is 192.168.0.2 (our local resolver), although it seems it is using another, external DNS server after a while.

Indeed the cache seems to be flushed when changing networking (e.g. turning ethernet off and back on through Gnome). Thus resolving works correctly for a while after changing networking. But after a few dozen seconds, it starts failing (ie. returning our public IP, as if it was using an external DNS server) again.

gpothier (gpothier) wrote :

It looks like this has been fixed, it is not occurring anymore.

gpothier (gpothier) wrote :

Sorry, sorry, it does still happen.

gpothier (gpothier) wrote :

This is still happening with 17.10 final. I have been digging a bit and found something that makes me think that this is a caching / IPv6 issue. Attached is the screenshot of a Wireshark capture of the DNS packets on all interfaces on the affected machine (the IP address of the machine is 192.168.0.154).

When querying a hostname that should be resolved to a local network address (in this case odoo.caligrafix.cl), the resolver makes two requests to our local DNS server 192.168.0.2 (and not to any external DNS server, as I first thought):
1. The request for odoo.caligrafix.cl
2. A request for o3.caligrafix.cl.

The second request is made before receiving the response to the first request. This second request can be explained by the fact that outside of our network, the name odoo.caligrafix.cl resolves to a CNAME o3.caligrafix.cl, and for some reason the resolver uses this cached information instead of waiting for the result of the first request.

The response to the first request, which correctly indicates the expected local network address, seems to be discarded, and the result of subsequent requests that resolves to our public address trough a chain of CNAMES, is used instead.

The funny thing is that after flushing the resolver's cache, the resolver also makes two requests to our local DNS server, but both with the name odoo.caligrafix.cl, and gets the correct answer. But then it makes a request for the AAAA (IPv6) record, and gets the chain the CNAME records that lead to our public IP. So it seems that somehow the IPv6 and IPv4 caches get mixed up afterwards.

Although I guess I could (and will attempt to) mitigate the issue by configuring the AAAA record differently on our DNS server, I think the current behavior of the resolver is incorrect, as it uses cached info for an IPv6 record when querying an IPv4 record.

gpothier (gpothier) wrote :
summary: - DNS resolver silently switches to an unknown DNS server
+ DNS resolver mixes IPv6 and IPv4 caches
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers