[SRU] systemd-resolved negative caching for extended period of time

Bug #1668771 reported by jowfdoijdfdwfwdf on 2017-02-28
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
systemd
New
Unknown
systemd (Ubuntu)
Status tracked in Eoan
Bionic
High
Jorge Niedbalski
Disco
High
Jorge Niedbalski
Eoan
High
Jorge Niedbalski

Bug Description

[Impact]

 * If a DNS lookup returns SERVFAIL, systemd-resolved seems to cache the result for very long (infinity?). I have to restart systemd-resolved to have the negative caching purged.

* After SERVFAIL DNS server issue has been resolved, chromium/firefox still returns DNS error despite host can correctly resolve the name.

[Test Case]

* If a lookup returns SERVFAIL systemd-resolved will cache the result for 30s (See 201d995),
however, there are several use cases on which this condition is not acceptable (See #5552 comments)
and the only workaround would be to disable cache entirely or flush it , which isn't optimal.

* Configure /etc/systemd/resolved.conf as follows:

Cache=yes (default)

* Restart systemd-resolved (systemctl restart systemd-resolved.service)

* Run a host/getent command against a entry that will return SERVFAIL and check the journalctl output to see that the reply gets served from cache.

root@systemd-disco:/home/ubuntu# host www.no-record.cl
Host www.montemar.cl not found: 2(SERVFAIL)
root@systemd-disco:/home/ubuntu# journalctl -u systemd-resolved -n
-- Logs begin at Fri 2019-07-12 18:09:42 UTC, end at Tue 2019-07-23 15:10:17 UTC. --
Jul 23 15:10:10 systemd-disco systemd-resolved[1282]: Transaction 6222 for <ntp.ubuntu.com IN AAAA> on scope dns on ens3/* now complete with <success>
Jul 23 15:10:10 systemd-disco systemd-resolved[1282]: Sending response packet with id 61042 on interface 1/AF_INET.
Jul 23 15:10:10 systemd-disco systemd-resolved[1282]: Freeing transaction 6222.
Jul 23 15:10:17 systemd-disco systemd-resolved[1282]: Got DNS stub UDP query packet for id 53580
Jul 23 15:10:17 systemd-disco systemd-resolved[1282]: Looking up RR for www.no-record.cl IN A.
Jul 23 15:10:17 systemd-disco systemd-resolved[1282]: RCODE SERVFAIL cache hit for www.no-record.cl IN A
Jul 23 15:10:17 systemd-disco systemd-resolved[1282]: Transaction 58570 for < www.no-record.cl IN A> on scope dns on ens3/* now complete with <rcode-fai
Jul 23 15:10:17 systemd-disco systemd-resolved[1282]: Freeing transaction 58570.
Jul 23 15:10:17 systemd-disco systemd-resolved[1282]: Sending response packet with id 53580 on interface 1/AF_INET.
Jul 23 15:10:17 systemd-disco systemd-resolved[1282]: Processing query...

[Regression Potential]

 * The default options (Yes/No) will remain as default Yes, behaving in the same original
way, by setting it to no-negative any negative answer will be skipped
from being cached.

* No regression potential has been detected as this just introduces
a new possible option for the Cache configuration directive.

[Fix]

With the cache option set to 'no-negative', negative DNS answers
are entirely avoided to being cached.

root@systemd-disco:/home/ubuntu# host www.metaklass.org
Host www.metaklass.org not found: 2(SERVFAIL)

* Look at the systemd-resolved entries
root@systemd-disco:/home/ubuntu# journalctl -u systemd-resolved -n
-- Logs begin at Fri 2019-07-12 18:09:42 UTC, end at Fri 2019-07-12 18:48:31 UTC. --
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Cache miss for www.metaklass.org IN A
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Transaction 22382 for <www.metaklass.org IN A> scope dns on ens3/.
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Using feature level UDP for transaction 22382.
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Sending query packet with id 22382.
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Processing incoming packet on transaction 22382 (rcode=SERVFAIL).
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Server returned error: SERVFAIL
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Not caching negative entry for: www.metaklass.org IN A, cache mode set to no-negative
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Transaction 22382 for <www.metaklass.org IN A> on scope dns on ens3/ now complete with from network (unsigned).
Jul 12 18:48:31 systemd-disco systemd-resolved[2635]: Sending response packet with id 31060 on interface 1/AF_INET.

The following patch https://github.com/systemd/systemd/pull/13047 implements the required changes.

[Other Info]

Note that systemd in Eoan is being upgraded to upstream 242, so I am not adding this to Eoan now, as I don't want to disturb the merge. If needed after the merge, I'll add to Eoan.

Related branches

Dimitri John Ledkov (xnox) wrote :

I believe this should be filed upstream instead.

Changed in systemd:
status: Unknown → New
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
James Hebden (ec0) wrote :

This is especially upsetting in an OpenStack and server environment where external DNS is being used to reach/resolve API endpoints and other systems such as database servers. A small outage to DNS becomes a potentially unbounded outage when the SERVFAIL responses are cached indefinitely, requiring manual intervention on each host in addition to fixing the cause of the SERVFAIL.

tags: added: canonical-bootstack
Drew Freiberger (afreiberger) wrote :

This affects bionic openstack cloud environments when os-*-hostname is configured for keystone, and the keystone entry is deleted temporarily from upstream dns, or the upstream dns fails providing no record for the lookup of keystone.endpoint.domain.com.

We have to then flush all caches across the cloud once DNS issue is resolved, rather than auto-healing at 60 seconds as if we were running nscd with negative-ttl set to 60 seconds.

Ultimately, a negative TTL that is settable would be ideal, or the ability to not cache negative hits would also be useful. Only workaround now is to not use caches or to operationally flush caches as needed.

Changed in systemd (Ubuntu):
assignee: nobody → Jorge Niedbalski (niedbalski)
Jorge Niedbalski (niedbalski) wrote :

I've made a proposal to change the resolved.conf Cache option to a tri-state "no, no-negative, yes" values. [0]

If a lookup returns SERVFAIL systemd-resolved will cache the result for 30s (See 201d995),
however, there are several use cases on which this condition is not acceptable (See #5552 comments)
and the only workaround would be to disable cache entirely or flush it , which isn't optimal.

This change adds the 'no-negative' option when set it avoids putting in cache
negative answers but still works the same heuristics for positive answers.

[0] https://github.com/systemd/systemd/pull/13047

Changed in systemd (Ubuntu):
importance: Undecided → High
status: Confirmed → In Progress
Jorge Niedbalski (niedbalski) wrote :

The proposal to extend the 'cache' option with 'no-negative' has been merged upstream. I will proceed with the backports to Ubuntu on the affected LTS releases.

[0] https://github.com/systemd/systemd/pull/13047

Changed in systemd (Ubuntu Disco):
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in systemd (Ubuntu Bionic):
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in systemd (Ubuntu Xenial):
assignee: nobody → Jorge Niedbalski (niedbalski)
status: New → In Progress
Changed in systemd (Ubuntu Bionic):
status: New → In Progress
Changed in systemd (Ubuntu Disco):
status: New → In Progress
Changed in systemd (Ubuntu Disco):
importance: Undecided → High
Changed in systemd (Ubuntu Bionic):
importance: Undecided → High
Changed in systemd (Ubuntu Xenial):
importance: Undecided → High
Jorge Niedbalski (niedbalski) wrote :

The attachment "lp1668771-eoan.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
tags: added: sts sts-sru-needed
description: updated
Jorge Niedbalski (niedbalski) wrote :
summary: - systemd-resolved negative caching for extended period of time
+ [SRU] systemd-resolved negative caching for extended period of time
Dan Streetman (ddstreet) on 2019-07-23
tags: added: sts-sponsor sts-sponsor-ddstreet
Jorge Niedbalski (niedbalski) wrote :
Dan Streetman (ddstreet) on 2019-07-23
description: updated

Hello jowfdoijdfdwfwdf, or anyone else affected,

Accepted systemd into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/240-6ubuntu5.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
Łukasz Zemczak (sil2100) wrote :

I have accepted this to disco-proposed conditionally without the fix landing in eoan yet as per the ongoing 242 systemd merge, mostly because the proposed changes are already merged and accepted upstream. But to make sure eoan is not left without this fix, please be sure to push the changes to eoan-proposed as soon as the systemd merge is done, uploaded and migrated.

If possible, I'd prefer not releasing these out of -proposed without the eoan counterparts at least present in eoan-proposed. Thank you!

Łukasz Zemczak (sil2100) wrote :

Hello jowfdoijdfdwfwdf, or anyone else affected,

Accepted systemd into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.25 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
no longer affects: systemd (Ubuntu Xenial)
Jorge Niedbalski (niedbalski) wrote :

Without the patch for LP: #1668771 resolved caches the SERVFAIL answer.

root@bionic:/home/multipass# host www.montemar.cl
Host www.montemar.cl not found: 2(SERVFAIL)
root@bionic:/home/multipass# journalctl -u systemd-resolved -f
-- Logs begin at Fri 2019-07-26 13:10:01 -04. --
Jul 26 13:13:12 bionic systemd-resolved[3167]: Transaction 25942 for <www.montemar.cl IN A> scope dns on ens3/*.
Jul 26 13:13:12 bionic systemd-resolved[3167]: Using feature level UDP for transaction 25942.
Jul 26 13:13:12 bionic systemd-resolved[3167]: Sending query packet with id 25942.
Jul 26 13:13:12 bionic systemd-resolved[3167]: Processing incoming packet on transaction 25942. (rcode=SERVFAIL)
Jul 26 13:13:12 bionic systemd-resolved[3167]: Server returned error: SERVFAIL
Jul 26 13:13:12 bionic systemd-resolved[3167]: Verified we get a response at feature level UDP from DNS server 10.91.4.1.
Jul 26 13:13:12 bionic systemd-resolved[3167]: Added SERVFAIL cache entry for www.montemar.cl IN A 30s
Jul 26 13:13:12 bionic systemd-resolved[3167]: Transaction 25942 for <www.montemar.cl IN A> on scope dns on ens3/* now complete with <rcode-failure> from network (unsigned).
Jul 26 13:13:12 bionic systemd-resolved[3167]: Sending response packet with id 26821 on interface 1/AF_INET.
Jul 26 13:13:12 bionic systemd-resolved[3167]: Freeing transaction 25942.

With the patch for LP: #1668771 resolved + setting Cache: 'no-negative' doesn't caches the SERVFAIL option

root@bionic:/home/multipass# host www.montemar.cl
Host www.montemar.cl not found: 2(SERVFAIL)
root@bionic:/home/multipass# journalctl -u systemd-resolved -f
-- Logs begin at Fri 2019-07-26 13:10:01 -04. --
Jul 26 13:13:40 bionic systemd-resolved[3199]: Transaction 48671 for <www.montemar.cl IN A> scope dns on ens3/*.
Jul 26 13:13:40 bionic systemd-resolved[3199]: Using feature level UDP for transaction 48671.
Jul 26 13:13:40 bionic systemd-resolved[3199]: Sending query packet with id 48671.
Jul 26 13:13:40 bionic systemd-resolved[3199]: Processing incoming packet on transaction 48671. (rcode=SERVFAIL)
Jul 26 13:13:40 bionic systemd-resolved[3199]: Server returned error: SERVFAIL
Jul 26 13:13:40 bionic systemd-resolved[3199]: Verified we get a response at feature level UDP from DNS server 10.91.4.1.
Jul 26 13:13:40 bionic systemd-resolved[3199]: Not caching negative entry for: www.montemar.cl IN A, cache mode set to no-negative
Jul 26 13:13:40 bionic systemd-resolved[3199]: Transaction 48671 for <www.montemar.cl IN A> on scope dns on ens3/* now complete with <rcode-failure> from network (unsigned).
Jul 26 13:13:40 bionic systemd-resolved[3199]: Sending response packet with id 25454 on interface 1/AF_INET.
Jul 26 13:13:40 bionic systemd-resolved[3199]: Freeing transaction 48671.

Marking this as verified and working.

Jorge Niedbalski (niedbalski) wrote :

Without the patch for LP: #1668771 resolved caches the SERVFAIL answer on disco.

oot@disco:/home/multipass# host www.montemar.cl
Host www.montemar.cl not found: 2(SERVFAIL)
(reverse-i-search)`': ^C
root@disco:/home/multipass# journalctl -u systemd-resolved -f
-- Logs begin at Fri 2019-07-26 13:15:35 -04. --
Jul 26 13:17:56 disco systemd-resolved[3872]: Transaction 43091 for <www.montemar.cl IN A> scope dns on ens3/*.
Jul 26 13:17:56 disco systemd-resolved[3872]: Using feature level UDP for transaction 43091.
Jul 26 13:17:56 disco systemd-resolved[3872]: Sending query packet with id 43091.
Jul 26 13:17:56 disco systemd-resolved[3872]: Processing incoming packet on transaction 43091 (rcode=SERVFAIL).
Jul 26 13:17:56 disco systemd-resolved[3872]: Server returned error: SERVFAIL
Jul 26 13:17:56 disco systemd-resolved[3872]: Verified we get a response at feature level UDP from DNS server 10.91.4.1.
Jul 26 13:17:56 disco systemd-resolved[3872]: Added SERVFAIL cache entry for www.montemar.cl IN A 30s
Jul 26 13:17:56 disco systemd-resolved[3872]: Transaction 43091 for <www.montemar.cl IN A> on scope dns on ens3/* now complete with <rcode-failure> from network (unsigned).
Jul 26 13:17:56 disco systemd-resolved[3872]: Sending response packet with id 40433 on interface 1/AF_INET.
Jul 26 13:17:56 disco systemd-resolved[3872]: Freeing transaction 43091.

With the patch for LP: #1668771 resolved + setting Cache: 'no-negative' doesn't caches the SERVFAIL option on disco.
Logs begin at Fri 2019-07-26 13:15:35 -04. --
Jul 26 13:18:21 disco systemd-resolved[3893]: Transaction 22380 for <www.montemar.cl IN A> scope dns on ens3/*.
Jul 26 13:18:21 disco systemd-resolved[3893]: Using feature level UDP for transaction 22380.
Jul 26 13:18:21 disco systemd-resolved[3893]: Sending query packet with id 22380.
Jul 26 13:18:21 disco systemd-resolved[3893]: Processing incoming packet on transaction 22380 (rcode=SERVFAIL).
Jul 26 13:18:21 disco systemd-resolved[3893]: Server returned error: SERVFAIL
Jul 26 13:18:21 disco systemd-resolved[3893]: Verified we get a response at feature level UDP from DNS server 10.91.4.1.
Jul 26 13:18:21 disco systemd-resolved[3893]: Not caching negative entry for: www.montemar.cl IN A, cache mode set to no-negative
Jul 26 13:18:21 disco systemd-resolved[3893]: Transaction 22380 for <www.montemar.cl IN A> on scope dns on ens3/* now complete with <rcode-failure> from network (unsigned).
Jul 26 13:18:21 disco systemd-resolved[3893]: Sending response packet with id 30418 on interface 1/AF_INET.
Jul 26 13:18:21 disco systemd-resolved[3893]: Freeing transaction 22380.

tags: added: verification-done verification-done-bionic verification-done-disco
removed: verification-needed verification-needed-bionic verification-needed-disco
Dan Streetman (ddstreet) wrote :

autopkgtest analysis for this upload in bug 1835581

Dan Streetman (ddstreet) wrote :

this is included in the systemd in eoan-proposed, https://launchpad.net/ubuntu/+source/systemd/243~rc1-0ubuntu1

Changed in systemd (Ubuntu Eoan):
status: In Progress → Fix Committed
Dan Streetman (ddstreet) on 2019-08-06
tags: removed: sts-sponsor-ddstreet
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 237-3ubuntu10.25

---------------
systemd (237-3ubuntu10.25) bionic; urgency=medium

  [ Dan Streetman ]
  * d/p/lp1835581-src-network-networkd-dhcp4.c-set-prefsrc-for-classle.patch:
    - set src address for dhcp 'classless' routes (LP: #1835581)
  * d/p/lp1833671-networkd-keep-bond-slave-up-if-already-attached.patch:
    - keep bond slave up if already attached (LP: #1833671)

  [ Jorge Niedbalski ]
  * d/p/lp1668771-resolved-switch-cache-option-to-a-tri-state-option-s.patch:
    Allows cache=no-negative option to be set, ignoring negative
    answers to be cached (LP: #1668771).

 -- Dan Streetman <email address hidden> Mon, 22 Jul 2019 12:45:02 -0400

Changed in systemd (Ubuntu Bionic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 240-6ubuntu5.3

---------------
systemd (240-6ubuntu5.3) disco; urgency=medium

  [ Dan Streetman ]
  * d/p/lp1835581-src-network-networkd-dhcp4.c-set-prefsrc-for-classle.patch:
    - Set src address for dhcp 'classless' routes (LP: #1835581)

  [ Jorge Niedbalski ]
  * d/p/lp1668771-resolved-switch-cache-option-to-a-tri-state-option-s.patch:
    Allows cache=no-negative option to be set, ignoring negative
    answers to be cached (LP: #1668771).

 -- Dan Streetman <email address hidden> Mon, 22 Jul 2019 12:45:02 -0400

Changed in systemd (Ubuntu Disco):
status: Fix Committed → Fix Released
Dan Streetman (ddstreet) on 2019-08-21
Changed in systemd (Ubuntu Eoan):
status: Fix Committed → In Progress
Dan Streetman (ddstreet) wrote :

The systemd in eoan-proposed was version 243-rc1, which contained this, but that has been reverted and the current version is back to 240, which doesn't contain this. Discussion in #ubuntu-devel indicates eoan should eventually have at least version 241, so I'm going to wait for that, and then upload this fix if eoan doesn't already contain it.

Changed in systemd (Ubuntu Eoan):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 240-6ubuntu13

---------------
systemd (240-6ubuntu13) eoan; urgency=medium

  * Drop s390x seccomp fix causing regression on s390x
    Files:
    - debian/patches/src-shared-seccomp-util.c-Add-mmap-definitions-for-s390.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=da95e1d022e94a4d3ce0b69bd6eb398c95d09f24

 -- Balint Reczey <email address hidden> Mon, 26 Aug 2019 17:02:54 +0200

Changed in systemd (Ubuntu Eoan):
status: Fix Committed → Fix Released
Eric Desrochers (slashd) wrote :

There is discussion to push systemd 241 to Eoan: https://launchpad.net/bugs/1841790

tags: added: sts-sru-done
removed: sts-sru-needed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.