systemd-resolved constantly restarts on Bionic upgraded from Xenial

Bug #1805183 reported by Neil Wilson on 2018-11-26
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Low
Unassigned
Bionic
Medium
Louis Bouchard
Cosmic
Undecided
Unassigned
Disco
Low
Unassigned

Bug Description

[Impact]
Log noise due to needless restart of resolved on lease expiry, maybe loss of cached state?
Application that require Name Resolution may fail while the service is being unnecessarily restarted

[Test case]
(1) Append make_resolv_conf to the end of the file, so it gets executed
(2) Execute the file with bash -x and different settings and ensure there are no restarts if the settings are the same, and that there are if settings change; for example:

sudo new_domain_name_servers=8.8.4.4 interface="wlp61s0" reason=REBIND bash -x debian/extra/dhclient-enter-resolved-hook
sudo new_domain_name_servers=8.8.4.4 interface="wlp61s0" reason=REBIND bash -x debian/extra/dhclient-enter-resolved-hook
=> no restart
sudo new_domain_name_servers=8.8.8.8 interface="wlp61s0" reason=REBIND bash -x debian/extra/dhclient-enter-resolved-hook
=> should restart
sudo new_domain_name_servers=8.8.8.8 interface="wlp61s0" reason=REBIND bash -x debian/extra/dhclient-enter-resolved-hook
=> no restart
sudo new_domain_name_servers=8.8.4.4 interface="wlp61s0" reason=REBIND bash -x debian/extra/dhclient-enter-resolved-hook
=> should restart

[Regression potential]
The change only restarts resolved when the settings change. If there's a bug in the logic, resolved might not be restarted when it should be. Also, since there will be less restarts of resolved, it will run longer, so if there are memory leaks they will become more apparent.

[Original bug report]
If a cloud server is upgraded from Xenial to Bionic, the dhclient system remains in place and any DHCP lease refreshes cause a needless restart of the system-resolved daemon

Nov 26 16:59:41 srv-qvjhx dhclient[825]: DHCPREQUEST of 10.226.209.106 on ens3 to 10.226.209.105 port 67 (xid=0x2bd41d7d)
Nov 26 16:59:41 srv-qvjhx dhclient[825]: DHCPACK of 10.226.209.106 from 10.226.209.105
Nov 26 16:59:41 srv-qvjhx systemd[1]: Stopping Network Name Resolution...
Nov 26 16:59:41 srv-qvjhx systemd[1]: Stopped Network Name Resolution.
Nov 26 16:59:41 srv-qvjhx systemd[1]: Starting Network Name Resolution...
Nov 26 16:59:41 srv-qvjhx systemd-resolved[1609]: Positive Trust Anchors:
Nov 26 16:59:41 srv-qvjhx systemd-resolved[1609]: . IN DS 19036 8 2 49aac11d7b6f6446702e54a1607371607a1a41855200fd2ce1cdde32f24e8fb5
Nov 26 16:59:41 srv-qvjhx systemd-resolved[1609]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Nov 26 16:59:41 srv-qvjhx systemd-resolved[1609]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 1
Nov 26 16:59:41 srv-qvjhx systemd-resolved[1609]: Using system hostname 'srv-qvjhx'.
Nov 26 16:59:41 srv-qvjhx systemd[1]: Started Network Name Resolution.
Nov 26 16:59:41 srv-qvjhx systemd[1]: Starting resolvconf-pull-resolved.service...
Nov 26 16:59:41 srv-qvjhx dhclient[825]: bound to 10.226.209.106 -- renewal in 1466 seconds.
Nov 26 16:59:41 srv-qvjhx systemd[1]: Started resolvconf-pull-resolved.service.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ubuntu-release-upgrader-core 1:16.04.25
ProcVersionSignature: Ubuntu 4.4.0-139.165-generic 4.4.160
Uname: Linux 4.4.0-139-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
CrashDB: ubuntu
Date: Mon Nov 26 16:17:52 2018
PackageArchitecture: all
SourcePackage: ubuntu-release-upgrader
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Neil Wilson (neil-aldur) wrote :
summary: - systems-resolved constantly restarts on Bionic upgraded from Xenial
+ systemd-resolved constantly restarts on Bionic upgraded from Xenial
tags: added: id-5c0011e969ed904c67dda9ee
Julian Andres Klode (juliank) wrote :

First of all, dhclient remaining in place is not a bug. Systems are not switching to netplan and networkd on upgrade.

The restart caused by /etc/dhcp/dhclient-enter-hooks.d/resolved to tell systemd-resolved about new servers and search domains, and hence expected to happen (resolved does not support a reload action, hence restart). I'm not sure what the alternative would look like, maybe a bunch of busctl calls feeding that information to the running daemon via dbus?

Julian Andres Klode (juliank) wrote :

The dbus API is only available in 239, which means we can't use it in bionic.

We could maybe make the hook just restart resolved if the new files it wrote are different from the old files.

Julian Andres Klode (juliank) wrote :

Ah no, 229, not 239 it seems.

no longer affects: ubuntu-release-upgrader (Ubuntu)
Changed in systemd (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Neil Wilson (neil-aldur) wrote :

I think just a delta change process would be fine. It's restarting when there is no change in lease details, and just clogging up the logs.

btw I am not suggesting leaving dhclient there is a bug - hence the title of the bug.

Julian Andres Klode (juliank) wrote :

Here's a patch while the git branch is pushing.

Changed in systemd (Ubuntu):
status: Triaged → In Progress
Julian Andres Klode (juliank) wrote :

If we want to backport this to stable releases, we need a test case. What I came up with, as I was not able to simulate dhcp lease refreshes, is:

(1) Append make_resolv_conf to the end of the file, so it gets executed
(2) Execute the file with bash -x and different settings and ensure there are no restarts if the settings are the same, and that there are if settings change.

description: updated
description: updated
tags: added: patch
dat (dat-1982) wrote :

@juliank
I *believe* this is impacting our AWS ubuntu EC2 machines and causing wild DNS errors impacting our users, I tried to apply the patch in this thread but it doesn't work, it complains like this

    /etc/dhcp/dhclient-enter-hooks.d/resolved: Syntax error: "(" unexpected

about this line

    if ! cmp --quiet $oldstate <(md5sum $statedir/isc-dhcp-v4-$interface.conf $statedir/isc-dhcp-v6-$interface.conf 2>&1); then

Also for me this line

+ md5sum $statedir/isc-dhcp-v4-$interface.conf $statedir/isc-dhcp-v6-$interface.conf &> $oldstate

outputs to stdout

   $ sudo dhclient
   RTNETLINK answers: File exists
   d41d8cd98f00b204e9800998ecf8427e /run/systemd/resolved.conf.d/isc-dhcp-v4-ens5.conf
   md5sum: /run/systemd/resolved.conf.d/isc-dhcp-v6-ens5.conf: No such file or directory

and the resulting temp file is empty.

Dunno why your patch misbehaves like this but I really have no time to investigate further.

I've attached a patch that seems to be working on our servers in case others are experiencing the same issue and need a quick fix.

----
Adding few server details in case you need 'em:

$ uname -a
Linux ip-172-31-14-255 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
$ dhclient --version
isc-dhclient-4.3.5
$ systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid
$ bash --version
GNU bash, version 4.4.19(1)-release (x86_64-pc-linux-gnu)

dat (dat-1982) wrote :

been running for 1 hour, seems to be working as expected

finally got one minute to create a patch against systemd repo instead of a live server

Changed in systemd (Ubuntu Bionic):
status: New → Confirmed
Changed in systemd (Ubuntu Cosmic):
status: New → Confirmed
Changed in systemd (Ubuntu Disco):
status: In Progress → Confirmed

xnox, it looks like you're missing a "$" in md5sum command from:

https://launchpadlibrarian.net/426017250/systemd_240-6ubuntu7_240-6ubuntu8.diff.gz

for the -proposed systemd version:

/sbin/dhclient-script: 57: /etc/dhcp/dhclient-enter-hooks.d/resolved: Syntax error: "(" unexpected

Adding it solved my boot problem (systemd stucks in networking services).

cheers o/

Lol

On Thu, 30 May 2019, 20:20 Rafael David Tinoco, <
<email address hidden>> wrote:

> xnox, it looks like you're missing a "$" in md5sum command from:
>
>
> https://launchpadlibrarian.net/426017250/systemd_240-6ubuntu7_240-6ubuntu8.diff.gz
>
> for the -proposed systemd version:
>
> /sbin/dhclient-script: 57: /etc/dhcp/dhclient-enter-hooks.d/resolved:
> Syntax error: "(" unexpected
>
> Adding it solved my boot problem (systemd stucks in networking
> services).
>
> cheers o/
>
> --
> You received this bug notification because you are subscribed to systemd
> in Ubuntu.
> Matching subscriptions: systemd
> https://bugs.launchpad.net/bugs/1805183
>
> Title:
> systemd-resolved constantly restarts on Bionic upgraded from Xenial
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1805183/+subscriptions
>

Launchpad Janitor (janitor) wrote :
Download full text (3.5 KiB)

This bug was fixed in the package systemd - 240-6ubuntu9

---------------
systemd (240-6ubuntu9) eoan; urgency=medium

  * Fix typpo in storage test.
    File: debian/tests/storage
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=f28aa5fe4ab175b99b6ea702559c59ca473b4ca8

  * Fix bashism
    File: debian/extra/dhclient-enter-resolved-hook
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=0725c1169ddde4f41cacba7af3e546704e2206be

systemd (240-6ubuntu8) eoan; urgency=medium

  * Only restart resolved on changes in dhclient enter hook.
    This prevents spurious restarts of resolved on rebounds when
    the addresses did not change. (LP: #1805183)
    Author: Julian Andres Klode
    File: debian/extra/dhclient-enter-resolved-hook
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=258893bae8cbb12670e4807636fe8f7e9fb5407a

  * Wait for cryptsetup unit to start, before stopping.
    Patch from cascardo. Plus small refactor for readability. (LP: #1814373)
    File: debian/tests/storage
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=b65aa350be7e61c65927fbc0921a750fcfaa51cd

  * Wait for systemctl is-system-running state.
    File: debian/tests/boot-smoke
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=776998f1f55c445b6e385cab69a4219c42d00838

systemd (240-6ubuntu7) eoan; urgency=medium

  * Revert "Add check to switch VTs only between K_XLATE or K_UNICODE"
    This reverts commit 60407728a1a453104e3975ecfdf25a254dd7cc44.
    Files:
    - debian/patches/Add-check-to-switch-VTs-only-between-K_XLATE-or-K_UNICODE.patch
    - debian/patches/Move-verify_vc_kbmode-to-terminal-util.c-as-vt_verify_kbm.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=18029ab5ff436bfb3b401f24cd1e3a4cf2a1579c

  * Cherrypick missing systemd-stable patches to unbreak wireguard peer endpoints.
    Signed-off-by: Dimitri John Ledkov <email address hidden> (LP: #1825378)
    Author: Dan Streetman
    Files:
    - debian/patches/network-wireguard-fixes-sending-wireguard-peer-setti.patch
    - debian/patches/network-wireguard-use-sd_netlink_message_append_sock.patch
    - debian/patches/sd-netlink-introduce-sd_netlink_message_append_socka.patch
    - debian/patches/test-network-add-more-checks-in-NetworkdNetDevTests..patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=4046f515e40c4dc80d18d2303466737f1f451f11

  * Remove expected failure from passing test.
    Signed-off-by: Dimitri John Ledkov <email address hidden> (LP: #1829450)
    Author: Dan Streetman
    File: debian/tests/systemd-fsckd
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=c43b12037d08555dc1d26593307726d7c7992df0

  * Fix false negative checking for running jobs after boot.
    Signed-off-by: Dimitri John Ledkov <email address hidden> (LP: #1825997)
    Author: Dan Streetman
    File: debian/tests/boot-smoke
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=aeb01631efbaf3fe851dee15d496e0b66b5c347f

  * Cherrypick ask-password: prevent buffer ...

Read more...

Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
Louis Bouchard (louis) on 2019-07-19
Changed in systemd (Ubuntu Bionic):
importance: Undecided → Medium
assignee: nobody → Louis Bouchard (louis)
status: Confirmed → In Progress
Louis Bouchard (louis) wrote :

For Info, I'm repearing an SRU upload for Bionic hopefully available by End Of Day

description: updated
tf8 (tifeit) wrote :

This still looks like an issue in Ubuntu 18.04 LTS. At least on systems that was upgraded from 14.04 release. I will disable systemd-resolved and revert to good old resolv.conf but what is the actual recommended workaround while the fix is not released?

dat (dat-1982) wrote :

@tf8 the patch in https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1805183/comments/8 against a live server has been working fine for us on AWS for almost a year now.

Steve Langasek (vorlon) wrote :

+ md5sum $statedir/isc-dhcp-v4-$interface.conf $statedir/isc-dhcp-v6-$interface.conf &> $oldstate

"&>" is a bashism. /sbin/dhclient-script uses /bin/sh as its interpreter. This does not work as intended.

$ dash
$ echo foo &> output
foo
$ cat output
[1] + Done echo foo
$ ^D
$

An upload of systemd to bionic-proposed has been rejected from the upload queue for the following reason: "wrong use of bashisms in shell script, does not work as intended".

Changed in systemd (Ubuntu Cosmic):
status: Confirmed → Won't Fix
Steve Langasek (vorlon) wrote :

This was also pointed out specifically in comment #8.

I'm reopening the trunk task on this bug as well since the bug is still there in eoan.

Changed in systemd (Ubuntu):
status: Fix Released → Triaged
dat (dat-1982) wrote :

Steve in comment #9 I left a patch against the repo that iirc was working fine on my machines. Maybe it can be helpful. Thanks

Louis Bouchard (louis) wrote :

Ok, I'll have a look at it; I only backported the Eoan fix to Bionic

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers