Activity log for bug #2013543

Date Who What changed Old value New value Message
2023-03-31 12:26:07 Erik Wasser bug added bug
2023-03-31 12:26:29 Erik Wasser tags systemd container systemd
2023-04-04 08:43:34 Erik Wasser affects sagemath (Ubuntu) systemd (Ubuntu)
2023-04-05 18:22:27 Nick Rosbrook nominated for series Ubuntu Jammy
2023-04-05 18:22:27 Nick Rosbrook bug task added systemd (Ubuntu Jammy)
2023-04-05 18:22:36 Nick Rosbrook systemd (Ubuntu Jammy): status New Incomplete
2023-04-05 18:22:38 Nick Rosbrook systemd (Ubuntu): status New Incomplete
2023-04-19 07:30:29 Erik Wasser attachment added Output of the container with `journalctl -t systemd -b 0` and LogLevel=debug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2013543/+attachment/5665158/+files/logs.txt.gz
2023-04-20 06:27:28 Erik Wasser attachment added Output of the container with `journalctl -t systemd -b 0` and LogLevel=debug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2013543/+attachment/5665411/+files/logs.txt.gz
2023-04-28 12:24:01 Tim Ritberg systemd (Ubuntu Jammy): status Incomplete Confirmed
2023-04-28 13:10:43 Nick Rosbrook systemd (Ubuntu Jammy): status Confirmed Incomplete
2023-05-09 14:41:39 Simon McVittie attachment added Journal from a virtual machine https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2013543/+attachment/5671917/+files/systemctl-daemon-reexec.txt
2023-06-29 20:26:23 Kubìc Grünfeld bug watch added https://github.com/systemd/systemd/issues/28184
2023-06-30 13:21:38 Nick Rosbrook bug task added systemd
2023-06-30 13:21:47 Nick Rosbrook systemd (Ubuntu): status Incomplete New
2023-06-30 13:21:49 Nick Rosbrook systemd (Ubuntu Jammy): status Incomplete New
2023-06-30 13:21:55 Nick Rosbrook systemd (Ubuntu): status New Confirmed
2023-06-30 13:21:57 Nick Rosbrook systemd (Ubuntu Jammy): status New Confirmed
2023-06-30 13:22:01 Nick Rosbrook systemd (Ubuntu): importance Undecided Medium
2023-06-30 13:22:03 Nick Rosbrook systemd (Ubuntu Jammy): importance Undecided Medium
2023-07-12 14:37:47 Nick Rosbrook systemd (Ubuntu): status Confirmed Triaged
2023-07-12 14:37:49 Nick Rosbrook systemd (Ubuntu Jammy): status Confirmed Triaged
2023-07-12 14:38:04 Nick Rosbrook tags container systemd container systemd systemd-sru-next
2023-07-21 13:26:04 Bug Watch Updater systemd: status Unknown Fix Released
2023-07-25 15:05:31 Nick Rosbrook nominated for series Ubuntu Lunar
2023-07-25 15:05:31 Nick Rosbrook bug task added systemd (Ubuntu Lunar)
2023-08-03 17:52:36 Nick Rosbrook systemd (Ubuntu Lunar): status New Triaged
2023-08-03 17:52:38 Nick Rosbrook systemd (Ubuntu Lunar): importance Undecided Medium
2023-08-03 21:46:09 Sergio Durigan Junior bug added subscriber Sergio Durigan Junior
2023-08-14 13:47:00 Launchpad Janitor merge proposal linked https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/449095
2023-08-15 22:07:23 Launchpad Janitor merge proposal linked https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/449220
2023-08-15 22:08:52 Launchpad Janitor merge proposal linked https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/449221
2023-08-18 14:04:34 Nick Rosbrook systemd (Ubuntu): status Triaged Fix Committed
2023-08-18 18:32:45 Nick Rosbrook systemd (Ubuntu Lunar): status Triaged In Progress
2023-08-18 21:42:53 Ubuntu Archive Robot bug added subscriber Nick Rosbrook
2023-08-21 21:25:35 Nick Rosbrook systemd (Ubuntu Jammy): status Triaged In Progress
2023-08-24 17:21:01 Łukasz Zemczak systemd (Ubuntu Lunar): status In Progress Incomplete
2023-08-24 17:21:04 Łukasz Zemczak systemd (Ubuntu Jammy): status In Progress Incomplete
2023-08-24 17:32:46 Nick Rosbrook description # Our problem # During a regular update of our container environment, `systemd` (and the related packages libpam-systemd, libsystemd0, libudev1, systemd-sysv and udev) were updated from `249.11-0ubuntu3.6` to `249.11-0ubuntu3.7`. We're talking only about Ubuntu 22.04. Our Ubuntu 20.04 is working fine with `systemctl daemon-reexec`. In my opinion, the update was not the problem because we've tried downgrading and tried these versions: (current) `249.11-0ubuntu3.7`, `249.11-0ubuntu3.6`, `249.11-0ubuntu3.4` and `249.11-0ubuntu3.3`. The symptoms were the same. # Symptoms # The `/var/lib/dpkg/info/systemd.postinst` executes a `systemctl daemon-reexec` and that ended in a disaster. It seems that `systemd` is forgetting all it started children and tries to start nearly every configured service again. Naturally, the old services are still running, and the ports can't be opened twice and `systemd` won't give up. Here are some(!) of the logfiles: Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Create Volatile Files and Directories... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found left-over process 130 (systemd-udevd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found left-over process 31475 (systemd-udevd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found left-over process 31476 (systemd-udevd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. And... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target System Initialization. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt download activities. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt upgrade and clean activities. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily dpkg database backup timer. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Periodic ext4 Online Metadata Check for All Filesystems. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Condition check resulted in Discard unused blocks once a week being skipped. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily rotation of log files. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily man-db regeneration. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Message of the Day. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Clean PHP session files every 30 mins. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Update the plocate database daily. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily Cleanup of Temporary Directories. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Basic System. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: System is tainted: cgroupsv1 Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Timer Units. And... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over process 206 (atd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Deferred execution scheduler... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: cron.service: Found left-over process 164 (cron) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Regular background program processing daemon. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: dbus.service: Found left-over process 177 (dbus-daemon) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started D-Bus System Message Bus. And... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: rsyslog.service: Found left-over process 204 (rsyslogd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Failed with result 'exit-code'. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Unit process 206 (atd) remains running after unit stopped. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 382 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 392 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 397 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 3052 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting The Apache HTTP Server... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Stopped Deferred execution scheduler. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over process 206 (atd) in control group while starting unit. Ignoring. And... Mar 31 12:51:40 FQDN_REDACTED sshd[31772]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use. And... Mar 31 12:52:06 FQDN_REDACTED systemd[1]: Started The Salt Minion. Mar 31 12:52:06 FQDN_REDACTED salt-minion[32339]: The Salt Minion is shutdown. Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Main process exited, code=exited, status=1/FAILURE Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Failed with result 'exit-code'. Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 2808 (/opt/saltstack/) remains running after unit stopped. Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 2848 (/opt/saltstack/) remains running after unit stopped. Other internal `systemd` process were started again: root 1 0.0 0.1 101204 12444 ? Ss 10:19 0:03 /lib/systemd/systemd -z --system --deserialize 16 root 75 0.0 0.1 31440 13484 ? Ss 10:19 0:00 /lib/systemd/systemd-journald systemd+ 159 0.0 0.0 16124 8004 ? Ss 10:19 0:00 /lib/systemd/systemd-networkd message+ 177 0.0 0.0 8252 4440 ? Ss 10:19 0:00 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 205 0.0 0.0 14908 6464 ? Ss 10:19 0:00 /lib/systemd/systemd-logind systemd+ 223 0.0 0.1 25268 12592 ? Ss 10:19 0:00 /lib/systemd/systemd-resolved root 31424 0.0 0.1 31424 13636 ? Ss 12:51 0:00 /lib/systemd/systemd-journald systemd+ 31636 0.0 0.0 16124 6588 ? Ss 12:51 0:00 /lib/systemd/systemd-networkd message+ 31639 0.0 0.0 8124 3804 ? Ss 12:51 0:00 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 31682 0.0 0.0 14908 6480 ? Ss 12:51 0:00 /lib/systemd/systemd-logind systemd+ 31686 0.0 0.1 25268 12580 ? Ss 12:51 0:00 /lib/systemd/systemd-resolved root 32087 0.0 0.0 21436 5252 ? Ss 12:51 0:00 /lib/systemd/systemd-udevd You can either kill all the old processes and restart them, and then everything is fine. Or you can reboot the container. Besides that `systemctl daemon-reexec` the `systemd` version is running fine. `systemctl daemon-reload` is working like a charme. # Normal case # In the normal case a `systemctl daemon-reexec` just prints only a few lines: Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Reexecuting. Mar 31 14:21:58 FQDN_REDACTED systemd[1]: systemd 249.11-0ubuntu3.7 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified) Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Detected architecture x86-64. # Testcase # Doing a `systemctl daemon-restart` and `ssh localhost` shows the problem. `systemctl` removes the directory `/run/sshd` during the reexec and `ssh` will refuse further connects because the directory is missing. $ systemctl daemon-restart $ ssh root@localhost kex_exchange_identification: read: Connection reset by peer Connection reset by 127.0.0.1 port 22 $ Killing the old instance of SSH and restarting it will work. # Some details to the hardware # Our metal runs OpenVZ/Virtuozzo with this kernel (without any problems): > Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20 21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux The container with the `systemctl daemon-reexec` problem reports the following kernel: Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64 x86_64 x86_64 GNU/Linux # Upshot # * Can somebody help me with this issue? * Why is `systemctl` losing its internal state about the running processes/services? * Why is `systemctl` restarting everything? [Impact] Depending on the contents of /proc/cmdline, when systemd is re-executed with systemctl daemon-reexec, the --deserialize flag may be ignored because it was added after the other arguments. For example, if /proc/cmdline contains ---, then the re-exec cmdline might look like: $ cat /proc/1/cmdline | tr '\0' '\n' /lib/systemd/systemd --- splash --system --deserialize 54 This causes systemd not to process the --deserialize 54 argument, causing it to start with a fresh state. This can cause all kinds of problems, and one easy symptom to see is many lines in the journal like: "$service.service: Found left-over process $pid ($service) in control group while starting unit. Ignoring." [Test Plan] 1. (Only needed if your test system is not already affected) Edit the kernel command line to contain '---' at the end, which would trigger the bug. This can be done by appending '---' to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, running update-grub, and then rebooting. 2. After enabling -proposed, install systemd: $ apt install systemd -y 3. Check that the systemd.postinst script skipped the daemon-reexec call, and instead indicated a reboot is required: $ grep -Fsx systemd /run/reboot-required.pkgs systemd 4. Reboot. 5. Try to re-exec systemd, and check that there are not tons of "left-over process" log messages: $ systemctl daemon-reexec $ journalctl --grep "Found left-over process" -b 0 6. Also confirm that the ordering of /proc/1/cmdline is correct, i.e. that --deserialize $fd comes before args from /proc/cmdline: $ cat /proc/1/cmdline | tr '\0' '\n' [Where problems could occur] There are two changes for this bug. First is the patch against systemd itself, which changes the ordering of arguments on the systemd commandline. This change simply makes it so that systemd's own arguments are always put first on it's re-exec commandline, and that anything from /proc/cmdline is appended after. Any regressions caused by this would also be seen in systemctl daemon-reexec invocations. The second change is in systemd.postinst, which skips the systemctl daemon-reexec call when upgrading from versions of systemd that could hit this bug. Regressions caused by this would be seen during package upgrades. [Original Description] # Our problem # During a regular update of our container environment, `systemd` (and the related packages libpam-systemd, libsystemd0, libudev1, systemd-sysv and udev) were updated from `249.11-0ubuntu3.6` to `249.11-0ubuntu3.7`. We're talking only about Ubuntu 22.04. Our Ubuntu 20.04 is working fine with `systemctl daemon-reexec`. In my opinion, the update was not the problem because we've tried downgrading and tried these versions: (current) `249.11-0ubuntu3.7`, `249.11-0ubuntu3.6`, `249.11-0ubuntu3.4` and `249.11-0ubuntu3.3`. The symptoms were the same. # Symptoms # The `/var/lib/dpkg/info/systemd.postinst` executes a `systemctl daemon-reexec` and that ended in a disaster. It seems that `systemd` is forgetting all it started children and tries to start nearly every configured service again. Naturally, the old services are still running, and the ports can't be opened twice and `systemd` won't give up. Here are some(!) of the logfiles: Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Create Volatile Files and Directories... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found left-over process 130 (systemd-udevd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found left-over process 31475 (systemd-udevd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found left-over process 31476 (systemd-udevd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. And... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target System Initialization. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt download activities. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt upgrade and clean activities. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily dpkg database backup timer. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Periodic ext4 Online Metadata Check for All Filesystems. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Condition check resulted in Discard unused blocks once a week being skipped. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily rotation of log files. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily man-db regeneration. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Message of the Day. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Clean PHP session files every 30 mins. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Update the plocate database daily. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily Cleanup of Temporary Directories. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Basic System. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: System is tainted: cgroupsv1 Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Timer Units. And... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over process 206 (atd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Deferred execution scheduler... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: cron.service: Found left-over process 164 (cron) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Regular background program processing daemon. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: dbus.service: Found left-over process 177 (dbus-daemon) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started D-Bus System Message Bus. And... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: rsyslog.service: Found left-over process 204 (rsyslogd) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Failed with result 'exit-code'. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Unit process 206 (atd) remains running after unit stopped. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 382 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 392 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 397 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over process 3052 (apache2) in control group while starting unit. Ignoring. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting The Apache HTTP Server... Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Stopped Deferred execution scheduler. Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over process 206 (atd) in control group while starting unit. Ignoring. And... Mar 31 12:51:40 FQDN_REDACTED sshd[31772]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use. And... Mar 31 12:52:06 FQDN_REDACTED systemd[1]: Started The Salt Minion. Mar 31 12:52:06 FQDN_REDACTED salt-minion[32339]: The Salt Minion is shutdown. Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Main process exited, code=exited, status=1/FAILURE Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Failed with result 'exit-code'. Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 2808 (/opt/saltstack/) remains running after unit stopped. Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 2848 (/opt/saltstack/) remains running after unit stopped. Other internal `systemd` process were started again: root 1 0.0 0.1 101204 12444 ? Ss 10:19 0:03 /lib/systemd/systemd -z --system --deserialize 16 root 75 0.0 0.1 31440 13484 ? Ss 10:19 0:00 /lib/systemd/systemd-journald systemd+ 159 0.0 0.0 16124 8004 ? Ss 10:19 0:00 /lib/systemd/systemd-networkd message+ 177 0.0 0.0 8252 4440 ? Ss 10:19 0:00 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 205 0.0 0.0 14908 6464 ? Ss 10:19 0:00 /lib/systemd/systemd-logind systemd+ 223 0.0 0.1 25268 12592 ? Ss 10:19 0:00 /lib/systemd/systemd-resolved root 31424 0.0 0.1 31424 13636 ? Ss 12:51 0:00 /lib/systemd/systemd-journald systemd+ 31636 0.0 0.0 16124 6588 ? Ss 12:51 0:00 /lib/systemd/systemd-networkd message+ 31639 0.0 0.0 8124 3804 ? Ss 12:51 0:00 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 31682 0.0 0.0 14908 6480 ? Ss 12:51 0:00 /lib/systemd/systemd-logind systemd+ 31686 0.0 0.1 25268 12580 ? Ss 12:51 0:00 /lib/systemd/systemd-resolved root 32087 0.0 0.0 21436 5252 ? Ss 12:51 0:00 /lib/systemd/systemd-udevd You can either kill all the old processes and restart them, and then everything is fine. Or you can reboot the container. Besides that `systemctl daemon-reexec` the `systemd` version is running fine. `systemctl daemon-reload` is working like a charme. # Normal case # In the normal case a `systemctl daemon-reexec` just prints only a few lines: Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Reexecuting. Mar 31 14:21:58 FQDN_REDACTED systemd[1]: systemd 249.11-0ubuntu3.7 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified) Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Detected architecture x86-64. # Testcase # Doing a `systemctl daemon-restart` and `ssh localhost` shows the problem. `systemctl` removes the directory `/run/sshd` during the reexec and `ssh` will refuse further connects because the directory is missing. $ systemctl daemon-restart $ ssh root@localhost kex_exchange_identification: read: Connection reset by peer Connection reset by 127.0.0.1 port 22 $ Killing the old instance of SSH and restarting it will work. # Some details to the hardware # Our metal runs OpenVZ/Virtuozzo with this kernel (without any problems): > Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20 21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux The container with the `systemctl daemon-reexec` problem reports the following kernel: Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64 x86_64 x86_64 GNU/Linux # Upshot # * Can somebody help me with this issue? * Why is `systemctl` losing its internal state about the running processes/services? * Why is `systemctl` restarting everything?
2023-08-24 17:34:58 Nick Rosbrook systemd (Ubuntu Jammy): status Incomplete In Progress
2023-08-24 17:35:00 Nick Rosbrook systemd (Ubuntu Lunar): status Incomplete In Progress
2023-08-24 18:00:16 Łukasz Zemczak systemd (Ubuntu Lunar): status In Progress Fix Committed
2023-08-24 18:00:17 Łukasz Zemczak bug added subscriber Ubuntu Stable Release Updates Team
2023-08-24 18:00:24 Łukasz Zemczak bug added subscriber SRU Verification
2023-08-24 18:00:27 Łukasz Zemczak tags container systemd systemd-sru-next container systemd systemd-sru-next verification-needed verification-needed-lunar
2023-08-24 18:10:19 Łukasz Zemczak systemd (Ubuntu Jammy): status In Progress Fix Committed
2023-08-24 18:10:24 Łukasz Zemczak tags container systemd systemd-sru-next verification-needed verification-needed-lunar container systemd systemd-sru-next verification-needed verification-needed-jammy verification-needed-lunar
2023-08-28 15:41:28 Nick Rosbrook tags container systemd systemd-sru-next verification-needed verification-needed-jammy verification-needed-lunar container systemd systemd-sru-next verification-done-lunar verification-needed verification-needed-jammy
2023-08-28 15:58:40 Nick Rosbrook tags container systemd systemd-sru-next verification-done-lunar verification-needed verification-needed-jammy container systemd systemd-sru-next verification-done verification-done-jammy verification-done-lunar
2023-08-30 15:09:14 Launchpad Janitor systemd (Ubuntu): status Fix Committed Fix Released
2023-09-12 23:25:28 Launchpad Janitor systemd (Ubuntu Lunar): status Fix Committed Fix Released
2023-09-12 23:25:46 Brian Murray removed subscriber Ubuntu Stable Release Updates Team
2023-09-12 23:33:37 Launchpad Janitor systemd (Ubuntu Jammy): status Fix Committed Fix Released
2023-09-14 00:54:04 Steve Langasek systemd (Ubuntu Jammy): status Fix Released Triaged
2023-09-15 02:12:39 Launchpad Janitor systemd (Ubuntu Jammy): status Triaged Fix Released