Ubuntu
systemd package

systemd: Failed to send signal

Bug #1783499 reported by Shuang Liu on 2018-07-25

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	dbus (Ubuntu)	Invalid	Undecided	Unassigned
	systemd (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

systemd: Failed to send signal.

[ 3.137257] systemd[1]: Failed to send job remove signal for 109: Connection reset by peer
[ 3.138119] systemd[1]: run-rpc_pipefs.mount: Failed to send unit change signal for run-rpc_pipefs.mount: Transport endpoint is not connected
[ 3.138185] systemd[1]: dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: Failed to send unit change signal for dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: Transport endpoint is not connected
[ 3.138512] systemd[1]: run-rpc_pipefs.mount: Failed to send unit change signal for run-rpc_pipefs.mount: Transport endpoint is not connected
[ 3.142719] systemd[1]: Failed to send job remove signal for 134: Transport endpoint is not connected
[ 3.142958] systemd[1]: auth-rpcgss-module.service: Failed to send unit change signal for auth-rpcgss-module.service: Transport endpoint is not connected
[ 3.165359] systemd[1]: Failed to send job remove signal for 133: Transport endpoint is not connected
[ 3.165505] systemd[1]: proc-fs-nfsd.mount: Failed to send unit change signal for proc-fs-nfsd.mount: Transport endpoint is not connected
[ 3.165541] systemd[1]: dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: Failed to send unit change signal for dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: Transport endpoint is not connected
[ 3.166854] systemd[1]: Failed to send job remove signal for 66: Transport endpoint is not connected
[ 3.167072] systemd[1]: proc-fs-nfsd.mount: Failed to send unit change signal for proc-fs-nfsd.mount: Transport endpoint is not connected
[ 3.167130] systemd[1]: systemd-modules-load.service: Failed to send unit change signal for systemd-modules-load.service: Transport endpoint is not connected
[ 2.929018] systemd[1]: Failed to send job remove signal for 53: Transport endpoint is not connected
[ 2.929220] systemd[1]: systemd-random-seed.service: Failed to send unit change signal for systemd-random-seed.service: Transport endpoint is not connected
[ 3.024320] systemd[1]: sys-devices-platform-serial8250-tty-ttyS12.device: Failed to send unit change signal for sys-devices-platform-serial8250-tty-ttyS12.device: Transport endpoint is not connected
[ 3.024421] systemd[1]: dev-ttyS12.device: Failed to send unit change signal for dev-ttyS12.device: Transport endpoint is not connected
[ 3.547019] systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed to send unit change signal for proc-sys-fs-binfmt_misc.automount: Connection reset by peer
[ 3.547144] systemd[1]: Failed to send job change signal for 207: Transport endpoint is not connected

How to reproduce:
1. enable debug level journal
LogLevel=debug in /etc/systemd/system.conf
2. reboot the system
3. journalctl | grep "Failed to send"

sliu@vmlxhi-094:~$ lsb_release -rd
Description: Ubuntu 16.04.4 LTS
Release: 16.04

sliu@vmlxhi-094:~$ systemctl --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN

sliu@vmlxhi-094:~$ dbus-daemon --version
D-Bus Message Bus Daemon 1.10.6
Copyright (C) 2002, 2003 Red Hat, Inc., CodeFactory AB, and others
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Tags:

Revision history for this message

Attila (acraciun) wrote on 2019-01-18:

UbuntuBug.zip Edit (506.7 KiB, application/zip)

Here at Mozilla, we have 200 servers running on HP Moonshot system, all have same hardware configuration and Ubuntu 16.04.2. The OS is not up to date, we use it as is was released. We using a program to tests Firefox source code and after each test we reboot the servers using /sbin/reboot. After a while (between 24-48h - during this period ~6 reboots/h are made), randomly, all 200 servers get stuck at the reboot - see the ILO capture - and to bring it back we have to power cycle each of them.

On one of the beta servers, we have made the bellow updates/changes, set debug, set cron to reboot server after 5-10 min, however, the reboot freeze is still present:
- upgraded OS to Ubuntu 16.04.5 latest packages;
- used GRUB_CMDLINE_LINUX_DEFAULT="reboot=bios"
- used GRUB_CMDLINE_LINUX_DEFAULT="acpi=off"
- GRUB_CMDLINE_LINUX_DEFAULT="reboot=force"
- upgraded Kernel to v4.15 (the main one from Ubuntu's repo);
- upgraded Kernel to v4.20 from https://kernel.ubuntu.com/~kernel-ppa/mainline/
- now we are testing the reboot with 4.20.3 from the above repo and working to update systemd.

Attached you can find the debug-log for:
- kernel 4.4.0-66-generic #87-Ubuntu - shutdown-debuglogkernel-4.4.txt
- kernel 4.15 - shutdown-log-kernel4-15.txt
- kernel 4.20 shutdown-log-kernel420.txt
- ILO capture with the freeze ILO-reboot-freeze.PNG

Please check all this logs/capture and let us know a solution. Thanks.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-01-18:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dbus (Ubuntu):
status:	New → Confirmed
Changed in systemd (Ubuntu):
status:	New → Confirmed

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-01-18:

Have you tried other reboot quirk like "reboot=pci"? It may help.

Revision history for this message

Attila (acraciun) wrote on 2019-01-21:

I did not tried reboot=pci, will try now. Thanks.

Revision history for this message

Attila (acraciun) wrote on 2019-01-22:

shutdown-log-reboot-pci.txt Edit (167.0 KiB, text/plain)

reboot=pci does not help, it get stuck after 19 hours (reboot once at 5 min). Attached is the debug log. On ILO we get the same as I sent in my first post - see capture from the arhive.

We'll update systemd to 237 and test it.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-01-22:

Please let the hardware vendor know. This is highly likely a platform bug.
AFAICT, systemd has done its part, so the bug is either in the kernel (less likely) or in the BIOS.

Revision history for this message

Attila (acraciun) wrote on 2019-01-23:

We will notify them if the systemd does not fix the issue. So far, after the systemd upgrade, the server reboots fine in 24h test (reboot once at 5 min).

Revision history for this message

Attila (acraciun) wrote on 2019-01-25:

shutdown-logsysd237.txt Edit (189.6 KiB, text/plain)

After 36h of rebooting, the system is stuck. Attached is the debug log.

Revision history for this message

Attila (acraciun) wrote on 2019-02-14:

#10

We sent all the logs and captures data to HPE and they asked to update the BIOS to the latest one. We already using the latest firmware, 4 months past the recommended firmware. Also, we have 200 servers with Windows 10 doing same test - reboot after each test and there are no hangs. HPE stated that they do not have any other reports of hangs on reboots or shutdowns for this hardware.

Any other suggestion from your side?

Meanwhile I'll do a dist-upgrade from 16.04 to 18 and do the reboot test.

Revision history for this message

Attila (acraciun) wrote on 2019-02-14:

#11

Ubuntu-18.zip Edit (316.4 KiB, application/zip)

After 5 hours of reboot the server is stuck. See the attached debug log and ILO capture.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-02-14:

#12

> Any other suggestion from your side?
The system hang in this bug is really hardware related, so HPE needs to take a deeper look.

For someone like us that cannot dig into firmware/hardware and need to find a solution at software level, in general I'll start with disabling runtime power management for all devices.

Revision history for this message

Attila (acraciun) wrote on 2019-02-20:

#13

reboot-acpi-off.zip Edit (169.6 KiB, application/zip)

I have disabled the acpi services (acpid.path, acpid.service, acpid.socket and set up the reboot once at 5 min. The server get stuck after 13h.

Something else to try?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-02-20:

#14

Try kernel parameter "acpi=off".

Revision history for this message

Attila (acraciun) wrote on 2019-02-20:

#15

That was already tested and no luck.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-02-20:

#16

Well, I don't think trials and errors can get any fruitful result, this issue really needs hardware vendor to investigate.

Revision history for this message

Attila (acraciun) wrote on 2019-02-21:

#17

We installed Centos 7 and started the reboot test now. If this fail again then is a hardware issue, if not, then it can be from Ubuntu.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-02-21:

#18

Or a kernel regression, Centos 7 uses an older kernel.

Revision history for this message

Attila (acraciun) wrote on 2019-02-21:

#19

centos7.PNG Edit (49.6 KiB, image/png)

This get stuck in max 5 hours. We'll try to install a non systemd Ubuntu, like v14 to test it.

Revision history for this message

Attila (acraciun) wrote on 2019-02-25:

#20

Tested with multiple OSs, all stuck: Ubuntu 14, Fedora 29, Arch Linux current 2019.02.01.

We received a detailed note from the next level of support with HPE and decided to test with Ubuntu 16 (our production one) and set "ps -ef >> /shutdown-log.txt" in to the debug.sh script. Maybe we can see something that is not closed/terminated and prevent the server to reboot.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-02-26:

#21

I think it stuck in firmware/hardware instead of userspace process.

Revision history for this message

Attila (acraciun) wrote on 2019-02-26:

#22

shutdown-log-process.txt Edit (1022.7 KiB, text/plain)

The test server get stuck after ~9h, here is the debug log.

Revision history for this message

Attila (acraciun) wrote on 2019-02-26:

#23

I have activated RuntimeWatchdogSec=20s and ShutdownWatchdogSec=1min in /etc/systemd/system.conf and set the reboot, the server get stuck in 105min.

Revision history for this message

Attila (acraciun) wrote on 2019-05-30:

#24

I have found that if I set the reboot once at 15 min, the server works for 5-6 days. If the reboots are done once at 10 min, the server get stuck in 24-48h max. Now I'm testing the reboots once at 20min, it should take 10 days or more until is stuck.

Also, I have tried all "reboot=" grub options, nothing helps. The only way to keep he server online is to set the reboot time less often. Odd!

Revision history for this message

Dan Streetman (ddstreet) wrote on 2021-06-30:

#25

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status:	Confirmed → Invalid
Changed in dbus (Ubuntu):
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntusystemd package

systemd: Failed to send signal

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
systemd package