Shutdown failure: Assertion 'sd_id128_randomize(&id) >= 0' failed at ../src/core/dbus.c:657, function bus_on_connection(). Aborting.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
systemd (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
This is a follow-up on this issue here: https:/
Stéphane Graber (stgraber) wrote : | #1 |
Changed in lxd (Ubuntu): | |
status: | New → Incomplete |
Sven R. Kunze (srkunze) wrote : | #2 |
The requestes versions:
>>> lxc --version
0.22
>>> apt search liblxc1
Sorting... Done
Full Text Search... Done
liblxc1/trusty,now 1.1.5-0ubuntu3~
Linux Containers userspace tools (library)
>>> apt search lxcfs
Sorting... Done
Full Text Search... Done
lxcfs/trusty,now 0.12-0ubuntu5~
FUSE based filesystem for LXC
>>> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.10
Release: 15.10
Codename: wily
>>> systemctl --version
systemd 225
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN
Sven R. Kunze (srkunze) wrote : | #3 |
Confirmations for:
1) lxd-images import ubuntu wily --alias ubuntu; lxc launch ubuntu test; sleep 10s; lxc stop test
Confirmed: It does not shutdown.
2) Executing "shutdown" manually inside the container (through lxc exec) does shutdown the container
Confirmed: It does not shutdown.
3) Manually doing "kill -SIGPWR <PID>" where <PID> is the init pid as reported by "lxc info test", also doesn't shutdown the container for you.
Confirmed: It does not shutdown.
These are the process trees before and after 1), 2) and 3):
BEFORE
22172 ? Ss 0:00 /usr/bin/lxd forkstart test /var/lib/
22191 ? Ss 0:00 \_ /sbin/init
22413 ? Ss 0:00 \_ /lib/systemd/
22857 ? Ss 0:00 \_ dhclient -1 -v -pf /run/dhclient.
22991 ? Ss 0:00 \_ /usr/sbin/
22992 ? Ss 0:00 \_ /usr/bin/
23040 ? Ss 0:00 \_ /usr/sbin/cron -f
23092 ? S 0:00 \_ /bin/sh /etc/init.
23119 ? S 0:00 | \_ sleep 60
23093 ? Ss 0:00 \_ /lib/systemd/
23152 ? Ssl 0:00 \_ /usr/lib/
23207 ? S 0:00 \_ dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=
23215 ? Ss 0:00 \_ /sbin/cgproxy
23240 ? Ssl 0:00 \_ /usr/sbin/rsyslogd -n
23288 ? Ss 0:00 \_ /usr/sbin/atd -f
23329 ? Ssl 0:00 \_ /usr/lib/
23633 ? Ss 0:00 \_ /usr/sbin/sshd -D
23751 pts/4 Ss+ 0:00 \_ /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt220
AFTER
22172 ? Ss 0:00 /usr/bin/lxd forkstart test /var/lib/
22191 ? Ss 0:00 \_ /sbin/init
22413 ? Ss 0:00 \_ /lib/systemd/
22992 ? Ss 0:00 \_ /usr/bin/
23092 ? Z 0:00 \_ [ondemand] <defunct>
24998 ? Zs 0:00 \_ [systemctl] <defunct>
Stéphane Graber (stgraber) wrote : | #4 |
Thanks, that does indeed seem to point towards a systemd issue, I've added a systemd task, Martin should be able to pick things up from there.
Can you also tell us what version of the kernel you're using? We've seen some weird behavior with systemd on some specific kernel releases in the past.
Also, does "journalctl" in the container after attempting a shutdown print any useful information or error messages?
Thanks!
Changed in lxd (Ubuntu): | |
status: | Incomplete → Triaged |
Stéphane Graber (stgraber) wrote : | #5 |
Just to be clear for any bug triager looking at this. We're keeping a lxd task open just for tracking in case there's something wrong going on with one of our components (cgmanager or lxcfs being the most likely candidate).
Once this is confirmed to be a pure systemd issue, we'll close the lxd task.
Sven R. Kunze (srkunze) wrote : | #6 |
>>> uname -a
Linux mexico 3.13.0-68-generic #111-Ubuntu SMP Fri Nov 6 18:17:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>>> lxc exec test -- uname -a
Linux test 3.13.0-68-generic #111-Ubuntu SMP Fri Nov 6 18:17:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
journalctl only shows messages related to container startup:
>>> lxc exec test -- journalctl
-- Logs begin at Mon 2015-11-30 21:03:37 UTC, end at Mon 2015-11-30 21:03:53 UTC. --
Nov 30 21:03:37 test systemd-
Nov 30 21:03:37 test systemd-
Nov 30 21:03:37 test systemd-
Nov 30 21:03:37 test systemd-
Nov 30 21:03:37 test systemd-
Nov 30 21:03:37 test mount[133]: mount: permission denied
Nov 30 21:03:37 test loadkeys[145]: Couldn't get a file descriptor referring to the console
Nov 30 21:03:37 test apparmor[157]: lxc
Nov 30 21:03:37 test apparmor[157]: * Not starting AppArmor in container
Nov 30 21:03:37 test apparmor[157]: ...done.
Nov 30 21:03:37 test systemd[1]: console-
Nov 30 21:03:37 test systemd[1]: console-
Nov 30 21:03:37 test systemd[1]: console-
Nov 30 21:03:37 test systemd[1]: Failed to start Set console keymap.
Nov 30 21:03:37 test systemd[1]: console-
Nov 30 21:03:37 test systemd-
Nov 30 21:03:37 test systemd[1]: console-
Nov 30 21:03:37 test systemd[1]: Started LSB: AppArmor initialization.
Nov 30 21:03:37 test systemd[1]: plymouth-
Nov 30 21:03:37 test systemd[1]: plymouth-
Nov 30 21:03:37 test systemd[1]: plymouth-
Nov 30 21:03:37 test systemd[1]: plymouth-
Nov 30 21:03:37 test systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
Nov 30 21:03:37 test systemd[1]: Started Load/Save Random Seed.
Nov 30 21:03:37 test systemd[1]: dns-clean.service: Failed to kill control group: Invalid argument
Nov 30 21:03:37 test systemd[1]: dns-clean.service: Failed to kill control group...
Sven R. Kunze (srkunze) wrote : | #7 |
> We've seen some weird behavior with systemd on some specific kernel releases in the past.
Not sure if that info helps: I have been installing several different kernels in the mean time; all of them share the same behavior.
Sven R. Kunze (srkunze) wrote : | #8 |
Also note: confirmed for lxd==0.23
Stéphane Graber (stgraber) wrote : | #9 |
Hmm, that journal log looks like it could be a lxcfs problem though I don't remember seeing problems on shutdown in the past...
We did have a rather bad bug at startup a little while back though.
Martin Pitt (pitti) wrote : | #10 |
I'm on current xenial, lxd version is 0.23, which you confirmed had this issue:
$ lxc --version
0.23
I created a wily and xenial container:
$ lxc launch images:
$ lxc launch images:
"lxc stop" works for both w1 and x1. When I follow (-f) the journal in a container with
$ lxc exec x1 -- journalctl -f
before "stop", it starts with
Dec 01 05:47:43 w1 systemd[1]: Received SIGPWR.
Dec 01 05:47:43 w1 systemd[1]: Failed to reset devices.list on /lxc/w1/
Dec 01 05:47:43 w1 systemd[1]: Starting Shut down containers on SIGPWR...
and then a lot of shutdown messages. I suppose your journal output in comment #6 was from *after* you did "lxc stop"? If not, please re-send it.
Let's check for you that everything is in place. Can you please copy&paste the output of
$ lxc exec test -- ls -l /lib/systemd/
$ lxc exec test -- systemd-detect-virt
and then *after* running "lxc stop", the output of
$ lxc exec test -- systemctl status -l sigpwr-
Thanks!
Changed in systemd (Ubuntu): | |
status: | New → Incomplete |
Sven R. Kunze (srkunze) wrote : | #11 |
Thanks for looking at this, Martin.
Minor correction: I work with Ubuntu 14.04. Containers are Wily.
> I suppose your journal output in comment #6 was from *after* you did "lxc stop"?
Correct, it was *after* "lxc stop" (resp. all the other variants)
>>> lxc exec test -- ls -l /lib/systemd/
-rw-r--r-- 1 root root 153 Oct 15 11:34 /lib/systemd/
lrwxrwxrwx 1 root root 36 Oct 15 12:02 /lib/systemd/
>>> lxc exec test -- systemd-detect-virt
lxc
>>> lxc stop test
^C>>>lxc exec test -- systemctl status -l sigpwr-
^C
The last call basically hangs (also without the -l option).
Martin Pitt (pitti) wrote : | #12 |
Sven, can you please run it this way around:
lxc exec test -- systemd-analyze set-log-level debug
lxc exec test -- journalctl -f 2>&1 | tee /tmp/journal.txt
and then (in another terminal) "lxc stop test". This should trigger a lot of log output. Please attach /tmp/journal.txt after it settled down.
Sven R. Kunze (srkunze) wrote : | #13 |
Martin Pitt (pitti) wrote : | #14 |
Ah, thanks! This finally revealed something interesting. So pretty much everything is shut down already, but the last step triggers an assertion:
Dec 01 17:23:49 test systemd[1600]: systemd-
Dec 01 17:23:49 test systemd[1]: [1;31mAssertion 'sd_id128_
Dec 01 17:23:50 test systemd[1]: [1;31mCaught <ABRT>, dumped core as pid 1612.[0m
Dec 01 17:23:50 test systemd[1]: [1;31mFreezing execution.[0m
Changed in systemd (Ubuntu): | |
status: | Incomplete → New |
summary: |
- [systemd] SIGPWR doesn't always lead to shutdown + Shutdown failure: Assertion 'sd_id128_randomize(&id) >= 0' failed at + ../src/core/dbus.c:657, function bus_on_connection(). Aborting. |
Sven R. Kunze (srkunze) wrote : | #15 |
Nice. Is there something else I can do here?
Martin Pitt (pitti) wrote : | #16 |
I first need to check some background what this is trying to do and why it could fail, as I can't reproduce that. I'll followup here with further questions. For now, if you don't mind, could you try upgrading the container to https:/
Sven R. Kunze (srkunze) wrote : | #17 |
Tested. However, it does not fix the issue.
Martin Pitt (pitti) wrote : | #18 |
This was discussed some time ago in https:/
So it's clear why the assertion happens. It's less clear why it's not reproducible for me, and how to fix this properly, but I'll discuss that with upstream.
Stéphane Graber (stgraber) wrote : | #19 |
As a reminder:
- http://
- http://
http://
The code change at that point was to ignore all mounts under /dev, /proc and /sys, maybe things changed since and re-introduced the issue somehow, or we missed a code path somewhere.
no longer affects: | lxd (Ubuntu) |
Stéphane Graber (stgraber) wrote : | #20 |
Also note that I'm removing the lxd task as this does appear to be a systemd issue, ~ubuntu-lxc will remain subscribed to the issue though.
Sven R. Kunze (srkunze) wrote : | #21 |
@Martin I'd be glad to assist if you require more information to make it reproducible.
Sven R. Kunze (srkunze) wrote : | #22 |
@Martin Is there some update on this?
Launchpad Janitor (janitor) wrote : | #23 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in systemd (Ubuntu): | |
status: | New → Confirmed |
Hans Deragon (deragon) wrote : | #24 |
I too cannot stop a container, but it is running Ubuntu Xenial Xerus 16.04 LTS. I cannot tell if the root cause is the same as reported by this bug report though; I am not familiar with systemd.
I created a few days ago a new container on an updated Ubuntu Trusty Thar 14.04 LTS host using the following command:
lxc launch images:
Now, the following command hangs/freezes:
lxc stop server1
I could assist in debugging, if required.
Sven R. Kunze (srkunze) wrote : | #25 |
@Hans
To confirm that it's the same issue, you might replay the instructions Martin gave at #12: https:/
It should reveal the triggered assertion.
Hans Deragon (deragon) wrote : | #26 |
Sven R. Kunze (srkunze) wrote : | #27 |
Indeed it does. Glad to see the bug persists even on 16.04 and its not just a mal-configuration on my side.
Hans Deragon (deragon) wrote : | #28 |
Shouldn't this bug be assigned to some LXC/LXD task? After all, a normal installation of the distributions on a computer works well. 16.04 shutsdown properly on a laptop, for example; systemd simply does not hang. Thus this bug is somewhat related to LXC/LXD. Maybe the bare minimum images are at fault?
Martin Pitt (pitti) wrote : | #29 |
This is likely fixed in current systemd versions already, but the recent commit https:/
Current systemd package in https:/
Changed in systemd (Ubuntu): | |
status: | Confirmed → Fix Committed |
Changed in systemd (Ubuntu): | |
status: | Fix Committed → Fix Released |
Can you give some more details on the version of:
- lxd package on the host
- liblxc1 package on the host
- lxcfs package on the host
- Ubuntu version in the container
- systemd version in the container
And confirm that:
- This behavior is reproducible for you with the following steps:
lxd-images import ubuntu (wily or xenial) --alias ubuntu
lxc launch ubuntu test
sleep 10s
lxc stop test
- Executing "shutdown" manually inside the container (through lxc exec) does shutdown the container
- Manually doing "kill -SIGPWR <PID>" where <PID> is the init pid as reported by "lxc info test", also doesn't shutdown the container for you.
With those we should have a better idea of where the bug (seems systemd is most likely) and we'll triage accordingly.
Thanks!