Upgrade from r41 to r55 on BBB failed to boot and also to failover (drops into rescue systemd mode)

Bug #1457491 reported by Michael Vogt on 2015-05-21
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Snappy
High
Unassigned

Bug Description

While testing autopkgtest against the beagleboneblack I ran into the following problem:

****** Running ./tests/90_test_upgrade
current version: 41
available version: 55
upgrading...
Installing ubuntu-core (55)
Name Date Version Developer
ubuntu-core 2015-05-08 55 ubuntu!
Reboot to use the new ubuntu-core.
Rebooting testbed...
....
adt-virt-ssh: DBG: execute-timeout: /tmp/adt-virt-ssh.48fs1ux1/runcmd rm -rf /tmp/sudo_askpass.3GXU /tmp/sudo_askpass.DysG /tmp/sudo_askpass.bDZN
<VirtSubproc>: failure: (down) ['rm', '-rf', '/tmp/sudo_askpass.3GXU', '/tmp/sudo_askpass.DysG', '/tmp/sudo_askpass.bDZN'] failed (exit status 255)
while cleaning up because of another error:
<VirtSubproc>: failure: Timed out on waiting for ssh connection
"""

It turns out the upgrade broke and systemd drops me into a recovery shell and says it can not execute /bin/plymouth (which indeed is not available on the system).

So there are two bugs here:
1. the system did not upgrade correctly (either /bin/plymouth is missing or there is a incorrect systemd unit that referes to it)
2. the system should not go into a recovery shell but instead just auto-reboot back into the good system

Michael Vogt (mvo) wrote :

Some relevant information:
"""
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: fsck.fat 3.0.27 (2014-11-12)
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: /snappy-system.txt
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: Duplicate directory entry.
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: First Size 1581 bytes, date 13:41:12 May 21 2015
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: Second Size 0 bytes, date 17:31:44 Jul 24 1913
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: Auto-renaming second.
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: Renamed to FSCK0000.000
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: /snappy-stamp.txt
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: File size is 0 bytes, cluster chain length is > 0 bytes.
May 21 13:42:06 localhost.localdomain systemd-fsck[424]: Truncating file to 0 bytes.
...
May 21 13:42:06 localhost.localdomain systemd[1]: Mounting /boot/uboot...
May 21 13:42:07 localhost.localdomain mount[440]: mount: wrong fs type, bad option, bad superblock on /dev/mmcblk0p1,
May 21 13:42:07 localhost.localdomain kernel: FAT-fs (mmcblk0p1): IO charset iso8859-1 not found
"""

Michael Vogt (mvo) wrote :

It looks like the "systemd-did-not-upgrade-correctly" is in very likely bug #1458903. It seems the kernel upgrade went wrong for some reason but because of the bug #1458903 it never tried to install a new kernel.

Michael Vogt (mvo) wrote :

Subscribing pitti to get intput what we can do with systemd so that it automatically reboots in the rescue.service if snappy_mode is set to "try".

Changed in snappy:
status: New → Triaged
importance: Undecided → High
summary: Upgrade from r41 to r55 on BBB failed to boot and also to failover
+ (drops into rescue systemd mode)
Michael Vogt (mvo) on 2015-08-26
tags: added: snappy-robustness
Martin Pitt (pitti) wrote :

> it can not execute /bin/plymouth

This is most likely just from

  ExecStartPre=-/bin/plymouth quit

which is guarded with the "-" so it's just cosmetical.

Not many logs here, but I suppose that this was actually emergency.target, triggered by local-fs.target's "OnFailure=emergency.target". So what you could do is to modify /lib/systemd/system/local-fs.target (inline in the snappy image or copy it to /etc/systemd/system/ [1]) and change to OnFailure=reboot.target. I tested this on a standard wily cloud image and it works well. Quite obviously we don't want to do this on a normal server/cloud/desktop install as this would lead to an eternal boot loop without a chance to fix things.

[1] Unfortunately using a drop-in (local-fs.target.d/snappy.conf) does not work here, as OnFailure= there only appends to the existing OnFailure= instead of overriding it.

Michael Vogt (mvo) wrote :

We won't work on this 15.04 bug anymore.

Changed in snappy:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers