bootloop on beagle bone

Bug #1449904 reported by Michael Vogt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snappy
Fix Released
Critical
Sergio Schvezov
15.04
Fix Released
Critical
Sergio Schvezov

Bug Description

I tried to reproduce a reported filesystem corruption error on the /boot partition when pulling the power I managed to get the snappy system into a boot loop. The error I get is:
"""
reading snappy-system.txt
1585 bytes read in 9 ms (171.9 KiB/s)
reading a/vmlinuz
6508992 bytes read in 377 ms (16.5 MiB/s)
reading a/initrd.img
13492457 bytes read in 774 ms (16.6 MiB/s)
reading a/dtbs/am335x-boneblack.dtb
30025 bytes read in 14 ms (2 MiB/s)
Kernel image @ 0x82000000 [ 0x000000 - 0x6351c0 ]
## Flattened Device Tree blob at 88000000
   Booting using the fdt blob at 0x88000000
   Loading Ramdisk to 8f321000, end 8ffff0e9 ... OK
   Loading Device Tree to 8f316000, end 8f320548 ... OK

Starting kernel ...

[ 0.000000] Booting Linux on physical CPU 0x0
...
Loading, please wait...
starting version 219
[ 10.278969] random: systemd-udevd urandom read with 39 bits of entropy available
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... ext4
done.
ext4
[ 11.393868] initrd: mounting /dev/disk/by-label/system-a
[ 11.406337] EXT4-fs (mmcblk0p2): couldn't mount as ext3 due to feature incompatibilities
[ 11.417123] EXT4-fs (mmcblk0p2): couldn't mount as ext2 due to feature incompatibilities
[ 11.440379] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[ 11.456766] initrd: mounting /run
[ 11.500665] initrd: checking filesystem for writable partition
[ 11.520088] EXT4-fs (mmcblk0p4): couldn't mount as ext3 due to feature incompatibilities
[ 11.531008] EXT4-fs (mmcblk0p4): couldn't mount as ext2 due to feature incompatibilities
[ 11.556151] EXT4-fs (mmcblk0p4): recovery complete
[ 11.561294] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[ 11.621327] initrd: mounting writable partition
[ 11.682302] EXT4-fs (mmcblk0p4): couldn't mount as ext3 due to feature incompatibilities
[ 11.695248] EXT4-fs (mmcblk0p4): couldn't mount as ext2 due to feature incompatibilities
[ 11.720508] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: discard
mkdir: can't create directory '/root/writable': Read-only file system
mount: mounting /tmpmnt_writable on /root/writable failed: No such file or directory
...
mount: mounting /root/writable/system-data/etc/systemd/system on /root/etc/systemd/system failed: No such file or directory
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
run-init: /lib/systemd/systemd: Exec format error
[ 12.545627] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[ 12.545627]
[ 12.555211] CPU: 0 PID: 1 Comm: run-init Not tainted 3.19.0-15-generic #15-Ubuntu
[ 12.563038] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 12.569460] [<c0312074>] (unwind_backtrace) from [<c030bd3c>] (show_stack+0x20/0x24)
[ 12.577575] [<c030bd3c>] (show_stack) from [<c0b05514>] (dump_stack+0x94/0xa4)
[ 12.585139] [<c0b05514>] (dump_stack) from [<c0b03500>] (panic+0xac/0x22c)
[ 12.592343] [<c0b03500>] (panic) from [<c0359470>] (complete_and_exit+0x0/0x2c)
[ 12.599996] [<c0359470>] (complete_and_exit) from [<c03594bc>] (do_group_exit+0x0/0xe8)

<REBOOT>

U-Boot SPL 2014.10-dirty (Dec 18 2014 - 22:07:26)
"""

and so on.

Related branches

Michael Vogt (mvo)
Changed in snappy-ubuntu:
importance: Undecided → Critical
Revision history for this message
Michael Vogt (mvo) wrote :
Revision history for this message
Michael Vogt (mvo) wrote :

It seems like the following code in the initrd is problematic:
"""
 /sbin/e2fsck -va "$path" >> "$logfile" 2>&1 || true
"""
because it should display the logfile if the fsck fails. This makes diagnosing this harder as the logfile is not displayed when the kernel panics.

Revision history for this message
Michael Vogt (mvo) wrote :

The snappy-system.txt file looks like this:
"""
# boot logic
# either "a" or "b"; target partition we want to boot
snappy_ab=a
# stamp file indicating a new version is being tried; removed by s-i after boot
snappy_stamp=snappy-stamp.txt
# either "regular" (normal boot) or "try" when trying a new version
snappy_mode=regular
"""
however system-a looks incomplete and like it was what the last upgrade was applied to.

When I switch this to:
  "snappy_ab=b"
it boots ok again.

Which is very confusing. But I was able to reproduce it, when I set it back to r31 with a half-done partition in system-a and do a snappy update I get:
"""
hardware spec requires dual root partitions
"""
and the update stop, *however* it does set my snappy-system.txt to:
"""
 snappy_ab=a
 snappy_mode=try
"""
which should not happen but it happens because the HandleAssets() calls is done *after* the "bootloader.ToggleRootFS()" call (in "func (p *Partition) toggleBootloaderRootfs() (err error) {)"

Revision history for this message
Michael Vogt (mvo) wrote :

The error: "hardware spec requires dual root partitions" is constantly showing for me, it seems like the reason is:
"""
if u.partition.dualRootPartitions() && hardware.PartitionLayout != bootloaderSystemAB {
  return fmt.Errorf("hardware spec requires dual root partitions")
 }
"""
that hardware.PartitionLayout is empty. However the file is removed so its hard to inspect. This will need another debug build.

Revision history for this message
Michael Vogt (mvo) wrote :

I found the root cause why the hardware.PartitionLayout was empty, there was a typo on the livecd-rootfs side. That indicates that we really need integration tests for this, i.e. ensure that we set the image back far enough to get a hardware.yaml and kernel update (for me going back to r31 was good enough).

Michael Vogt (mvo)
Changed in snappy-ubuntu:
status: New → In Progress
status: In Progress → Fix Committed
Revision history for this message
Michael Vogt (mvo) wrote :

Sergio fixed this now, thanks a lot! The root cause of this problem is that on a good boot it would reset the "try" to "regular" but it would not set the partition to the good boot. So what happend is that it boots with "try" and "b", this fails, the bootloader notices, boots into "a" and "boot-ok" sets "try" to "regular" but did not switch "b" back to "a". So the next boot would boot into "b" and never switch because "try" was set to "regular".

Michael Terry (mterry)
affects: snappy-ubuntu → snappy
Changed in snappy:
status: Fix Committed → Fix Released
assignee: nobody → Sergio Schvezov (sergiusens)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.