Fails to produce a loop-mountable FS on powerpc/armhf

Bug #1598136 reported by Dan Watkins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
High
Dan Watkins
e2fsprogs (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

In yakkety cloud image builds for each of powerpc and armhf (but not amd64, i386, ppc64el, arm64 or s390x) we see the following failure:

[2016-07-01 09:51:08] lb_binary_chroot
P: Begin copying chroot...
[2016-07-01 09:51:08] lb_binary_rootfs
P: Begin building root filesystem image...
0+0 records in
0+0 records out
0 bytes copied, 3.7116e-05 s, 0.0 kB/s
mke2fs 1.43.1 (08-Jun-2016)
ext2fs_check_if_mount: Can't check if filesystem is mounted due to missing mtab file while determining whether binary/boot/filesystem.ext4 is mounted.
Suggestion: Use Linux kernel >= 3.18 for improved stability of the metadata and journal checksum features.
Discarding device blocks: 4096/343040 done
Creating filesystem with 343040 4k blocks and 171600 inodes
Filesystem UUID: 43ff7fc9-12bb-4151-8549-77c8e1e35e2c
Superblock backups stored on blocks:
 32768, 98304, 163840, 229376, 294912

Allocating group tables: 0/11 done
Writing inode tables: 0/11 done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: 0/11 done

mount: wrong fs type, bad option, bad superblock on /dev/loop0,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

This is caused by this code in live-build's lb_binary_rootfs script:

dd if=/dev/zero of=binary/${INITFS}/filesystem.${LB_CHROOT_FILESYSTEM} bs=1024k count=0 seek=${REAL_DIM}
mkfs.${LB_CHROOT_FILESYSTEM} -F -b ${LB_EXT_BLOCKSIZE:-1024} -i 8192 -m 0 -L ${LB_HDD_LABEL} ${LB_EXT_RESIZEBLOCKS:+-E resize=${LB_EXT_RESIZEBLOCKS}} binary/${INITFS}/filesystem.${LB_CHROOT_FILESYSTEM}

mkdir -p filesystem.tmp
${LB_ROOT_COMMAND} mount -o loop binary/${INITFS}/filesystem.${LB_CHROOT_FILESYSTEM} filesystem.tmp

Revision history for this message
Dan Watkins (oddbloke) wrote :

e2fsprogs was sync'd to 1.43.1-1 from Debian (where previously Ubuntu had 1.42.13-1ubuntu1) in the time between our last successful build and the first failure.

Changed in cloud-images:
milestone: none → y-2016-07-14
assignee: nobody → Dan Watkins (daniel-thewatkins)
importance: Undecided → Critical
description: updated
Revision history for this message
Dan Watkins (oddbloke) wrote :

(This is currently blocking new yakkety cloud images from being created)

Revision history for this message
Theodore Ts'o (tytso) wrote :

Looks like you are using a pre-3.18 kernel with e2fsprogs 1.43.1. I'm guessing this is because whatever cloud service you are using is using a very old kernel? (In contrast, Yakkety Yak is using a 4.x kernel.)

What kernel version is this cloud service using, anyway? I want to make a mental note to stay far, far, far away. :-)

We'd have to see the exact kernel version to be sure (and there should be some hints in the log messages), but at a guess, try building the file system with "mke2fs -O ^metadata_csum" and see if that fixes things for you.

Revision history for this message
Dan Watkins (oddbloke) wrote :

This is running in the Launchpad build farm, so I think this is a yakkety chroot on a precise host.

I'm testing that option out now, thanks for the suggestion!

Interestingly, though, applying the patch that Ubuntu had against the previous version to the latest version fixed this issue on powerpc (but not armhf), so I do wonder if there's something more deeply broken going on. I'll investigate that further if this option doesn't do the trick.

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, so it looks like Theodore was right: disabling metadata_csum does fix the errors. Thanks!

So the next question is whether or not that's something we're happy doing. I'm going to mull this over (and also check with Launchpad people how soon our builds will be moving off precise), but any input would be appreciated.

Dan Watkins (oddbloke)
Changed in cloud-images:
status: New → In Progress
Revision history for this message
Colin Watson (cjwatson) wrote :

The cases in question are edge cases as far as the Launchpad build farm is concerned. Most of our builders are on at least trusty (3.13), and will be upgraded to xenial (4.4) soon. However, we haven't yet moved all armhf and powerpc builds off some relatively old bare-metal hosting yet, so some of those are still on precise (3.2).

armhf will be fixed soon (we have a newer builder cloud ready to be swapped in as soon as we've finished upgrading them from wily to xenial). For powerpc, it will take rather longer to get that up and running in our new cloud builder architecture; but only one of our nine builders is still running a precise kernel, and it should be possible to upgrade that to the backported trusty kernel independently.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the clarification, Colin. I'm uploading versions of live-build and livecd-rootfs to our build PPA which will disable metadata_csum; this will unblock us until the buildds are in a state which will support it.

Revision history for this message
Dan Watkins (oddbloke) wrote :

This has now been worked around in our build PPA; pushing this out to our next iteration to revisit and check if we still need the workaround.

Changed in cloud-images:
milestone: y-2016-07-14 → y-2016-07-28
importance: Critical → High
Changed in e2fsprogs (Ubuntu):
status: New → Invalid
Dan Watkins (oddbloke)
Changed in cloud-images:
milestone: y-2016-07-28 → y-2016-08-11
Dan Watkins (oddbloke)
Changed in cloud-images:
milestone: y-2016-08-11 → y-2016-08-25
Revision history for this message
Dan Watkins (oddbloke) wrote :

sagari has been pulled out of rotation, and will hopefully stay that way except in cases of dire need. Closing this out.

2016-08-16 10:03:43 Odd_Bloke infinity: We're seeing yakkety build failures because of sagari's old kernel. Do I need to re-upload our hack-around for that to our build PPA, or might you be able to upgrade that kernel soonish?
2016-08-16 10:04:15 infinity Odd_Bloke: Is trusty's new enough?
2016-08-16 10:04:38 Odd_Bloke infinity: I _think_ so.
2016-08-16 10:04:50 infinity Odd_Bloke: I'll file a ticket.
2016-08-16 10:05:31 Odd_Bloke Yeah, https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums suggests it landed in 3.6.
2016-08-16 10:07:14 infinity Oh, but we didn't do LTS backports on all arches in precise. That started in trusty. Grr.
2016-08-16 10:08:53 infinity Odd_Bloke: Unfortunately, the solution is going to be to either upgrade sagari entirely or pull it out of rotation. Less fun.
2016-08-16 10:09:16 Odd_Bloke Oh, that sucks. :(
2016-08-16 10:12:36 infinity Odd_Bloke: I'll pull it out of rotation for now, but no guarantees it'll stay that way if the queues get deep.
2016-08-16 10:14:34 Odd_Bloke infinity: OK, thanks. We can cope with it being in rotation if there's a queue, because we're less likely to get scheduled on it. :p
2016-08-16 10:16:10 infinity Odd_Bloke: Heh. I'll throw some hours later at parallelising my current builders a bit more to make it a non-issue.
2016-08-16 10:16:40 Odd_Bloke infinity: Thanks. <3

Changed in cloud-images:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.