Activity log for bug #1824407

Date Who What changed Old value New value Message
2019-04-11 18:34:15 Dimitri John Ledkov bug added bug
2019-04-11 19:00:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2019-04-11 19:00:07 Ubuntu Kernel Bot tags disco
2019-04-16 22:38:54 Michael Hudson-Doyle bug added subscriber Michael Hudson-Doyle
2019-07-15 04:17:29 Launchpad Janitor linux (Ubuntu): status Incomplete Expired
2019-10-03 15:11:29 Dimitri John Ledkov linux (Ubuntu): status Expired New
2019-10-03 15:30:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2019-10-22 14:11:43 Dimitri John Ledkov description Apr 11 18:32:52 ubuntu-server kernel: SQUASHFS error: squashfs_read_data failed to read block 0x6ff3660032757063 Apr 11 18:32:52 ubuntu-server kernel: SQUASHFS error: Unable to read metadata cache entry [6ff3660032757063] Apr 11 18:32:55 ubuntu-server kernel: SQUASHFS error: squashfs_read_data failed to read block 0x6261746d79732e Apr 11 18:32:55 ubuntu-server kernel: SQUASHFS error: Unable to read metadata cache entry [6261746d79732e] Apr 11 18:33:05 ubuntu-server kernel: SQUASHFS error: squashfs_read_data failed to read block 0x6ff366df00333a37 Apr 11 18:33:05 ubuntu-server kernel: SQUASHFS error: Unable to read metadata cache entry [6ff366df00333a37] Happens when booting e.g. subiquity disco image. v5.0.0-8-generic kernel 1) Download focal subiquity daily image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) Before --- insert the following options bebroken debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 6) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 7) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 8) Exhibit A: $ cat /etc/machine-id (no output) $ systemd-machine-id-setup $ cat /etc/machine-id (some machine id) $ mount -o remount / $ cat /etc/machine-id I/O error with overlay errors in dmesg Similarly one can reproduce this with /etc/.pwd.lock & executing systemd-sysusers. systemd-machine-id-setup is probably the easiest to trace. It does a simply open, truncate, lseek, write. On boot, actuall remount is done by the starting a unit which calls /lib/systemd/systemd-remount-fs Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /.
2019-10-22 14:11:47 Dimitri John Ledkov linux (Ubuntu): status Incomplete New
2019-10-22 14:36:12 Dimitri John Ledkov bug task added linux-hwe (Ubuntu)
2019-10-22 14:36:53 Dimitri John Ledkov nominated for series Ubuntu Bionic
2019-10-22 14:36:53 Dimitri John Ledkov bug task added linux (Ubuntu Bionic)
2019-10-22 14:36:53 Dimitri John Ledkov bug task added linux-hwe (Ubuntu Bionic)
2019-10-22 14:36:59 Dimitri John Ledkov linux-hwe (Ubuntu Bionic): milestone ubuntu-18.04.4
2019-10-22 14:37:04 Dimitri John Ledkov linux-hwe (Ubuntu Bionic): importance Undecided Critical
2019-10-22 14:37:12 Dimitri John Ledkov bug task deleted linux (Ubuntu Bionic)
2019-10-22 14:37:22 Dimitri John Ledkov linux (Ubuntu): milestone ubuntu-20.01
2019-10-22 14:37:28 Dimitri John Ledkov linux (Ubuntu): importance Undecided Critical
2019-10-22 14:37:33 Dimitri John Ledkov linux-hwe (Ubuntu): status New Invalid
2019-10-22 14:37:38 Dimitri John Ledkov linux (Ubuntu): status New Confirmed
2019-10-22 14:37:40 Dimitri John Ledkov linux-hwe (Ubuntu Bionic): status New Confirmed
2019-10-23 16:34:28 Steve Langasek summary why does booting any livefs squashfs has kernel complaining about unable to read metadata something rather why does booting any livefs squashfs cause the kernel to complain about being unable to read metadata
2019-10-23 16:35:18 Brian Murray summary why does booting any livefs squashfs cause the kernel to complain about being unable to read metadata why does booting any livefs squashfs cause the kernel to complain about being unable to read metadata‽
2019-10-23 16:35:46 Brian Murray tags disco disco rls-ff-incoming
2019-11-01 15:50:14 Dimitri John Ledkov description 1) Download focal subiquity daily image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) Before --- insert the following options bebroken debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 6) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 7) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 8) Exhibit A: $ cat /etc/machine-id (no output) $ systemd-machine-id-setup $ cat /etc/machine-id (some machine id) $ mount -o remount / $ cat /etc/machine-id I/O error with overlay errors in dmesg Similarly one can reproduce this with /etc/.pwd.lock & executing systemd-sysusers. systemd-machine-id-setup is probably the easiest to trace. It does a simply open, truncate, lseek, write. On boot, actuall remount is done by the starting a unit which calls /lib/systemd/systemd-remount-fs Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /. 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) Before --- insert the following options break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute: rm /scripts/casper-bottom/25adduser exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock /usr/lib/systemd/systemd-remount-fs cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id (some machine uuid) mount -o remount / cat /etc/machine-id I/O error with overlay errors in dmesg Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /.
2019-11-01 15:55:12 Dimitri John Ledkov description 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) Before --- insert the following options break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute: rm /scripts/casper-bottom/25adduser exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock /usr/lib/systemd/systemd-remount-fs cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id (some machine uuid) mount -o remount / cat /etc/machine-id I/O error with overlay errors in dmesg Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /. 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) Before --- insert the following options    break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute:     rm /scripts/casper-bottom/25adduser     exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /.
2019-11-01 15:56:09 Dimitri John Ledkov summary why does booting any livefs squashfs cause the kernel to complain about being unable to read metadata‽ remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files
2019-11-01 17:16:46 Terry Rudd bug added subscriber Terry Rudd
2019-11-01 17:21:32 Colin Ian King linux (Ubuntu): assignee Colin Ian King (colin-king)
2019-11-01 17:21:36 Colin Ian King linux-hwe (Ubuntu Bionic): assignee Colin Ian King (colin-king)
2019-11-02 02:49:27 Dimitri John Ledkov description 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) Before --- insert the following options    break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute:     rm /scripts/casper-bottom/25adduser     exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /. 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options    break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute:     rm /scripts/casper-bottom/25adduser     exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /.
2019-11-04 17:39:07 Colin Ian King attachment added repro.sh https://bugs.launchpad.net/ubuntu/bionic/+source/linux-hwe/+bug/1824407/+attachment/5302762/+files/repro.sh
2019-11-22 12:09:02 Colin Ian King description 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options    break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute:     rm /scripts/casper-bottom/25adduser     exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /. == SRU Justification Disco, Eoan, Focal == Multiple squashfs filesystems with overlayfs cause file corruption issues when modifying zero sized files == Fix == The current fix is pending in https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a == Test case == With an Ubuntu ISO on the cdrom drive, use: #!/bin/bash -x mkdir -p /cdrom mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom sleep 1 mkdir -p /cow mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow sleep 1 mkdir -p /cow/upper mkdir -p /cow/work modprobe -q -b overlay sleep 1 modprobe -q -b loop sleep 1 dev=$(losetup -f) mkdir -p /filesystem.squashfs losetup $dev /cdrom/casper/filesystem.squashfs mount -t squashfs -o ro,noatime $dev /filesystem.squashfs sleep 1 dev=$(losetup -f) mkdir -p /installer.squashfs losetup $dev /cdrom/casper/installer.squashfs mount -t squashfs -o ro,noatime $dev /installer.squashfs sleep 1 mkdir -p /root-tmp mount -t overlay -o 'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work' /cow /root-tmp FILE=/root-tmp/etc/.pwd.lock echo foo > $FILE cat $FILE sync # # dropping caches or remounting causes the bug # echo 3 > /proc/sys/vm/drop_caches cat $FILE Without the fix the cat of the file will produce an error. With the the cat will work correctly. == Regression Potential == There is an unhandled corner case: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B However, since this is an issue without the fix and will be addressed later with subsequent fixes once they are OK with upstream I think the risk is minimal considering nobody is complaining about these corner cases with the current broken overlayfs squashfs layering. ----------------------- 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options    break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute:     rm /scripts/casper-bottom/25adduser     exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /.
2019-11-22 12:09:15 Colin Ian King description == SRU Justification Disco, Eoan, Focal == Multiple squashfs filesystems with overlayfs cause file corruption issues when modifying zero sized files == Fix == The current fix is pending in https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a == Test case == With an Ubuntu ISO on the cdrom drive, use: #!/bin/bash -x mkdir -p /cdrom mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom sleep 1 mkdir -p /cow mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow sleep 1 mkdir -p /cow/upper mkdir -p /cow/work modprobe -q -b overlay sleep 1 modprobe -q -b loop sleep 1 dev=$(losetup -f) mkdir -p /filesystem.squashfs losetup $dev /cdrom/casper/filesystem.squashfs mount -t squashfs -o ro,noatime $dev /filesystem.squashfs sleep 1 dev=$(losetup -f) mkdir -p /installer.squashfs losetup $dev /cdrom/casper/installer.squashfs mount -t squashfs -o ro,noatime $dev /installer.squashfs sleep 1 mkdir -p /root-tmp mount -t overlay -o 'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work' /cow /root-tmp FILE=/root-tmp/etc/.pwd.lock echo foo > $FILE cat $FILE sync # # dropping caches or remounting causes the bug # echo 3 > /proc/sys/vm/drop_caches cat $FILE Without the fix the cat of the file will produce an error. With the the cat will work correctly. == Regression Potential == There is an unhandled corner case: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B However, since this is an issue without the fix and will be addressed later with subsequent fixes once they are OK with upstream I think the risk is minimal considering nobody is complaining about these corner cases with the current broken overlayfs squashfs layering. ----------------------- 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options    break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute:     rm /scripts/casper-bottom/25adduser     exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /. == SRU Justification Disco, Eoan, Focal == Multiple squashfs filesystems with overlayfs cause file corruption issues when modifying zero sized files == Fix == The current fix is pending in https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a == Test case == With an Ubuntu ISO on the cdrom drive, use: #!/bin/bash -x mkdir -p /cdrom mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom sleep 1 mkdir -p /cow mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow sleep 1 mkdir -p /cow/upper mkdir -p /cow/work modprobe -q -b overlay sleep 1 modprobe -q -b loop sleep 1 dev=$(losetup -f) mkdir -p /filesystem.squashfs losetup $dev /cdrom/casper/filesystem.squashfs mount -t squashfs -o ro,noatime $dev /filesystem.squashfs sleep 1 dev=$(losetup -f) mkdir -p /installer.squashfs losetup $dev /cdrom/casper/installer.squashfs mount -t squashfs -o ro,noatime $dev /installer.squashfs sleep 1 mkdir -p /root-tmp mount -t overlay -o 'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work' /cow /root-tmp FILE=/root-tmp/etc/.pwd.lock echo foo > $FILE cat $FILE sync # # dropping caches or remounting causes the bug # echo 3 > /proc/sys/vm/drop_caches cat $FILE Without the fix the cat of the file will produce an error. With the the cat will work correctly. == Regression Potential == There is an unhandled corner case:     - two filesystems, A and B, both have null uuid     - upper layer is on A     - lower layer 1 is also on A     - lower layer 2 is on B However, since this is an issue without the fix and will be addressed later with subsequent fixes once they are OK with upstream I think the risk is minimal considering nobody is complaining about these corner cases with the current broken overlayfs squashfs layering. ----------------------- 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options    break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute:     rm /scripts/casper-bottom/25adduser     exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and create them again on the upper rw layer. They then survive remount without i/o errors. However, we'd rather not ship those hacks, and have kernel overlay fixed to work correctly with multi-lower-dir and not corrupt files upon remounting /.
2019-11-25 23:20:31 Colin Ian King nominated for series Ubuntu Focal
2019-11-25 23:20:31 Colin Ian King bug task added linux (Ubuntu Focal)
2019-11-25 23:20:31 Colin Ian King bug task added linux-hwe (Ubuntu Focal)
2019-11-25 23:20:31 Colin Ian King nominated for series Ubuntu Eoan
2019-11-25 23:20:31 Colin Ian King bug task added linux (Ubuntu Eoan)
2019-11-25 23:20:31 Colin Ian King bug task added linux-hwe (Ubuntu Eoan)
2019-11-25 23:20:31 Colin Ian King nominated for series Ubuntu Disco
2019-11-25 23:20:31 Colin Ian King bug task added linux (Ubuntu Disco)
2019-11-25 23:20:31 Colin Ian King bug task added linux-hwe (Ubuntu Disco)
2019-11-25 23:20:56 Colin Ian King bug task deleted linux-hwe (Ubuntu Focal)
2019-11-25 23:21:03 Colin Ian King bug task deleted linux-hwe (Ubuntu Eoan)
2019-11-25 23:21:08 Colin Ian King bug task deleted linux-hwe (Ubuntu Disco)
2019-11-25 23:21:15 Colin Ian King linux (Ubuntu Focal): status Confirmed In Progress
2019-11-25 23:21:20 Colin Ian King linux-hwe (Ubuntu Bionic): status Confirmed In Progress
2019-11-28 15:48:02 Stefan Bader linux (Ubuntu Eoan): importance Undecided Critical
2019-11-28 15:48:06 Stefan Bader linux (Ubuntu Disco): importance Undecided Critical
2019-11-28 15:57:10 Stefan Bader linux (Ubuntu Eoan): status New Fix Committed
2019-11-28 15:57:15 Stefan Bader linux (Ubuntu Disco): status New Fix Committed
2019-12-03 15:42:24 Ubuntu Kernel Bot tags disco rls-ff-incoming disco rls-ff-incoming verification-needed-disco
2019-12-04 11:04:17 Colin Ian King tags disco rls-ff-incoming verification-needed-disco disco rls-ff-incoming verification-done-disco
2019-12-05 11:27:13 Ubuntu Kernel Bot tags disco rls-ff-incoming verification-done-disco disco rls-ff-incoming verification-done-disco verification-needed-eoan
2019-12-08 15:50:48 Colin Ian King tags disco rls-ff-incoming verification-done-disco verification-needed-eoan disco rls-ff-incoming verification-done-disco verification-done-eoan
2019-12-11 16:59:27 Dimitri John Ledkov linux (Ubuntu Focal): status In Progress Fix Committed
2020-01-06 12:53:38 Launchpad Janitor linux (Ubuntu Eoan): status Fix Committed Fix Released
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14895
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14896
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14897
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14901
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-18660
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-19055
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-19072
2020-01-06 13:12:44 Launchpad Janitor linux (Ubuntu Disco): status Fix Committed Fix Released
2020-01-06 13:12:44 Launchpad Janitor cve linked 2019-2214
2020-01-06 22:31:20 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2020-01-06 22:31:20 Launchpad Janitor cve linked 2019-19078
2020-01-06 22:31:20 Launchpad Janitor cve linked 2019-19332
2020-01-16 20:55:18 Launchpad Janitor linux-hwe (Ubuntu Bionic): status In Progress Fix Released