Comment 10 for bug 2045586

Revision history for this message
dann frazier (dannf) wrote : Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

I ran into this on jammy/amd64: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/amd64/l/livecd-rootfs/20240121_173406_e4f9a@/log.gz

I downloaded all of the amd64 failures and searched for this failure pattern. These were the kernels that were running at the time:

"Linux 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023"
"Linux 6.2.0-21-generic #21-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 14 12:34:02 UTC 2023"
"Linux 6.3.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 8 16:02:30 UTC 2023"
"Linux 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 7 01:35:40 UTC 2023"
"Linux 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023"

Here's the count of failures per image type:
     12 017-disk-image-uefi.binary
      3 018-disk-image.binary
      3 020-kvm-image.binary
      1 023-vagrant.binary
      1 024-vagrant.binary

I can confirm that /dev/loop0p1 is created by devtmpfs. This surprised me because I'd never actually need to know what devtmpfs was, and I saw devices being created even though I had SIGSTOP'd systemd-udevd. But watching udevadm monitor and forktrace output convinced me.

I had a theory that something was opening the first created partition before all partitions were created. loop_reread_partitions() can fail without returning an error to userspace:
 https://elixir.bootlin.com/linux/v5.15.147/source/drivers/block/loop.c#L676

that could happen if bdev_disk_changed() aborts because it finds another partition on the device is open:
 https://elixir.bootlin.com/linux/v5.15.147/source/block/partitions/core.c#L662

But then we should see this in dmesg:
  pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n"

I added dmesg calls to check that:
https://autopkgtest.ubuntu.com/results/autopkgtest-jammy-dannf-test/jammy/amd64/l/livecd-rootfs/20240122_161631_62ecd@/log.gz

.. but no such message appeared, so that's not it. But what *is* interesting there is that it shows *2* partition scan lines:

1248s [ 990.855361] loop0: detected capacity change from 0 to 4612096
1248s [ 990.855628] loop0: p1 p14 p15
1248s [ 990.874241] loop0: p1 p14 p15

Previously we just saw 1:

1189s [ 932.268459] loop0: detected capacity change from 0 to 4612096
1189s [ 932.268715] loop0: p1 p14 p15

That only gets printed when bdev_disk_changed() is called. So do we have 2 racing callers?

One thing that seems off is that loop_configure() unsuppresses uevents for the full device before the partition scan, but loop_change_fd() waits until the partition scan is complete. Shouldn't they be following the same pattern? I wonder if that could cause the following race:

[livecd-rootfs] losetup creates /dev/loop0
[livecd-rootfs] kernel sends uevent for /dev/loop0
[livecd-rootfs] /dev/loop0p* appear in devtmpfs
      [udev] receives uevent for loop0
      [udev] partprobe /dev/loop0
[livecd-rootfs] losetup exit(0)
      [partprobe] /dev/loop0p* cleared
[livecd-rootfs] check for /dev/loop0p1 FAILS
      [partprobe] /dev/loop0p* recreated

I tried checking for this using ftrace in a local jammy VM. I haven't been able to reproduce this in a local VM, but I wanted to see what happens in a normal losetup.. er... setup.

>>>>> First I used losetup to create the device:

root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# loopdev="$(losetup --show -f -P -v /home/ubuntu/disk.img)"
root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 1/1 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
         losetup-1996 [000] ..... 657.573994: bdev_disk_changed <-loop_reread_partitions

>>>>> Only the expected bdev_disk_change() call
>>>>> Then I remove the device:

root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# losetup -v -d $loopdev
root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 3/3 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
         losetup-1996 [000] ..... 657.573994: bdev_disk_changed <-loop_reread_partitions
   systemd-udevd-2524 [000] ..... 680.555336: bdev_disk_changed <-blkdev_get_whole
         losetup-2523 [000] ..... 680.568473: bdev_disk_changed <-__loop_clr_fd

>>>>> udev did rescan the partitions... but only during removal. And its strange that it gets in there before losetup calls it itself.

Perhaps I should add this tracing into a debug livecd-rootfs and see if I can reproduce in our autopkgtest infra.