I ran into this on jammy/amd64: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/amd64/l/livecd-rootfs/20240121_173406_e4f9a@/log.gz I downloaded all of the amd64 failures and searched for this failure pattern. These were the kernels that were running at the time: "Linux 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023" "Linux 6.2.0-21-generic #21-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 14 12:34:02 UTC 2023" "Linux 6.3.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 8 16:02:30 UTC 2023" "Linux 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 7 01:35:40 UTC 2023" "Linux 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023" Here's the count of failures per image type: 12 017-disk-image-uefi.binary 3 018-disk-image.binary 3 020-kvm-image.binary 1 023-vagrant.binary 1 024-vagrant.binary I can confirm that /dev/loop0p1 is created by devtmpfs. This surprised me because I'd never actually need to know what devtmpfs was, and I saw devices being created even though I had SIGSTOP'd systemd-udevd. But watching udevadm monitor and forktrace output convinced me. I had a theory that something was opening the first created partition before all partitions were created. loop_reread_partitions() can fail without returning an error to userspace: https://elixir.bootlin.com/linux/v5.15.147/source/drivers/block/loop.c#L676 that could happen if bdev_disk_changed() aborts because it finds another partition on the device is open: https://elixir.bootlin.com/linux/v5.15.147/source/block/partitions/core.c#L662 But then we should see this in dmesg: pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n" I added dmesg calls to check that: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy-dannf-test/jammy/amd64/l/livecd-rootfs/20240122_161631_62ecd@/log.gz .. but no such message appeared, so that's not it. But what *is* interesting there is that it shows *2* partition scan lines: 1248s [ 990.855361] loop0: detected capacity change from 0 to 4612096 1248s [ 990.855628] loop0: p1 p14 p15 1248s [ 990.874241] loop0: p1 p14 p15 Previously we just saw 1: 1189s [ 932.268459] loop0: detected capacity change from 0 to 4612096 1189s [ 932.268715] loop0: p1 p14 p15 That only gets printed when bdev_disk_changed() is called. So do we have 2 racing callers? One thing that seems off is that loop_configure() unsuppresses uevents for the full device before the partition scan, but loop_change_fd() waits until the partition scan is complete. Shouldn't they be following the same pattern? I wonder if that could cause the following race: [livecd-rootfs] losetup creates /dev/loop0 [livecd-rootfs] kernel sends uevent for /dev/loop0 [livecd-rootfs] /dev/loop0p* appear in devtmpfs [udev] receives uevent for loop0 [udev] partprobe /dev/loop0 [livecd-rootfs] losetup exit(0) [partprobe] /dev/loop0p* cleared [livecd-rootfs] check for /dev/loop0p1 FAILS [partprobe] /dev/loop0p* recreated I tried checking for this using ftrace in a local jammy VM. I haven't been able to reproduce this in a local VM, but I wanted to see what happens in a normal losetup.. er... setup. >>>>> First I used losetup to create the device: root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# loopdev="$(losetup --show -f -P -v /home/ubuntu/disk.img)" root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace # tracer: function # # entries-in-buffer/entries-written: 1/1 #P:1 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | losetup-1996 [000] ..... 657.573994: bdev_disk_changed <-loop_reread_partitions >>>>> Only the expected bdev_disk_change() call >>>>> Then I remove the device: root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# losetup -v -d $loopdev root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace # tracer: function # # entries-in-buffer/entries-written: 3/3 #P:1 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | losetup-1996 [000] ..... 657.573994: bdev_disk_changed <-loop_reread_partitions systemd-udevd-2524 [000] ..... 680.555336: bdev_disk_changed <-blkdev_get_whole losetup-2523 [000] ..... 680.568473: bdev_disk_changed <-__loop_clr_fd >>>>> udev did rescan the partitions... but only during removal. And its strange that it gets in there before losetup calls it itself. Perhaps I should add this tracing into a debug livecd-rootfs and see if I can reproduce in our autopkgtest infra.