Comment 0 for bug 1856871

Revision history for this message
Eric Desrochers (slashd) wrote :

This is reproducible in Bionic and late.

Here's an example running 'focal':

$ lsb_release -cs
focal

$ uname -r
5.3.0-24-generic

How to trigger it:

$ sosreport -o block

or more precisely the command causing the situation inside the block plugin:
$ parted -s /dev/$(losetup -f) unit s print

https://github.com/sosreport/sos/blob/master/sos/plugins/block.py#L52

but if I run it on the next next unused loop device, in this case /dev/loop3 (which is also unused), no errors.

While I agree that sosreport shouldn't query unused loop devices, there is definitely something going on with the next unused loop device.

What is the difference between loop2 and loop3 and other unused one ?

3 things so far I have noticed:
* The loop device need to be the next unused loop device (losetup -f)
* A reboot is needed (if some loop modification (snap install, mount loop, ...) has been made at runtime
* I have also noticed that loop2 (or whatever the next unused one is) have some stat as oppose to other unused loop devices

/sys/block/loop2/stat
::::::::::::::
2 0 10 0 1 0 0 0 0 0 0

while /dev/loop3 doesn't

/sys/block/loop2/stat
::::::::::::::
0 0 0 0 0 0 0 0 0 0 0

Explanation of each column:
https://meet.google.com/linkredirect?authuser=0&dest=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Fblock%2Fstat.html

Which tells me that something during the boot process most likely acquired (on purpose or not) the next unused loop and possibly didn't released it well.

If loop2 is generating errors, and I install a snap, the snap squashfs will take loop2, making loop3 the next unused loop device.

If I query loop3 with 'parted' right after, no errors.

If I reboot, and query loop3 again, then no I'll have an error.

To triggers the errors it need to be after a reboot and it only impact the first unused loop device available (losetup -f).

This was tested with focal/systemd whic his very close to latest upstream code.
This has been test with latest v5.5 kernel as well. For now, I don't think it's a kernel problem, I'm more thinking of a userspace misbehaviour dealing with loop device at boot.