Ubuntu
linux package

Bug #1894780
Comment #9

Comment 9 for bug 1894780

Revision history for this message

William Grant (wgrant) wrote on 2020-09-10: Re: [Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

dmsetup.bad Edit (2.3 KiB, text/plain; charset=UTF-8; name="dmsetup.bad")
dmsetup.good Edit (2.4 KiB, text/plain; charset=UTF-8; name="dmsetup.good")
vm.dmesg Edit (5.2 KiB, text/plain; charset=UTF-8; name="vm.dmesg")
oops.desktop Edit (7.6 KiB, application/x-desktop; name="oops.desktop")

On 10/9/20 7:44 am, Jay Vosburgh wrote:
> wgrant, you said:
>
> That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
> working kernel shows some trouble there:
>
> $ uname -a
> Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> $ ls -l /sys/kernel/slab | grep a-0000152
> lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152
>
> Are you saying that the symlink is "some trouble" here? Because that
> part isn't an error, that's the effect of slab merge (that the kernel
> normally treats all slabs of the same size as one big slab with multiple
> references, more or less).

The symlink itself is indeed not a bug. But there's one reference, and
the thing it's referencing doesn't exist. I don't think that symlink
should be dangling.

> Slab merge can be disabled via "slab_nomerge" on the command line.

Thanks for the slab_nomerge hint. That gets 5.4.0-47 to boot, but
dm_bufio_buffer interestingly doesn't show up in /proc/slabinfo or
/sys/kernel/slab at all, unlike in earlier kernels. There's no 152-byte
slab:

$ sudo cat /sys/kernel/slab/*/slab_size | grep ^152$
$

I've also just reproduced this on a second host by rebooting it into the
same updated kernel -- identical hardware except for a couple of things
like SSDs, and fairly similar software configuration.

... some digging later ...

The trigger on boot is the parallel pvscans launched by
lvm2-pvscan@.service in the presence of several PVs. If I mask that
service, the system boots fine on the updated kernel (without
slab_nomerge). And then this crashes it:

for i in 259:1 259:2 259:3 8:32 8:48 8:64 8:80; do sudo /sbin/lvm
pvscan --cache --activate ay $i & done`

I think the key is to have no active VGs with snapshots, then
simultaneously activate two VGs with snapshots.

Armed with that hypothesis, I set up a boring local bionic qemu-kvm
instance, installed linux-generic-hwe-18.04, and reproduced the problem
with a couple of loop devices:

  $ sudo dd if=/dev/zero of=pv1.img bs=1M count=1 seek=1024
  $ sudo dd if=/dev/zero of=pv2.img bs=1M count=1 seek=1024
  $ sudo losetup -f pv1.img
  $ sudo losetup -f pv2.img
  $ sudo vgcreate vg1 /dev/loop0
  $ sudo vgcreate vg2 /dev/loop1
  $ sudo lvcreate --type snapshot -L4M -V10G -n test vg1
  $ sudo lvcreate --type snapshot -L4M -V10G -n test vg2
  $ sudo systemctl mask lvm2-pvscan@.service
  $ sudo reboot

  $ sudo losetup -f pv1.img
  $ sudo losetup -f pv2.img
  $ for i in 7:0 7:1; do sudo /sbin/lvm pvscan --cache --activate ay $i
& done
  $ # Be glad if you can still type by this point.

The oops is not 100% reproducible in this configuration, but it seems
fairly reliable with four vCPUs. If not, a few cycles of rebooting and
running those last three commands always worked for me.

The console sometimes remains responsive after the oops, allowing me to
capture good and bad `dmsetup table -v` output. Not sure how helpful
that is, but I've attached an example (from a slightly different
configuration, where each VG has a linear LV with a snapshot,
rather than a snapshot-backed thin LV).

I've also been able to reproduce the fault on a pure focal system, but
it doesn't always happen on boot; lvm2-pvscan@.service (or a manual
pvscan afterwards) fails to activate the VGs. Something is creating
/run/lvm/vgs_online/$VG too early, so pvscan thinks it's already done
and I end up needing to activate them manually later. This seems
unrelated, and only affects a subset of my VMs. But when it happens,
that actually makes it easier to reproduce, since the system boots
without having the unit masked. So you can then crash with just:

$ for VG in vg1 vg2; do sudo vgchange -ay $VG & done

While debugging locally I also found that groovy with 5.8.0-18 is
affected. Because when I stopped a VM with PVs on real block devices the
host (my desktop, on which I nearly lost this email, oops) dutifully ran
pvscan over them, got very sad, and needed to be rebooted with
slab_nomerge to recover:

  [ DO NOT BLINDLY RUN THIS, it may well crash the host. ]
  $ lxc launch --vm ubuntu:focal bug-1894780-focal-2
  $ lxc storage volume create default lvm-1 --type=block size=10GB
  $ lxc storage volume create default lvm-2 --type=block size=10GB
  $ lxc stop bug-1894780-focal-2
  $ lxc storage volume attach default lvm-1 bug-1894780-focal-2 lvm-1
  $ lxc storage volume attach default lvm-2 bug-1894780-focal-2 lvm-2
  $ lxc start bug-1894780-focal-2
  $ lxc exec bug-1894780-focal-2 bash
  # vgcreate vg1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_lvm-1
  # vgcreate vg2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_lvm-2
  # lvcreate --type snapshot -L4M -V10G -n test vg1
  # lvcreate --type snapshot -L4M -V10G -n test vg2
  # poweroff
  $ # Host sadness here, unless you're somehow immune.
  $ lxc start bug-1894780-focal-2
  $ lxc exec bug-1894780-focal-2 bash
  # for VG in vg1 vg2; do sudo vgchange -ay $VG & done
  # # Guest sadness here.

So that's reproduced on metal and VM on 5.4.0-47 and 5.8.0-18 on two
different hosts (one an EPYC 7501 server, the other a Ryzen 7 1700X
desktop, both Zen 1 but I doubt that's relevant). Hopefully one of the
recipes works for you too.

On 10/9/20 7:44 am, Jay Vosburgh wrote:
> wgrant, you said:
> 
> That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
> working kernel shows some trouble there:
> 
>   $ uname -a
>   Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>   $ ls -l /sys/kernel/slab | grep a-0000152
>   lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152
> 
> Are you saying that the symlink is "some trouble" here?  Because that
> part isn't an error, that's the effect of slab merge (that the kernel
> normally treats all slabs of the same size as one big slab with multiple
> references, more or less).