Activity log for bug #1894780

Date Who What changed Old value New value Message
2020-09-08 03:42:41 William Grant bug added bug
2020-09-08 03:42:58 William Grant description One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47, with the following trace: [ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, don't try to register things with the same name in the same directory. [ 29.138854] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 29.145977] #PF: supervisor read access in kernel mode [ 29.145979] #PF: error_code(0x0000) - not-present page [ 29.145981] PGD 0 P4D 0 [ 29.158800] Oops: 0000 [#1] SMP NOPTI [ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic #50~18.04.1-Ubuntu [ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 07/15/2019 [ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0 [ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45 [ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046 [ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: ffffffffa880a000 [ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 0000000000000000 [ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: ffffffffa74a5300 [ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: cf35c0f24f14c3c0 [ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 0000000000000008 [ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000) knlGS:0000000000000000 [ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 00000000003406e0 [ 29.265883] Call Trace: [ 29.268346] __kmem_cache_release+0x1a/0x30 [ 29.273913] __kmem_cache_create+0x4f9/0x550 [ 29.278192] ? __kmalloc_node+0x1eb/0x320 [ 29.282205] ? kvmalloc_node+0x31/0x80 [ 29.285962] create_cache+0x120/0x1f0 [ 29.291003] kmem_cache_create_usercopy+0x17d/0x270 [ 29.295882] kmem_cache_create+0x16/0x20 [ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio] [ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot] [ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot] [ 29.316627] ? _cond_resched+0x19/0x40 [ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot] [ 29.325276] dm_table_add_target+0x18d/0x370 [ 29.329552] table_load+0x12a/0x370 [ 29.333045] ctl_ioctl+0x1e2/0x590 [ 29.336450] ? retrieve_status+0x1c0/0x1c0 [ 29.340551] dm_ctl_ioctl+0xe/0x20 [ 29.343958] do_vfs_ioctl+0xa9/0x640 [ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190 [ 29.352337] ksys_ioctl+0x75/0x80 [ 29.355663] __x64_sys_ioctl+0x1a/0x20 [ 29.359421] do_syscall_64+0x57/0x190 [ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 29.368144] RIP: 0033:0x7f939f0286d7 [ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48 [ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 00007f939f0286d7 [ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 0000000000000009 [ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 00007ffe918defd0 [ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 00007f939f59c4e6 [ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 00007f939f59c4e6 [ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw i2c_algo_bit scsi_transport_sas drm hid i2c_piix4 [ 29.507853] CR2: 0000000000000020 [ 29.511174] ---[ end trace 43bd923f80cbdf52 ]--- That :a-0000152 is meant to be /sys/kernel/debug/:a-0000152. Even a working kernel shows some trouble there: $ uname -a Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ ls -l /sys/kernel/slab | grep a-0000152 lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152 So on 5.4.0-42 the named node doesn't get created, but at least it doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I can't reproduce the crash on other machines with snapshot thin volumes despite it happening every time (even with maxcpus=1) on the affected system. It should be noted that LVM was not in use on this system until just before it was rebooted into the new kernel, but downgrading to -42 does work so it seems like a coincidence. Before I realised it was a recent regression I dug through mm/slub.c's history and found dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename"), exactly the codepath that seems to crash here. There's clearly some existing bug causing the slab sysfs node to not be added, and I guess dde3c6b7 turns that into a crash on some systems. This is a test system, so I can do whatever debugging is required to narrow down the trigger. One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47, with the following trace:   [ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, don't try to register things with the same name in the same directory.   [ 29.138854] BUG: kernel NULL pointer dereference, address: 0000000000000020   [ 29.145977] #PF: supervisor read access in kernel mode   [ 29.145979] #PF: error_code(0x0000) - not-present page   [ 29.145981] PGD 0 P4D 0   [ 29.158800] Oops: 0000 [#1] SMP NOPTI   [ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic #50~18.04.1-Ubuntu   [ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 07/15/2019   [ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0   [ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45   [ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046   [ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: ffffffffa880a000   [ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 0000000000000000   [ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: ffffffffa74a5300   [ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: cf35c0f24f14c3c0   [ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 0000000000000008   [ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000) knlGS:0000000000000000   [ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033   [ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 00000000003406e0   [ 29.265883] Call Trace:   [ 29.268346] __kmem_cache_release+0x1a/0x30   [ 29.273913] __kmem_cache_create+0x4f9/0x550   [ 29.278192] ? __kmalloc_node+0x1eb/0x320   [ 29.282205] ? kvmalloc_node+0x31/0x80   [ 29.285962] create_cache+0x120/0x1f0   [ 29.291003] kmem_cache_create_usercopy+0x17d/0x270   [ 29.295882] kmem_cache_create+0x16/0x20   [ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]   [ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]   [ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot]   [ 29.316627] ? _cond_resched+0x19/0x40   [ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot]   [ 29.325276] dm_table_add_target+0x18d/0x370   [ 29.329552] table_load+0x12a/0x370   [ 29.333045] ctl_ioctl+0x1e2/0x590   [ 29.336450] ? retrieve_status+0x1c0/0x1c0   [ 29.340551] dm_ctl_ioctl+0xe/0x20   [ 29.343958] do_vfs_ioctl+0xa9/0x640   [ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190   [ 29.352337] ksys_ioctl+0x75/0x80   [ 29.355663] __x64_sys_ioctl+0x1a/0x20   [ 29.359421] do_syscall_64+0x57/0x190   [ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9   [ 29.368144] RIP: 0033:0x7f939f0286d7   [ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48   [ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 0000000000000010   [ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 00007f939f0286d7   [ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 0000000000000009   [ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 00007ffe918defd0   [ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 00007f939f59c4e6   [ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 00007f939f59c4e6   [ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw i2c_algo_bit scsi_transport_sas drm hid i2c_piix4   [ 29.507853] CR2: 0000000000000020   [ 29.511174] ---[ end trace 43bd923f80cbdf52 ]--- That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a working kernel shows some trouble there:   $ uname -a   Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux   $ ls -l /sys/kernel/slab | grep a-0000152   lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152 So on 5.4.0-42 the named node doesn't get created, but at least it doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I can't reproduce the crash on other machines with snapshot thin volumes despite it happening every time (even with maxcpus=1) on the affected system. It should be noted that LVM was not in use on this system until just before it was rebooted into the new kernel, but downgrading to -42 does work so it seems like a coincidence. Before I realised it was a recent regression I dug through mm/slub.c's history and found dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename"), exactly the codepath that seems to crash here. There's clearly some existing bug causing the slab sysfs node to not be added, and I guess dde3c6b7 turns that into a crash on some systems. This is a test system, so I can do whatever debugging is required to narrow down the trigger.
2020-09-08 03:43:16 William Grant attachment added lspci-vnvn.log https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+attachment/5408512/+files/lspci-vnvn.log
2020-09-08 03:43:30 William Grant attachment added lvs-a.log https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+attachment/5408513/+files/lvs-a.log
2020-09-08 03:43:49 William Grant attachment added version_signature from the last good kernel https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+attachment/5408514/+files/version.log
2020-09-08 03:44:03 William Grant summary Oops when starting LVM snapshots on 5.4.0-47 Oops and hang when starting LVM snapshots on 5.4.0-47
2020-09-08 04:00:10 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2020-09-08 04:00:11 Ubuntu Kernel Bot tags focal
2020-09-08 05:15:20 Andy Whitcroft linux (Ubuntu): status Incomplete Confirmed
2020-09-08 06:13:18 Junien F bug added subscriber The Canonical Sysadmins
2020-09-08 08:28:40 Kleber Sacilotto de Souza nominated for series Ubuntu Focal
2020-09-08 08:28:40 Kleber Sacilotto de Souza bug task added linux (Ubuntu Focal)
2020-09-08 13:53:53 Terry Rudd bug added subscriber Terry Rudd
2020-09-09 21:44:54 Jay Vosburgh bug added subscriber Jay Vosburgh
2020-09-10 05:06:27 William Grant attachment added dmsetup.bad https://bugs.launchpad.net/bugs/1894780/+attachment/5409274/+files/dmsetup.bad
2020-09-10 05:06:27 William Grant attachment added dmsetup.good https://bugs.launchpad.net/bugs/1894780/+attachment/5409275/+files/dmsetup.good
2020-09-10 05:06:27 William Grant attachment added vm.dmesg https://bugs.launchpad.net/bugs/1894780/+attachment/5409276/+files/vm.dmesg
2020-09-10 05:06:27 William Grant attachment added oops.desktop https://bugs.launchpad.net/bugs/1894780/+attachment/5409277/+files/oops.desktop
2020-09-16 10:50:57 Thadeu Lima de Souza Cascardo description One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47, with the following trace:   [ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, don't try to register things with the same name in the same directory.   [ 29.138854] BUG: kernel NULL pointer dereference, address: 0000000000000020   [ 29.145977] #PF: supervisor read access in kernel mode   [ 29.145979] #PF: error_code(0x0000) - not-present page   [ 29.145981] PGD 0 P4D 0   [ 29.158800] Oops: 0000 [#1] SMP NOPTI   [ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic #50~18.04.1-Ubuntu   [ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 07/15/2019   [ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0   [ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45   [ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046   [ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: ffffffffa880a000   [ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 0000000000000000   [ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: ffffffffa74a5300   [ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: cf35c0f24f14c3c0   [ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 0000000000000008   [ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000) knlGS:0000000000000000   [ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033   [ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 00000000003406e0   [ 29.265883] Call Trace:   [ 29.268346] __kmem_cache_release+0x1a/0x30   [ 29.273913] __kmem_cache_create+0x4f9/0x550   [ 29.278192] ? __kmalloc_node+0x1eb/0x320   [ 29.282205] ? kvmalloc_node+0x31/0x80   [ 29.285962] create_cache+0x120/0x1f0   [ 29.291003] kmem_cache_create_usercopy+0x17d/0x270   [ 29.295882] kmem_cache_create+0x16/0x20   [ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]   [ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]   [ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot]   [ 29.316627] ? _cond_resched+0x19/0x40   [ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot]   [ 29.325276] dm_table_add_target+0x18d/0x370   [ 29.329552] table_load+0x12a/0x370   [ 29.333045] ctl_ioctl+0x1e2/0x590   [ 29.336450] ? retrieve_status+0x1c0/0x1c0   [ 29.340551] dm_ctl_ioctl+0xe/0x20   [ 29.343958] do_vfs_ioctl+0xa9/0x640   [ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190   [ 29.352337] ksys_ioctl+0x75/0x80   [ 29.355663] __x64_sys_ioctl+0x1a/0x20   [ 29.359421] do_syscall_64+0x57/0x190   [ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9   [ 29.368144] RIP: 0033:0x7f939f0286d7   [ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48   [ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 0000000000000010   [ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 00007f939f0286d7   [ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 0000000000000009   [ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 00007ffe918defd0   [ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 00007f939f59c4e6   [ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 00007f939f59c4e6   [ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw i2c_algo_bit scsi_transport_sas drm hid i2c_piix4   [ 29.507853] CR2: 0000000000000020   [ 29.511174] ---[ end trace 43bd923f80cbdf52 ]--- That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a working kernel shows some trouble there:   $ uname -a   Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux   $ ls -l /sys/kernel/slab | grep a-0000152   lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152 So on 5.4.0-42 the named node doesn't get created, but at least it doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I can't reproduce the crash on other machines with snapshot thin volumes despite it happening every time (even with maxcpus=1) on the affected system. It should be noted that LVM was not in use on this system until just before it was rebooted into the new kernel, but downgrading to -42 does work so it seems like a coincidence. Before I realised it was a recent regression I dug through mm/slub.c's history and found dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename"), exactly the codepath that seems to crash here. There's clearly some existing bug causing the slab sysfs node to not be added, and I guess dde3c6b7 turns that into a crash on some systems. This is a test system, so I can do whatever debugging is required to narrow down the trigger. [Impact] kmemcaches will fail to be created after they have just been removed but not completely ripped out. This will cause some drivers (like lvm snapshots) to properly work and cause kernel traces to go on the logs. [Test case] See comment #9. [Regression potential] The fix reverts a commit, so we go back to a state of a previously released kernel, where a leak was possible. The regression here, though, is better than the impact that will also lead to a different leak and prevent users from correctly using LVM snapshots. ========================================================================= One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47, with the following trace:   [ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, don't try to register things with the same name in the same directory.   [ 29.138854] BUG: kernel NULL pointer dereference, address: 0000000000000020   [ 29.145977] #PF: supervisor read access in kernel mode   [ 29.145979] #PF: error_code(0x0000) - not-present page   [ 29.145981] PGD 0 P4D 0   [ 29.158800] Oops: 0000 [#1] SMP NOPTI   [ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic #50~18.04.1-Ubuntu   [ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 07/15/2019   [ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0   [ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45   [ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046   [ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: ffffffffa880a000   [ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 0000000000000000   [ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: ffffffffa74a5300   [ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: cf35c0f24f14c3c0   [ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 0000000000000008   [ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000) knlGS:0000000000000000   [ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033   [ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 00000000003406e0   [ 29.265883] Call Trace:   [ 29.268346] __kmem_cache_release+0x1a/0x30   [ 29.273913] __kmem_cache_create+0x4f9/0x550   [ 29.278192] ? __kmalloc_node+0x1eb/0x320   [ 29.282205] ? kvmalloc_node+0x31/0x80   [ 29.285962] create_cache+0x120/0x1f0   [ 29.291003] kmem_cache_create_usercopy+0x17d/0x270   [ 29.295882] kmem_cache_create+0x16/0x20   [ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]   [ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]   [ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot]   [ 29.316627] ? _cond_resched+0x19/0x40   [ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot]   [ 29.325276] dm_table_add_target+0x18d/0x370   [ 29.329552] table_load+0x12a/0x370   [ 29.333045] ctl_ioctl+0x1e2/0x590   [ 29.336450] ? retrieve_status+0x1c0/0x1c0   [ 29.340551] dm_ctl_ioctl+0xe/0x20   [ 29.343958] do_vfs_ioctl+0xa9/0x640   [ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190   [ 29.352337] ksys_ioctl+0x75/0x80   [ 29.355663] __x64_sys_ioctl+0x1a/0x20   [ 29.359421] do_syscall_64+0x57/0x190   [ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9   [ 29.368144] RIP: 0033:0x7f939f0286d7   [ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48   [ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 0000000000000010   [ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 00007f939f0286d7   [ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 0000000000000009   [ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 00007ffe918defd0   [ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 00007f939f59c4e6   [ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 00007f939f59c4e6   [ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw i2c_algo_bit scsi_transport_sas drm hid i2c_piix4   [ 29.507853] CR2: 0000000000000020   [ 29.511174] ---[ end trace 43bd923f80cbdf52 ]--- That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a working kernel shows some trouble there:   $ uname -a   Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux   $ ls -l /sys/kernel/slab | grep a-0000152   lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152 So on 5.4.0-42 the named node doesn't get created, but at least it doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I can't reproduce the crash on other machines with snapshot thin volumes despite it happening every time (even with maxcpus=1) on the affected system. It should be noted that LVM was not in use on this system until just before it was rebooted into the new kernel, but downgrading to -42 does work so it seems like a coincidence. Before I realised it was a recent regression I dug through mm/slub.c's history and found dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename"), exactly the codepath that seems to crash here. There's clearly some existing bug causing the slab sysfs node to not be added, and I guess dde3c6b7 turns that into a crash on some systems. This is a test system, so I can do whatever debugging is required to narrow down the trigger.
2020-09-17 23:56:27 Ian May linux (Ubuntu Focal): status New Fix Committed
2020-09-21 18:12:37 Ubuntu Kernel Bot tags focal focal verification-needed-focal
2020-10-08 16:10:23 Launchpad Janitor linux (Ubuntu): status Confirmed Fix Released
2020-10-09 15:20:04 Ian May tags focal verification-needed-focal focal verification-done-focal
2020-10-13 22:38:19 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2020-10-13 22:38:19 Launchpad Janitor cve linked 2020-16119
2020-10-13 22:38:19 Launchpad Janitor cve linked 2020-16120