Kernel panic - kernel NULL pointer dereference; RIP is at blk_mq_put_rq_ref+0xa/0x60
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-meta-gcp-5.11 (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
We have recently seen seemingly random kernel panics on one of our instances in Google Compute Engine, running the `linux-gcp-5.11` kernel on Ubuntu 20.04.
The crashes happened weeks apart and seem to occur randomly, we have not found a way to consistently trigger the crashes.
We've captured the crash logs of the two crashes we've seen so far:
## Crash 1
[731972.720559] BUG: kernel NULL pointer dereference, address: 0000000000000000
[731972.727963] #PF: supervisor instruction fetch in kernel mode
[731972.733844] #PF: error_code(0x0010) - not-present page
[731972.739243] PGD 147111067 P4D 147111067 PUD 10fd90067 PMD 0
[731972.745117] Oops: 0010 [#1] SMP NOPTI
[731972.748989] CPU: 6 PID: 146349 Comm: node_exporter Not tainted 5.11.0-1018-gcp #20~20.04.2-Ubuntu
[731972.758078] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[731972.767508] RIP: 0010:0x0
[731972.770345] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[731972.777862] RSP: 0018:ffffb971c1
[731972.783298] RAX: 0000000000000000 RBX: ffffb971c105fb90 RCX: 0000000000000002
[731972.790643] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9e4a8305dc00
[731972.797984] RBP: ffffb971c105fb08 R08: 0000000000000000 R09: 000000000000003b
[731972.805328] R10: ffff9e4aa8ac2000 R11: ffff9e4aa8ac13b2 R12: ffff9e4a8305dc00
[731972.812672] R13: ffff9e4a8305d000 R14: 0000000000000000 R15: 0000000000000001
[731972.820016] FS: 00007fb07dffb70
[731972.829050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[731972.835006] CR2: ffffffffffffffd6 CR3: 00000001115a8000 CR4: 0000000000350ee0
[731972.843919] Call Trace:
[731972.846571] blk_mq_
[731972.850710] bt_iter+0x54/0x90
[731972.854002] blk_mq_
[731972.859104] ? blk_mq_
[731972.864026] ? blk_mq_
[731972.868952] blk_mq_
[731972.873355] diskstats_
[731972.877322] seq_read_
[731972.881288] proc_reg_
[731972.885499] new_sync_
[731972.890119] vfs_read+
[731972.893647] ksys_read+0x67/0xe0
[731972.897092] __x64_sys_
[731972.900969] do_syscall_
[731972.904870] entry_SYSCALL_
[731972.910124] RIP: 0033:0x4bd01b
[731972.913381] Code: fb ff eb bd e8 e6 22 fb ff e9 61 ff ff ff cc e8 5b f1 fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[731972.933139] RSP: 002b:000000c000
[731972.940915] RAX: ffffffffffffffda RBX: 000000c000045800 RCX: 00000000004bd01b
[731972.948249] RDX: 0000000000001000 RSI: 000000c00087a000 RDI: 0000000000000008
[731972.955586] RBP: 000000c000579840 R08: 0000000000000001 R09: 0000000000000002
[731972.963040] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[731972.970378] R13: 0000000000000002 R14: 0000000000000002 R15: 0000000000000002
[731972.978074] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd virtio_net glue_helper net_failover input_leds failover psmouse serio_raw efi_pstore sch_fq_codel msr drm virtio_rng ip_tables x_tables autofs4
[731973.006538] CR2: 0000000000000000
[731973.010064] ---[ end trace a489911d719de581 ]---
[731973.135811] RIP: 0010:0x0
[731973.138658] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[731973.145731] RSP: 0018:ffffb971c1
[731973.151331] RAX: 0000000000000000 RBX: ffffb971c105fb90 RCX: 0000000000000002
[731973.158961] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9e4a8305dc00
[731973.166296] RBP: ffffb971c105fb08 R08: 0000000000000000 R09: 000000000000003b
[731973.173718] R10: ffff9e4aa8ac2000 R11: ffff9e4aa8ac13b2 R12: ffff9e4a8305dc00
[731973.181228] R13: ffff9e4a8305d000 R14: 0000000000000000 R15: 0000000000000001
[731973.189008] FS: 00007fb07dffb70
[731973.197297] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[731973.203242] CR2: ffffffffffffffd6 CR3: 00000001115a8000 CR4: 0000000000350ee0
[731973.210841] Kernel panic - not syncing: Fatal exception
[731973.222020] Kernel Offset: 0x1a200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000
[731973.361494] Rebooting in 10 seconds..
## Crash 2
[1785249.917228] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[1785249.924620] #PF: supervisor read access in kernel mode
[1785249.930041] #PF: error_code(0x0000) - not-present page
[1785249.935458] PGD 109bb2067 P4D 109bb2067 PUD 106a5d067 PMD 0
[1785249.941399] Oops: 0000 [#1] SMP NOPTI
[1785249.945339] CPU: 14 PID: 137099 Comm: node_exporter Not tainted 5.11.0-1020-gcp #22~20.04.1-Ubuntu
[1785249.954576] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[1785249.964162] RIP: 0010:blk_
[1785249.970037] Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
[1785249.989188] RSP: 0018:ffffb65081
[1785249.994695] RAX: 0000000000000000 RBX: ffffb65081bcbb90 RCX: 0000000000000002
[1785250.002125] RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff8b74c2e42000
[1785250.009567] RBP: ffffb65081bcbb40 R08: 0000000000000000 R09: 0000000000000015
[1785250.017084] R10: 0000000000000364 R11: 0000000000000008 R12: ffff8b74c2e42000
[1785250.024946] R13: ffff8b74c2e41000 R14: 0000000000000000 R15: 0000000000000001
[1785250.032362] FS: 00007f7ce5ffb70
[1785250.040996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1785250.047453] CR2: 00000000000000c0 CR3: 00000001081d8000 CR4: 0000000000350ee0
[1785250.054973] Call Trace:
[1785250.057703] ? bt_iter+0x54/0x90
[1785250.061215] blk_mq_
[1785250.066461] ? blk_mq_
[1785250.071967] ? blk_mq_
[1785250.076958] blk_mq_
[1785250.081096] diskstats_
[1785250.085145] seq_read_
[1785250.089271] proc_reg_
[1785250.093648] new_sync_
[1785250.097766] vfs_read+
[1785250.101363] ksys_read+0x67/0xe0
[1785250.104887] __x64_sys_
[1785250.108829] do_syscall_
[1785250.112772] entry_SYSCALL_
[1785250.118107] RIP: 0033:0x4bd01b
[1785250.121916] Code: fb ff eb bd e8 e6 22 fb ff e9 61 ff ff ff cc e8 5b f1 fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[1785250.141048] RSP: 002b:000000c000
[1785250.148896] RAX: ffffffffffffffda RBX: 000000c000045800 RCX: 00000000004bd01b
[1785250.156309] RDX: 0000000000001000 RSI: 000000c00094d000 RDI: 0000000000000009
[1785250.163726] RBP: 000000c00007a840 R08: 0000000000000001 R09: 0000000000000002
[1785250.171401] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
[1785250.178834] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000002
[1785250.186425] Modules linked in: tcp_diag inet_diag btrfs blake2b_generic xor raid6_pq ufs msdos xfs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel input_leds aesni_intel virtio_net crypto_simd cryptd net_failover psmouse glue_helper failover serio_raw efi_pstore sch_fq_codel msr drm virtio_rng ip_tables x_tables autofs4
[1785250.221208] CR2: 00000000000000c0
[1785250.225329] ---[ end trace 2202978dadca9c1d ]---
[1785250.385966] RIP: 0010:blk_
[1785250.390984] Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
[1785250.410204] RSP: 0018:ffffb65081
[1785250.415707] RAX: 0000000000000000 RBX: ffffb65081bcbb90 RCX: 0000000000000002
[1785250.423124] RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff8b74c2e42000
[1785250.430745] RBP: ffffb65081bcbb40 R08: 0000000000000000 R09: 0000000000000015
[1785250.438422] R10: 0000000000000364 R11: 0000000000000008 R12: ffff8b74c2e42000
[1785250.445835] R13: ffff8b74c2e41000 R14: 0000000000000000 R15: 0000000000000001
[1785250.453424] FS: 00007f7ce5ffb70
[1785250.461888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1785250.467914] CR2: 00000000000000c0 CR3: 00000001081d8000 CR4: 0000000000350ee0
[1785250.475347] Kernel panic - not syncing: Fatal exception
[1785250.481954] Kernel Offset: 0x12200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000
[1785250.589751] Rebooting in 10 seconds..
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-gcp 5.11.0.
ProcVersionSign
Uname: Linux 5.11.0-1021-gcp x86_64
ApportVersion: 2.20.11-
Architecture: amd64
CasperMD5CheckR
Date: Fri Oct 22 12:13:34 2021
SourcePackage: linux-meta-gcp-5.11
UpgradeStatus: No upgrade log present (probably fresh install)
Status changed to 'Confirmed' because the bug affects multiple users.