Kernel 5.4 - general protection fault SMP NOPTI
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Setup is comprised of multiple compute nodes in an OpenStack setup, all nodes being connected to a SAN storage through FC.
Env specs:
Ubuntu-Server 20.04.3 LTS
Kernel: 5.4.0-89-generic
CPU: AMD EPYC 7H12
At random times we observe the nodes getting locked up, system load is increasing and no actions can be taken, leading to having to reboot the server to recover.
There is no pattern in this and stress testing the servers does not reproduce this.
Log snippet:
[1673239.174269] general protection fault: 0000 [#1] SMP NOPTI
[1673239.183446] CPU: 97 PID: 1224718 Comm: cadvisor Not tainted 5.4.0-89-generic #100-Ubuntu
[1673239.192622] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.3.6 07/06/2021
[1673239.203336] RIP: 0010:string_
[1673239.212811] Code: 66 85 c0 74 3e 83 e8 01 4c 8d 5c 07 01 31 c0 eb 19 49 39 fa 76 03 44 88 07 48 83 c7 01 41 8d 71 01 48 83 c0 01 4c 39 df 74 0f <44> 0f b6 04 02 41 89 c1 89 c6 45 84 c0 75 d8 4c 89 d2 e8 11 ff ff
[1673239.232904] RSP: 0018:ffffa25f31
[1673239.244331] RAX: 0000000000000000 RBX: ffffa25f3199fc58 RCX: ffff0a00ffffff04
[1673239.256551] RDX: d969688991a5a25c RSI: ffff8de32b560000 RDI: ffff8de32b5400c6
[1673239.269226] RBP: ffffa25f3199fba0 R08: ffffffff9c445a00 R09: 0000000000ffff0a
[1673239.279111] R10: ffff8de32b560000 R11: ffff8de42b5400c5 R12: ffff8de32b560000
[1673239.289855] R13: d969688991a5a25c R14: ffff0a00ffffff04 R15: ffff8de32b5400c6
[1673239.299447] FS: 00007f925a7fc70
[1673239.308670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1673239.317988] CR2: 00007f9012efcfb8 CR3: 0000007fa1abe000 CR4: 0000000000340ee0
[1673239.327377] Call Trace:
[1673239.337796] string+0x4a/0x60
[1673239.347948] vsnprintf+
[1673239.356909] seq_vprintf+
[1673239.365819] seq_printf+
[1673239.374919] __blkg_
[1673239.383362] blkg_prfill_
[1673239.391580] blkcg_print_
[1673239.399891] ? blkg_prfill_
[1673239.408266] blkg_print_
[1673239.416378] cgroup_
[1673239.424336] kernfs_
[1673239.432208] seq_read+0xdc/0x490
[1673239.440065] kernfs_
[1673239.448107] __vfs_read+
[1673239.456329] vfs_read+0xab/0x160
[1673239.465529] ksys_read+0x67/0xe0
[1673239.473872] __x64_sys_
[1673239.481269] do_syscall_
[1673239.488798] entry_SYSCALL_
[1673239.497445] RIP: 0033:0x4cc910
[1673239.504694] Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 00 00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[1673239.519380] RSP: 002b:000000c01d
[1673239.526527] RAX: ffffffffffffffda RBX: 000000c000046f00 RCX: 00000000004cc910
[1673239.534890] RDX: 0000000000001000 RSI: 000000c00c2cd000 RDI: 000000000000000e
[1673239.542958] RBP: 000000c01dd1e7f0 R08: 0000000000000000 R09: 0000000000000000
[1673239.549915] R10: 0000000000000000 R11: 0000000000000202 R12: ffffffffffffffff
[1673239.558350] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000002
[1673239.565178] Modules linked in: veth vhost_net nf_conntrack_
[1673239.565265] async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib lpfc drm_vram_helper i2c_algo_bit ib_uverbs nvmet_fc crct10dif_pclmul ib_core crc32_pclmul ttm ghash_clmulni_intel drm_kms_helper aesni_intel nvmet syscopyarea crypto_simd sysfillrect cryptd nvme_fc glue_helper sysimgblt nvme_fabrics ahci fb_sys_fops mlx5_core tg3 libahci nvme_core pci_hyperv_intf drm tls scsi_transport_fc mlxfw megaraid_sas i2c_piix4 wmi
[1673239.664974] ---[ end trace b20e1996a1c8240d ]---
apport information