Kernel 5.4 - general protection fault SMP NOPTI

Bug #1954924 reported by Chris Valean
30
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Setup is comprised of multiple compute nodes in an OpenStack setup, all nodes being connected to a SAN storage through FC.

Env specs:
Ubuntu-Server 20.04.3 LTS
Kernel: 5.4.0-89-generic
CPU: AMD EPYC 7H12

At random times we observe the nodes getting locked up, system load is increasing and no actions can be taken, leading to having to reboot the server to recover.
There is no pattern in this and stress testing the servers does not reproduce this.

Log snippet:
[1673239.174269] general protection fault: 0000 [#1] SMP NOPTI
[1673239.183446] CPU: 97 PID: 1224718 Comm: cadvisor Not tainted 5.4.0-89-generic #100-Ubuntu
[1673239.192622] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.3.6 07/06/2021
[1673239.203336] RIP: 0010:string_nocheck+0x38/0x60
[1673239.212811] Code: 66 85 c0 74 3e 83 e8 01 4c 8d 5c 07 01 31 c0 eb 19 49 39 fa 76 03 44 88 07 48 83 c7 01 41 8d 71 01 48 83 c0 01 4c 39 df 74 0f <44> 0f b6 04 02 41 89 c1 89 c6 45 84 c0 75 d8 4c 89 d2 e8 11 ff ff
[1673239.232904] RSP: 0018:ffffa25f3199fba0 EFLAGS: 00010046
[1673239.244331] RAX: 0000000000000000 RBX: ffffa25f3199fc58 RCX: ffff0a00ffffff04
[1673239.256551] RDX: d969688991a5a25c RSI: ffff8de32b560000 RDI: ffff8de32b5400c6
[1673239.269226] RBP: ffffa25f3199fba0 R08: ffffffff9c445a00 R09: 0000000000ffff0a
[1673239.279111] R10: ffff8de32b560000 R11: ffff8de42b5400c5 R12: ffff8de32b560000
[1673239.289855] R13: d969688991a5a25c R14: ffff0a00ffffff04 R15: ffff8de32b5400c6
[1673239.299447] FS: 00007f925a7fc700(0000) GS:ffff8df87f440000(0000) knlGS:0000000000000000
[1673239.308670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1673239.317988] CR2: 00007f9012efcfb8 CR3: 0000007fa1abe000 CR4: 0000000000340ee0
[1673239.327377] Call Trace:
[1673239.337796] string+0x4a/0x60
[1673239.347948] vsnprintf+0x26f/0x4e0
[1673239.356909] seq_vprintf+0x35/0x50
[1673239.365819] seq_printf+0x53/0x70
[1673239.374919] __blkg_prfill_rwstat+0x5d/0xb0
[1673239.383362] blkg_prfill_rwstat_field+0x97/0xc0
[1673239.391580] blkcg_print_blkgs+0xba/0xf0
[1673239.399891] ? blkg_prfill_rwstat+0xc0/0xc0
[1673239.408266] blkg_print_stat_bytes+0x45/0x50
[1673239.416378] cgroup_seqfile_show+0x56/0xc0
[1673239.424336] kernfs_seq_show+0x27/0x30
[1673239.432208] seq_read+0xdc/0x490
[1673239.440065] kernfs_fop_read+0x35/0x1b0
[1673239.448107] __vfs_read+0x1b/0x40
[1673239.456329] vfs_read+0xab/0x160
[1673239.465529] ksys_read+0x67/0xe0
[1673239.473872] __x64_sys_read+0x1a/0x20
[1673239.481269] do_syscall_64+0x57/0x190
[1673239.488798] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1673239.497445] RIP: 0033:0x4cc910
[1673239.504694] Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 00 00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[1673239.519380] RSP: 002b:000000c01dd1e7a0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
[1673239.526527] RAX: ffffffffffffffda RBX: 000000c000046f00 RCX: 00000000004cc910
[1673239.534890] RDX: 0000000000001000 RSI: 000000c00c2cd000 RDI: 000000000000000e
[1673239.542958] RBP: 000000c01dd1e7f0 R08: 0000000000000000 R09: 0000000000000000
[1673239.549915] R10: 0000000000000000 R11: 0000000000000202 R12: ffffffffffffffff
[1673239.558350] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000002
[1673239.565178] Modules linked in: veth vhost_net nf_conntrack_netlink vhost tap dm_queue_length cls_u32 sch_cbq xsk_diag udp_diag raw_diag unix_diag af_packet_diag tcp_diag inet_diag netlink_diag ebtable_filter ebtables sch_ingress geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink aufs overlay rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bonding ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_state xt_conntrack iptable_filter bpfilter nls_iso8859_1 scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif amd64_edac_mod edac_mce_amd dell_smbios kvm_amd dcdbas kvm wmi_bmof dell_wmi_descriptor ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid sch_fq_codel tcp_bbr openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 msr dm_multipath br_netfilter bridge stp llc sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
[1673239.565265] async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib lpfc drm_vram_helper i2c_algo_bit ib_uverbs nvmet_fc crct10dif_pclmul ib_core crc32_pclmul ttm ghash_clmulni_intel drm_kms_helper aesni_intel nvmet syscopyarea crypto_simd sysfillrect cryptd nvme_fc glue_helper sysimgblt nvme_fabrics ahci fb_sys_fops mlx5_core tg3 libahci nvme_core pci_hyperv_intf drm tls scsi_transport_fc mlxfw megaraid_sas i2c_piix4 wmi
[1673239.664974] ---[ end trace b20e1996a1c8240d ]---

Revision history for this message
Chris Valean (cvalean) wrote : CRDA.txt

apport information

tags: added: apport-collected focal uec-images
description: updated
Revision history for this message
Chris Valean (cvalean) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : Lspci.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : Lspci-vt.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : Lsusb.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : Lsusb-t.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : Lsusb-v.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : ProcEnviron.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : ProcModules.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : UdevDb.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : WifiSyslog.txt

apport information

Revision history for this message
Chris Valean (cvalean) wrote : acpidump.txt

apport information

description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Chris Stacey (cstacey4444) wrote :

we're seeing this exact issue on our Dell R6525 servers running the latest patch level of focal, kernel is 5.4.0-169-generic.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.