This doesn't happen on ALL SPR platforms, but it does happen periodically, and always seems to be centered around arch/x86/events/intel/uncore_discovery.c
This doesn't seem to cause an stability issues that we've seen, but we need to know if these are innocuous, and better, can this be fixed so the kernel no longer spits out warnings (which triggers the kernel taint flag)?
5.15 Kernel Warnings with some Sapphire Rapids CPUs
On some Sapphire Rapids CPUs we are seeing Kernel warnings in the syslog: /certification. canonical. com/hardware/ 202311- 32288/submissio n/341156/
https:/
Intel(R) Xeon(R) Gold 6442Y
Oct 31 03:35:55 N8 kernel: [ 92.770372] ------------[ cut here ]------------ events/ intel/uncore_ discovery. c:184 uncore_ insert_ box_info+ 0x134/0x350 insert_ box_info+ 0x134/0x350 0efc98 EFLAGS: 00010246 0(0000) GS:ff32ac99bfa0 0000(0000) knlGS:000000000 0000000 log_lvl+ 0x1d6/0x2ea log_lvl+ 0x1d6/0x2ea _table. isra.0+ 0x162/0x1a0 part.0+ 0x23/0x29 cold+0x8/ 0xd insert_ box_info+ 0x134/0x350 insert_ box_info+ 0x134/0x350 bug+0xa4/ 0xd0 bug+0x39/ 0x90 op+0x19/ 0x70 invalid_ op+0x1b/ 0x20 insert_ box_info+ 0x134/0x350 insert_ box_info+ 0xe3/0x350 _table. isra.0+ 0x162/0x1a0 has_discovery_ tables+ 0x19e/0x270 register+ 0x2f/0x42 init+0xe3/ 0x226 register+ 0x42/0x42 initcall+ 0x46/0x1e0 0x12f/0x159 init_freeable+ 0x162/0x1b5 0x100/0x100 init+0x1b/ 0x150 0x100/0x100 fork+0x1f/ 0x30
Oct 31 03:35:55 N8 kernel: [ 92.825738] WARNING: CPU: 48 PID: 1 at arch/x86/
Oct 31 03:35:55 N8 kernel: [ 92.953850] Modules linked in:
Oct 31 03:35:55 N8 kernel: [ 92.990464] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 5.15.0-88-generic #98-Ubuntu
Oct 31 03:35:55 N8 kernel: [ 93.082179] Hardware name: ASUSTeK COMPUTER INC. ESC N8-E11/Z13PN-D32 Series, BIOS 0402 09/08/2023
Oct 31 03:35:55 N8 kernel: [ 93.189501] RIP: 0010:uncore_
Oct 31 03:35:55 N8 kernel: [ 93.206419] Freeing initrd memory: 106936K
Oct 31 03:35:55 N8 kernel: [ 93.253138] Code: c2 01 48 83 c0 04 39 d1 0f 8e c6 01 00 00 49 8b 4c 24 38 8b 0c 01 41 89 0c 07 49 8b 74 24 40 8b 34 06 41 89 34 06 39 f9 75 cf <0f> 0b 4c 89 ff e8 b2 07 33 00 4c 89 f7 e8 aa 07 33 00 5b 41 5c 41
Oct 31 03:35:55 N8 kernel: [ 93.527071] RSP: 0000:ff5c25ed80
Oct 31 03:35:55 N8 kernel: [ 93.589669] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000003
Oct 31 03:35:55 N8 kernel: [ 93.675160] RDX: 0000000000000002 RSI: 0000000000018000 RDI: 0000000000000003
Oct 31 03:35:55 N8 kernel: [ 93.760654] RBP: ff5c25ed800efcc0 R08: 0000000000000010 R09: ff32ac8a801df260
Oct 31 03:35:55 N8 kernel: [ 93.846130] R10: 0000000000000246 R11: 00000000ffffffff R12: ff32ac8a8b8412a0
Oct 31 03:35:55 N8 kernel: [ 93.931613] R13: ff5c25ed800efcf8 R14: ff32ac8a8aa32cb0 R15: ff32ac8a801df260
Oct 31 03:35:55 N8 kernel: [ 94.017099] FS: 000000000000000
Oct 31 03:35:55 N8 kernel: [ 94.114042] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 31 03:35:55 N8 kernel: [ 94.182871] CR2: 0000000000000000 CR3: 0000000d07e10001 CR4: 0000000000771ee0
Oct 31 03:35:55 N8 kernel: [ 94.268360] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 31 03:35:55 N8 kernel: [ 94.353828] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
Oct 31 03:35:55 N8 kernel: [ 94.439332] PKRU: 55555554
Oct 31 03:35:55 N8 kernel: [ 94.471788] Call Trace:
Oct 31 03:35:55 N8 kernel: [ 94.501100] <TASK>
Oct 31 03:35:55 N8 kernel: [ 94.526275] ? show_trace_
Oct 31 03:35:55 N8 kernel: [ 94.578457] ? show_trace_
Oct 31 03:35:55 N8 kernel: [ 94.630686] ? parse_discovery
Oct 31 03:35:55 N8 kernel: [ 94.693295] ? show_regs.
Oct 31 03:35:55 N8 kernel: [ 94.741331] ? show_regs.
Oct 31 03:35:55 N8 kernel: [ 94.785212] ? uncore_
Oct 31 03:35:55 N8 kernel: [ 94.841591] ? __warn+0x8c/0x100
Oct 31 03:35:55 N8 kernel: [ 94.880281] ? uncore_
Oct 31 03:35:55 N8 kernel: [ 94.936636] ? report_
Oct 31 03:35:55 N8 kernel: [ 94.978460] ? handle_
Oct 31 03:35:55 N8 kernel: [ 95.020246] ? exc_invalid_
Oct 31 03:35:55 N8 kernel: [ 95.066232] ? asm_exc_
Oct 31 03:35:55 N8 kernel: [ 95.116341] ? uncore_
Oct 31 03:35:55 N8 kernel: [ 95.172708] ? uncore_
Oct 31 03:35:55 N8 kernel: [ 95.228032] parse_discovery
Oct 31 03:35:55 N8 cloud-init[1992]: |.+.o .o .o o +|
Oct 31 03:35:55 N8 kernel: [ 95.288570] intel_uncore_
Oct 31 03:35:55 N8 kernel: [ 95.354298] ? type_pmu_
Oct 31 03:35:55 N8 kernel: [ 95.403385] intel_uncore_
Oct 31 03:35:55 N8 kernel: [ 95.451409] ? type_pmu_
Oct 31 03:35:55 N8 kernel: [ 95.500506] do_one_
Oct 31 03:35:55 N8 kernel: [ 95.546475] do_initcalls+
Oct 31 03:35:55 N8 kernel: [ 95.590372] kernel_
Oct 31 03:35:55 N8 kernel: [ 95.642556] ? rest_init+
Oct 31 03:35:55 N8 kernel: [ 95.685405] kernel_
Oct 31 03:35:55 N8 kernel: [ 95.727228] ? rest_init+
Oct 31 03:35:55 N8 kernel: [ 95.770054] ret_from_
Oct 31 03:35:55 N8 kernel: [ 95.812906] </TASK>
Oct 31 03:35:55 N8 kernel: [ 95.839108] ---[ end trace 2d0c57130f45fd62 ]---
https:/ /certification. canonical. com/hardware/ 202305- 31570/submissio n/312593/ events/ intel/uncore_ discovery. c:184 uncore_ insert_ box_info+ 0x134/0x350 insert_ box_info+ 0x134/0x350 06bc98 EFLAGS: 00010246 0(0000) GS:ff3176e5bf80 0000(0000) knlGS:000000000 0000000 _table. isra.0+ 0x162/0x1a0 has_discovery_ tables+ 0x19e/0x270 register+ 0x21/0x42 init+0xe3/ 0x226 register+ 0x42/0x42 initcall+ 0x46/0x1e0 0x12f/0x159 init_freeable+ 0x162/0x1b5 0x100/0x100 init+0x1b/ 0x150 0x100/0x100 fork+0x1f/ 0x30
Intel(R) Xeon(R) Gold 6426Y
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135184] ------------[ cut here ]------------
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135185] WARNING: CPU: 0 PID: 1 at arch/x86/
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135192] Modules linked in:
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135194] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-69-generic #76-Ubuntu
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135198] Hardware name: HPE ProLiant ML110 Gen11/ProLiant ML110 Gen11, BIOS 1.30 03/01/2023
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135200] RIP: 0010:uncore_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135202] Code: c2 01 48 83 c0 04 39 d1 0f 8e c6 01 00 00 49 8b 4c 24 38 8b 0c 01 41 89 0c 07 49 8b 74 24 40 8b 34 06 41 89 34 06 39 f9 75 cf <0f> 0b 4c 89 ff e8 22 a2 32 00 4c 89 f7 e8 1a a2 32 00 5b 41 5c 41
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135206] RSP: 0000:ff3b3e1980
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135209] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000003
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135210] RDX: 0000000000000002 RSI: 0000000000018000 RDI: 0000000000000003
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135212] RBP: ff3b3e198006bcc0 R08: 0000000000000010 R09: ff31766844f3c5e0
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135214] R10: ff31766844fa4438 R11: 0000000000000000 R12: ff31766844f5fa20
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135216] R13: ff3b3e198006bcf8 R14: ff31766844f3ca20 R15: ff31766844f3c5e0
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135218] FS: 000000000000000
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135222] CR2: 0000000000000000 CR3: 0000004f35e10001 CR4: 0000000000771ef0
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135224] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135225] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135227] PKRU: 55555554
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135228] Call Trace:
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135230] <TASK>
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135232] parse_discovery
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135235] intel_uncore_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135238] ? type_pmu_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135243] intel_uncore_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135246] ? type_pmu_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135249] do_one_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135253] do_initcalls+
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135256] kernel_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135259] ? rest_init+
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135263] kernel_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135265] ? rest_init+
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135266] ret_from_
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135270] </TASK>
Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135271] ---[ end trace 6011f2a9999291c3 ]---
This doesn't happen on ALL SPR platforms, but it does happen periodically, and always seems to be centered around arch/x86/ events/ intel/uncore_ discovery. c
This doesn't seem to cause an stability issues that we've seen, but we need to know if these are innocuous, and better, can this be fixed so the kernel no longer spits out warnings (which triggers the kernel taint flag)?