It shows 'smpboot: do_boot_cpu failed(-1) to wakeup CPU#96' in system dmesg log with kernel version 6.2.0-060200-generic.

Bug #2042888 reported by CocoZT_Wang
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux-hwe (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

OS: Ubuntu 22.04.3 LTS
Release: 22.04
Kernel: 6.2.0-060200-generic
CPU: 1x EPYC 9654
DIMM: 2x Samsung DDR5 16G

1.Install Ubuntu 22.04.3 and upgrade its kernel to 6.2.0-060200-generic (also tried on kernel 6.5.7-060507)
2.Reboot system and enable SMT-Control from BIOS.
3.Boot to system to read lscpu.txt and dmesg.txt,
lscpu log show Off-line CPU(s) info as below:

CPU(s): 192
On-line CPU(s) list: 0-95,97-191
Off-line CPU(s) list: 96
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9654 96-Core Processor
CPU family: 25
Model: 17
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 1
Stepping: 1
Frequency boost: enabled

dmesg log show CPU fail info as below,

[ 0.360560] smp: Bringing up secondary CPUs ...
[ 0.360708] x86: Booting SMP configuration:
[ 0.360711] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96
[ 10.631524] smpboot: do_boot_cpu failed(-1) to wakeup CPU#96
[ 10.631842] #97

BTW, we also find the same bug on AMD EPYC 9534 64-Core CPU with Ubuntu 22.04.3-kernel 6.2.0 and 6.5.7.

Revision history for this message
CocoZT_Wang (compal-666) wrote :
Revision history for this message
CocoZT_Wang (compal-666) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe (Ubuntu):
status: New → Confirmed
Revision history for this message
ChienSheng Chen (jackyjs) wrote :

I suffered same issue after upgrade kernel.
When system boots up with SMT enabled, all good in default kernel (5.15.0-43-generic).
But error message occurred after upgrade kernel to 6.2.0.
core 192 reported as disabled
Attempt to enable it fails, resulting the following messages in dmesg:
[ 342.156946] smpboot: Booting Node 0 Processor 192 APIC 0x1
[ 352.156658] smpboot: do_boot_cpu failed(-1) to wakeup CPU#192

Revision history for this message
Kokoro Natsume (kokkoro) wrote (last edit ):

I have suffered the same issue. I have found that I lost one CPU thread after upgrading amd64-microcode from `3.20191218.1ubuntu1` to `3.20191218.1ubuntu1.2`. Backporting the latest microcode resolves the issue.

OS: Ubuntu 20.04.6 LTS
Release: 20.04
Kernel: 5.4.0-144-generic
CPU: 2x EPYC 9654
DIMM: 24x Micron DDR5 64G

lscpu log show Off-line CPU(s) info as below. Please note that my SMT is enabled but "Thread(s) per core" shows "1":

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 384
On-line CPU(s) list: 0-191,193-383
Off-line CPU(s) list: 192
Thread(s) per core: 1
Core(s) per socket: 96
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9654 96-Core Processor
Stepping: 1
Frequency boost: enabled

Revision history for this message
Simon Alev (scie-mon) wrote (last edit ):

I have the same issue with an AMD EPYC 9334 32-core processor.

The Problem appears with an upgrade to Ubuntu 22.04.4., already with live ubuntu from a bootstick. Ubuntu 22.04.3 works fine.
A fresh install (without updates on OS install) of 22.04.3 runs without problems. After the 1st apt upgrade the problem appears. Intrestingly, as well with the updated 6.5.0-21 kernel as with the 6.2.0.26 kernel that was installed with 22.04.3.

Besides, with Kernel 6.5.0-21 the error message on boot now is 'CPU32 failed to report alive state'.

Also, when running the root shell in recovery mode the missing thread is online.

Motherboard is a Gigabyte MZ33-AR0.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.