Unable to put CPU back online on AWS x1e.xlarge instance with kernel 6.2+
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned | ||
linux-aws (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Issue found on AWS x1e.xlarge instance with:
* M-aws 6.5.0-1011.11
* L-aws 6.2.0-1007.7
* J-aws-6.
* J-aws-6.
J-aws-5.15 looks OK. And I can't see this failure on other instances in our pool.
CPU can be offlined but you won't be able to put it back online.
There are 4 CPUs on this instance.
$ uname -a
Linux ip-172-31-2-102 6.5.0-1011-aws #11~22.04.1-Ubuntu SMP Mon Nov 20 18:38:58 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ grep CONFIG_HOTPLUG_CPU /boot/config-
CONFIG_
$ cat /sys/devices/
1
$ echo 0| sudo tee /sys/devices/
0
$ echo 1| sudo tee /sys/devices/
1
tee: /sys/devices/
Output from
# Offline cpu3 - OK
Nov 23 06:21:06 ip-172-31-2-102 kernel: [ 1124.449748] smpboot: CPU 3 is now offline
# Online cpu3 - Failed
Nov 23 06:21:14 ip-172-31-2-102 kernel: [ 1132.310197] installing Xen timer for CPU 3
Nov 23 06:21:14 ip-172-31-2-102 kernel: [ 1132.310424] smpboot: Booting Node 0 Processor 3 APIC 0x3
Nov 23 06:21:24 ip-172-31-2-102 kernel: [ 1142.312481] CPU3 failed to report alive state
This is affecting the ubuntu_
This can be reproduced with generic kernel on x1e.xlarge as well:
* L 6.2.0-39-generic
* M 6.5.0-14-generic