Unable to put CPU back online on AWS x1e.xlarge instance with kernel 6.2+

Bug #2044334 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux-aws (Ubuntu)
New
Undecided
Unassigned

Bug Description

Issue found on AWS x1e.xlarge instance with:
* M-aws 6.5.0-1011.11
* L-aws 6.2.0-1007.7
* J-aws-6.5.0-1008.8~22.04.1
* J-aws-6.2.0-1005.5~22.04.1

J-aws-5.15 looks OK. And I can't see this failure on other instances in our pool.

CPU can be offlined but you won't be able to put it back online.

There are 4 CPUs on this instance.
$ uname -a
Linux ip-172-31-2-102 6.5.0-1011-aws #11~22.04.1-Ubuntu SMP Mon Nov 20 18:38:58 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ grep CONFIG_HOTPLUG_CPU /boot/config-6.5.0-1011-aws
CONFIG_HOTPLUG_CPU=y
$ cat /sys/devices/system/cpu/cpu3/online
1
$ echo 0| sudo tee /sys/devices/system/cpu/cpu3/online
0
$ echo 1| sudo tee /sys/devices/system/cpu/cpu3/online
1
tee: /sys/devices/system/cpu/cpu3/online: Input/output error

Output from
# Offline cpu3 - OK
Nov 23 06:21:06 ip-172-31-2-102 kernel: [ 1124.449748] smpboot: CPU 3 is now offline
# Online cpu3 - Failed
Nov 23 06:21:14 ip-172-31-2-102 kernel: [ 1132.310197] installing Xen timer for CPU 3
Nov 23 06:21:14 ip-172-31-2-102 kernel: [ 1132.310424] smpboot: Booting Node 0 Processor 3 APIC 0x3
Nov 23 06:21:24 ip-172-31-2-102 kernel: [ 1142.312481] CPU3 failed to report alive state

This is affecting the ubuntu_kernel_selftests/cpu-hotplug:cpu-on-off-test.sh and ubuntu_ltp/cpuhotplug:cpuhotplug02, cpuhotplug03, cpuhotplug04, cpuhotplug06

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This can be reproduced with generic kernel on x1e.xlarge as well:
* L 6.2.0-39-generic
* M 6.5.0-14-generic

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.