cpuhotplug failures on google n2d-standard-4.sev_snp VMs

Bug #2051378 reported by Francis Ginther
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

Testing on n2d-standard-4.sev_snp and possibly other sev_snp enabled instances unexpectedly reboot during cpuhotplug operations. This is known to happen during the following tests:

ubuntu_ftrace_smoke_test.ftrace-smoke-test
ubuntu_kernel_selftests.cpu-hotplug:cpu-on-off-test.sh
ubuntu_kselftests_ftrace.ftrace:test.d--00basic--basic2.tc
ubuntu_ltp.cpuhotplug:cpuhotplug02

The VM will reboot in most cases interrupting the test. From the console logs we see messages like:

...
Jan 25 19:22:13 j-lgcp-gcp-5-15-n2dstd4-sev-snp-u-ftrace-smk-test kernel: [ 192.364513] mmiotrace: Disabling non-boot CPUs...
Jan 25 19:22:13 j-lgcp-gcp-5-15-n2dstd4-sev-snp-u-ftrace-smk-test kernel: [ 192.403578] smpboot: CPU 1 is now offline
Jan 25 19:22:13 j-lgcp-gcp-5-15-n2dstd4-sev-snp-u-ftrace-smk-test kernel: [ 192.425028] mmiotrace: CPU1 is down.
Jan 25 19:22:13 j-lgcp-gcp-5-15-n2dstd4-sev-snp-u-ftrace-smk-test kernel: [ 192.447485] smpboot: CPU 2 is now offline
Jan 25 19:22:13 j-lgcp-gcp-5-15-n2dstd4-sev-snp-u-ftrace-smk-test kernel: [ 192.464986] mmiotrace: CPU2 is down.
Jan 25 19:22:13 j-lgcp-gcp-5-15-n2dstd4-sev-snp-u-ftrace-smk-test kernel: [ 192.483908] smpboot: CPU 3 is now offline

then the VM starts booting again.

This has been seen on at least linux-gcp 5.15 and 6.5 kernels since we've started testing on these n2d-standard-4.sev_snp VMs.

Revision history for this message
Francis Ginther (fginther) wrote :

Serial console log from a ubuntu_kselftests_ftrace test run.

tags: added: sru-20231030 sru-20240108
summary: - cpuhotplug failures on google seg_snp instances
+ cpuhotplug failures on google n2d-standard-4.sev_snp VMs
Po-Hsu Lin (cypressyew)
tags: added: 5.15 6.5 jammy ubuntu-ltp
Po-Hsu Lin (cypressyew)
tags: added: ubuntu-ftrace-smoke-test ubuntu-kernel-selftests ubuntu-kselftests-ftrace
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

n2d-standard-4.sev_snp with F-gcp-5.15.0-1051.59~20.04.1 and F-gcp-tcpx-5.15.0-1004.4 the ubuntu_ltp_controllers/cpuset_hotplug will cause test interrupt as well.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

With jammy/linux-gcp-6.5/6.5.0-1017.17~22.04.1 on n2d-standard-4.sev_snp, ubuntu_ltp.cpuhotplug:cpuhotplug02 has passed:
 INFO: Test start time: Sat Mar 16 03:10:32 UTC 2024
 COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 422515 -n 422515 -f /tmp/ltp-KBfVFAlKgT/alltests -l /dev/null -C /dev/null -T /dev/null
 LOG File: /dev/null
 FAILED COMMAND File: /dev/null
 TCONF COMMAND File: /dev/null
 Running tests.......
 Name: cpuhotplug02
 Date: Sat Mar 16 03:10:33 UTC 2024
 Desc: What happens to a process when its CPU is offlined?

 CPU is 1
 cpuhotplug02 1 TPASS: turned off CPU 1, process migrated to CPU 0
 INFO: ltp-pan reported all tests PASS
 LTP Version: 20230929-406-gcbc2d0568
 INFO: Test end time: Sat Mar 16 03:10:45 UTC 2024

But ubuntu_ltp.cpuhotplug:cpuhotplug03 cause the crash this time.

tags: added: sru-s20240205
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.