Automatic configuration of TSX flag on cmdline
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
David Vallee Delisle |
Bug Description
Description
-----------
Fast-forward upgrade from OSP-13 (RHEL-7.9) to OSP-16.2 (RHEL-8.3)
fails[1] during live migration with:
[...] libvirt.
match specification: missing features: hle,rtm
The failure is due to RHEL-8.3 (destination host) disabling an Intel
"TSX". And disabling TSX disables the 'hle' and 'rtm' features.
This was discovered during OSP fast-forward upgrades testing[+] where a
guest was being live-migrated from RHEL-7.9 (with TSX=on) to RHEL-8.3
(breaking change: TSX=off), and the migration failed with the
above-mentioned error.
[+] https:/
migration during OSP16.2 hybrid state from RHEL7.9 to RHEL8.3 not
working
Why?
----
RHEL-8.3 kernel disabled Intel TSX by default, because it is considered
a potential security risk:
https:/
kernel: Disable Intel TSX by default on newer CPUs
Still, it is not acceptable for RHEL-8.3 kernel to break user-space in a
minor RHEL release. (See also:
https:/
Workaround for OSP upgrades
-------
This is unpalatable, but unfortunately there's no other option currently:
(1) have a TripleO config attribute that will enable TSX on the
destination RHEL-8.3 host; set the following in /etc/default/grub:
... and reboot the 8.3 host;
(2) live-migrate the guests from RHEL-7.9 to the RHEL-8.3;
(3) now turn off TSX on the RHEL-8.3 host kernel command-line;
shutdown the guests;
(4) reboot the 8.3 host again, and start the guests
https:/
https:/
Changed in tripleo: | |
assignee: | nobody → David Vallee Delisle (valleedelisle) |
status: | New → In Progress |
tags: | added: tripleo-heat-templates |
tags: | added: tripleo-kernel |
Changed in tripleo: | |
importance: | Undecided → High |
Change abandoned by "David Vallee Delisle <email address hidden>" on branch: master /review. opendev. org/c/openstack /tripleo- ansible/ +/783969
Review: https:/
Reason: After discussing this, we agreed to move this to a validation with a hard stop if operators didn't have explicitly added TSX kernel flag to their KernelArgs during update/upgrade