A general-proteciton exception during guest migration to unsupported PKRU machine

Bug #2032164 reported by Chengen Du
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Committed
High
Chengen Du

Bug Description

[Impact]
When a host that supports PKRU initiates a guest that lacks PKRU support, the flag is enabled on the guest's fpstate.
This information is then passed to userspace through the vcpu ioctl KVM_GET_XSAVE.
However, a problem arises when the user opts to migrate the mentioned guest to another machine that does not support PKRU.
In this scenario, the new host attempts to restore the guest's fpu registers.
Nevertheless, due to the absence of PKRU support on the new host, a general-protection exception takes place, leading to a guest crash.

[Fix]
The problem is resolved by the following upstream commit:
18164f66e6c5 x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
8647c52e9504 KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}

[Test Plan]
Several scenarios need to be conducted to confirm the migration outcome.
 Patched kernel with PKRU -> kernel with PKRU
 Patched kernel with PKRU -> kernel without PKRU
 Patched kernel without PKRU -> kernel with PKRU
 Patched kernel without PKRU -> kernel without PKRU
 Kernel with PKRU -> patched kernel with PKRU
 Kernel with PKRU -> patched kernel without PKRU
 Kernel without PKRU -> patched kernel with PKRU
 Kernel without PKRU -> patched kernel without PKRU
 Patched kernel with PKRU -> patched kernel without PKRU

Each scenarios shall succeed except "Kernel with PKRU -> patched kernel without PKRU" one.
Addressing this case poses challenges because the most plausible solution is to clamp the FPU features at the destination during migration.
However, upstream does not support this approach due to concerns about silently dropping features requested by userspace.
This could potentially lead to other issues and violate KVM's ABI.

[Where problems could occur]
The introduced commits will impact the guest migration process,
potentially leading to failures and preventing the guest from operating successfully on the migration destination.

Chengen Du (chengendu)
Changed in linux (Ubuntu Jammy):
assignee: nobody → Chengen Du (chengendu)
Chengen Du (chengendu)
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Adrien Cunin (adri2000) wrote :

We see the same issue on focal, kernel 5.15.0-76-generic (linux-image-generic-hwe-20.04).

Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-85.95 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If the problem still exists, change the tag 'verification-needed-jammy-linux' to 'verification-failed-jammy-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux
Revision history for this message
Chengen Du (chengendu) wrote :

The kernels (5.15.0-85.95) have been tested without any issues.

tags: added: verification-done-jammy-linux
removed: verification-needed-jammy-linux
Revision history for this message
Alan Baghumian (alanbach) wrote :

We have now confirmed at three different locations that Live-Migration from PKRU PRE-5.15.0-85.95 to PKRU 5.15.0-85.95 compute nodes breaks.

Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → High
Revision history for this message
Stefan Bader (smb) wrote :

This got reverted on request since it caused different migration issues (bug #2036675).

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Triaged
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1050.57 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-linux-azure'. If the problem still exists, change the tag 'verification-needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure-v2 verification-needed-jammy-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra/5.15.0-1018.18 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-done-jammy-linux-nvidia-tegra'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-failed-jammy-linux-nvidia-tegra'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-v2 verification-needed-jammy-linux-nvidia-tegra
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-igx/5.15.0-1005.5 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra-igx' to 'verification-done-jammy-linux-nvidia-tegra-igx'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra-igx' to 'verification-failed-jammy-linux-nvidia-tegra-igx'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-igx-v2 verification-needed-jammy-linux-nvidia-tegra-igx
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-5.15/5.15.0-1018.18~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-nvidia-tegra-5.15' to 'verification-done-focal-linux-nvidia-tegra-5.15'. If the problem still exists, change the tag 'verification-needed-focal-linux-nvidia-tegra-5.15' to 'verification-failed-focal-linux-nvidia-tegra-5.15'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-nvidia-tegra-5.15-v2 verification-needed-focal-linux-nvidia-tegra-5.15
Revision history for this message
Olaf Seibert (oseibert-sys11) wrote :

Is there an alternative patch in development? This bug (or at least it seems to be this bug) still affects us.

Chengen Du (chengendu)
description: updated
Revision history for this message
Chengen Du (chengendu) wrote :

We have backported the following two patches in Jammy and waiting for review:
 18164f66e6c5 x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
 8647c52e9504 KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}
The fix will also apply to Focal-HWE kernel.

Changed in linux (Ubuntu Jammy):
status: Triaged → In Progress
Revision history for this message
Stefan Bader (smb) wrote :

Fix is from v6.6. Noble is not affected. Question is whether this should not also be added to Mantic?

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Chengen Du (chengendu) wrote :

Mantic has already included this two commits in #2049202 [1].
  c95b65ba744d x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
  97cdceb46fb3 KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}

[1] https://bugs.launchpad.net/bugs/2049202

Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.