Avoid migration issues with aligned 2MB THB

Bug #1788098 reported by Christian Ehrhardt  on 2018-08-21
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Medium
bugproxy
linux (Ubuntu)
Medium
Joseph Salisbury
Bionic
Medium
Joseph Salisbury
Cosmic
Medium
Joseph Salisbury
qemu (Ubuntu)
Undecided
Unassigned

Bug Description

FYI: This blocks bug 1781526 - once this one here is resolved we can go on with SRU considerations for 1781526

------- Comment From <email address hidden> 2018-08-20 17:12 EDT-------

Hi, in some environments it was observed that this qemu patch to enable THP made it more likely to hit guest migration issues, however the following kernel patch resolves those migration issues:

https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git/commit/?h=kvm-ppc-next&id=c066fafc595eef5ae3c83ae3a8305956b8c3ef15
KVM: PPC: Book3S HV: Use correct pagesize in kvm_unmap_radix()

Once merged upstream, it would be good to include that change as well to avoid potential migration problems. Should I open a new bug for that or is it better to track here?

Note Paelzer: I have not seen related migration issues myself, but it seems reasonable and confirmed by IBM.

Oh, I just realized while initially reported against qemu in bug 1781526 that this is a kernel, and not a qemu patch.

That spreads the timeline a bit:
- this should be in Cosmic before Release to avoid issues due to the fix of 1781526.
  - since that is kind of short I'll bump priority there.
- This has to be in Bionic before a fix for bug 1781526 (I'll wait with a qemu change until this one is complete)

I'm marking the qemu task invalid (no action there other than to track the Bionic release of this which will finally unblock the SRU of bug 1781526 to Bionic).

I'm adding a kernel task to reflect that this is a kernel change that is needed.
Finally I'm adding a Cosmic and Bionic Task.

Changed in qemu (Ubuntu):
status: New → Invalid
no longer affects: qemu (Ubuntu Bionic)
no longer affects: qemu (Ubuntu Cosmic)
Changed in linux (Ubuntu Cosmic):
importance: Undecided → Critical
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1788098

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

For this particular case the log files are not needed and/or applicable.
After discussing in #stable-kernel I set it to confirmed.

Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Changed in linux (Ubuntu Cosmic):
status: Incomplete → Confirmed

FYI: this is essentially an IBM request, reverse mirroring will happen at some point, but I wanted to make you aware right now

no longer affects: qemu
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
tags: added: triage-g
description: updated
Manoj Iyer (manjo) on 2018-08-22
Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → bugproxy (bugproxy)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
importance: Medium → Critical
status: Confirmed → In Progress
Changed in linux (Ubuntu Cosmic):
status: Confirmed → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the following patch:
KVM: PPC: Book3S HV: Use correct pagesize in kvm_unmap_radix()

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1788098

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

------- Comment From <email address hidden> 2018-08-30 10:29 EDT-------
Thanks, I've asked for some testing assistance from our KVM team but will note here some of the details from the original report of this problem..

repro steps are just a simple local host migration.

..they later noted that increasing the speed was a workaround:
(qemu) migrate_set_speed 1G

so you would want to test w/ default speed to confirm the issue is resolved

(qemu) migrate -d tcp:localhost:4444

using " cosmic qemu version 1:2.12+dfsg-3 " from Bug 169712 / LP 1781526 (which enables qemu to use 2MB THP backing for powerpc), plus the test kernel build from this bug.

Note without the kernel fix discussed in this bug, a migration problem might still happen even without that qemu THP patch if you got lucky enough to have a 2MB alignment by chance.

tags: added: architecture-ppc64le bugnameltc-170805 severity-critical targetmilestone-inin---
bugproxy (bugproxy) on 2018-08-30
tags: added: targetmilestone-inin1804
removed: targetmilestone-inin---
Changed in ubuntu-power-systems:
status: New → In Progress
Manoj Iyer (manjo) on 2018-09-24
tags: added: triage-a
removed: triage-g
Manoj Iyer (manjo) on 2018-10-01
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Andrew Cloke (andrew-cloke) wrote :

Marking as incomplete while awaiting the IBM testing assistance described in comment #6.

Nothing yet happened here.
I also declared the related qemu fix that is blocked by this as incomplete.
@manoj/jfh - maybe time for triage-r here?

tags: added: triage-r
removed: triage-a
Andrew Cloke (andrew-cloke) wrote :

After discussions with IBM, reducing the priority.

Changed in ubuntu-power-systems:
importance: Critical → Medium
Changed in linux (Ubuntu):
importance: Critical → Medium
Changed in linux (Ubuntu Bionic):
importance: Critical → Medium
Changed in linux (Ubuntu Cosmic):
importance: Critical → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers