KVM/QEMU live migration fails with Ubuntu wily kernel 4.2.0-30.35

Bug #1552592 reported by Thomas Lamprecht on 2016-03-03
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Kamal Mostafa
Wily
High
Kamal Mostafa
linux-lts-wily (Ubuntu)
Undecided
Kamal Mostafa
Trusty
Undecided
Kamal Mostafa

Bug Description

commit 3f11933efc9ef55ecb2ac7e6d626e8d05a99a4b1 - "KVM: x86: expose MSR_TSC_AUX to userspace" from ubuntu-wily master git tree breaks KVM/QEMU live migration of host with a graphical user interface.

== Reproduction ==

Install kernel with 3f11933efc9ef55ecb2ac7e6d626e8d05a99a4b1 included (Ubuntu-4.2.0-30.35)
Start VM with GUI
Start migration (no post copy, same migration as you'd in qemu 2.4)
When migration has finished and you switch the VNC over to the migration target the VM running but frozen showing the last frame buffer, its also not ping-able (so you ssh session is dead if you used that)

This is independent of (x86_64) Hardware and Guest OS used and also the QEMU cpu type (I used mainly kvm64).

== Backport Solution ==

After a reverse bisect on the working 4.4 xenial kernel (where it's working with the patch from above) I found the missing part.

While those two KVM patches listed above were backported/added another on was not, namely:
81b1b9ca6d5ca5f3ce91c0095402def657cf5db3 (from upstream linus)
KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=81b1b9ca6d5ca5f3ce91c0095402def657cf5db3

As there was a API change I made the rather trivial backport to the 4.2 kernel and tested it successfully.

The backported patch is attached to this report.

Some further information and earlier posts from myself about this issue by can be found here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1540532
https://lists.ubuntu.com/archives/kernel-team/2016-March/072356.html

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1552592

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: patch
Thomas Lamprecht (t-lamprecht) wrote :

 > This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

As this cannot be tracked through the hosts (log) and I posted this bug report here at a request at the kernel ML from Kamal to track this issue (and the patch is already queued and ACKed by some kernel team members) I think it does not bring any additional value to add those logs here and I'm currently not easily able to do so as I'm off work.

But if needed I can naturally fulfill the request.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Kamal Mostafa (kamalmostafa) wrote :

Thomas, thanks for your detailed analysis here, and for your backport. Yes, you can ignore the "missing log files" bot for this one -- we've what we need at this point.

Changed in linux (Ubuntu):
assignee: nobody → Kamal Mostafa (kamalmostafa)
importance: Undecided → High
status: Confirmed → In Progress
Kamal Mostafa (kamalmostafa) wrote :

The problem only affects the 4.2-stable Ubuntu kernels (won't manifest in earlier versions; already fixed upstream in later versions).

no longer affects: linux (Ubuntu Trusty)
no longer affects: linux-lts-wily (Ubuntu Wily)
Changed in linux (Ubuntu Wily):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux-lts-wily (Ubuntu Trusty):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux-lts-wily (Ubuntu):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in linux (Ubuntu Wily):
status: New → In Progress
Changed in linux-lts-wily (Ubuntu Trusty):
status: New → In Progress
Changed in linux-lts-wily (Ubuntu):
status: New → In Progress
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Changed in linux-lts-wily (Ubuntu Trusty):
status: In Progress → Triaged
Changed in linux (Ubuntu Wily):
importance: Undecided → High
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-wily
tags: added: verification-done-wily
removed: verification-needed-wily
Launchpad Janitor (janitor) wrote :
Download full text (20.9 KiB)

This bug was fixed in the package linux-lts-wily - 4.2.0-35.40~14.04.1

---------------
linux-lts-wily (4.2.0-35.40~14.04.1) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1557884

  [ Upstream Kernel Changes ]

  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1556269
  * Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
    - LP: #1556269
  * KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX
    - LP: #1552592
  * locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
    - LP: #1545330
  * [media] usbvision fix overflow of interfaces array
    - LP: #1556269
  * [media] usbvision: fix crash on detecting device with invalid
    configuration
    - LP: #1556269
  * ASN.1: Fix non-match detection failure on data overrun
    - LP: #1556269
  * iw_cxgb3: Fix incorrectly returning error on success
    - LP: #1556269
  * EVM: Use crypto_memneq() for digest comparisons
    - LP: #1556269
  * vmstat: explicitly schedule per-cpu work on the CPU we need it to run
    on
    - LP: #1556269
  * x86/entry/compat: Add missing CLAC to entry_INT80_32
    - LP: #1556269
  * iio-light: Use a signed return type for ltr501_match_samp_freq()
    - LP: #1556269
  * iio: add IIO_TRIGGER dependency to STK8BA50
    - LP: #1556269
  * iio: add HAS_IOMEM dependency to VF610_ADC
    - LP: #1556269
  * iio: dac: mcp4725: set iio name property in sysfs
    - LP: #1556269
  * iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
    - LP: #1556269
  * iio: light: acpi-als: Report data as processed
    - LP: #1556269
  * iio:adc:ti_am335x_adc Fix buffered mode by identifying as software
    buffer.
    - LP: #1556269
  * ASoC: rt5645: fix the shift bit of IN1 boost
    - LP: #1556269
  * ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
    - LP: #1556269
  * cgroup: make sure a parent css isn't offlined before its children
    - LP: #1556269
  * ARM: OMAP2+: Fix wait_dll_lock_timed for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix l2dis_3630 for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix save_secure_ram_context for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix l2_inv_api_params for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix ppa_zero_params and ppa_por_params for rodata
    - LP: #1556269
  * rtlwifi: rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded
    - LP: #1556269
  * PCI/AER: Flush workqueue on device remove to avoid use-after-free
    - LP: #1556269
  * ARM: dts: Fix wl12xx missing clocks that cause hangs
    - LP: #1556269
  * libata: disable forced PORTS_IMPL for >= AHCI 1.3
    - LP: #1556269
  * mac80211: Requeue work after scan complete for all VIF types.
    - LP: #1556269
  * rfkill: fix rfkill_fop_read wait_event usage
    - LP: #1556269
  * ARM: dts: at91: sama5d4: fix instance id of DBGU
    - LP: #1556269
  * ARM: dts: at91: sama5d4ek: add phy address and IRQ for macb0
    - LP: #1556269
  * ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
    - LP: #1556269
  * crypto: shash - Fix has_key setting
    - LP: #1556269
  * Input: vmmouse - fix absolute device registration
    - LP: #1556269
  * spi: atmel: fix gpio chip-select in case of n...

Changed in linux-lts-wily (Ubuntu Trusty):
status: Triaged → Fix Released
status: Triaged → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (20.9 KiB)

This bug was fixed in the package linux - 4.2.0-35.40

---------------
linux (4.2.0-35.40) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1557706

  [ Upstream Kernel Changes ]

  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1556269
  * Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
    - LP: #1556269
  * KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX
    - LP: #1552592
  * locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
    - LP: #1545330
  * [media] usbvision fix overflow of interfaces array
    - LP: #1556269
  * [media] usbvision: fix crash on detecting device with invalid
    configuration
    - LP: #1556269
  * ASN.1: Fix non-match detection failure on data overrun
    - LP: #1556269
  * iw_cxgb3: Fix incorrectly returning error on success
    - LP: #1556269
  * EVM: Use crypto_memneq() for digest comparisons
    - LP: #1556269
  * vmstat: explicitly schedule per-cpu work on the CPU we need it to run
    on
    - LP: #1556269
  * x86/entry/compat: Add missing CLAC to entry_INT80_32
    - LP: #1556269
  * iio-light: Use a signed return type for ltr501_match_samp_freq()
    - LP: #1556269
  * iio: add IIO_TRIGGER dependency to STK8BA50
    - LP: #1556269
  * iio: add HAS_IOMEM dependency to VF610_ADC
    - LP: #1556269
  * iio: dac: mcp4725: set iio name property in sysfs
    - LP: #1556269
  * iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
    - LP: #1556269
  * iio: light: acpi-als: Report data as processed
    - LP: #1556269
  * iio:adc:ti_am335x_adc Fix buffered mode by identifying as software
    buffer.
    - LP: #1556269
  * ASoC: rt5645: fix the shift bit of IN1 boost
    - LP: #1556269
  * ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
    - LP: #1556269
  * cgroup: make sure a parent css isn't offlined before its children
    - LP: #1556269
  * ARM: OMAP2+: Fix wait_dll_lock_timed for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix l2dis_3630 for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix save_secure_ram_context for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix l2_inv_api_params for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix ppa_zero_params and ppa_por_params for rodata
    - LP: #1556269
  * rtlwifi: rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded
    - LP: #1556269
  * PCI/AER: Flush workqueue on device remove to avoid use-after-free
    - LP: #1556269
  * ARM: dts: Fix wl12xx missing clocks that cause hangs
    - LP: #1556269
  * libata: disable forced PORTS_IMPL for >= AHCI 1.3
    - LP: #1556269
  * mac80211: Requeue work after scan complete for all VIF types.
    - LP: #1556269
  * rfkill: fix rfkill_fop_read wait_event usage
    - LP: #1556269
  * ARM: dts: at91: sama5d4: fix instance id of DBGU
    - LP: #1556269
  * ARM: dts: at91: sama5d4ek: add phy address and IRQ for macb0
    - LP: #1556269
  * ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
    - LP: #1556269
  * crypto: shash - Fix has_key setting
    - LP: #1556269
  * Input: vmmouse - fix absolute device registration
    - LP: #1556269
  * spi: atmel: fix gpio chip-select in case of non-DT platform
    - LP: #1556269
  ...

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers