The VM hang happens because of pending interrupts not reinjected when migrating the VM several times

Bug #1791286 reported by Gavin Guo
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Gavin Guo
Trusty
Fix Released
Medium
Unassigned

Bug Description

[Impact]

After the VM(Guest OS is Windows Server 2012R2) has been live-migrated
several times, the screen is blacked out or freeze when the VM is
connected via VNC.

The Windows Guest OS Server has been investigated that all of kernel
threads of Windows Server OS are waiting(idle state) for interrupt
request(IRQ) from hypervisor.

The following environment is tested:
* Host OS: Ubuntu 14.04(kernel 3.13.0-40.69)
* qemu-kvm_2.0.0+dfsg-2ubuntu1.22
* libvert-bin_1.2.2-0ubuntu13.1.5
* nova-compute_1:2014.2.3-0ubuntu1.2~cloud0
* GuestOS: Windows Server 2012R2
* virtio-win-0.1.126

[Fix]

The patch set is needed:

673f7b4257a1 KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
44847dea7975 KVM: ioapic: extract body of kvm_ioapic_set_irq
0bc830b05c66 KVM: ioapic: clear IRR for edge-triggered interrupts at delivery
0b10a1c87a2b KVM: ioapic: merge ioapic_deliver into ioapic_service

[Test]

Prepare two machines with the same environment mentioned above and run
the following script.

#!/bin/bash
INSTANCE="test"

FROM="elite"
virsh -c qemu+ssh://ubuntu@${FROM}/system list| grep -q ${INSTANCE}
if [[ $? != 0 ]]; then
FROM="dixie";
TO="elite";
else
TO="dixie"
fi

echo "= Migrating ${INSTANCE} from ${FROM} to ${TO} ="
ssh ubuntu@${FROM} -- virsh migrate --live --domain ${INSTANCE}
--desturi qemu+ssh://ubuntu@${TO}/system

The VNC script I use to keep track of the VNC console is:

#!/bin/bash
INSTANCE="test"

while true; do
HOST="elite"
virsh -c qemu+ssh://ubuntu@${HOST}/system list| grep -q ${INSTANCE}
if [[ $? != 0 ]]; then HOST="dixie"; fi

virt-viewer -c qemu+ssh://ubuntu@${HOST}/system test
sleep 3
done

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1791286

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Gavin Guo (mimi0213kimo)
description: updated
Changed in linux (Ubuntu Trusty):
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Gavin Guo (mimi0213kimo)
tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Toshikazu Ichikawa (ichikawa-toshikazu) wrote :

We tested "3.13.0.161.211" and confirmed the issue was fixed.
We installed "3.13.0.161.211" into two hosts and conducted the live-migration of Windows guest between the hosts more than 1000 times. The problems didn't occured in the Windows guest, which means "3.13.0.161.211" fixed the bug.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-161.211

---------------
linux (3.13.0-161.211) trusty; urgency=medium

  * linux: 3.13.0-161.211 -proposed tracker (LP: #1795595)

  * CVE-2017-0794
    - scsi: sg: protect accesses to 'reserved' page array
    - scsi: sg: reset 'res_in_use' after unlinking reserved array
    - scsi: sg: recheck MMAP_IO request length with lock held

  * CVE-2017-15299
    - KEYS: don't let add_key() update an uninstantiated key

  * CVE-2015-8539
    - KEYS: Fix handling of stored error in a negatively instantiated user key

  * CVE-2018-7566
    - ALSA: seq: Fix racy pool initializations
    - ALSA: seq: More protection for concurrent write and ioctl races

  * CVE-2018-1000004. // CVE-2018-7566
    - ALSA: seq: Don't allow resizing pool in use

  * CVE-2018-1000004
    - ALSA: seq: Make ioctls race-free

  * CVE-2017-18216
    - ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent

  * CVE-2016-7913
    - tuner-xc2028: Don't try to sleep twice
    - xc2028: avoid use after free
    - xc2028: unlock on error in xc2028_set_config()
    - xc2028: Fix use-after-free bug properly

  * The VM hang happens because of pending interrupts not reinjected when
    migrating the VM several times (LP: #1791286)
    - KVM: ioapic: merge ioapic_deliver into ioapic_service
    - KVM: ioapic: clear IRR for edge-triggered interrupts at delivery
    - KVM: ioapic: extract body of kvm_ioapic_set_irq
    - KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP

  * CVE-2018-5390
    - SAUCE: tcp: Correct the backport of the CVE-2018-5390 fix

  * CVE-2018-9518
    - NFC: llcp: Limit size of SDP URI

  * Improvements to the kernel source package preparation (LP: #1793461)
    - [Packaging] startnewrelease: add support for backport kernels

 -- Stefan Bader <email address hidden> Wed, 03 Oct 2018 16:41:42 +0200

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
importance: Undecided → Medium
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.