KVM deadlock on KVM guest migration with latest QEMU (mitaka) from Xenial (or Mitaka Ubuntu Cloud Archive)

Bug #1596941 reported by Rafael David Tinoco
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned
Trusty
Expired
High
Unassigned
Vivid
Expired
High
Unassigned
Wily
Expired
High
Unassigned
Xenial
Expired
High
Unassigned
Yakkety
Expired
High
Unassigned

Bug Description

It was brought to my knowledge that qemu-kvm live migration (with full storage copy) on Trusty + Mitaka Ubuntu Cloud Archive was broken. When investigating I stepped into the following situation:

crash> sys
      KERNEL: /usr/lib/debug/boot/vmlinux-3.13.0-86-generic
    DUMPFILE: ./201606241546/dump.201606241546 [PARTIAL DUMP]
        CPUS: 4
        DATE: Fri Jun 24 15:46:39 2016
      UPTIME: 00:06:00
LOAD AVERAGE: 1.00, 0.60, 0.26
       TASKS: 146
    NODENAME: vmqemulivefail1
     RELEASE: 3.13.0-86-generic
     VERSION: #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016
     MACHINE: x86_64 (2494 Mhz)
      MEMORY: 8 GB
       PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"

Full backtrace doesn't have anything useful since i've configured kernel.softlockup_panic.

From scheduled-out tasks (and from kern.log) I was able to see that in more than one occasion I had the qemu process possibly dead-locked when dealing with asynchronous page faults:

## kernel 3.13

# dump 1

PID: 1604 TASK: ffff8800374be000 CPU: 3 COMMAND: "qemu-system-x86"
 #0 [ffff8800ba115e28] __schedule at ffffffff8172e379
 #1 [ffff8800ba115e90] schedule at ffffffff8172e859
 #2 [ffff8800ba115ea0] kvm_async_pf_task_wait at ffffffff8105060f
 #3 [ffff8800ba115f38] do_async_page_fault at ffffffff81736090
 #4 [ffff8800ba115f50] async_page_fault at ffffffff81732cd8
    RIP: 00007fb4eff0a4b3 RSP: 00007fb4713facb0 RFLAGS: 00010206
    RAX: 00007fb4cb9cf000 RBX: 00007fb4f166d8f0 RCX: 0000000000000010
    RDX: 0000000000001fff RSI: 00007fb4cb9deff8 RDI: 4000000000000000
    RBP: 0000000000000000 R8: 0000000000000000 R9: 00000002601b0000
    R10: 00fffffffffffe00 R11: 0000000000001fff R12: 0000000000000008
    R13: 00007fb4713fad84 R14: 00007fb4f1665290 R15: 00007fb4713fad88
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

# dump 2

PID: 1735 TASK: ffff8800b9bcb000 CPU: 2 COMMAND: "qemu-system-x86"
 #0 [ffff8802333c9e28] __schedule at ffffffff8172e379
 #1 [ffff8802333c9e90] schedule at ffffffff8172e859
 #2 [ffff8802333c9ea0] kvm_async_pf_task_wait at ffffffff8105060f
 #3 [ffff8802333c9f38] do_async_page_fault at ffffffff81736090
 #4 [ffff8802333c9f50] async_page_fault at ffffffff81732cd8
    RIP: 00007f631399d3b0 RSP: 00007f62912c7990 RFLAGS: 00010206
    RAX: 0000000000000000 RBX: 00007f6315f9e370 RCX: 00007f62ca714000
    RDX: 0000000032914020 RSI: 0000000000001000 RDI: 00007f62ca714000
    RBP: 00007f6315c66e40 R8: 00007f62912c7a40 R9: 00007f6315f9e3e0
    R10: 0000000000000000 R11: 0000000032914020 R12: 0000000032914020
    R13: 0000000000032914 R14: 00000000ffffffff R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

# dump 3

PID: 1617 TASK: ffff880232834800 CPU: 3 COMMAND: "qemu-system-x86"
 #0 [ffff880232a6de28] __schedule at ffffffff8172e379
 #1 [ffff880232a6de90] schedule at ffffffff8172e859
 #2 [ffff880232a6dea0] kvm_async_pf_task_wait at ffffffff8105060f
 #3 [ffff880232a6df38] do_async_page_fault at ffffffff81736090
 #4 [ffff880232a6df50] async_page_fault at ffffffff81732cd8
    RIP: 00007f8c39e8b3b0 RSP: 00007f8bb80c9990 RFLAGS: 00010206
    RAX: 0000000000000000 RBX: 00007f8c3aeba370 RCX: 00007f8bdea18000
    RDX: 0000000022c18020 RSI: 0000000000001000 RDI: 00007f8bdea18000
    RBP: 00007f8c3ab82e40 R8: 00007f8bb80c9a40 R9: 00007f8c3aeba498
    R10: 0000000000000000 R11: 0000000022c18020 R12: 0000000022c18020
    R13: 0000000000022c18 R14: 00000000ffffffff R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

## kernel 4.4

# kern.log

544 [ 360.282132] INFO: task qemu-system-x86:1592 blocked for more than 120 seconds.
545 [ 360.282984] Not tainted 4.4.0-27-generic #46~14.04.1-Ubuntu
546 [ 360.283581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
547 [ 360.284439] qemu-system-x86 D ffff8800bb833e90 0 1592 1 0x00000000
548 [ 360.284443] ffff8800bb833e90 ffff88023151c4c0 ffff8802345eb700 ffff8800bb834000
549 [ 360.284444] 0000000000000010 ffffffff81efe6d0 000055ac8fa05520 00007f88fc7f7d88
550 [ 360.284445] ffff8800bb833ea8 ffffffff817ed5f5 ffff8800bb833ef0 ffff8800bb833f38
551 [ 360.284447] Call Trace:
552 [ 360.284472] [<ffffffff817ed5f5>] schedule+0x35/0x80
553 [ 360.284481] [<ffffffff81060a93>] kvm_async_pf_task_wait+0x1a3/0x1f0
554 [ 360.284487] [<ffffffff810bdc60>] ? prepare_to_wait_event+0xf0/0xf0
555 [ 360.284494] [<ffffffff811fe600>] ? do_sendfile+0x360/0x380
556 [ 360.284495] [<ffffffff81060c55>] do_async_page_fault+0x75/0x80
557 [ 360.284498] [<ffffffff817f2fe8>] async_page_fault+0x28/0x30
558 [ 360.284500] Sending NMI to all CPUs:

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

The following upstream patch:

From 25fb213d873977290caf374234df496ad158ec1e Mon Sep 17 00:00:00 2001
From: Rik van Riel <email address hidden>
Date: Mon, 21 Mar 2016 15:13:27 +0100
Subject: [PATCH 2/2] kvm, rt: change async pagefault code locking for
 PREEMPT_RT

The async pagefault wake code can run from the idle task in exception
context, so everything here needs to be made non-preemptible.

Conversion to a simple wait queue and raw spinlock does the trick.

Signed-off-by: Rik van Riel <email address hidden>
Signed-off-by: Paolo Bonzini <email address hidden>

Fixes the issue by not letting async pagefault code to be preempted due to waitqueues.

Backport for Trusty needs:

From 25fb213d873977290caf374234df496ad158ec1e Mon Sep 17 00:00:00 2001
From: Rik van Riel <email address hidden>
Date: Mon, 21 Mar 2016 15:13:27 +0100
Subject: [PATCH 2/2] kvm, rt: change async pagefault code locking for
 PREEMPT_RT

From 6b9cf536987c69825f91af9478109aa7bcbebc94 Mon Sep 17 00:00:00 2001
From: "Peter Zijlstra (Intel)" <email address hidden>
Date: Fri, 19 Feb 2016 09:46:37 +0100
Subject: [PATCH 1/2] wait.[ch]: Introduce the simple waitqueue (swait)
 implementation

If adding simple waitqueue interface to Trusty is not acceptable as SRU I'll have to come up with something else. I'm sure that problem goes away when using these 2 patches.

Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Rafael David Tinoco (inaddy)
tags: added: canonical-bootstack
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Wily):
status: New → In Progress
Changed in linux (Ubuntu Vivid):
status: New → In Progress
Changed in linux (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → High
Changed in linux (Ubuntu Wily):
importance: Undecided → High
Changed in linux (Ubuntu Vivid):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu Wily):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Changed in linux (Ubuntu Trusty):
status: In Progress → Incomplete
Changed in linux (Ubuntu Vivid):
status: In Progress → Incomplete
Changed in linux (Ubuntu Wily):
status: In Progress → Incomplete
Changed in linux (Ubuntu Xenial):
status: In Progress → Incomplete
Changed in linux (Ubuntu Yakkety):
status: In Progress → Incomplete
Changed in linux (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu Vivid):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu Trusty):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu Wily):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu Yakkety):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu Xenial):
assignee: Rafael David Tinoco (inaddy) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Xenial) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Xenial):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Trusty) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Trusty):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Vivid) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Vivid):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Wily) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Wily):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Yakkety) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Yakkety):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.