deadlock on balloon deflation

Bug #1598197 reported by Roman Kagan on 2016-07-01
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Gavin Guo
Trusty
High
Gavin Guo

Bug Description

Latest Ubuntu trusty with kernel 3.13.0-91-generic run in a KVM virtual machine with virtio_balloon hangs when the previously inflated balloon is deflated.

The problem is in the recently committed backport:

commit 838478a8496ef9677256f53710144abe2ea49625
Author: Konstantin Khlebnikov <email address hidden>
AuthorDate: Mon May 16 14:43:10 2016 +0800
Commit: Kamal Mostafa <email address hidden>
CommitDate: Fri Jun 10 07:15:37 2016 -0700

    mm/balloon_compaction: redesign ballooned pages management

    BugLink: http://bugs.launchpad.net/bugs/1572562

    Sasha Levin reported KASAN splash inside isolate_migratepages_range().
    Problem is in the function __is_movable_balloon_page() which tests
    AS_BALLOON_MAP in page->mapping->flags. This function has no protection
    against anonymous pages. As result it tried to check address space flags
    inside struct anon_vma.

    Further investigation shows more problems in current implementation:

    * Special branch in __unmap_and_move() never works:
      balloon_page_movable() checks page flags and page_count. In
      __unmap_and_move() page is locked, reference counter is elevated, thus
      balloon_page_movable() always fails. As a result execution goes to the
      normal migration path. virtballoon_migratepage() returns
      MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
      move_to_new_page() thinks this is an error code and assigns
      newpage->mapping to NULL. Newly migrated page lose connectivity with
      balloon an all ability for further migration.

    * lru_lock erroneously required in isolate_migratepages_range() for
      isolation ballooned page. This function releases lru_lock periodically,
      this makes migration mostly impossible for some pages.

    * balloon_page_dequeue have a tight race with balloon_page_isolate:
      balloon_page_isolate could be executed in parallel with dequeue between
      picking page from list and locking page_lock. Race is rare because they
      use trylock_page() for locking.

    This patch fixes all of them.

    Instead of fake mapping with special flag this patch uses special state of
    page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses
    PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark
    directly in struct page makes everything safer and easier.

    PagePrivate is used to mark pages present in page list (i.e. not
    isolated, like PageLRU for normal pages). It replaces special rules for
    reference counter and makes balloon migration similar to migration of
    normal pages. This flag is protected by page_lock together with link to
    the balloon device.

    Signed-off-by: Konstantin Khlebnikov <email address hidden>
    Reported-by: Sasha Levin <email address hidden>
    Link: http://<email address hidden>
    Cc: Rafael Aquini <email address hidden>
    Cc: Andrey Ryabinin <email address hidden>
    Cc: <email address hidden> [3.8+]
    Signed-off-by: Andrew Morton <email address hidden>
    Signed-off-by: Linus Torvalds <email address hidden>
    (backported from commit d6d86c0a7f8ddc5b38cf089222cb1d9540762dc2)
    Signed-off-by: Gavin Guo <email address hidden>

    Conflicts:
     mm/balloon_compaction.c
     mm/migrate.c

    Acked-by: Stefan Bader <email address hidden>
    Signed-off-by: Kamal Mostafa <email address hidden>

It was applied after another backport:

commit 47618e32c2a729554bf56b8ee7b541b63aadad97
Author: Minchan Kim <email address hidden>
AuthorDate: Mon Dec 28 08:35:13 2015 +0900
Commit: Luis Henriques <email address hidden>
CommitDate: Mon Feb 22 19:31:53 2016 +0000

    virtio_balloon: fix race between migration and ballooning

    BugLink: http://bugs.launchpad.net/bugs/1542497

    commit 21ea9fb69e7c4b1b1559c3e410943d3ff248ffcb upstream.

    In balloon_page_dequeue, pages_lock should cover the loop
    (ie, list_for_each_entry_safe). Otherwise, the cursor page could
    be isolated by compaction and then list_del by isolation could
    poison the page->lru.{prev,next} so the loop finally could
    access wrong address like this. This patch fixes the bug.

    general protection fault: 0000 [#1] SMP
    Dumping ftrace buffer:
       (ftrace buffer empty)
    Modules linked in:
    CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
    RIP: 0010:[<ffffffff8115e754>] [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
    RSP: 0018:ffff8800a7fefdc0 EFLAGS: 00010246
    RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
    RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
    RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
    R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
    R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
    FS: 0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
    Stack:
     0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
     0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
     ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
    Call Trace:
     [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
     [<ffffffff812c8bc7>] balloon+0x217/0x2a0
     [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
     [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
     [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
     [<ffffffff8105b6e9>] kthread+0xc9/0xe0
     [<ffffffff8105b620>] ? kthread_park+0x60/0x60
     [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
     [<ffffffff8105b620>] ? kthread_park+0x60/0x60
    Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
    RIP [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
     RSP <ffff8800a7fefdc0>
    ---[ end trace 43cf28060d708d5f ]---
    Kernel panic - not syncing: Fatal exception
    Dumping ftrace buffer:
       (ftrace buffer empty)
    Kernel Offset: disabled

    Signed-off-by: Minchan Kim <email address hidden>
    Signed-off-by: Michael S. Tsirkin <email address hidden>
    Acked-by: Rafael Aquini <email address hidden>
    Signed-off-by: Kamal Mostafa <email address hidden>

This resulted in the following code:

 82 struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 83 {
 84 struct page *page, *tmp;
 85 unsigned long flags;
 86 bool dequeued_page;
 87
 88 dequeued_page = false;
 89 spin_lock_irqsave(&b_dev_info->pages_lock, flags);
 90 list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
 91 /*
 92 * Block others from accessing the 'page' while we get around
 93 * establishing additional references and preparing the 'page'
 94 * to be released by the balloon driver.
 95 */
 96 if (trylock_page(page)) {
 97 #ifdef CONFIG_BALLOON_COMPACTION
 98 if (!PagePrivate(page)) {
 99 /* raced with isolation */
100 unlock_page(page);
101 continue;
102 }
103 #endif
104 spin_lock_irqsave(&b_dev_info->pages_lock, flags);
105 balloon_page_delete(page);
106 spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
107 unlock_page(page);
108 dequeued_page = true;
109 break;
110 }
111 }
112 spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);

Note the line 104 takes the spinlock already taken in line 89.

CVE References

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-trusty (Ubuntu):
status: New → Confirmed
tags: added: kernel-da-key
Changed in linux-lts-trusty (Ubuntu):
importance: Undecided → High
affects: linux-lts-trusty (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu Trusty):
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
Gavin Guo (mimi0213kimo) on 2016-07-11
Changed in linux (Ubuntu Trusty):
assignee: nobody → Gavin Guo (mimi0213kimo)
Changed in linux (Ubuntu):
assignee: nobody → Gavin Guo (mimi0213kimo)
Changed in linux (Ubuntu Trusty):
status: Triaged → Fix Committed
Denis V. Lunev (r-den) wrote :

Guys, can we know exact version of the kernel with the fix? Thank you.

Gavin Guo (mimi0213kimo) wrote :

@Denis V.Lunev

Really sorry for the inconvenience. The patch has already been sent to the mailing list.
http://comments.gmane.org/gmane.linux.ubuntu.devel.kernel.general/78486

The fix will be included in the Ubuntu-3.13.0-93.140.

Seth Forshee (sforshee) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Seth Forshee (sforshee) wrote :

This bug still needs verification that the kernel in -proposed fixes the issue. Can someone please verify the fix?

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-93.140

---------------
linux (3.13.0-93.140) trusty; urgency=low

  [ Seth Forshee ]

  * Release Tracking Bug
    - LP: #1604134

  * Boot failure with EFI stub (LP: #1603476)
    - x86/efi: Fix boot failure with EFI stub

  * CVE-2016-5243 (LP: #1589036)
    - tipc: fix an infoleak in tipc_nl_compat_link_dump

  * qeth: delete napi struct when removing a qeth device (LP: #1601831)
    - qeth: delete napi struct when removing a qeth device

  * deadlock on balloon deflation (LP: #1598197)
    - SAUCE: mm/balloon_compaction: Fix Regression of LP#1572562

  * serial: 8250_pci: Add support for 16 port Exar boards (LP: #1447485)
    - serial: 8250_pci: Add support for 16 port Exar boards
    - serial: 8250_pci: Add support for 12 port Exar boards
    - serial: 8250_pci: Correct uartclk for xr17v35x expansion chips

  * linux: Homogenize changelog format across releases (LP: #1599562)
    - Revert "UBUNTU: [debian] BugLink: close LP: bugs only for Launchpad urls"
    - [Debian] git-ubuntu-log -- switch to bug order
    - [Debian] git-ubuntu-log -- fix empty section formatting
    - [Debian] git-ubuntu-log -- output should be utf-8
    - [Debian] git-ubuntu-log -- handle invalid or private bugs
    - [Debian] git-ubuntu-log -- wrap long bug and commit titles
    - [Debian] git-ubuntu-log -- ensure we get the last commit
    - [Debian] git-ubuntu-log -- prevent bug references being split
    - [Debian] git-ubuntu-log -- git log output is UTF-8

  * exercising ptys causes a kernel oops (LP: #1586418)
    - devpts: fix null pointer dereference on failed memory allocation

  * Miscellaneous upstream changes
    - KEYS: potential uninitialized variable

 -- Seth Forshee <email address hidden> Mon, 18 Jul 2016 15:05:56 -0500

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
tags: added: verification-done-trusty
removed: verification-needed-trusty
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers