Heap-use-after-free in io_writex / cputlb.c results in Linux kernel crashes

Bug #1920934 reported by Marco Elver
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

qemu version: git 5ca634afcf83215a9a54ca6e66032325b5ffb5f6; 5.2.0

We've encountered that booting the Linux kernel in TCG mode, results in a racy heap-use-after-free. The bug can be detected by ASan [A], but in the majority of runs results in a crashing kernel [B].

To reproduce, the following command line was used:

$> while ./qemu-system-x86_64 -no-reboot -smp 10 -m 2G -kernel arch/x86/boot/bzImage -nographic -append "oops=panic panic_on_warn=1 panic=1 kfence.sample_interval=1 nokaslr"; do sleep 0.5s; done

The crashes in the kernel [B] appear to receive an interrupt in a code location where the instructions are periodically patched (via the jump_label infrastructure).

[A]:
=================================================================
==3552508==ERROR: AddressSanitizer: heap-use-after-free on address 0x6190007fef50 at pc 0x55885b0b4d1b bp 0x7f83baffb800 sp 0x7f83baffb7f8
READ of size 8 at 0x6190007fef50 thread T4
[ 4.616506][ T1] pci 0000:00:02.0: reg 0x18: [mem 0xfebf0000-0xfebf0fff]
[ 4.670567][ T1] pci 0000:00:02.0: reg 0x30: [mem 0xfebe0000-0xfebeffff pref]
[ 4.691345][ T1] pci 0000:00:03.0: [8086:100e] type 00 class 0x020000
[ 4.701540][ T1] pci 0000:00:03.0: reg 0x10: [mem 0xfebc0000-0xfebdffff]
[ 4.711076][ T1] pci 0000:00:03.0: reg 0x14: [io 0xc000-0xc03f]
[ 4.746869][ T1] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref]
[ 4.813612][ T1] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
    #0 0x55885b0b4d1a in io_writex ../accel/tcg/cputlb.c:1408
    #1 0x55885b0d3b9f in store_helper ../accel/tcg/cputlb.c:2444
    #2 0x55885b0d3b9f in helper_le_stl_mmu ../accel/tcg/cputlb.c:2510
[ 4.820927][ T1] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
    #3 0x7f843cedf8dc (<unknown module>)

0x6190007fef50 is located 208 bytes inside of 1024-byte region [0x6190007fee80,0x6190007ff280)
freed by thread T11 here:
    #0 0x7f8483f431f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x7f8483586de7 in g_realloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57de7)

previously allocated by thread T11 here:
    #0 0x7f8483f431f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x7f8483586de7 in g_realloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57de7)

Thread T4 created by T0 here:
[ 4.827679][ T1] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 4.835143][ T1] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 4.838441][ T1] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
    #0 0x7f8483eee2a2 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:214
    #1 0x55885b7cf0de in qemu_thread_create ../util/qemu-thread-posix.c:558

Thread T11 created by T0 here:
    #0 0x7f8483eee2a2 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:214
    #1 0x55885b7cf0de in qemu_thread_create ../util/qemu-thread-posix.c:558

SUMMARY: AddressSanitizer: heap-use-after-free ../accel/tcg/cputlb.c:1408 in io_writex
Shadow bytes around the buggy address:
  0x0c32800f7d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32800f7da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32800f7db0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32800f7dc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32800f7dd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c32800f7de0: fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd
  0x0c32800f7df0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c32800f7e00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c32800f7e10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c32800f7e20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c32800f7e30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable: 00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone: fa
  Freed heap region: fd
  Stack left redzone: f1
  Stack mid redzone: f2
  Stack right redzone: f3
  Stack after return: f5
  Stack use after scope: f8
  Global redzone: f9
  Global init order: f6
  Poisoned by user: f7
  Container overflow: fc
  Array cookie: ac
  Intra object redzone: bb
  ASan internal: fe
  Left alloca redzone: ca
  Right alloca redzone: cb
  Shadow gap: cc
==3552508==ABORTING

[B]:
[ 6.029269][ C4] int3: 0000 [#1] PREEMPT SMP
[ 6.029269][ C4] CPU: 4 PID: 34 Comm: cpuhp/4 Not tainted 5.12.0-rc4 #2
[ 6.029269][ C4] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 6.029269][ C4] RIP: 0010:kmem_cache_alloc_trace+0xdd/0x2f0
[ 6.029269][ C4] Code: de e8 a7 2e 02 00 85 c0 74 0d 48 89 ef e8 bb 60 00 00 e9 e3 00 00 00 4d 85 f6 0f 84 da 00 00 00 4c 89 6c 24 08 48 8b 2c 24 cc <98> 01 00 00 45 31 ed 4c 89 6c 24 10 4d 85 ed 0f 85 99 00 00 00 49
[ 6.029269][ C4] RSP: 0018:ffffc90000483cc0 EFLAGS: 00000286
[ 6.029269][ C4] RAX: 0000000000000000 RBX: 0000000000000dc0 RCX: ffff888003b717c0
[ 6.029269][ C4] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff888003842a00
[ 6.029269][ C4] RBP: 0000000000000110 R08: 0000000000000000 R09: 0000000000000000
[ 6.029269][ C4] R10: ffffffff81248e22 R11: 00000000fa83b201 R12: 0000000000000dc0
[ 6.029269][ C4] R13: 0000000000000000 R14: ffff888003842a00 R15: ffffffff8150e1c9
[ 6.029269][ C4] FS: 0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
[ 6.029269][ C4] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6.029269][ C4] CR2: 0000000000000000 CR3: 0000000002011000 CR4: 00000000000006e0
[ 6.029269][ C4] Call Trace:
[ 6.029269][ C4] device_add+0x59/0x7b0
[ 6.029269][ C4] device_create+0xea/0x130
[ 6.029269][ C4] ? cpu_report_death+0x40/0x40
[ 6.029269][ C4] ? cpu_report_death+0x40/0x40
[ 6.029269][ C4] ? msr_devnode+0x20/0x20
[ 6.029269][ C4] msr_device_create+0x28/0x40
[ 6.029269][ C4] cpuhp_invoke_callback+0x140/0x2f0
[ 6.029269][ C4] ? finish_task_switch+0x8c/0x230
[ 6.029269][ C4] ? cpu_report_death+0x40/0x40
[ 6.029269][ C4] cpuhp_thread_fun+0x118/0x1a0
[ 6.029269][ C4] ? cpu_report_death+0x40/0x40
[ 6.029269][ C4] smpboot_thread_fn+0x1b9/0x270
[ 6.029269][ C4] kthread+0x14b/0x160
[ 6.029269][ C4] ? kthread_unuse_mm+0xf0/0xf0
[ 6.029269][ C4] ret_from_fork+0x1f/0x30
[ 6.029269][ C4] ---[ end trace 1336f71544bb94e4 ]---

Revision history for this message
Marco Elver (melver) wrote :
Revision history for this message
Peter Maydell (pmaydell) wrote :

Does this repro with current-head-of-git QEMU ?

Revision history for this message
Marco Elver (melver) wrote :

Yes, I have:

commit 5ca634afcf83215a9a54ca6e66032325b5ffb5f6 (HEAD -> master, origin/master, origin/HEAD)
Merge: c95bd5ff16 cffb446e8f
Author: Peter Maydell <email address hidden>
Date: Mon Mar 22 18:50:25 2021 +0000

Or another branch?

Revision history for this message
Richard Henderson (rth) wrote :

This suggests that the rcu_read in iotlb_to_section is not
playing well with one of the g_renew calls in softmmu/physmem.c.

Not sure which, since the sanitizer dump above doesn't trace
back beyond glib itself.

Revision history for this message
Richard Henderson (rth) wrote :

I have been unable to reproduce this problem with qemu
master (67c1115edd98), and linux 5.10 w/ your config.

Revision history for this message
Marco Elver (melver) wrote :

The config is from 5.12-rc4, and the earliest kernel version that should reproduce this is 5.12-rc1.

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently moving its bug tracking to another system.
For this we need to know which bugs are still valid and which could be
closed already. Thus we are setting the bug state to "Incomplete" now.

If the bug has already been fixed in the latest upstream version of QEMU,
then please close this ticket as "Fix released".

If it is not fixed yet and you think that this bug report here is still
valid, then you have two options:

1) If you already have an account on gitlab.com, please open a new ticket
for this problem in our new tracker here:

    https://gitlab.com/qemu-project/qemu/-/issues

and then close this ticket here on Launchpad (or let it expire auto-
matically after 60 days). Please mention the URL of this bug ticket on
Launchpad in the new ticket on GitLab.

2) If you don't have an account on gitlab.com and don't intend to get
one, but still would like to keep this ticket opened, then please switch
the state back to "New" or "Confirmed" within the next 60 days (other-
wise it will get closed as "Expired"). We will then eventually migrate
the ticket automatically to the new system (but you won't be the reporter
of the bug in the new system and thus you won't get notified on changes
anymore).

Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.