sbrk() not working under qemu-user with a PIE-compiled binary?

Bug #1749393 reported by Raphaël Hertzog on 2018-02-14
This bug affects 13 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Christian Ehrhardt 

Bug Description


 * The current space reserved can be too small and we can end up
   with no space at all for BRK. It can happen to any case, but is
   much more likely with the now common PIE binaries.

 * Backport the upstream fix which reserves a bit more space while loading
   and giving it back after interpreter and stack is loaded.

[Test Plan]

 * On x86 run:
sudo apt install -y qemu-user-static
sudo docker run --rm arm64v8/debian:bullseye bash -c 'apt update && apt install -y wget'
Running hooks in /etc/ca-certificates/update.d...
Errors were encountered while processing:
E: Sub-process /usr/bin/dpkg returned an error code (1)

[Where problems could occur]

 * Regressions would be around use-cases of linux-user that is
   emulation not of a system but of binaries.
   Commonly uses for cross-tests and cross-builds so that is the
   space to watch for regressions

[Other Info]

 * n/a


In Debian unstable, we recently switched bash to be a PIE-compiled binary (for hardening). Unfortunately this resulted in bash being broken when run under qemu-user (for all target architectures, host being amd64 for me).

$ sudo chroot /srv/chroots/sid-i386/ qemu-i386-static /bin/bash
bash: xmalloc: .././shell.c:1709: cannot allocate 10 bytes (0 bytes allocated)

bash has its own malloc implementation based on sbrk():

When we disable this internal implementation and rely on glibc's malloc, then everything is fine. But it might be that glibc has a fallback when sbrk() is not working properly and it might hide the underlying problem in qemu-user.

This issue has also been reported to the bash upstream author and he suggested that the issue might be in qemu-user so I'm opening a ticket here. Here's the discussion with the bash upstream author:

You can find the problematic bash binary in that .deb file:

The version of qemu I have been using is 2.11 (Debian package qemu-user-static version 1:2.11+dfsg-1) but I have had reports that the problem is reproducible with older versions (back to 2.8 at least).

Here are the related Debian bug reports:

It's worth noting that bash used to have this problem (when compiled as a PIE binary) even when run directly but then something got fixed in the kernel and now the problem only appears when run under qemu-user:

Related branches

Revision history for this message
Gérard Vidal (gerard-f-vidal-4) wrote :

Affected by the same bug.
target architecture arm
host architecture amd64

bug message
bash: xmalloc: .././locale.c:****: cannot allocate 2 bytes (0 bytes allocated)

Revision history for this message
Peter Ogden (peterogden) wrote :

This appears to be a problem in all PIE-compiled executables that use sbrk in qemu-user due to the way that position-independent code gets mmapped into adjacent ranges meaning there is no room for expansion. I've hacked my version of QEMU to force the program binary to mmap in a different range allowing for the region to be resized which fixes this issue. I don't know the most appropriate way to determine what range to use in generate though.

Peter Maydell (pmaydell) on 2018-03-15
tags: added: arm linux-user
Revision history for this message
Peter Maydell (pmaydell) wrote :

There seem to be two parts to this. Firstly, with a big reserved-region, which is the default for 32-bit-guest-on-64-bit-host, this code in main.c:

        if (reserved_va) {
            mmap_next_start = reserved_va;

says to start trying for the next mmap address at the top of the reserved section, which is typically right at the top of the guest's address space. This means that for a PIE executable we'll try to load it at a very high address, which then means there's no space above the data section for the brk segment.

Secondly, for the no-reserved-region case (-R 0, or 64-on-64), we still fail, but this time because we've chosen to mmap the dynamic interpreter at an address just above the executable. Again, no space to expand the data segment and brk fails.

Linux kernel commit a87938b2e246b81 message says something about there being a guaranteed 128MB "gap" between data segment and stack on x86-64 which we're obviously not honourin; presumbably there's similar requirements for other archs. (As an aside, is bash really happy with only having perhaps 128MB of allocatable memory? Otherwise it really ought to use mmap rather than brk for its allocator.)

Revision history for this message
Peter Ogden (peterogden) wrote :

Could we over-allocate the data segment by QEMU_DATA_SIZE/getrlimit(RLIMIT_DATA)/128 MB depending on what's set - similar to how the stack size is managed?

My current workaround for aarch64 on x86-64 is to mmap a dynamic main executable in some far-away part of the address space but I'm not sure how to find somewhere suitable on a 32-bit host/guest.

Revision history for this message
Peter Maydell (pmaydell) wrote : Re: [PATCH] linux-user: Allocate extra space for brk in PIE executable

On 16 March 2018 at 10:34, Richard Henderson
<email address hidden> wrote:
> Limit this to 16M; there does not appear to be any special
> support for this in the kernel itself, at least for i686.
> Fixes:
> Signed-off-by: Richard Henderson <email address hidden>
> ---
> Commentary in the launchpad bug suggests 128M gap for x86_64, but that's
> somewhat irrelevant to the given i686 test case. There's certainly nothing
> in the referenced kernel patch that does any more than we had been doing
> without this patch.

I think the 128MB is enforced by mmap_base() in arch/x86/mm/mmap.c:
since x86-64 sots HAVE_ARCH_UNMAPPED_AREA_TOPDOWN, mmap_base is the
highest address in memory where mmap is permitted, and mmap_base()
enforces that it goes at least 128MB below the bottom of the stack
(accounting for rlimit stack size requirements also). Since
binfmt_elf() loads ELF segments via mmap this means that they won't
go too close to the stack. (The commit a87938b2e246 ensures the
gap is honoured by using the full binary size when it does the first
mapping so that mmap picks an address that is sufficiently before the
end of the mmap region for everything to fit.)
The kernel also uses ELF_ET_DYN_BASE to ensure that PIE programs
themselves get loaded clear of the ELF interpreter, which we
don't have any equivalent of (so you can see that different values
of -R result in either the interpreter or the executable getting
loaded at lower addresses.)

PS: do you know what the intention of the
        if (reserved_va) {
            mmap_next_start = reserved_va;
code in linux-user/main.c is? It seems a bit odd to say "ok,
we have reserved a big region. we will start trying to mmap
outside it.", especially when that region covers the full
4G that the guest can access...

-- PMM

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Matthias Klose (doko) wrote :
Revision history for this message
Richard Henderson (rth) wrote :

Another proposed patch:
https://<email address hidden>/

Changed in qemu (Ubuntu):
assignee: nobody → Richard Henderson (rth)
Revision history for this message
Laurent Vivier (laurent-vivier) wrote :
Changed in qemu:
status: New → Fix Committed
Changed in qemu:
status: Fix Committed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Will be merged in 20.10 with qemu >=5.0 where this came upstream.

tags: added: qemu-20.10
Changed in qemu (Ubuntu):
status: Confirmed → Triaged
Changed in qemu (Ubuntu):
assignee: Richard Henderson (rth) → Christian Ehrhardt  (paelzer)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.5 KiB)

This bug was fixed in the package qemu - 1:5.0-5ubuntu3

qemu (1:5.0-5ubuntu3) groovy; urgency=medium

  * d/p/ubuntu/lp-1887763-*: fix TCG sizing that OOMed many small CI
    environments (LP: #1887763)
  * Pick further changes for groovy from debian/master since 5.0-5
    - ati-vga-check-mm_index-before-recursive-call-CVE-2020-13800.patch
      Closes: CVE-2020-13800, ati-vga allows guest OS users to trigger
      infinite recursion via a crafted mm_index value during
      ati_mm_read or ati_mm_write call.
    - revert-memory-accept-mismatching-sizes-in-memory_region_access_valid...patch
      Closes: CVE-2020-13754, possible OOB memory accesses in a bunch of qemu
      devices which uses min_access_size and max_access_size Memory API fields.
      Also closes: CVE-2020-13791
    - exec-set-map-length-to-zero-when-returning-NULL-CVE-2020-13659.patch
      CVE-2020-13659: address_space_map in exec.c can trigger
      a NULL pointer dereference related to BounceBuffer
    - megasas-use-unsigned-type-for-reply_queue_head-and-check-index...patch
      Closes: #961887, CVE-2020-13362, megasas_lookup_frame in hw/scsi/megasas.c
      has an OOB read via a crafted reply_queue_head field from a guest OS user
    - megasas-use-unsigned-type-for-positive-numeric-fields.patch
      fix other possible cases like in CVE-2020-13362 (#961887)
    - megasas-fix-possible-out-of-bounds-array-access.patch
      Some tracepoints use a guest-controlled value as an index into the
      mfi_frame_desc[] array. Thus a malicious guest could cause a very low
      impact OOB errors here
    - nbd-server-avoid-long-error-message-assertions-CVE-2020-10761.patch
      Closes: CVE-2020-10761, An assertion failure issue in the QEMU NBD Server.
      This flaw occurs when an nbd-client sends a spec-compliant request that is
      near the boundary of maximum permitted request length. A remote nbd-client
      could use this flaw to crash the qemu-nbd server resulting in a DoS.
    - es1370-check-total-frame-count-against-current-frame-CVE-2020-13361.patch
      Closes: CVE-2020-13361, es1370_transfer_audio in hw/audio/es1370.c does not
      properly validate the frame count, which allows guest OS users to trigger
      an out-of-bounds access during an es1370_write() operation
    - a few patches from the stable series:
      - fix-tulip-breakage.patch
        The tulip network driver in a qemu-system-hppa emulation is broken in
        the sense that bigger network packages aren't received any longer and
        thus even running e.g. "apt update" inside the VM fails. Fix this.
      - 9p-lock-directory-streams-with-a-CoMutex.patch
        Prevent deadlocks in 9pfs readdir code
      - net-do-not-include-a-newline-in-the-id-of-nic-device.patch
        Fix newline accidentally sneaked into id string of a nic
      - qemu-nbd-close-inherited-stderr.patch
      - virtio-balloon-fix-free-page-hinting-check-on-unreal.patch
      - virtio-balloon-fix-free-page-hinting-without-an-iothread.patch
      - virtio-balloon-unref-the-iothread-when-unrealizing.patch
    - acpi-tmr-allow-2-byte-reads.patch (Closes: #964247)
    - reapply CVE-2020-13253 fixed from upstre...

Changed in qemu (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

There's a request for a backport of this fix to be made to Ubuntu 20.04 in duplicate bug 1924231, so I'm adding a task for that.

Changed in qemu (Ubuntu Focal):
status: New → Confirmed
status: Confirmed → Triaged
importance: Undecided → Medium
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

For Focal:
- SRU Template added to the bug
- MP:
- PPA: (still building)

I'd ask anyone affected by this on Focal to give it a try on the PPA and let us know if this fix would work for you.

Revision history for this message
Yasuhiro Horimoto (komainu8) wrote :

Thank you for fixing the problem.

I confirmed that is fixed with

Thank you.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers