sbrk() not working under qemu-user with a PIE-compiled binary?

Bug #1749393 reported by Raphaël Hertzog on 2018-02-14
54
This bug affects 9 people
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned
qemu (Ubuntu)
Undecided
Unassigned

Bug Description

In Debian unstable, we recently switched bash to be a PIE-compiled binary (for hardening). Unfortunately this resulted in bash being broken when run under qemu-user (for all target architectures, host being amd64 for me).

$ sudo chroot /srv/chroots/sid-i386/ qemu-i386-static /bin/bash
bash: xmalloc: .././shell.c:1709: cannot allocate 10 bytes (0 bytes allocated)

bash has its own malloc implementation based on sbrk():
https://git.savannah.gnu.org/cgit/bash.git/tree/lib/malloc/malloc.c

When we disable this internal implementation and rely on glibc's malloc, then everything is fine. But it might be that glibc has a fallback when sbrk() is not working properly and it might hide the underlying problem in qemu-user.

This issue has also been reported to the bash upstream author and he suggested that the issue might be in qemu-user so I'm opening a ticket here. Here's the discussion with the bash upstream author:
https://lists.gnu.org/archive/html/bug-bash/2018-02/threads.html#00080

You can find the problematic bash binary in that .deb file:
http://snapshot.debian.org/archive/debian/20180206T154716Z/pool/main/b/bash/bash_4.4.18-1_i386.deb

The version of qemu I have been using is 2.11 (Debian package qemu-user-static version 1:2.11+dfsg-1) but I have had reports that the problem is reproducible with older versions (back to 2.8 at least).

Here are the related Debian bug reports:
https://bugs.debian.org/889869
https://bugs.debian.org/865599

It's worth noting that bash used to have this problem (when compiled as a PIE binary) even when run directly but then something got fixed in the kernel and now the problem only appears when run under qemu-user:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518483

Affected by the same bug.
target architecture arm
host architecture amd64

bug message
bash: xmalloc: .././locale.c:****: cannot allocate 2 bytes (0 bytes allocated)

Peter Ogden (peterogden) wrote :

This appears to be a problem in all PIE-compiled executables that use sbrk in qemu-user due to the way that position-independent code gets mmapped into adjacent ranges meaning there is no room for expansion. I've hacked my version of QEMU to force the program binary to mmap in a different range allowing for the region to be resized which fixes this issue. I don't know the most appropriate way to determine what range to use in generate though.

Peter Maydell (pmaydell) on 2018-03-15
tags: added: arm linux-user
Peter Maydell (pmaydell) wrote :

There seem to be two parts to this. Firstly, with a big reserved-region, which is the default for 32-bit-guest-on-64-bit-host, this code in main.c:

        if (reserved_va) {
            mmap_next_start = reserved_va;
        }

says to start trying for the next mmap address at the top of the reserved section, which is typically right at the top of the guest's address space. This means that for a PIE executable we'll try to load it at a very high address, which then means there's no space above the data section for the brk segment.

Secondly, for the no-reserved-region case (-R 0, or 64-on-64), we still fail, but this time because we've chosen to mmap the dynamic interpreter at an address just above the executable. Again, no space to expand the data segment and brk fails.

Linux kernel commit a87938b2e246b81 message says something about there being a guaranteed 128MB "gap" between data segment and stack on x86-64 which we're obviously not honourin; presumbably there's similar requirements for other archs. (As an aside, is bash really happy with only having perhaps 128MB of allocatable memory? Otherwise it really ought to use mmap rather than brk for its allocator.)

Peter Ogden (peterogden) wrote :

Could we over-allocate the data segment by QEMU_DATA_SIZE/getrlimit(RLIMIT_DATA)/128 MB depending on what's set - similar to how the stack size is managed?

My current workaround for aarch64 on x86-64 is to mmap a dynamic main executable in some far-away part of the address space but I'm not sure how to find somewhere suitable on a 32-bit host/guest.

On 16 March 2018 at 10:34, Richard Henderson
<email address hidden> wrote:
> Limit this to 16M; there does not appear to be any special
> support for this in the kernel itself, at least for i686.
>
> Fixes: https://bugs.launchpad.net/bugs/1749393
> Signed-off-by: Richard Henderson <email address hidden>
> ---
>
> Commentary in the launchpad bug suggests 128M gap for x86_64, but that's
> somewhat irrelevant to the given i686 test case. There's certainly nothing
> in the referenced kernel patch that does any more than we had been doing
> without this patch.

I think the 128MB is enforced by mmap_base() in arch/x86/mm/mmap.c:
since x86-64 sots HAVE_ARCH_UNMAPPED_AREA_TOPDOWN, mmap_base is the
highest address in memory where mmap is permitted, and mmap_base()
enforces that it goes at least 128MB below the bottom of the stack
(accounting for rlimit stack size requirements also). Since
binfmt_elf() loads ELF segments via mmap this means that they won't
go too close to the stack. (The commit a87938b2e246 ensures the
gap is honoured by using the full binary size when it does the first
mapping so that mmap picks an address that is sufficiently before the
end of the mmap region for everything to fit.)
The kernel also uses ELF_ET_DYN_BASE to ensure that PIE programs
themselves get loaded clear of the ELF interpreter, which we
don't have any equivalent of (so you can see that different values
of -R result in either the interpreter or the executable getting
loaded at lower addresses.)

PS: do you know what the intention of the
        if (reserved_va) {
            mmap_next_start = reserved_va;
        }
code in linux-user/main.c is? It seems a bit odd to say "ok,
we have reserved a big region. we will start trying to mmap
outside it.", especially when that region covers the full
4G that the guest can access...

thanks
-- PMM

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers