Error in user-mode calculation of ELF aux vector's AT_PHDR

Bug #1885332 reported by Langston
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

I have an (admittedly strange) statically-linked ELF binary for Linux that runs just fine on top of the Linux kernel in QEMU full-system emulation, but crashes before main in user-mode emulation. Specifically, it crashes when initializing thread-local storage in glibc's _dl_aux_init, because it reads out a strange value from the AT_PHDR entry of the ELF aux vector.

The binary has these program headers:

  Program Headers:
    Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
    EXIDX 0x065874 0x00075874 0x00075874 0x00570 0x00570 R 0x4
    PHDR 0x0a3000 0x00900000 0x00900000 0x00160 0x00160 R 0x1000
    LOAD 0x0a3000 0x00900000 0x00900000 0x00160 0x00160 R 0x1000
    LOAD 0x000000 0x00010000 0x00010000 0x65de8 0x65de8 R E 0x10000
    LOAD 0x066b7c 0x00086b7c 0x00086b7c 0x02384 0x02384 RW 0x10000
    NOTE 0x000114 0x00010114 0x00010114 0x00044 0x00044 R 0x4
    TLS 0x066b7c 0x00086b7c 0x00086b7c 0x00010 0x00030 R 0x4
    GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x8
    GNU_RELRO 0x066b7c 0x00086b7c 0x00086b7c 0x00484 0x00484 R 0x1
    LOAD 0x07e000 0x00089000 0x00089000 0x03f44 0x03f44 R E 0x1000
    LOAD 0x098000 0x00030000 0x00030000 0x01000 0x01000 RW 0x1000

If I build the Linux kernel with the following patch to the very end of create_elf_tables in fs/binfmt_elf.c

  /* Put the elf_info on the stack in the right place. */
  elf_addr_t *my_auxv = (elf_addr_t *) mm->saved_auxv;
  int i;
  for (i = 0; i < 15; i++) {
    printk("0x%x = 0x%x", my_auxv[2*i], my_auxv[(2*i)+ 1]);
  }
  if (copy_to_user(sp, mm->saved_auxv, ei_index * sizeof(elf_addr_t)))
      return -EFAULT;
  return 0;

and run it like this:

  qemu-system-arm \
    -M versatilepb \
    -nographic \
    -dtb ./dts/versatile-pb.dtb \
    -kernel zImage \
    -M versatilepb \
    -m 128M \
    -append "earlyprintk=vga,keep" \
    -initrd initramfs

after I've built the kernel initramfs like this (where "init" is the binary in question):

  make ARCH=arm versatile_defconfig
  make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- all -j10
  cp "$1" arch/arm/boot/init
  cd arch/arm/boot
  echo init | cpio -o --format=newc > initramfs

then I get the following output. This is the kernel's view of the aux vector for this binary:

  0x10 = 0x1d7
  0x6 = 0x1000
  0x11 = 0x64
  0x3 = 0x900000
  0x4 = 0x20
  0x5 = 0xb
  0x7 = 0x0
  0x8 = 0x0
  0x9 = 0x101b8
  0xb = 0x0
  0xc = 0x0
  0xd = 0x0
  0xe = 0x0
  0x17 = 0x0
  0x19 = 0xbec62fb5

However, if I run "qemu-arm -g 12345 binary" and use GDB to peek at the aux vector at the beginning of __libc_start_init (for example, using this Python GDB API script: https://gist.github.com/langston-barrett/5573d64ae0c9953e2fa0fe26847a5e1e), then I see the following values:

  AT_PHDR = 0xae000
  AT_PHENT = 0x20
  AT_PHNUM = 0xb
  AT_PAGESZ = 0x1000
  AT_BASE = 0x0
  AT_FLAGS = 0x0
  AT_ENTRY = 0x10230
  AT_UID = 0x3e9
  AT_EUID = 0x3e9
  AT_GID = 0x3e9
  AT_EGID = 0x3e9
  AT_HWCAP = 0x1fb8d7
  AT_CLKTCK = 0x64
  AT_RANDOM = -0x103c0
  AT_HWCAP2 = 0x1f
  AT_NULL = 0x0

The crucial difference is in AT_PHDR (0x3), which is indeed the virtual address of the PHDR segment when the kernel calculates it, but is not when QEMU calculates it.

qemu-arm --version
qemu-arm version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.26)

Revision history for this message
Langston (langston0) wrote :

I just confirmed that this is still a problem on git tag v5.0.0, where I applied the following:

  diff --git a/linux-user/elfload.c b/linux-user/elfload.c
  index 619c054cc4..093656d059 100644
  --- a/linux-user/elfload.c
  +++ b/linux-user/elfload.c
  @@ -2016,6 +2016,7 @@ static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc,
      /* There must be exactly DLINFO_ITEMS entries here, or the assert
        * on info->auxv_len will trigger.
        */
  + printf("PHDR: %x\n", (abi_ulong)(info->load_addr + exec->e_phoff));
      NEW_AUX_ENT(AT_PHDR, (abi_ulong)(info->load_addr + exec->e_phoff));
      NEW_AUX_ENT(AT_PHENT, (abi_ulong)(sizeof (struct elf_phdr)));
      NEW_AUX_ENT(AT_PHNUM, (abi_ulong)(exec->e_phnum));

and saw:

  PHDR: ae000

Revision history for this message
Langston (langston0) wrote :

Taking a peek at how Linux and QEMU calculate AT_PHDR for static binaries reveals the following. Both involve the program headers' offset (e_phoff) added to a value I'll call load_addr (as in the kernel).

In the kernel, load_addr is

  elf_ppnt->p_vaddr - elf_ppnt->p_offset

where elf_ppnt is the program header entry of the first segment with type LOAD: https://github.com/torvalds/linux/blob/242b23319809e05170b3cc0d44d3b4bd202bb073/fs/binfmt_elf.c#L1120

In QEMU, load_addr is set to an earlier value loaddr, which is set to

  min_i(phdr[i].p_vaddr - phdr[i].p_offset)

where min_i is the minimum over indices "i" of LOAD segments. https://github.com/qemu/qemu/blob/9e7f1469b9994d910fc1b185c657778bde51639c/linux-user/elfload.c#L2407. If you perform this calculation by hand for the program headers posted at the beginning of this thread, you'll get ae000, as expected.

The problem here is that QEMU takes a minimum where Linux just takes the first value. Presumably, changing QEMU's behavior to match that of the kernel wouldn't break anything that wouldn't be broken if it really ran on Linux. Unfortunately, Linux's ELF loader is much more picky than the ELF standard, but that's a whole other story...

Revision history for this message
Dmitry (xeioexception) wrote :

@langston0 Thanks for detailed explanation, got the same problem for qemu-s390.

The way to reproduce (linux kernel >= 4.8, for example: Ubuntu 18.04):
# Register qemu binfmt_misc handlers
$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

$ cat Dockerfile.s390x
FROM s390x/ubuntu
RUN apt-get update && \
    apt-get install -y \
    gcc make libpcre3-dev libreadline-dev

RUN cd /home && git clone https://github.com/nginx/njs

RUN cd /home/njs && ./configure --cc-opt='-O0 -static -lm -lrt -pthread -Wl,--whole-archive -lpthread -ltinfo -Wl,--no-whole-archive' && make njs

$ docker build -t njs/390x -f Dockerfile.s390x .

# check the binary (WORKS!)
# inside docker s390 binaries are executed using qemu-s390-static from the host
$ docker run -t njs/390x /home/njs/build/njs -c 'console.log("hello")'
hello

# copy binary to host
$ docker run -v `pwd`:/m -ti njs/390x cp /home/njs/build/njs /m/njs-s390

# deregister binfmt handler
$ sudo bash -c "echo -1 > /proc/sys/fs/binfmt_misc/qemu-s390x"

# run qemu gdb
$ qemu-s390x -g 12345 ./njs-s390

# in a separate terminal
$ gdb-multiarch ./njs-s390 -ex 'target remote localhost:12345'
0x0000000001000520 in _start ()
(gdb) si
0x0000000001000524 in _start ()
(gdb) si
0x000000000100052a in _start ()
(gdb) c
Continuing.

Program received signal SIGILL, Illegal instruction.
0x00000000011a418c in _dl_aux_init ()
(gdb) bt
#0 0x00000000011a418c in _dl_aux_init ()
#1 0x00000000011663f0 in __libc_start_main ()
#2 0x0000000001000564 in _start ()

qemu-s390x --version
qemu-s390x version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.28)

Revision history for this message
Dmitry (xeioexception) wrote :

BTW, before "sudo bash -c "echo -1 > /proc/sys/fs/binfmt_misc/qemu-s390x"

njs-s390 also works on the host:

$ ./njs-s390 -c 'console.log("hello")'
hello

$ file njs-s390
njs-s390: ELF 64-bit MSB executable, IBM S/390, version 1 (GNU/Linux), statically linked, BuildID[sha1]=e37618578fb0a8c60f426826167a800e4f314ef3, for GNU/Linux 3.2.0, with debug_info, not stripped

Revision history for this message
Dmitry (xeioexception) wrote :

> runs just fine on top of the Linux kernel in QEMU full-system emulation, but crashes before main in user-mode emulation

So it seems system vs user-mode is not the issue here, probably it is related to gdb mode in user-mode qemu.

Revision history for this message
Langston (langston0) wrote :

@Dimitry To confirm that this is really the same issue (and not an unrelated crash in the same function), could you post:

 1. the ELF headers ("readelf -h"),
 2. the program headers ("readelf -l"), and
 3. the output (the AUX VECTOR section) from this GDB script (suitably modified for your program), when connecting to QEMU's GDB server? https://gist.github.com/langston-barrett/5573d64ae0c9953e2fa0fe26847a5e1e

Revision history for this message
Dmitry (xeioexception) wrote :

@Langston will do tomorrow. s390x ABI requires heavy changes to the python script.

Revision history for this message
Dmitry (xeioexception) wrote :
Download full text (3.4 KiB)

When I switch to armv7 the issue goes away

$ cat Dockerfile.armv7
FROM arm32v7/ubuntu
RUN apt-get update && \
    apt-get install -y \
    gcc make libpcre3-dev libreadline-dev git

RUN cd /home && git clone https://github.com/nginx/njs

RUN cd /home/njs && ./configure --cc-opt='-O0 -static -lm -lrt -pthread -Wl,--whole-archive -lpthread -ltinfo -Wl,--no-whole-archive' && make njs

$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
$ docker build -t njs/armv7 -f Dockerfile.armv7 .
$ docker run -v `pwd`:/m -ti njs/armv7 cp /home/njs/build/njs /m/njs-armv7

$ readelf -l ./njs-armv7

Elf file type is EXEC (Executable file)
Entry point 0x12fb9
There are 7 program headers, starting at offset 52

Program Headers:
  Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
  EXIDX 0x1be338 0x001ce338 0x001ce338 0x009b8 0x009b8 R 0x4
  LOAD 0x000000 0x00010000 0x00010000 0x1becf4 0x1becf4 R E 0x10000
  LOAD 0x1bedfc 0x001dedfc 0x001dedfc 0x17674 0x1c2cc RW 0x10000
  NOTE 0x000114 0x00010114 0x00010114 0x00044 0x00044 R 0x4
  TLS 0x1bedfc 0x001dedfc 0x001dedfc 0x00038 0x00060 R 0x4
  GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
  GNU_RELRO 0x1bedfc 0x001dedfc 0x001dedfc 0x0e204 0x0e204 R 0x1

 Section to Segment mapping:
  Segment Sections...
   00 .ARM.exidx
   01 .note.ABI-tag .note.gnu.build-id .rel.dyn .init .iplt .text __libc_freeres_fn __libc_thread_freeres_fn .fini .rodata .stapsdt.base __libc_subfreeres __libc_IO_vtables __libc_atexit __libc_thread_subfreeres .ARM.extab .ARM.exidx .eh_frame
   02 .tdata .init_array .fini_array .data.rel.ro .got .data .bss __libc_freeres_ptrs
   03 .note.ABI-tag .note.gnu.build-id
   04 .tdata .tbss
   05
   06 .tdata .init_array .fini_array .data.rel.ro

$ readelf -h ./njs-armv7
ELF Header:
  Magic: 7f 45 4c 46 01 01 01 03 00 00 00 00 00 00 00 00
  Class: ELF32
  Data: 2's complement, little endian
  Version: 1 (current)
  OS/ABI: UNIX - GNU
  ABI Version: 0
  Type: EXEC (Executable file)
  Machine: ARM
  Version: 0x1
  Entry point address: 0x12fb9
  Start of program headers: 52 (bytes into file)
  Start of section headers: 5696248 (bytes into file)
  Flags: 0x5000400, Version5 EABI, hard-float ABI
  Size of this header: 52 (bytes)
  Size of program headers: 32 (bytes)
  Number of program headers: 7
  Size of section headers: 40 (bytes)
  Number of section headers: 42
  Section header string table index: 41

$ qemu-arm -g 12345 ./njs-armv7 -c 'console.log("HH")'

$ gdb-multiarch ./njs-armv7 -ex 'source showstack.py'
ARGUMENTS
---------
argc = 3
arg 0 = ./njs-armv7
arg 1 = -c
arg 2 = console.log("HH")

...

AUX VECTOR
----------
AT_PHDR = 10034
AT_PHENT = 20
AT_PHNUM = 7
AT_PAGESZ = 1000
AT_BASE = 0
AT_FLAGS = 0
AT_ENTRY = 12fb9
AT...

Read more...

Revision history for this message
Dmitry (xeioexception) wrote :

Built the latest QEMU, the issue goes away

$ bin/debug/native/s390x-linux-user/qemu-s390x --version
qemu-s390x version 5.0.50 (v5.0.0-2358-g6c87d9f311-dirty)
Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

$ bin/debug/native/s390x-linux-user/qemu-s390x ../njs/njs-s390 -c 'console.log("HI")'
HI

So my issue seems unrelated, sorry for bothering.

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently moving its bug tracking to another system.
For this we need to know which bugs are still valid and which could be
closed already. Thus we are setting the bug state to "Incomplete" now.

If the bug has already been fixed in the latest upstream version of QEMU,
then please close this ticket as "Fix released".

If it is not fixed yet and you think that this bug report here is still
valid, then you have two options:

1) If you already have an account on gitlab.com, please open a new ticket
for this problem in our new tracker here:

    https://gitlab.com/qemu-project/qemu/-/issues

and then close this ticket here on Launchpad (or let it expire auto-
matically after 60 days). Please mention the URL of this bug ticket on
Launchpad in the new ticket on GitLab.

2) If you don't have an account on gitlab.com and don't intend to get
one, but still would like to keep this ticket opened, then please switch
the state back to "New" within the next 60 days (otherwise it will get
closed as "Expired"). We will then eventually migrate the ticket auto-
matically to the new system (but you won't be the reporter of the bug
in the new system and thus won't get notified on changes anymore).

Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Langston (langston0)
Changed in qemu:
status: Incomplete → New
Revision history for this message
Thomas Huth (th-huth) wrote : Moved bug report

This is an automated cleanup. This bug report has been moved to QEMU's
new bug tracker on gitlab.com and thus gets marked as 'expired' now.
Please continue with the discussion here:

 https://gitlab.com/qemu-project/qemu/-/issues/275

Changed in qemu:
status: New → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.