SIGSEGV in memory_region_access_valid on Sabre Lite board

Bug #1596160 reported by 小太 on 2016-06-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned

Bug Description

I'm trying to emulate a Sabre Lite board and booting U-Boot, but I'm encountering a SIGSEGV almost immediately after starting QEMU.

QEMU version: 6f1d2d1c5ad20d464705b17318cb7ca495f8078a
U-Boot version: mx6qsabrelite_defconfig 2016.05 (with http://git.denx.de/?p=u-boot.git;a=commitdiff;h=1f516faa45611aedc8c2e3f303b3866f615d481e reverted, since it hangs the CPU)

$ gdb --args ./arm-softmmu/qemu-system-arm -machine sabrelite -kernel ~/u-boot-2016.05/u-boot
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1

...

(gdb) r
Starting program: /home/kota/qemu/build/arm-softmmu/qemu-system-arm -machine sabrelite -kernel /home/kota/u-boot-2016.05/u-boot
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe9074700 (LWP 18025)]
[New Thread 0x7fffe58c0700 (LWP 18027)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe58c0700 (LWP 18027)]
0x00005555557aaaa8 in memory_region_access_valid (mr=mr@entry=0x7fffe594e0e0, addr=addr@entry=0, size=size@entry=4, is_write=is_write@entry=true) at /home/kota/qemu/memory.c:1143
1143 if (!mr->ops->valid.unaligned && (addr & (size - 1))) {
(gdb) print mr->ops
$1 = (const MemoryRegionOps *) 0x0
(gdb) print *mr
$2 = {parent_obj = {class = 0x555556678990, free = 0x0, properties = 0x555557002d20, ref = 1, parent = 0x555556693d10}, romd_mode = true, ram = false, subpage = false, readonly = false, rom_device = true,
  flush_coalesced_mmio = false, global_locking = true, dirty_log_mask = 0 '\000', ram_block = 0x5555570228f0, owner = 0x0, iommu_ops = 0x0, ops = 0x0, opaque = 0x0, container = 0x555556693980, size = {
    lo = 98304, hi = 0}, addr = 0, destructor = 0x5555557a70b0 <memory_region_destructor_rom_device>, align = 2097152, terminates = true, skip_dump = false, enabled = true, warning_printed = false,
  vga_logging_count = 0 '\000', alias = 0x0, alias_offset = 0, priority = 0, subregions = {tqh_first = 0x0, tqh_last = 0x7fffe594e188}, subregions_link = {tqe_next = 0x7fffe594d988, tqe_prev = 0x7fffe594e290},
  coalesced = {tqh_first = 0x0, tqh_last = 0x7fffe594e1a8}, name = 0x555557022710 "imx6.rom", ioeventfd_nb = 0, ioeventfds = 0x0, iommu_notify = {notifiers = {lh_first = 0x0}}}
(gdb) bt
#0 0x00005555557aaaa8 in memory_region_access_valid (mr=mr@entry=0x7fffe594e0e0, addr=addr@entry=0, size=size@entry=4, is_write=is_write@entry=true) at /home/kota/qemu/memory.c:1143
#1 0x00005555557aacbd in memory_region_dispatch_write (mr=0x7fffe594e0e0, addr=0, data=3925868734, size=4, attrs=...) at /home/kota/qemu/memory.c:1249
#2 0x00007fffe645a4e4 in code_gen_buffer ()
#3 0x0000555555778d4d in cpu_tb_exec (itb=<optimized out>, itb=<optimized out>, cpu=0x7fffe58c92e0) at /home/kota/qemu/cpu-exec.c:166
#4 cpu_loop_exec_tb (sc=0x7fffe58bfab0, tb_exit=<synthetic pointer>, last_tb=0x7fffe58bfaa0, tb=<optimized out>, cpu=0x7fffe58c92e0) at /home/kota/qemu/cpu-exec.c:530
#5 cpu_arm_exec (cpu=cpu@entry=0x7fffe58c1080) at /home/kota/qemu/cpu-exec.c:626
#6 0x0000555555798a20 in tcg_cpu_exec (cpu=0x7fffe58c1080) at /home/kota/qemu/cpus.c:1541
#7 tcg_exec_all () at /home/kota/qemu/cpus.c:1574
#8 qemu_tcg_cpu_thread_fn (arg=<optimized out>) at /home/kota/qemu/cpus.c:1171
#9 0x00007ffff27f1184 in start_thread (arg=0x7fffe58c0700) at pthread_create.c:312
#10 0x00007ffff251e37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

小太 (nospam-i) wrote :

I've narrowed the crash to a stmia instruction in U-Boot's relocate_code:

Breakpoint 3, relocate_code () at arch/arm/lib/relocate.S:81
81 subs r4, r0, r1 /* r4 <- relocation offset */
(gdb) disas
Dump of assembler code for function relocate_code:
   0x17802620 <+0>: ldr r1, [pc, #76] ; 0x17802674 <relocate_done+4>
=> 0x17802624 <+4>: subs r4, r0, r1
   0x17802628 <+8>: beq 0x17802670 <relocate_done>
   0x1780262c <+12>: ldr r2, [pc, #68] ; 0x17802678 <relocate_done+8>
   0x17802630 <+16>: ldm r1!, {r10, r11}
   0x17802634 <+20>: stmia r0!, {r10, r11}
   0x17802638 <+24>: cmp r1, r2
   0x1780263c <+28>: bcc 0x17802630 <relocate_code+16>
   0x17802640 <+32>: ldr r2, [pc, #52] ; 0x1780267c <relocate_done+12>
   0x17802644 <+36>: ldr r3, [pc, #52] ; 0x17802680 <relocate_done+16>
   0x17802648 <+0>: ldm r2!, {r0, r1}
   0x1780264c <+4>: and r1, r1, #255 ; 0xff
   0x17802650 <+8>: cmp r1, #23
   0x17802654 <+12>: bne 0x17802668 <fixnext>
   0x17802658 <+16>: add r0, r0, r4
   0x1780265c <+20>: ldr r1, [r0]
   0x17802660 <+24>: add r1, r1, r4
   0x17802664 <+28>: str r1, [r0]
   0x17802668 <+0>: cmp r2, r3
   0x1780266c <+4>: bcc 0x17802648 <fixloop>
   0x17802670 <+0>: bx lr
End of assembler dump.
(gdb) si
82 beq relocate_done /* skip relocation */
(gdb)
83 ldr r2, =__image_copy_end /* r2 <- SRC &__image_copy_end */
(gdb)
86 ldmia r1!, {r10-r11} /* copy from source address [r1] */
(gdb)
87 stmia r0!, {r10-r11} /* copy to target address [r0] */
(gdb) bt
#0 relocate_code () at arch/arm/lib/relocate.S:87
#1 0x178025cc in _main () at arch/arm/lib/crt0.S:121
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) si
Remote connection closed

小太 (nospam-i) wrote :

Registers at location of crash:
(gdb) info reg
r0 0x0 0
r1 0x17800008 394264584
r2 0x178655e8 394679784
r3 0x0 0
r4 0xe8800000 -394264576
r5 0x17800338 394265400
r6 0x0 0
r7 0x0 0
r8 0x0 0
r9 0x4f53beb8 1330888376
r10 0xea0000be -369098562
r11 0xe59ff014 -442503148
r12 0x4f53bfb0 1330888624
sp 0x4f53be90 0x4f53be90
lr 0x178025cc 394274252
pc 0x17802634 0x17802634 <relocate_code+20>
cpsr 0x800001d3 -2147483181

description: updated
小太 (nospam-i) on 2016-06-25
description: updated
Peter Maydell (pmaydell) wrote :

We shouldn't really be segfaulting in QEMU no matter what the guest does. Can you put the guest binary somewhere where we can get it, please?

小太 (nospam-i) wrote :

Attached, though I've since recompiled it (with no further changes) so addresses might no longer match the ones in my original report. It still crashes, though

berte (b3hzat) wrote :
Download full text (8.7 KiB)

This issue as same as when I build yocto sabrelite build. You can find detailed information as below:
berte [ ~/playground/fsl-arm-yocto-bsp/hmi_test/tmp/deploy/images/imx6dlsabresd ]$ gdb --args ~/playground/qemu/debug/arm-softmmu/qemu-system-arm -smp 4 -M sabrelite -m 1024M -kernel u-boot.imx-sd
GNU gdb (Gentoo 7.10.1 vanilla) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/berte/playground/qemu/debug/arm-softmmu/qemu-system-arm...done.
(gdb) r
Starting program: /home/berte/playground/qemu/debug/arm-softmmu/qemu-system-arm -smp 4 -M sabrelite -m 1024M -kernel u-boot.imx-sd
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffeb37b700 (LWP 8652)]
[New Thread 0x7fffd63ca700 (LWP 8653)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd63ca700 (LWP 8653)]
0x00005555557ac1fa in memory_region_access_valid (mr=0x7ffff7f2b0e0, addr=0, size=1, is_write=true) at /home/berte/playground/qemu/memory.c:1143
1143 if (!mr->ops->valid.unaligned && (addr & (size - 1))) {
(gdb) bt
#0 0x00005555557ac1fa in memory_region_access_valid (mr=0x7ffff7f2b0e0, addr=0, size=1, is_write=true) at /home/berte/playground/qemu/memory.c:1143
#1 0x00005555557ac663 in memory_region_dispatch_write (mr=0x7ffff7f2b0e0, addr=0, data=0, size=1, attrs=...) at /home/berte/playground/qemu/memory.c:1249
#2 0x00005555557b24f4 in io_writeb (env=0x7ffff7ea62f8, iotlbentry=0x7ffff7eb9688, val=0 '\000', addr=0, retaddr=140736862856889)
    at /home/berte/playground/qemu/softmmu_template.h:369
#3 0x00005555557b2837 in helper_ret_stb_mmu (env=0x7ffff7ea62f8, addr=0, val=0 '\000', oi=4, retaddr=140736862856889)
    at /home/berte/playground/qemu/softmmu_template.h:409
#4 0x00007fffdab7a6bb in code_gen_buffer ()
#5 0x000055555576a056 in cpu_tb_exec (cpu=0x7ffff7e9e080, itb=0x7fffd63cb240) at /home/berte/playground/qemu/cpu-exec.c:166
#6 0x000055555576ab3a in cpu_loop_exec_tb (cpu=0x7ffff7e9e080, tb=0x7fffd63cb240, last_tb=0x7fffd63c9a68, tb_exit=0x7fffd63c9a64, sc=0x7fffd63c9a80)
    at /home/berte/playground/qemu/cpu-exec.c:530
#7 0x000055555576ae26 in cpu_arm_exec (cpu=0x7ffff7e9e080) at /home/berte/playground/qemu/cpu-exec.c:626
#8 0x000055555579483a in tcg_cpu_exec (cpu=0x7ffff7e9e080) at /home/berte/playground/qemu/cpus.c:1541
#9 0x0000555555794925 in tcg_exec_all () at /home/berte/playground/qemu/cpus.c:1574
#10 0x0000555555793d05 in qemu_tcg_cpu_thread_fn (arg=0x7ffff7e9e080) at /home/berte/playground/qemu/cpus.c:1171
...

Read more...

berte (b3hzat) wrote :
Peter Maydell (pmaydell) wrote :

The immediate cause of this crash is that the guest is trying to write to the imx6.rom region, which (as the name suggests) is read-only, so your guest is probably misconfigured if it's doing that. However we shouldn't crash.

The bug here is that the various imx boards call memory_region_init_rom_device() for the ROMs passing a NULL pointer for the 'ops' argument, which is always a bug. The right API for this is to call memory_region_init_ram() and then memory_region_set_readonly(). We should also assert in memory_region_rom_device() if the ops argument is NULL...

Peter Maydell (pmaydell) wrote :

I have some patches which I'll post shortly which fix QEMU crashing on attempts to write to the ROM. However this doesn't cause your test binary to work. What happens is that we start executing "instructions" from the start of this binary blob, but it looks like this is actually data:

0x10010000: 402000d1 ldrdmi r0, [r0], -r1
0x10010004: 17800000 strne r0, [r0, r0]
0x10010008: 00000000 andeq r0, r0, r0
0x1001000c: 177ff42c ldrbne pc, [pc, -ip, lsr #8]!
0x10010010: 177ff420 ldrbne pc, [pc, -r0, lsr #8]!
0x10010014: 177ff400 ldrbne pc, [pc, -r0, lsl #8]!
0x10010018: 00000000 andeq r0, r0, r0
0x1001001c: 00000000 andeq r0, r0, r0
0x10010020: 177ff000 ldrbne pc, [pc, -r0]!
0x10010024: 00065000 andeq r5, r6, r0
0x10010028: 00000000 andeq r0, r0, r0
0x1001002c: 40f002d2 ldrsbtmi r0, [r0], #34
0x10010030: 04ec02cc strbteq r0, [ip], #716
0x10010034: 74070e02 strvc r0, [r7], #-3586
0x10010038: 00000c00 andeq r0, r0, r0, lsl #24
0x1001003c: 54070e02 strpl r0, [r7], #-3586
0x10010040: 00000000 andeq r0, r0, r0
0x10010044: ac040e02 stcge 14, cr0, [r4], {2}

Eventually we get to that 'stcge', which isn't a valid instruction for a Cortex-A9, and causes an UNDEF exception. We take the exception to the usual UNDEF vector, where there is no code:
0x00000004: 00000000 andeq r0, r0, r0
0x00000008: 00000000 andeq r0, r0, r0
0x0000000c: 00000000 andeq r0, r0, r0
0x00000010: 00000000 andeq r0, r0, r0

and continue to execute NOPs through the whole of this empty ROM until we get to the end of it:
0x00017ff8: 00000000 andeq r0, r0, r0
0x00017ffc: 00000000 andeq r0, r0, r0

at which point we hit the usual "trying to execute code outside RAM or ROM" error:

qemu: fatal: Trying to execute code outside RAM or ROM at 0x00018000

R00=00000000 R01=ffffffff R02=10000100 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=ffffe3fc
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=000002cc R13=00000000 R14=10010048 R15=00018000
PSR=400001db -Z-- A S und32

So the underlying problem here is that the thing you're passing to -kernel is neither (1) a Linux kernel nor (2) an ELF format binary, which is what -kernel is expecting.

thanks
-- PMM

Peter Maydell (pmaydell) wrote :

Patches fixing the SEGV are now in git master and will be in the 2.7 release.

Changed in qemu:
status: New → Fix Committed
Thomas Huth (th-huth) on 2017-01-14
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers