qemu-system-arm segfaults emulating versatile machine after running debootstrap --second-stage inside vm

Bug #604872 reported by Ricardo Salveti on 2010-07-13
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned
qemu-linaro (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: qemu-kvm

As I'm now implementing the support for creating a rootstock rootfs without requiring root, I need to run the deboostrap' second stage inside a VM, to correctly install the packages into the rootfs.

qemu-system-arm fails right after debootstrap finish the second stage, giving a segmentation fault.

Command:
qemu-system-arm -M versatilepb -cpu cortex-a8 -kernel vmlinuz -no-reboot -nographic -drive file=qemu-armel-201007122016.img,aio=native,cache=none -m 256 -append 'console=ttyAMA0,115200n8 root=/dev/sda rw mem=256M devtmpfs.mount=0 init=/bin/installer'
Uncompressing Linux................................................................................................................................................................................................. done, booting the kernel.
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.32-21-versatile (buildd@cushaw) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #32-Ubuntu Fri Apr 16 08:14:53 UTC 2010 (Ubuntu 2.6.32-21.32-versatile 2.6.32.11+drm33.2)
...
I: Base system installed successfully.
I: Starting basic services in VM
Segmentation fault (core dumped)

[492816.197352] qemu-system-arm[16024]: segfault at ffffffffcf6ba8fc ip ffffffffcf6ba8fc sp 00007fffd0e68680 error 14

Image:
 * rootfs: http://rsalveti.net/pub/ubuntu/rootstock/qemu-armel-201007122016.img (md5 1d063ac8a65c798bb004cd1c4c7970c5)
 * kernel: http://ports.ubuntu.com/ubuntu-ports/dists/lucid/main/installer-armel/current/images/versatile/netboot/vmlinuz

I'm able to reproduce the bug on Maverick (amd64) and Lucid (x86).

Maverick qemu-kvm-extras: 0.12.4+noroms-0ubuntu4
Lucid qemu-kvm-extras: 0.12.3+noroms-0ubuntu9.2

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: qemu-kvm-extras 0.12.4+noroms-0ubuntu4
ProcVersionSignature: Ubuntu 2.6.35-6.9-generic 2.6.35-rc3
Uname: Linux 2.6.35-6-generic x86_64
Architecture: amd64
Date: Mon Jul 12 18:55:35 2010
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1)
KvmCmdLine: Error: command ['ps', '-C', 'kvm', '-F'] failed with exit code 1: UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
MachineType: LENOVO 2764CTO
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.35-6-generic root=/dev/mapper/primary-root ro crashkernel=384M-2G:64M,2G-:128M quiet splash
ProcEnviron:
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: qemu-kvm
dmi.bios.date: 04/19/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 7UET86WW (3.16 )
dmi.board.name: 2764CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7UET86WW(3.16):bd04/19/2010:svnLENOVO:pn2764CTO:pvrThinkPadT400:rvnLENOVO:rn2764CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 2764CTO
dmi.product.version: ThinkPad T400
dmi.sys.vendor: LENOVO

Ricardo Salveti (rsalveti) wrote :
Ricardo Salveti (rsalveti) wrote :

Maverick:

I: Base system installed successfully.
I: Starting basic services in VM

Program received signal SIGSEGV, Segmentation fault.
0xffffffffceeef54c in ?? ()
(gdb) bt full
#0 0xffffffffceeef54c in ?? ()
No symbol table info available.
#1 0x00007fffffffdfa0 in ?? ()
No symbol table info available.
#2 0x000000000059896f in tb_find_slow (pc=Cannot access memory at address 0xffffffffffffffbe
) at /home/rsalveti/projects/ubuntu/maverick/packages/qemu-kvm-0.12.4+noroms/cpu-exec.c:172
        tb = Cannot access memory at address 0xffffffffffffffd2

Ricardo Salveti (rsalveti) wrote :

I'm also able to reproduce with upstream Qemu, hash aa5fb7b3bf388d643bd9c6e6fee9ace5db2e590f

I: Base system installed successfully.
I: Starting basic services in VM

Program received signal SIGSEGV, Segmentation fault.
0x000000008be2208b in ?? ()
(gdb) bt full
#0 0x000000008be2208b in ?? ()
No symbol table info available.
#1 0x00007fffffffe110 in ?? ()
No symbol table info available.
#2 0x00000000004f211e in tb_find_slow (pc=Cannot access memory at address 0xffffffffffffffbe
) at /home/rsalveti/projects/qemu/trunk/cpu-exec.c:170
        tb = Cannot access memory at address 0xffffffffffffffe2

Ricardo Salveti (rsalveti) wrote :

If you're at Lucid or Maverick, you can also create the rootfs img by running the attached rootstock script.

Please install all rootstock dependencies by installing the official version provided by the distro:
sudo apt-get install rootstock

Using the attached script:
sudo bash ./rootstock --fqdn beagleboard --login ubuntu --password temppwd --imagesize 512M --seed ubuntu-minimal --dist lucid --serial ttyS2 --components "main universe multiverse"

You'll get the rootfs img and the command that rootstock would call, that gives the seg fault, e.g:
qemu-system-arm -M versatilepb -cpu cortex-a8 -kernel /tmp/tmp.OxxUOBq4B0/qemu-vmlinuz -no-reboot -nographic -pidfile /tmp/tmp.OxxUOBq4B0/qemu.pid -drive file=/tmp/tmp.OxxUOBq4B0/qemu-armel-201007122016.img,aio=native,cache=none -m 256 -append 'console=ttyAMA0,115200n8 root=/dev/sda rw mem=256M devtmpfs.mount=0 init=/bin/installer quiet'

Oliver Grawert (ogra) on 2010-07-13
tags: added: armel
Changed in qemu-kvm (Ubuntu):
importance: Undecided → Medium
Changed in qemu-kvm (Ubuntu):
importance: Medium → Undecided
Scott Moser (smoser) on 2010-07-15
Changed in qemu-kvm (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Ricardo Salveti (rsalveti) wrote :

Tested with Maverick's vmlinuz-2.6.35-10-versatile and I'm still able to reproduce the problem (qemu cdcf9153e5e17dde340135fee5dcc7c299f2d4f5 this time).

Peter Maydell (pmaydell) wrote :

I've analysed this segfault. The problem is that we're not correctly taking account of the IT state on entry to a Thumb translation block if we're retranslating it for cpu_restore_state().

The offending TB here is:
0x0003dc00: movle r2, #0
0x0003dc02: ldr r1, [pc, #644] (0x3de88)
0x0003dc04: cmp r3, #2
0x0003dc06: str r2, [r1, #0]
0x0003dc08: it eq
0x0003dc0a: ldreq r3, [r5, #8]
0x0003dc0c: beq.w 0x3ddce

where the 'le' is because the TB before that ended with an 'it le'. When we execute this the str gets a data abort. qemu handles this by calling cpu_restore_state(), which reruns the translation process but this time generating a mapping between target and host addresses, so we can turn the host PC of the fault into a target PC. Unfortunately we retranslate without taking account of what the IT state at the start of the TB should have been:

0x0003dc00: movs r2, #0
0x0003dc02: ldr r1, [pc, #644] (0x3de88)
0x0003dc04: cmp r3, #2
0x0003dc06: str r2, [r1, #0]
0x0003dc08: it eq
0x0003dc0a: ldreq r3, [r5, #8]
0x0003dc0c: beq.w 0x3ddce

...note that that mov has become unconditional. (It's not just the disassembly, the generated intermediate code changes too.)
Since cpu_restore_state() works by (a) actually rewriting the translated code into the buffer and (b) stopping when we get to the PC which faulted, this means we end up writing over the old generated code with half of a different version of the generated code. This is never going to go well, and we end up jumping off into the weeds the next time we execute the TB.

I think this is related to but not the same as https://bugs.launchpad.net/qemu/+bug/581335.

Peter Maydell (pmaydell) wrote :

I have a patchset which fixes this bug, which I need to do a bit more cleanup and testing with before I post it to the list.

Changed in qemu:
status: New → In Progress
Ricardo Salveti (rsalveti) wrote :

Can you later post your patch or link to this bug? Then I can help testing it with rootstock.

Jani Monoses (jani) wrote :

should the fixes be applied to qemu-kvm if we plan on packaging qemu-meego for ARM support anyway?

Loïc Minier (lool) wrote :

qemu-kvm is what's currently in the Ubuntu archive; I'm sure Peter will also arrange for the ubuntu-qemu-omap branch to get these fixes once they are in suitable shape.

In any case, these fixes are also going upstream and will eventually bubble up to derived trees

On Fri, Jan 7, 2011 at 7:53 AM, Jani Monoses <email address hidden> wrote:
> should the fixes be applied to qemu-kvm if we plan on packaging qemu-
> meego for ARM support anyway?

It depends on the size of the fix, if it's something simple we can for
sure also update the qemu-kvm package. But as Loic pointed out, these
fixes are landing upstream soon, so it'll eventually end up at this
package later on.

Peter Maydell (pmaydell) wrote :

I've now posted this patchset; it comes in 7 parts:

http://patchwork.ozlabs.org/patch/77887/
http://patchwork.ozlabs.org/patch/77882/
http://patchwork.ozlabs.org/patch/77884/
http://patchwork.ozlabs.org/patch/77885/
http://patchwork.ozlabs.org/patch/77888/
http://patchwork.ozlabs.org/patch/77881/
http://patchwork.ozlabs.org/patch/77883/

An upstream qemu with those patches applied successfully runs the test case given in this bug.

(it is patch 5/7 http://patchwork.ozlabs.org/patch/77888/ in particular which is dealing with the specific case you've hit here, but I haven't tested with that patch alone.)

Aurelien Jarno (aurel32) on 2011-01-14
Changed in qemu:
status: In Progress → Fix Committed
Dustin Kirkland  (kirkland) wrote :

Moving this bug over to the qemu-linaro package, which now provides qemu-system-arm

affects: qemu-kvm (Ubuntu) → qemu-linaro (Ubuntu)
Peter Maydell (pmaydell) on 2011-02-11
Changed in qemu-linaro (Ubuntu):
status: Triaged → Fix Released
Loïc Minier (lool) on 2011-02-11
Changed in qemu-linaro (Ubuntu):
status: Fix Released → Triaged
Steve Langasek (vorlon) wrote :

Peter, is this targeted for the next monthly Linaro QEMU release?

(No need to worry about this for qemu-kvm any longer; the qemu-linaro package now handles qemu-system-arm exclusively.)

Peter Maydell (pmaydell) wrote :

The fix is already in qemu-linaro 2011.02.

Loïc Minier (lool) wrote :

Hmm Ubuntu has 2011.02, but I think you had asked me to flip this bug back to Triaged; I'm confused now, is this fixed in Ubuntu?

I didn't find http://patchwork.ozlabs.org/patch/77888/ applied in qemu-linaro 0.13.50-2011.02-0-0ubuntu1 which is based of 2011.02.

Peter Maydell (pmaydell) wrote :

The commit in qemu-linaro is:
http://git.linaro.org/gitweb?p=qemu/qemu-linaro.git;a=commit;h=98eac7cab4392ab28fa22265e27906f5b9c6c9da

I asked you to undo the status change just because I don't "own" "qemu-linaro (Ubuntu)" and don't know what counts as "fix released" (eg maybe that only happens when it goes into a stable ubuntu release?). It's "fix released" as far as "Linaro QEMU" is concerned, though.

Loïc Minier (lool) wrote :

Ok; the commit you point at is in the current qemu-linaro package

Fix released in Ubuntu is when we upload a fixed source package to the Ubuntu development release

The usual way to close Ubuntu bugs fixed with an upload is via the debian/changelog, when Launchpad processes the .changes file with the list of fixed bugs, it marks these fix released

If we need to track status in stable releases of Ubuntu, we use "Target to series" to flag the ones where we need a different bug state

Changed in qemu-linaro (Ubuntu):
status: Triaged → Fix Released
Aurelien Jarno (aurel32) on 2011-02-20
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers