Arm64 fails to run a binary which runs OK on real hardware

Bug #1263747 reported by Richard Jones
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

Note this is using the not-yet-upstream aarch64 patches from:

https://github.com/susematz/qemu/tree/aarch64-1.6

---- ----

This binary:

http://oirase.annexia.org/tmp/test.gz

runs OK on real aarch64 hardware. It is a statically linked Linux binary which (if successful) will print "hello, world" and exit cleanly.

On qemu-arm64 userspace emulator it doesn't print anything and loops forever using 100% CPU.

---- ----

The following section is only if you wish to compile this binary from source, otherwise you can ignore it.

First compile OCaml from:

https://github.com/ocaml/ocaml

(note you have to compile it on aarch64 or in qemu, it's not possible to cross-compile). You will have to apply the one-line patch from:

https://sympa.inria.fr/sympa/arc/caml-list/2013-12/msg00179.html

    ./configure
    make -j1 world.opt

Then do:

    echo 'print_endline "hello, world"' > test.ml
    ./boot/ocamlrun ./ocamlopt -I stdlib stdlib.cmxa test.ml -o test
    ./test

description: updated
Revision history for this message
Peter Maydell (pmaydell) wrote : Re: [Qemu-devel] [Bug 1263747] [NEW] Arm64 fails to run a binary which runs OK on real hardware

On 23 December 2013 18:38, Richard Jones <email address hidden> wrote:
> This binary:
>
> http://oirase.annexia.org/tmp/test.gz
>
> runs OK on real aarch64 hardware. It is a statically linked Linux
> binary which (if successful) will print "hello, world" and exit cleanly.
>
> On qemu-arm64 userspace emulator it doesn't print anything and loops
> forever using 100% CPU.

Does the equivalent binary run OK in 32 bit ARM QEMU?
Does the binary use multiple threads?

If you have the time to investigate more closely what the binary
is actually doing when it loops (eg by running under a host gdb,
or using the debug log tracing of TCG input and output code and
execution) that would be helpful. Otherwise it's likely to be quite a
long time before I get round to looking at this kind of thing, because
"runs complex binaries/runtimes like ocaml" is not very high up the
priority list, I'm afraid.

thanks
-- PMM

Revision history for this message
Richard Jones (rjones-redhat) wrote :

It's an Aarch64 binary so it won't run on 32 bit ARM at all. However I guess you meant does the equivalent program run on 32 bit ARM, and the answer is yes, but that doesn't tell us much because OCaml uses separate code generators for 32 and 64 bit ARM.

The binary is single threaded.

I enabled tracing on qemu and got this:

http://oirase.annexia.org/tmp/arm64-call-trace.txt

The associate disassembly of the binary is here:

http://oirase.annexia.org/tmp/arm64-disassembly.txt

I'm not exactly sure which instruction fails to be emulated properly, but it looks like one of the ones in the caml_c_call function.

Revision history for this message
Richard Jones (rjones-redhat) wrote :

One thing I notice is that caml_c_call is the only function that uses the instruction "ret xM" (in all other places the code uses the default "ret" with implicit x30). Hmmm .. do we emulate "ret xM"?

Revision history for this message
Richard Jones (rjones-redhat) wrote :

The attached patch fixes the ret xM variant of ret. I verified that it fixes the bug.

Revision history for this message
Peter Maydell (pmaydell) wrote : Re: [Qemu-devel] [Bug 1263747] Re: Arm64 fails to run a binary which runs OK on real hardware

On 23 December 2013 21:27, Richard Jones <email address hidden> wrote:
> It's an Aarch64 binary so it won't run on 32 bit ARM at all. However I
> guess you meant does the equivalent program run on 32 bit ARM, and the
> answer is yes, but that doesn't tell us much because OCaml uses separate
> code generators for 32 and 64 bit ARM.

Yes, that's why I said "equivalent binary". It's a useful check because it
can tell us whether the program is using things our linux-user emulation
doesn't get right at all (examples: multiple threads; some interactions of
signals and blocking syscalls); so it divides the bug into "probably in
linux-user" vs "probably a target-arm bug".

I see you've tracked the issue down in this case, though.

thanks
-- PMM

Revision history for this message
sumanth (sgundapa) wrote :

>> runs OK on real aarch64 hardware.
May I know which hardware you are talking about. Is there an aarch64 hardware target available ?

Revision history for this message
Peter Maydell (pmaydell) wrote :

The (re)implementation of this instruction for mainline never had this bug.

Changed in qemu:
status: New → Fix Committed
Thomas Huth (th-huth)
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.