AARCH64 to ARMv7 mistranslation in TCG

Bug #1830872 reported by Laszlo Ersek (Red Hat)
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

The following guest code:

  https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S

implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI
Development Kit II) library function. (CopyMem() basically has memmove()
semantics, to provide a standard C analog here.) The relevant functions
are InternalMemCopyMem() and __memcpy().

When TCG translates this aarch64 code to x86_64, everything works fine.

When TCG translates this aarch64 code to ARMv7, the destination area of
the translated CopyMem() function becomes corrupted -- it differs from
the intended source contents. Namely, in every 4096 byte block, the
8-byte word at offset 4032 (0xFC0) is zeroed out in the destination,
instead of receiving the intended source value.

I'm attaching two hexdumps of the same destination area:

- "good.txt" is a hexdump of the destination area when CopyMem() was
  translated to x86_64,

- "bad.txt" is a hexdump of the destination area when CopyMem() was
  translated to ARMv7.

In order to assist with the analysis of this issue, I disassembled the
aarch64 binary with "objdump". Please find the listing in
"DxeCore.objdump", attached. The InternalMemCopyMem() function starts at
hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180.

And, I ran the guest on the ARMv7 host with "-d
in_asm,op,op_opt,op_ind,out_asm". Please find the log in
"tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached.

The TBs that correspond to (parts of) the InternalMemCopyMem() and
__memcpy() functions are scattered over the TCG log file, but the offset
between the "nice" disassembly from "DxeCore.objdump", and the in-RAM
TBs in the TCG log, can be determined from the fact that there is a
single prfm instruction in the entire binary. The instruction's offset
is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy()
function --, and its RAM address is 0x472d2180 in the TCG log. Thus the
difference (= the load address of DxeCore.efi) is 0x472a7000.

QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch
'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21).

The reproducer command line is (on an ARMv7 host):

  qemu-system-aarch64 \
    -display none \
    -machine virt,accel=tcg \
    -nodefaults \
    -nographic \
    -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \
    -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \
    -cpu cortex-a57 \
    -chardev stdio,signal=off,mux=on,id=char0 \
    -mon chardev=char0,mode=readline \
    -serial chardev:char0

The apparent symptom is an assertion failure *in the guest*, such as

> ASSERT [DxeCore]
> /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090):
> Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength

but that is only a (distant) consequence of the CopyMem()
mistranslation, and resultant destination area corruption.

Originally reported in the following two mailing list messages:
- http://<email address hidden>
- http://<email address hidden>

Tags: arm testcase
Revision history for this message
Laszlo Ersek (Red Hat) (lersek) wrote :
tags: added: arm
Revision history for this message
Laszlo Ersek (Red Hat) (lersek) wrote :

Possibly related:
[Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html

(qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when QEMU is built for i686)

Note to self: try to reprodouce the present issue with QEMU built at eed5664238ea^ -- this LP has originally been filed about the tree at a4f667b67149, and that commit contains eed5664238ea. So checking at eed5664238ea^ might reveal a difference.

Revision history for this message
Alex Bennée (ajbennee) wrote : Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG

Laszlo Ersek (Red Hat) <email address hidden> writes:

> Possibly related:
> [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html
>
> (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when
> QEMU is built for i686)
>
> Note to self: try to reprodouce the present issue with QEMU built at
> eed5664238ea^ -- this LP has originally been filed about the tree at
> a4f667b67149, and that commit contains eed5664238ea. So checking at
> eed5664238ea^ might reveal a difference.

Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough
cases.

--
Alex Bennée

Alex Bennée (ajbennee)
tags: added: testcase
Revision history for this message
Alex Bennée (ajbennee) wrote :

Alex Bennée <email address hidden> writes:

> Laszlo Ersek (Red Hat) <email address hidden> writes:
>
>> Possibly related:
>> [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
>> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html
>>
>> (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when
>> QEMU is built for i686)
>>
>> Note to self: try to reprodouce the present issue with QEMU built at
>> eed5664238ea^ -- this LP has originally been filed about the tree at
>> a4f667b67149, and that commit contains eed5664238ea. So checking at
>> eed5664238ea^ might reveal a difference.
>
> Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough
> cases.

Actually I do see something with i386 host running the aarch64 memory
test (although so far not with a armv7 host):

  ./qemu-system-aarch64 -monitor none -display none -M virt -cpu max -display none -semihosting -kernel tests/memory

Gives:

  Reading u64 from 0x40213004 (offset 4):....Error 0, 0, 0, 0, 250, 249, 248, 255Test complete: FAILED

--
Alex Bennée

Revision history for this message
Alex Bennée (ajbennee) wrote : [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

When running on 32 bit TCG backends a wide unaligned load ends up
truncating data before returning to the guest. We specifically have
the return type as uint64_t to avoid any premature truncation so we
should use the same for the interim types.

Hopefully fixes #1830872

Signed-off-by: Alex Bennée <email address hidden>
---
 accel/tcg/cputlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cdcc3771020..b796ab1cbea 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                     >= TARGET_PAGE_SIZE)) {
         target_ulong addr1, addr2;
- tcg_target_ulong r1, r2;
+ uint64_t r1, r2;
         unsigned shift;
     do_unaligned_access:
         addr1 = addr & ~(size - 1);
--
2.20.1

Revision history for this message
Laszlo Ersek (Red Hat) (lersek) wrote :

I confirm that QEMU works fine (for the use case originally reported in this LP ticket) when built at commit a6ae23831b, i.e. at the parent of eed5664238ea.

Revision history for this message
Andrew Randrianasulu (andrew-randrianasulu) wrote :

В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а):
> When running on 32 bit TCG backends a wide unaligned load ends up
> truncating data before returning to the guest. We specifically have
> the return type as uint64_t to avoid any premature truncation so we
> should use the same for the interim types.
>
> Hopefully fixes #1830872
>
> Signed-off-by: Alex Bennée <email address hidden>
> ---
> accel/tcg/cputlb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cdcc3771020..b796ab1cbea 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
> >= TARGET_PAGE_SIZE)) {
> target_ulong addr1, addr2;
> - tcg_target_ulong r1, r2;
> + uint64_t r1, r2;
> unsigned shift;
> do_unaligned_access:
> addr1 = addr & ~(size - 1);

Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my bug is separate from #1830872 ?

I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler ..
probably x86-64 testing on i686 requires either docker (I don't have this
) or 'real' cross-compiler (build with glibc support).

Revision history for this message
Laszlo Ersek (Red Hat) (lersek) wrote :

Sorry the patch in comment #5 wasn't visible when I wrote what would end up as comment #6. I'll test the patch later. Thanks!

Revision history for this message
Alex Bennée (ajbennee) wrote :

I managed to tweak the memory test enough to detect the failure on aarch64-on-armv7 and I the attached patch fixes it. Could you please double check with your test case?

Revision history for this message
Andrew Randrianasulu (andrew-randrianasulu) wrote : Re: [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG
Download full text (8.9 KiB)

В сообщении от Monday 03 June 2019 18:51:40 Alex Bennée написал(а):
> I managed to tweak the memory test enough to detect the failure on
> aarch64-on-armv7 and I the attached patch fixes it. Could you please
> double check with your test case?
>

Hm, I manually applied path from LP(git diff disliked copypasted patch),
so for now git diff in qemu tree shows:

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cdcc377102..b796ab1cbe 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                     >= TARGET_PAGE_SIZE)) {
         target_ulong addr1, addr2;
- tcg_target_ulong r1, r2;
+ uint64_t r1, r2;
         unsigned shift;
     do_unaligned_access:
         addr1 = addr & ~(size - 1);
lines 1-13/13 (END)

---------

but x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg
still hangs at 'booting the kernel" (it decompress OK)

I make distclean'ed source tree and reconfigured it:
 ./configure --target-list=x86_64-softmmu --disable-werror --enable-debug-tcg --cross-cc-x86_64="/opt/kgcc64/bin/x86_64-unknown-linux-gnu-gcc-6.5.0"

next, make -j 5 and test.

Hm.

I tried debug switches, it seems to hang a bit differently for two runs:

x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg -nographic -d in_asm,op,op_opt,op_ind,out_asm

=====================

IN:
0xffffffff810e8a63: 48 83 c3 64 addq $0x64, %rbx
0xffffffff810e8a67: eb c2 jmp 0xffffffff810e8a2b

OP:
 ld_i32 tmp18,env,$0xfffffff0
 movi_i32 tmp19,$0x0
 brcond_i32 tmp18,tmp19,lt,$L0

 ---- ffffffff810e8a63 0000000000000000
 movi_i32 tmp2,$0x64
 movi_i32 tmp3,$0x0
 mov_i32 tmp0,rbx_0
 mov_i32 tmp1,rbx_1
 add2_i32 tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
 mov_i32 rbx_0,tmp0
 mov_i32 rbx_1,tmp1
 mov_i32 cc_src_0,tmp2
 mov_i32 cc_src_1,tmp3
 mov_i32 cc_dst_0,tmp0
 mov_i32 cc_dst_1,tmp1
 discard cc_src2_0
 discard cc_src2_1
 discard cc_op

 ---- ffffffff810e8a67 0000000000000009
 movi_i32 cc_op,$0x9
 goto_tb $0x0
 movi_i32 tmp6,$0x810e8a2b
 movi_i32 tmp7,$0xffffffff
 st_i32 tmp6,env,$0x80
 st_i32 tmp7,env,$0x84
 exit_tb $0xf2f1c080
 set_label $L0
 exit_tb $0xf2f1c083

OP after optimization and liveness analysis:
 ld_i32 tmp18,env,$0xfffffff0 dead: 1 pref=0xff
 movi_i32 tmp19,$0x0 pref=0xff
 brcond_i32 tmp18,tmp19,lt,$L0 dead: 0 1

 ---- ffffffff810e8a63 0000000000000000
 movi_i32 tmp2,$0x64 pref=0xff
 movi_i32 tmp3,$0x0 pref=0xff
 add2_i32 tmp0,tmp1,rbx_0,rbx_1,tmp2,tmp3 dead: 2 3 pref=0xff,0xff
 mov_i32 rbx_0,tmp0 sync: 0 dead: 1 pref=0xff
 mov_i32 rbx_1,tmp1 sync: 0 dead: 1 pref=0xff
 mov_i32 cc_src_0,tmp2 sync: 0 dead: 0 1 pref=0xff
 mov_i32 cc_src_1,tmp3 sync: 0 dead: 0 1 pref=0xff
 mov_i32 cc_dst_0,rbx_0 sync: 0 dead: 0 1 pref=0xff
 mov_i32 cc_dst_1,rbx_1 sync: 0 dead: 0 1 pref=0xff
 ...

Read more...

Revision history for this message
Laszlo Ersek (Red Hat) (lersek) wrote : Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

(+Igor)

On 06/03/19 17:01, Alex Bennée wrote:
> When running on 32 bit TCG backends a wide unaligned load ends up
> truncating data before returning to the guest. We specifically have
> the return type as uint64_t to avoid any premature truncation so we
> should use the same for the interim types.
>
> Hopefully fixes #1830872
>
> Signed-off-by: Alex Bennée <email address hidden>
> ---
> accel/tcg/cputlb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cdcc3771020..b796ab1cbea 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
> >= TARGET_PAGE_SIZE)) {
> target_ulong addr1, addr2;
> - tcg_target_ulong r1, r2;
> + uint64_t r1, r2;
> unsigned shift;
> do_unaligned_access:
> addr1 = addr & ~(size - 1);
>

Applied on top of commit ad88e4252f09c2956b99c90de39e95bab2e8e7af:

Tested-by: Laszlo Ersek <email address hidden>

Thanks!
Laszlo

Revision history for this message
Alex Bennée (ajbennee) wrote : Re: [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

Andrew Randrianasulu <email address hidden> writes:

> В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а):
>> When running on 32 bit TCG backends a wide unaligned load ends up
>> truncating data before returning to the guest. We specifically have
>> the return type as uint64_t to avoid any premature truncation so we
>> should use the same for the interim types.
>>
>> Hopefully fixes #1830872
>>
>> Signed-off-by: Alex Bennée <email address hidden>
>> ---
>> accel/tcg/cputlb.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>> index cdcc3771020..b796ab1cbea 100644
>> --- a/accel/tcg/cputlb.c
>> +++ b/accel/tcg/cputlb.c
>> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
>> >= TARGET_PAGE_SIZE)) {
>> target_ulong addr1, addr2;
>> - tcg_target_ulong r1, r2;
>> + uint64_t r1, r2;
>> unsigned shift;
>> do_unaligned_access:
>> addr1 = addr & ~(size - 1);
>
> Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my
> bug is separate from #1830872 ?

I think you've hit two - one of which we have just fixed. With my
expanded memory test on i386 I'm seeing a hang but it's ok @
pull-demacro-softmmu-100519-1. Unfortunately bisecting through the slirp
move and other i386 Werror stuff is proving painful.

>
> I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler ..
> probably x86-64 testing on i686 requires either docker (I don't have this
> ) or 'real' cross-compiler (build with glibc support).

--
Alex Bennée

Revision history for this message
Igor (imammedo) wrote : Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

On Mon, 3 Jun 2019 16:01:20 +0100
Alex Bennée <email address hidden> wrote:

> When running on 32 bit TCG backends a wide unaligned load ends up
> truncating data before returning to the guest. We specifically have
> the return type as uint64_t to avoid any premature truncation so we
> should use the same for the interim types.
>
> Hopefully fixes #1830872
>
> Signed-off-by: Alex Bennée <email address hidden>

Fixes arm/virt bios-tables-test for me, so

Tested-by: Igor Mammedov <email address hidden>

> ---
> accel/tcg/cputlb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cdcc3771020..b796ab1cbea 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
> >= TARGET_PAGE_SIZE)) {
> target_ulong addr1, addr2;
> - tcg_target_ulong r1, r2;
> + uint64_t r1, r2;
> unsigned shift;
> do_unaligned_access:
> addr1 = addr & ~(size - 1);

Alex Bennée (ajbennee)
Changed in qemu:
status: New → Fix Committed
Revision history for this message
Thomas Huth (th-huth) wrote :
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.