QEMU

AARCH64 to ARMv7 mistranslation in TCG

Bug #1830872 reported by Laszlo Ersek (Red Hat) on 2019-05-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	QEMU	Fix Released	Undecided	Unassigned

Bug Description

The following guest code:

https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S

implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI
Development Kit II) library function. (CopyMem() basically has memmove()
semantics, to provide a standard C analog here.) The relevant functions
are InternalMemCopyMem() and __memcpy().

When TCG translates this aarch64 code to x86_64, everything works fine.

When TCG translates this aarch64 code to ARMv7, the destination area of
the translated CopyMem() function becomes corrupted -- it differs from
the intended source contents. Namely, in every 4096 byte block, the
8-byte word at offset 4032 (0xFC0) is zeroed out in the destination,
instead of receiving the intended source value.

I'm attaching two hexdumps of the same destination area:

- "good.txt" is a hexdump of the destination area when CopyMem() was
translated to x86_64,

- "bad.txt" is a hexdump of the destination area when CopyMem() was
translated to ARMv7.

In order to assist with the analysis of this issue, I disassembled the
aarch64 binary with "objdump". Please find the listing in
"DxeCore.objdump", attached. The InternalMemCopyMem() function starts at
hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180.

And, I ran the guest on the ARMv7 host with "-d
in_asm,op,op_opt,op_ind,out_asm". Please find the log in
"tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached.

The TBs that correspond to (parts of) the InternalMemCopyMem() and
__memcpy() functions are scattered over the TCG log file, but the offset
between the "nice" disassembly from "DxeCore.objdump", and the in-RAM
TBs in the TCG log, can be determined from the fact that there is a
single prfm instruction in the entire binary. The instruction's offset
is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy()
function --, and its RAM address is 0x472d2180 in the TCG log. Thus the
difference (= the load address of DxeCore.efi) is 0x472a7000.

QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch
'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21).

The reproducer command line is (on an ARMv7 host):

  qemu-system-aarch64 \
    -display none \
    -machine virt,accel=tcg \
    -nodefaults \
    -nographic \
    -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \
    -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \
    -cpu cortex-a57 \
    -chardev stdio,signal=off,mux=on,id=char0 \
    -mon chardev=char0,mode=readline \
    -serial chardev:char0

The apparent symptom is an assertion failure *in the guest*, such as

> ASSERT [DxeCore]
> /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090):
> Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength

but that is only a (distant) consequence of the CopyMem()
mistranslation, and resultant destination area corruption.

Originally reported in the following two mailing list messages:
- http://<email address hidden>
- http://<email address hidden>

Tags:

Revision history for this message

Laszlo Ersek (Red Hat) (lersek) wrote on 2019-05-29:

aarch64-to-arm-mistranslation.tar.xz Edit (4.3 MiB, application/x-tar)

Philippe Mathieu-Daudé (philmd) on 2019-05-29

tags:

added: arm

Revision history for this message

Laszlo Ersek (Red Hat) (lersek) wrote on 2019-06-02:

Possibly related:
[Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html

(qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when QEMU is built for i686)

Note to self: try to reprodouce the present issue with QEMU built at eed5664238ea^ -- this LP has originally been filed about the tree at a4f667b67149, and that commit contains eed5664238ea. So checking at eed5664238ea^ might reveal a difference.

Revision history for this message

Alex Bennée (ajbennee) wrote on 2019-06-02: Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG

Laszlo Ersek (Red Hat) <email address hidden> writes:

> Possibly related:
> [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html
>
> (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when
> QEMU is built for i686)
>
> Note to self: try to reprodouce the present issue with QEMU built at
> eed5664238ea^ -- this LP has originally been filed about the tree at
> a4f667b67149, and that commit contains eed5664238ea. So checking at
> eed5664238ea^ might reveal a difference.

Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough
cases.

--
Alex Bennée

Alex Bennée (ajbennee) on 2019-06-02

tags:

added: testcase

Revision history for this message

Alex Bennée (ajbennee) wrote on 2019-06-03:

Alex Bennée <email address hidden> writes:

> Laszlo Ersek (Red Hat) <email address hidden> writes:
>
>> Possibly related:
>> [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
>> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html
>>
>> (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when
>> QEMU is built for i686)
>>
>> Note to self: try to reprodouce the present issue with QEMU built at
>> eed5664238ea^ -- this LP has originally been filed about the tree at
>> a4f667b67149, and that commit contains eed5664238ea. So checking at
>> eed5664238ea^ might reveal a difference.
>
> Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough
> cases.

Actually I do see something with i386 host running the aarch64 memory
test (although so far not with a armv7 host):

./qemu-system-aarch64 -monitor none -display none -M virt -cpu max -display none -semihosting -kernel tests/memory

Gives:

Reading u64 from 0x40213004 (offset 4):....Error 0, 0, 0, 0, 250, 249, 248, 255Test complete: FAILED

--
Alex Bennée

Revision history for this message

Alex Bennée (ajbennee) wrote on 2019-06-03: [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

When running on 32 bit TCG backends a wide unaligned load ends up
truncating data before returning to the guest. We specifically have
the return type as uint64_t to avoid any premature truncation so we
should use the same for the interim types.

Hopefully fixes #1830872

Signed-off-by: Alex Bennée <email address hidden>
---
accel/tcg/cputlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cdcc3771020..b796ab1cbea 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                     >= TARGET_PAGE_SIZE)) {
         target_ulong addr1, addr2;
- tcg_target_ulong r1, r2;
+ uint64_t r1, r2;
         unsigned shift;
     do_unaligned_access:
         addr1 = addr & ~(size - 1);
--
2.20.1

Revision history for this message

Laszlo Ersek (Red Hat) (lersek) wrote on 2019-06-03:

I confirm that QEMU works fine (for the use case originally reported in this LP ticket) when built at commit a6ae23831b, i.e. at the parent of eed5664238ea.

Revision history for this message

Andrew Randrianasulu (andrew-randrianasulu) wrote on 2019-06-03:

В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а):
> When running on 32 bit TCG backends a wide unaligned load ends up
> truncating data before returning to the guest. We specifically have
> the return type as uint64_t to avoid any premature truncation so we
> should use the same for the interim types.
>
> Hopefully fixes #1830872
>
> Signed-off-by: Alex Bennée <email address hidden>
> ---
> accel/tcg/cputlb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cdcc3771020..b796ab1cbea 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
> >= TARGET_PAGE_SIZE)) {
> target_ulong addr1, addr2;
> - tcg_target_ulong r1, r2;
> + uint64_t r1, r2;
> unsigned shift;
> do_unaligned_access:
> addr1 = addr & ~(size - 1);

Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my bug is separate from #1830872 ?

I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler ..
probably x86-64 testing on i686 requires either docker (I don't have this
) or 'real' cross-compiler (build with glibc support).

Revision history for this message

Laszlo Ersek (Red Hat) (lersek) wrote on 2019-06-03:

Sorry the patch in comment #5 wasn't visible when I wrote what would end up as comment #6. I'll test the patch later. Thanks!

Revision history for this message

Alex Bennée (ajbennee) wrote on 2019-06-03:

I managed to tweak the memory test enough to detect the failure on aarch64-on-armv7 and I the attached patch fixes it. Could you please double check with your test case?

Revision history for this message

Andrew Randrianasulu (andrew-randrianasulu) wrote on 2019-06-03: Re: [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG

#10

bzImage-4.12.0-x64 Edit (2.8 MiB, application/octet-stream; name="bzImage-4.12.0-x64")

Download full text (8.9 KiB)

В сообщении от Monday 03 June 2019 18:51:40 Alex Bennée написал(а):
> I managed to tweak the memory test enough to detect the failure on
> aarch64-on-armv7 and I the attached patch fixes it. Could you please
> double check with your test case?
>

Hm, I manually applied path from LP(git diff disliked copypasted patch),
so for now git diff in qemu tree shows:

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cdcc377102..b796ab1cbe 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                     >= TARGET_PAGE_SIZE)) {
         target_ulong addr1, addr2;
- tcg_target_ulong r1, r2;
+ uint64_t r1, r2;
         unsigned shift;
     do_unaligned_access:
         addr1 = addr & ~(size - 1);
lines 1-13/13 (END)

---------

but x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg
still hangs at 'booting the kernel" (it decompress OK)

I make distclean'ed source tree and reconfigured it:
./configure --target-list=x86_64-softmmu --disable-werror --enable-debug-tcg --cross-cc-x86_64="/opt/kgcc64/bin/x86_64-unknown-linux-gnu-gcc-6.5.0"

next, make -j 5 and test.

Hm.

I tried debug switches, it seems to hang a bit differently for two runs:

x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg -nographic -d in_asm,op,op_opt,op_ind,out_asm

=====================

IN:
0xffffffff810e8a63: 48 83 c3 64 addq $0x64, %rbx
0xffffffff810e8a67: eb c2 jmp 0xffffffff810e8a2b

OP:
ld_i32 tmp18,env,$0xfffffff0
movi_i32 tmp19,$0x0
brcond_i32 tmp18,tmp19,lt,$L0

---- ffffffff810e8a63 0000000000000000
movi_i32 tmp2,$0x64
movi_i32 tmp3,$0x0
mov_i32 tmp0,rbx_0
mov_i32 tmp1,rbx_1
add2_i32 tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
mov_i32 rbx_0,tmp0
mov_i32 rbx_1,tmp1
mov_i32 cc_src_0,tmp2
mov_i32 cc_src_1,tmp3
mov_i32 cc_dst_0,tmp0
mov_i32 cc_dst_1,tmp1
discard cc_src2_0
discard cc_src2_1
discard cc_op

---- ffffffff810e8a67 0000000000000009
movi_i32 cc_op,$0x9
goto_tb $0x0
movi_i32 tmp6,$0x810e8a2b
movi_i32 tmp7,$0xffffffff
st_i32 tmp6,env,$0x80
st_i32 tmp7,env,$0x84
exit_tb $0xf2f1c080
set_label $L0
exit_tb $0xf2f1c083

OP after optimization and liveness analysis:
ld_i32 tmp18,env,$0xfffffff0 dead: 1 pref=0xff
movi_i32 tmp19,$0x0 pref=0xff
brcond_i32 tmp18,tmp19,lt,$L0 dead: 0 1

Hm, I manually applied path from LP(git diff disliked copypasted patch), 
so for now git diff in qemu tree shows:

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cdcc377102..b796ab1cbe 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                     >= TARGET_PAGE_SIZE)) {
         target_ulong addr1, addr2;
-        tcg_target_ulong r1, r2;
+        uint64_t r1, r2;
         unsigned shift;
     do_unaligned_access:
         addr1 = addr & ~(size - 1);
lines 1-13/13 (END)

---------

but x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg
still hangs at 'booting the kernel" (it decompress OK)

I make distclean'ed source tree and reconfigured it:
 ./configure --target-list=x86_64-softmmu --disable-werror --enable-debug-tcg  --cross-cc-x86_64="/opt/kgcc64/bin/x86_64-unknown-linux-gnu-gcc-6.5.0"

next, make -j 5 and test.

Hm.

I tried debug switches, it seems to hang a bit differently for two runs:

x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg -nographic -d in_asm,op,op_opt,op_ind,out_asm

=====================

IN:
0xffffffff810e8a63:  48 83 c3 64              addq     $0x64, %rbx
0xffffffff810e8a67:  eb c2                    jmp      0xffffffff810e8a2b

OP:
 ld_i32 tmp18,env,$0xfffffff0
 movi_i32 tmp19,$0x0
 brcond_i32 tmp18,tmp19,lt,$L0

---- ffffffff810e8a63 0000000000000000
 movi_i32 tmp2,$0x64
 movi_i32 tmp3,$0x0
 mov_i32 tmp0,rbx_0
 mov_i32 tmp1,rbx_1
 add2_i32 tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
 mov_i32 rbx_0,tmp0
 mov_i32 rbx_1,tmp1
 mov_i32 cc_src_0,tmp2
 mov_i32 cc_src_1,tmp3
 mov_i32 cc_dst_0,tmp0
 mov_i32 cc_dst_1,tmp1
 discard cc_src2_0
 discard cc_src2_1
 discard cc_op

---- ffffffff810e8a67 0000000000000009
 movi_i32 cc_op,$0x9
 goto_tb $0x0
 movi_i32 tmp6,$0x810e8a2b
 movi_i32 tmp7,$0xffffffff
 st_i32 tmp6,env,$0x80
 st_i32 tmp7,env,$0x84
 exit_tb $0xf2f1c080
 set_label $L0
 exit_tb $0xf2f1c083

OP after optimization and liveness analysis:
 ld_i32 tmp18,env,$0xfffffff0             dead: 1  pref=0xff
 movi_i32 tmp19,$0x0                      pref=0xff
 brcond_i32 tmp18,tmp19,lt,$L0            dead: 0 1

---- ffffffff810e8a63 0000000000000000
 movi_i32 tmp2,$0x64                      pref=0xff
 movi_i32 tmp3,$0x0                       pref=0xff
 add2_i32 tmp0,tmp1,rbx_0,rbx_1,tmp2,tmp3  dead: 2 3  pref=0xff,0xff
 mov_i32 rbx_0,tmp0                       sync: 0  dead: 1  pref=0xff
 mov_i32 rbx_1,tmp1                       sync: 0  dead: 1  pref=0xff
 mov_i32 cc_src_0,tmp2                    sync: 0  dead: 0 1  pref=0xff
 mov_i32 cc_src_1,tmp3                    sync: 0  dead: 0 1  pref=0xff
 mov_i32 cc_dst_0,rbx_0                   sync: 0  dead: 0 1  pref=0xff
 mov_i32 cc_dst_1,rbx_1                   sync: 0  dead: 0 1  pref=0xff
 discard cc_src2_0                        pref=0xff
 discard cc_src2_1                        pref=0xff
 discard cc_op                            pref=0xff mov_i32 cc_dst_0,tmp0
 mov_i32 cc_dst_1,tmp1
 discard cc_src2_0
 discard cc_src2_1
 discard cc_op

---- ffffffff810e8a55 0000000000000021
 movi_i32 cc_op,$0x21
 movi_i32 tmp20,$0x0
 movi_i32 tmp21,$0x0
 brcond2_i32 cc_dst_0,cc_dst_1,tmp20,tmp21,eq,$L1
 goto_tb $0x0
 movi_i32 tmp6,$0x810e8a57
 movi_i32 tmp7,$0xffffffff
 st_i32 tmp6,env,$0x80
 st_i32 tmp7,env,$0x84
 exit_tb $0xf2f1c180
 set_label $L1
 goto_tb $0x1
 movi_i32 tmp6,$0x810e8a63
 movi_i32 tmp7,$0xffffffff
 st_i32 tmp6,env,$0x80
 st_i32 tmp7,env,$0x84
 exit_tb $0xf2f1c181
 set_label $L0
 exit_tb $0xf2f1c183

---- ffffffff810e8a4c 0000000000000000

---- ffffffff810e8a52 0000000000000000
 movi_i32 tmp1,$0x0                       pref=0xff
 movi_i32 tmp0,$0x64                      pref=0xff
 mov_i32 r14_0,tmp0                       sync: 0  dead: 1  pref=0xf8
 mov_i32 r14_1,tmp1                       sync: 0  dead: 1  pref=0xf8
 call cc_compute_c,$0x5,$2,cc_src_0,cc_src_1,cc_dst_0,cc_dst_1,cc_src_0,cc_src_1,cc_src2_0,cc_src2_1,cc_op  sync: 0 1  dead: 0 1 2 3 4 5 6 7 8  pref=none,none
 mov_i32 cc_dst_0,r14_0                   sync: 0  dead: 0 1  pref=0xff
 mov_i32 cc_dst_1,r14_1                   sync: 0  dead: 0 1  pref=0xffУбито

(killed by me)

==================

IN:
0xffffffff810e8a61:  eb ef                    jmp      0xffffffff810e8a52

OP:
 ld_i32 tmp18,env,$0xfffffff0
 movi_i32 tmp19,$0x0
 brcond_i32 tmp18,tmp19,lt,$L0

---- ffffffff810e8a61 0000000000000000
 goto_tb $0x0
 movi_i32 tmp6,$0x810e8a52
 movi_i32 tmp7,$0xffffffff
 st_i32 tmp6,env,$0x80
 st_i32 tmp7,env,$0x84
 exit_tb $0xf2f22900
 set_label $L0
 exit_tb $0xf2f22903

---- ffffffff810e8a61 0000000000000000
 goto_tb $0x0
 movi_i32 tmp6,$0x810e8a52                pref=0xff
 movi_i32 tmp7,$0xffffffff                pref=0xff
 st_i32 tmp6,env,$0x80                    dead: 0
 st_i32 tmp7,env,$0x84                    dead: 0 1
 exit_tb $0xf2f22900
 set_label $L0
 exit_tb $0xf2f22903

OUT: [size=56]
0xf2f22980:  8b 5d f0                 movl     -0x10(%ebp), %ebx
0xf2f22983:  85 db                    testl    %ebx, %ebx
0xf2f22985:  0f 8c 23 00 00 00        jl       0xf2f229ae
0xf2f2298b:  e9 00 00 00 00           jmp      0xf2f22990
0xf2f22990:  c7 85 80 00 00 00 52 8a  movl     $0x810e8a52, 0x80(%ebp)
0xf2f22998:  0e 81
0xf2f2299a:  c7 85 84 00 00 00 ff ff  movl     $0xffffffff, 0x84(%ebp)
0xf2f229a2:  ff ff
0xf2f229a4:  b8 00 29 f2 f2           movl     $0xf2f22900, %eax
0xf2f229a9:  e9 69 46 c9 ff           jmp      0xf2bb7017
0xf2f229ae:  b8 03 29 f2 f2           movl     $0xf2f22903, %eax
0xf2f229b3:  e9 5f 46 c9 ff           jmp      0xf2bb7017

----------------
IN:
0xffffffff810e8a52:  49 ff ce                 decq     %r14
0xffffffff810e8a55:  74 0c                    je       0xffffffff810e8a63

OP:
 ld_i32 tmp18,env,$0xfffffff0
 movi_i32 tmp19,$0x0
 brcond_i32 tmp18,tmp19,lt,$L0

---- ffffffff810e8a52 0000000000000000
 mov_i32 tmp0,r14_0
 mov_i32 tmp1,r14_1
 mov_i32 tmp0,r14_0
 mov_i32 tmp1,r14_1
 movi_i32 tmp20,$0xffffffff
 movi_i32 tmp21,$0xffffffff
 add2_i32 tmp0,tmp1,tmp0,tmp1,tmp20,tmp21
 mov_i32 r14_0,tmp0
 mov_i32 r14_1,tmp1
 call cc_compute_c,$0x5,$2,cc_src_0,cc_src_1,cc_dst_0,cc_dst_1,cc_src_0,cc_src_1,cc_src2_0,cc_src2_1,cc_op
 mov_i32 cc_dst_0,tmp0
 mov_i32 cc_dst_1,tmp1
 discard cc_src2_0
 discard cc_src2_1
 discard cc_op

---- ffffffff810e8a55 0000000000000021 mov_i32 cc_dst_0,r14_0                   sync: 0  dead: 1  pref=0xff
 mov_i32 cc_dst_1,r14_1                   sync: 0  dead: 1  pref=0xff
 discard cc_src2_0                        pref=0xff
 discard cc_src2_1                        pref=0xff
 discard cc_op                            pref=0xff

---- ffffffff810e8a55 0000000000000021
 movi_i32 cc_op,$0x21                     sync: 0  dead: 0  pref=0xff
 movi_i32 tmp20,$0x0                      pref=0xff
 movi_i32 tmp21,$0x0                      pref=0xff
 brcond2_i32 cc_dst_0,cc_dst_1,tmp20,tmp21,eq,$L1  dead: 0 1 2 3
 goto_tb $0x0
 movi_i32 tmp6,$0x810e8a57                pref=0xff
 movi_i32 tmp7,$0xffffffff                pref=0xff
 st_i32 tmp6,env,$0x80                    dead: 0
 st_i32 tmp7,env,$0x84                    dead: 0 1
 exit_tb $0xf2f229c0
 set_label $L1
 goto_tb $0x1
 movi_i32 tmp6,$0x810e8a63                pref=0xff movi_i32 cc_op,$0x9                      sync: 0  dead: 0  pref=0xff
 goto_tb $0x0
 movi_i32 tmp6,$0x810e8a2b                pref=0xff
 movi_i32 tmp7,$0xffffffff                pref=0xff
 st_i32 tmp6,env,$0x80                    dead: 0
 st_i32 tmp7,env,$0x84                    dead: 0 1
 exit_tb $0xf2f22b40
 set_label $L0
 exit_tb $0xf2f22b43

OUT: [size=116]
0xf2f22bc0:  8b 5d f0                 movl     -0x10(%ebp), %ebx
0xf2f22bc3:  85 db                    testl    %ebx, %ebx
0xf2f22bc5:  0f 8c 5f 00 00 00        jl       0xf2f22c2a
0xf2f22bcb:  8b 5d 18                 movl     0x18(%ebp), %ebx
0xf2f22bce:  8b 75 1c                 movl     0x1c(%ebp), %esi
0xf2f22bd1:  83 c3 64                 addl     $0x64, %ebx
0xf2f22bd4:  83 d6 00                 adcl     $0, %esi
0xf2f22bd7:  89 5d 18                 movl     %ebx, 0x18(%ebp)
Убито

=============================

try kernel I use (it works with qemu compiiled under 64-bit Slackware, and also with kvm on 32-bit x86)

sha256sum /boot/bzImage-4.12.0-x64
b4183376de17e8ea7a25094b7a526e99bcb8339b8703090684c93e0e0a50d284  /boot/bzImage-4.12.0-x64

Revision history for this message

Laszlo Ersek (Red Hat) (lersek) wrote on 2019-06-03: Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

#11

(+Igor)

On 06/03/19 17:01, Alex Bennée wrote:
> When running on 32 bit TCG backends a wide unaligned load ends up
> truncating data before returning to the guest. We specifically have
> the return type as uint64_t to avoid any premature truncation so we
> should use the same for the interim types.
>
> Hopefully fixes #1830872
>
> Signed-off-by: Alex Bennée <email address hidden>
> ---
> accel/tcg/cputlb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cdcc3771020..b796ab1cbea 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
> >= TARGET_PAGE_SIZE)) {
> target_ulong addr1, addr2;
> - tcg_target_ulong r1, r2;
> + uint64_t r1, r2;
> unsigned shift;
> do_unaligned_access:
> addr1 = addr & ~(size - 1);
>

Applied on top of commit ad88e4252f09c2956b99c90de39e95bab2e8e7af:

Tested-by: Laszlo Ersek <email address hidden>

Thanks!
Laszlo

Revision history for this message

Alex Bennée (ajbennee) wrote on 2019-06-04: Re: [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

#12

Andrew Randrianasulu <email address hidden> writes:

> В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а):
>> When running on 32 bit TCG backends a wide unaligned load ends up
>> truncating data before returning to the guest. We specifically have
>> the return type as uint64_t to avoid any premature truncation so we
>> should use the same for the interim types.
>>
>> Hopefully fixes #1830872
>>
>> Signed-off-by: Alex Bennée <email address hidden>
>> ---
>> accel/tcg/cputlb.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>> index cdcc3771020..b796ab1cbea 100644
>> --- a/accel/tcg/cputlb.c
>> +++ b/accel/tcg/cputlb.c
>> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
>> >= TARGET_PAGE_SIZE)) {
>> target_ulong addr1, addr2;
>> - tcg_target_ulong r1, r2;
>> + uint64_t r1, r2;
>> unsigned shift;
>> do_unaligned_access:
>> addr1 = addr & ~(size - 1);
>
> Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my
> bug is separate from #1830872 ?

I think you've hit two - one of which we have just fixed. With my
expanded memory test on i386 I'm seeing a hang but it's ok @
pull-demacro-softmmu-100519-1. Unfortunately bisecting through the slirp
move and other i386 Werror stuff is proving painful.

>
> I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler ..
> probably x86-64 testing on i686 requires either docker (I don't have this
> ) or 'real' cross-compiler (build with glibc support).

--
Alex Bennée

Revision history for this message

Igor (imammedo) wrote on 2019-06-04: Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load

#13

On Mon, 3 Jun 2019 16:01:20 +0100
Alex Bennée <email address hidden> wrote:

> When running on 32 bit TCG backends a wide unaligned load ends up
> truncating data before returning to the guest. We specifically have
> the return type as uint64_t to avoid any premature truncation so we
> should use the same for the interim types.
>
> Hopefully fixes #1830872
>
> Signed-off-by: Alex Bennée <email address hidden>

Fixes arm/virt bios-tables-test for me, so

Tested-by: Igor Mammedov <email address hidden>

> ---
> accel/tcg/cputlb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cdcc3771020..b796ab1cbea 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
> >= TARGET_PAGE_SIZE)) {
> target_ulong addr1, addr2;
> - tcg_target_ulong r1, r2;
> + uint64_t r1, r2;
> unsigned shift;
> do_unaligned_access:
> addr1 = addr & ~(size - 1);