Bad Neon intrinsics code gen when using ld4/st4 on AArch64

Bug #1234146 reported by Matthew Gretton-Dann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro GCC
Confirmed
Undecided
Michael Collison

Bug Description

The attached test case produces the following code for arm-none-eabi:

gcc -S -o- -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=hard /tmp/t.c:

test4:
        add r1, r0, r1
        vmov.i32 d24, #0 @ v8qi
        cmp r0, r1
        bxeq lr
.L7:
        vld4.8 {d20-d23}, [r0]
        vadd.i8 d25, d24, d20
        vmov d16, d25 @ v8qi
        vadd.i8 d25, d25, d21
        vmov d17, d25 @ v8qi
        vadd.i8 d25, d25, d22
        vadd.i8 d24, d25, d23
        vmov d18, d25 @ v8qi
        vmov d19, d24 @ v8qi
        vst4.32 {d16[0], d17[0], d18[0], d19[0]}, [r0]!
        cmp r1, r0
        bne .L7
        bx lr

(Not perfect but the extraneous vmov's are understood and being investigated elsewhere).

For aarch64-none-elf this produces:
aarch64-none-elf-gcc -S -o- /tmp/t.c -O3

test4:
        add x1, x0, x1, uxtw
        cmp x0, x1
        sub sp, sp, #96
        beq .L1
        movi v0.2s, 0
        add x4, sp, 8
        add x3, sp, 16
        add x2, sp, 24
.L3:
        ld4 {v1.8b - v4.8b}, [x0]
        add x5, sp, 32
        st1 {v1.16b - v4.16b}, [x5]
        ld1 {v3.8b}, [x5]
        add x5, sp, 48
        add v3.8b, v0.8b, v3.8b
        ld1 {v2.8b}, [x5]
        add x5, sp, 64
        ld1 {v1.8b}, [x5]
        add v2.8b, v2.8b, v3.8b
        add x5, sp, 80
        add v1.8b, v1.8b, v2.8b
        ld1 {v0.8b}, [x5]
        add v0.8b, v0.8b, v1.8b
        st1 {v3.8b}, [sp]
        st1 {v2.8b}, [x4]
        st1 {v1.8b}, [x3]
        st1 {v0.8b}, [x2]
        // Start of user assembly
// 15030 "/work/builds/gcc-fsf-master/tools/lib/gcc/aarch64-none-elf/4.9.0/include/arm_neon.h" 1
        ld1 {v16.2s - v19.2s}, [sp]
        st4 {v16.s - v19.s}[0], [x0]

// 0 "" 2
        // End of user assembly
        add x0, x0, 16
        cmp x1, x0
        bne .L3
.L1:
        add sp, sp, 96
        ret
        .size test4, .-test4
        .ident "GCC: (GNU) 4.9.0 20130930 (experimental)"

This code is in Linaro GCC 4.8 and FSF trunk. The AArch64 code has significantly more stores and loads.

Revision history for this message
Matthew Gretton-Dann (matthew-gretton-dann) wrote :
Viktor (vchong)
Changed in gcc-linaro:
status: New → Confirmed
Changed in gcc-linaro:
assignee: nobody → Michael Collison (michael-collison)
Revision history for this message
Christophe Lyon (christophe-lyon) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.