Unnecessary code is generated when using 8 bit and 16 bit variables

Bug #1806157 reported by Harjit Singh on 2018-12-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
Undecided
Unassigned

Bug Description

Using gcc 7.2.1 for ARM thumb with -O3, I see unnecessary instructions being emitted. I tried it on gcc 8.2 and see the same output.

Code is shown below and can be found here: https://godbolt.org/z/AgFwNE

void shortMult()
{
    volatile short int a;
    volatile short int b;

    a *= b;
}

void charMult()
{
    volatile char x;
    volatile char y;

    x *= y;
}

void shortIntMult(void)
{
    volatile short int a;
    volatile int b;

    b *= a;
}

This generates:
shortMult():
        sub sp, sp, #8
        ldrh r1, [sp, #6]
        ldrh r2, [sp, #4]
        mul r3, r2, r1
        lsl r3, r3, #16
        asr r3, r3, #16
        strh r3, [sp, #4] @ movhi
        add sp, sp, #8
        bx lr
charMult():
        sub sp, sp, #8
        ldrb r2, [sp, #7] @ zero_extendqisi2
        ldrb r1, [sp, #6] @ zero_extendqisi2
        mul r3, r2, r1
        and r3, r3, #255
        strb r3, [sp, #6]
        add sp, sp, #8
        bx lr
shortIntMult():
        sub sp, sp, #8
        ldrh r3, [sp, #2]
        ldr r2, [sp, #4]
        lsl r1, r3, #16
        asr r1, r1, #16
        mul r3, r2, r1
        str r3, [sp, #4]
        add sp, sp, #8
        bx lr

Looking at shortMult():
The strh r3, [sp, #4] writes out 16 bits. There is no need to do the lsl and asr - they don't modify the bits we write out.

Looking at charMult():
Similar issue as above - strb r3, [sp, #6] writes out 8 bits. There is no need to do the and r3, r3, #255 - it doesn't modify the bits we write out.

Looking at shortIntMult():
a is loaded into r3 using ldrh and then sign extended using lsl r1, r3, #16 and asr r1, r1, #16
a can be loaded with sign extension using ldrsh

Thank you.

Changed in gcc-arm-embedded:
status: New → Confirmed

Confirmed. I've compiled for Armv7E-M where smullbb is used but otherwise the same happens. The middle end seemed to do something to do something funky already since it converted operand to unsigned short and then back to short int for the result. However changing a and b to unsigned short does not change the assembly (it does improve gimple).

Not sure what's happening but expand looks particularly dumb:

(insn 12 11 13 2 (set (reg:HI 117)
        (subreg:HI (reg:SI 118) 0)) "test.c":6 -1
     (expr_list:REG_EQUAL (mult:HI (subreg/s/v:HI (reg:SI 110 [ a.1_1 ]) 0)
            (subreg/s/v:HI (reg:SI 112 [ b.0_4 ]) 0))
        (nil)))
(insn 13 12 14 2 (set (reg:SI 111 [ _2 ])
        (zero_extend:SI (reg:HI 117))) "test.c":6 -1
     (nil))
(insn 14 13 15 2 (set (reg:HI 119)
        (subreg/s/v:HI (reg:SI 111 [ _2 ]) 0)) "test.c":6 -1
     (nil))
(insn 15 14 0 2 (set (mem/v/c:HI (plus:SI (reg/f:SI 105 virtual-stack-vars)
                (const_int -4 [0xfffffffffffffffc])) [1 a+0 S2 A32])
        (reg:HI 119)) "test.c":6 -1
     (nil))

So it thinks it can do a 16bit multiply (later the compiler realizes that smullbb produces a 32bit register so the multiplication dest becomes SImode), then extend it to SI and reduce it to HI in the next instruction. This is probably unrelated though as the zero_extend gets removed as it figures out smulbb produces a 32bit output. The real problem is that GCC does not see the subreg as redundant.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers