Failure to use ARMv6 / Cortex-M4 DSP MAC instructions

Bug #643479 reported by Andrew Stubbs
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linaro GCC
Fix Released
Undecided
Andrew Stubbs
Linaro GCC Tracking
Fix Released
Undecided
Andrew Stubbs

Bug Description

GCC does not produce optimal code for operations where the DSP multiply-and-accumulate instructions would be very useful.

Consider this test code:

   int footrunc (int x, int a, int b)
   {
     return x + (short) a * (short) b;
   }

   int fooshort (int x, short *a, short *b)
   {
     return x + *a * *b;
   }

   long long foolong (long long x, short *a, short *b)
   {
     return x + *a * *b;
   }

Compile as follows:

   gcc -S test.c -O2 -mcpu=cortex-a8

With the current Linaro GCC 4.5, we get this output:

  footrunc:
        uxth r1, r1
        uxth r2, r2
        smlabb r0, r2, r1, r0
        bx lr

  fooshort:
        ldrh r3, [r1, #0]
        ldrh r2, [r2, #0]
        smlabb r0, r2, r3, r0
        bx lr

  foolong:
        ldrh r2, [r2, #0]
        push {r4}
        .save {r4}
        ldrh r4, [r3, #0]
        smulbb r4, r4, r2
        adds r2, r0, r4
        mov r0, r2
        adc r3, r1, r4, asr #31
        mov r1, r3
        pop {r4}
        bx lr

Upstream GCC 4.6 is a bit better:

  footrunc:
        uxth r1, r1
        uxth r2, r2
        smlabb r0, r1, r2, r0
        bx lr

  fooshort:
        ldrh r1, [r1, #0]
        ldrh r3, [r2, #0]
        smlabb r0, r1, r3, r0
        bx lr

  foolong:
        ldrh r2, [r2, #0]
        ldrh r3, [r3, #0]
        smulbb r3, r2, r3
        adds r0, r0, r3
        adc r1, r1, r3, asr #31
        bx lr

But the ideal output *should* be this:

  footrunc:
    @ The uxth instructions GCC generates are redundant.
    smlabb r0, r1, r2, r0
    bx lr

  fooshort:
    @ GCC gets this right (register allocation differences should be harmless).
    ldrh r1, [r1]
    ldrh r2, [r2]
    smlabb r0, r1, r2, r0
    bx lr

  foolong:
    @ GCC does not use the long-accumulate version.
    ldrh r2, [r2]
    ldrh r3, [r3]
    smlalbb r0, r1, r2, r3
    bx lr

[CodeSourcery Tracker ID #8610]

Related branches

Michael Hope (michaelh1)
tags: added: speed task
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

Note that the above is true for Cortex-A8, but for some other cores the decision might go another way (e.g. Cortex-R4 should prefer MLA over SMLAxy). Care should be taken not to make the changes unconditional.

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

I'm currently testing a patch for these problems.

Changed in gcc-linaro:
assignee: nobody → Andrew Stubbs (ams-codesourcery)
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :
Changed in gcc-linaro:
status: New → In Progress
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

The first patch has now been approved and committed to GCC 4.6.

Still awaiting any word on the other one.

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :
Changed in gcc-linaro-tracking:
milestone: none → 4.7.0
status: New → In Progress
Changed in gcc-linaro-tracking:
assignee: nobody → Andrew Stubbs (ams-codesourcery)
tags: added: 46merge
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

The second patch is still stuck. Richard Earnshaw has reviewed the patch and come up with some concerns. I have looked at the problem, but not yet found time to fix it. I did post a query, but Richard did not reply. I'll ping him now.

Changed in gcc-linaro:
status: In Progress → Fix Released
Changed in gcc-linaro-tracking:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.