Linaro GCC

Failure to use ARMv6 / Cortex-M4 DSP MAC instructions

Bug #643479 reported by Andrew Stubbs on 2010-09-20

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Linaro GCC	Fix Released	Undecided	Andrew Stubbs
	Linaro GCC Tracking	Fix Released	Undecided	Andrew Stubbs	Linaro GCC Tracking 4.7.0

Bug Description

GCC does not produce optimal code for operations where the DSP multiply-and-accumulate instructions would be very useful.

Consider this test code:

   int footrunc (int x, int a, int b)
   {
     return x + (short) a * (short) b;
   }

   int fooshort (int x, short *a, short *b)
   {
     return x + *a * *b;
   }

   long long foolong (long long x, short *a, short *b)
   {
     return x + *a * *b;
   }

Compile as follows:

gcc -S test.c -O2 -mcpu=cortex-a8

With the current Linaro GCC 4.5, we get this output:

  footrunc:
        uxth r1, r1
        uxth r2, r2
        smlabb r0, r2, r1, r0
        bx lr

  fooshort:
        ldrh r3, [r1, #0]
        ldrh r2, [r2, #0]
        smlabb r0, r2, r3, r0
        bx lr

  foolong:
        ldrh r2, [r2, #0]
        push {r4}
        .save {r4}
        ldrh r4, [r3, #0]
        smulbb r4, r4, r2
        adds r2, r0, r4
        mov r0, r2
        adc r3, r1, r4, asr #31
        mov r1, r3
        pop {r4}
        bx lr

Upstream GCC 4.6 is a bit better:

  footrunc:
        uxth r1, r1
        uxth r2, r2
        smlabb r0, r1, r2, r0
        bx lr

  fooshort:
        ldrh r1, [r1, #0]
        ldrh r3, [r2, #0]
        smlabb r0, r1, r3, r0
        bx lr

  foolong:
        ldrh r2, [r2, #0]
        ldrh r3, [r3, #0]
        smulbb r3, r2, r3
        adds r0, r0, r3
        adc r1, r1, r3, asr #31
        bx lr

But the ideal output *should* be this:

  footrunc:
    @ The uxth instructions GCC generates are redundant.
    smlabb r0, r1, r2, r0
    bx lr

  fooshort:
    @ GCC gets this right (register allocation differences should be harmless).
    ldrh r1, [r1]
    ldrh r2, [r2]
    smlabb r0, r1, r2, r0
    bx lr

  foolong:
    @ GCC does not use the long-accumulate version.
    ldrh r2, [r2]
    ldrh r3, [r3]
    smlalbb r0, r1, r2, r3
    bx lr

[CodeSourcery Tracker ID #8610]

Tags:

Related branches

lp:gcc-linaro/4.5

lp:~ams-codesourcery/gcc-linaro/smlalbb-4.6

Rejected for merging into lp:gcc-linaro/4.6

Linaro Toolchain Developers: Pending requested 2011-01-11

Michael Hope (michaelh1) on 2010-09-28

tags:

added: speed task

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2010-11-24:

Note that the above is true for Cortex-A8, but for some other cores the decision might go another way (e.g. Cortex-R4 should prefer MLA over SMLAxy). Care should be taken not to make the changes unconditional.

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2010-11-25:

I'm currently testing a patch for these problems.

Changed in gcc-linaro:
assignee:	nobody → Andrew Stubbs (ams-codesourcery)

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2010-12-09:

I've posted the first patch upstream here:
http://old.nabble.com/-patch--ARM--Don't-generate-redundant-zero_extend-before-smlabb-to30417908.html

And the second patch here:
http://old.nabble.com/-patch--ARM--Fix-16-bit--%3E-64-bit-multiply-and-accumulate-to30418128.html

Changed in gcc-linaro:
status:	New → In Progress

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2011-01-05:

The first patch has now been approved and committed to GCC 4.6.

Still awaiting any word on the other one.

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2011-01-06:

Related: lp:gcc-linaro/4.5,revno=99454

Changed in gcc-linaro-tracking:
milestone:	none → 4.7.0
status:	New → In Progress

Andrew Stubbs (ams-codesourcery) on 2011-01-06

Changed in gcc-linaro-tracking:
assignee:	nobody → Andrew Stubbs (ams-codesourcery)

Andrew Stubbs (ams-codesourcery) on 2011-01-28

tags:

added: 46merge

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2011-04-15:

The second patch is still stuck. Richard Earnshaw has reviewed the patch and come up with some concerns. I have looked at the problem, but not yet found time to fix it. I did post a query, but Richard did not reply. I'll ping him now.

Ramana Radhakrishnan (ramana) on 2012-05-28

Changed in gcc-linaro:
status:	In Progress → Fix Released
Changed in gcc-linaro-tracking:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.