Improve constant handling of thumb2 instructions

Bug #663939 reported by Yao Qi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro GCC
Fix Released
Undecided
Andrew Stubbs
gcc
Fix Released
Wishlist

Bug Description

This bug is reported in upstreams GCC, and the problem exists in Linaro GCC 4.5 also.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46092

Compile the following code with options -march=armv7-a -mthumb -Os

unsigned long tv(unsigned long x)
{
  if (x >= 65521)
    x = x - 65521;
  return x;
}

gcc generates:

tv:
        movw r3, #65520
        cmp r0, r3
        ittt hi
        subhi r0, r0, #65024 // A
        subhi r0, r0, #496 // B
        subhi r0, r0, #1 // C
        bx lr

Notice that (x = x - 65521) is translated into 3 instructions. Actually 65521
can be loaded into register with one instruction. So the shorter result is:

tv:
        movw r3, #65520
        cmp r0, r3
        ittt hi
        movwhi r1, #65521
        subhi r0, r1
        bx lr

This should be improved by function arm_gen_constant. The function is already
commented with
/* ??? This needs more work for thumb2. */

Tags: size speed task

Related branches

Revision history for this message
Yao Qi (yao-codesourcery) wrote :

This bug is reported in upstreams GCC, and the problem exists in Linaro GCC 4.5 also.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46092

Compile the following code with options -march=armv7-a -mthumb -Os

unsigned long tv(unsigned long x)
{
  if (x >= 65521)
    x = x - 65521;
  return x;
}

gcc generates:

tv:
        movw r3, #65520
        cmp r0, r3
        ittt hi
        subhi r0, r0, #65024 // A
        subhi r0, r0, #496 // B
        subhi r0, r0, #1 // C
        bx lr

Notice that (x = x - 65521) is translated into 3 instructions. Actually 65521
can be loaded into register with one instruction. So the shorter result is:

tv:
        movw r3, #65520
        cmp r0, r3
        ittt hi
        movwhi r1, #65521
        subhi r0, r1
        bx lr

This should be improved by function arm_gen_constant. The function is already
commented with
/* ??? This needs more work for thumb2. */

Revision history for this message
In , Ibolton (ibolton) wrote :

Confirmed on trunk, revision 165513.

In your example, another way would be to subtract r3 (#65520) and then #1.

Note: neither -O3 or -O2 can avoid the three subhi instruction, currently.

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

Another constant problem: thumb2 replicated constants of the form 0xXY00XY00 are rejected incorrectly.

I've posted a patch for this here:
 http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02338.html

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

Another problem: SUBW and ADDW are not supported. These support 12-bit constants, and are therefore useful. Note that MOVW is supported (and that that has a 16-bit constant), so is not the same.

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

Another problem: arm_gen_constant can build constants from multiple immediates, but only by means of the 8-bit shifted format immediates. Using the replicated constants (of the form 0xXYXYXYXY, 0xXY00XY00 and 0x00XY00XY) would provide extra opportunities for optimal instruction sequences.

This would be much less straightforward, however.

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

Another size optimization opportunity (I think): currently the compiler always(?) does 32-bit constant loads to registers by using a movw/movt. This is the minimum number of instructions, and is probably as fast as it can be, but for some subset of possible constants there may be a size benefit from using a different pair of instructions including a 16-bit thumb1 mov/add/sub instruction.

Thumb1 mov/add/sub can only be used to set/add the bottom 8 bits of the register, that's ok for constants with bit patterns permitting the rest of the constant to be encoded in a thumb2 immediate (perhaps using replicated patterns).

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

To fix the original problem reported, we need to detect the 3 instruction case, and, if the extra registers are available, and the constant fits movw, emit the movw as a special case. We should prefer solutions that don't add to register pressure.

Changed in gcc-linaro:
assignee: nobody → Andrew Stubbs (ams-codesourcery)
status: New → In Progress
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

I've posted a patch upstream here:
 http://old.nabble.com/-PATCH--ARM--Thumb2-constant-loading-optimization-to30405142.html

This covers comment #2, comment #3, and the original problem, although not by the solution suggested in the original description, or in comment #5.

Revision history for this message
In , Andrew Stubbs (ams-codesourcery) wrote :

I've posted a patch to fix this issue (among others):

  http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html

Revision history for this message
In , Andrew Stubbs (ams-codesourcery) wrote :

Oh, I should say, what it actually does is this:

         subhi r0, r0, #65280 ; 0xff00
         subhi r0, r0, #241 ; 0x00f1

Changed in gcc:
importance: Unknown → Medium
status: Unknown → In Progress
tags: added: speed
Revision history for this message
Ramana Radhakrishnan (ramana) wrote :

This is what we generate today which appears to be much better than the original bug report. This should be marked as fixed in FSF 4.7 and Linaro GCC 4.6

tv:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 movw r3, #65520
 cmp r0, r3
 itt hi
 subhi r0, r0, #65280
 subhi r0, r0, #241
 bx lr

Revision history for this message
In , Ramana-gcc (ramana-gcc) wrote :

Trunk generates this today. Thus fixed.

Ramana

tv:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 movw r3, #65520
 cmp r0, r3
 itt hi
 subhi r0, r0, #65280
 subhi r0, r0, #241
 bx lr
 .size tv, .-tv
 .ident "GCC: (GNU) 4.7.0 20120116 (experimental)"

Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

Yeah, I'm terrible at cleaning up bug trackers ....

Changed in gcc-linaro:
milestone: none → 4.6-2011.09
status: In Progress → Fix Released
Changed in gcc:
importance: Medium → Wishlist
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.