bitfields poorly optimized

Bug #641397 reported by Andrew Stubbs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro GCC
New
Medium
Unassigned
gcc
Fix Released
Wishlist

Bug Description

Consider the following code:

    struct bits {
      unsigned a : 5;
      unsigned b : 5;
      unsigned c : 5;
      unsigned d : 5;
    };

    struct bits
    f (unsigned int a) {
      struct bits bits;
      bits.a = 1;
      bits.b = 2;
      bits.c = 3;
      bits.d = a;
      return bits;
    }

The function 'f' creates a new struct with three constant initialized bitfields, and one from the argument.

You might expect GCC to compile this to one constant load, and a single "bfi" instruction for "a", but it doesn't. Instead it produces code that builds the struct from component bit fields at run time. This is both speed and space inefficient.

[CodeSourcery Tracker ID #6753]

Related branches

Michael Hope (michaelh1)
tags: added: speed task
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

The compiler has some code to deal with this sort of case, but it appears bit-rotted.

It doesn't work when:
 * the target cpu has no bit-field-insert instruction that takes an immediate constant.
 * the assignment is to an uninitialized (or zero-initialized) destination.

Both are true in this case.

Changed in gcc-linaro:
status: New → Triaged
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

I'm testing a patch for this problem.

Changed in gcc-linaro:
assignee: nobody → Andrew Stubbs (ams-codesourcery)
status: Triaged → In Progress
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :
Revision history for this message
Chung-Lin Tang (cltang) wrote :
Revision history for this message
In , Andrew Stubbs (ams-codesourcery) wrote :

The compiler fails to do constant folding for bit fields, at least on ARM targets, and instead builds the constant at run time.

Test case:

  struct bits
  {
     unsigned a:5;
     unsigned b:5;
     unsigned c:5;
     unsigned d:5;
  };

  struct bits
  f (unsigned int a)
  {
     struct bits bits = {0,0,0,0};
     bits.a = 1;
     bits.b = 2;
     bits.c = 3;
     bits.d = a;
     return bits;
  }

Output, compiled for ARM with "-O2 -mcpu=cortex-a8 -mthumb":
        movs r2, #1
        movs r3, #0
        bfi r3, r2, #0, #5
        movs r2, #2
        bfi r3, r2, #5, #5
        movs r2, #3
        bfi r3, r2, #10, #5
        bfi r3, r0, #15, #5
        mov r0, r3
        bx lr

Revision history for this message
In , Andrew Stubbs (ams-codesourcery) wrote :

Two different patches have been posted to fix this:

 http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00778.html

 http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00784.html

One, or both should be applied to GCC 4.7, when the time comes.

Revision history for this message
In , Pinskia (pinskia) wrote :

Here is a simpler example. Take:
struct b
{
  int t;
  int tt;
};
union a
{
  long long t;
  struct b b;
};
long long f(int i, int t)
{
  long long ii = ((long long)i) << 32;
  union a a;
  a.t = 0;
  a.b.t = i;
  a.b.tt = t;
  return a.t;
}

--- CUT ---
For MIPS64r2 (both n64 and n32):
 move $2,$0
 dins $2,$4,32,32
 j $31
 dins $2,$5,0,32
--- CUT ---
is produced but the first two instructions can really be done as one shift:
        dsll $2,$4,32
 j $31
 dins $2,$5,0,32

Revision history for this message
In , Pinskia (pinskia) wrote :

(In reply to comment #1)
> Two different patches have been posted to fix this:
>
> http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00778.html
>
> http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00784.html

The testcase in comment #2 is not optimized with the CSE patch but is with the combine patch.

Changed in gcc:
importance: Unknown → Wishlist
status: Unknown → Confirmed
Michael Hope (michaelh1)
Changed in gcc-linaro:
importance: Undecided → Medium
importance: Medium → Low
Revision history for this message
Andrew Stubbs (ams-codesourcery) wrote :

This is still stuck.

There was some enthusiasm upstream for the part of my patch (patch 1) that reorders the passes, and I need to get around to testing/benchmark that to make sure it has no undesirable side-effects. The other part of that patch was not approved.

So reviewers upstream expressed a preference for Chung-Lin's patch (patch 2, above), not least because it appears to catch another case mine does not. However, this patch had some bugs that would need fixing before it could be committed.

Changed in gcc-linaro:
importance: Low → Medium
Changed in gcc-linaro:
assignee: Andrew Stubbs (ams-codesourcery) → nobody
Michael Hope (michaelh1)
Changed in gcc-linaro:
status: In Progress → New
Revision history for this message
In , Pinskia (pinskia) wrote :

Created attachment 28428
A third patch

Here is a third patch which improves the code a different way, one which seems better as combine.c already had the code which is supposed to do the combining of:
(set (x) (const))
(set (zero_extract (x a b) (const))
But it forgot that x can be a non subreg if we are using a zero extract.

Also it is easy add the case where the second set does not have a const there and optimize it to a shift followed by an and which gets my testcase too.

Revision history for this message
In , Pinskia (pinskia) wrote :

It actually does not work for the arm case but it can be improved to work for it which I am doing now.

Revision history for this message
In , Pinskia (pinskia) wrote :

Created attachment 28431
New patch

Still does not fix the ARM case but similar thing can be done for the three combine case. It does fix the mips case which I was looking into; though really there should be only one insert and no shift.

Revision history for this message
In , Pinskia (pinskia) wrote :

Oh with my current patch, we are able to remove at least one bfi, the first one.

Changed in gcc:
status: Confirmed → In Progress
Revision history for this message
In , Pinskia (pinskia) wrote :

http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00952.html
Will improve the testcase at the tree level for little-endian:
  _4 = (unsigned char) a_3(D);
  _5 = (<unnamed-unsigned:5>) _4;
  BIT_FIELD_REF <D.1727, 10, 0> = 65;
  D.1727.c = 3;
  D.1727.d = _5;
  return D.1727;

And will fix it on big-endian:
  D.1354_2 = (unsigned char) a_1(D);
  D.1355_3 = (<unnamed-unsigned:5>) D.1354_2;
  BIT_FIELD_REF <D.1356, 15, 0> = 1091;
  D.1356.d = D.1355_3;

Revision history for this message
In , Pinskia (pinskia) wrote :

No longer working on this.

Revision history for this message
In , Pinskia (pinskia) wrote :

Fixed fully in GCC 10+:
        movw r3, #3137
        bfi r3, r0, #15, #5
        mov r0, r3
        bx lr

Had also improved in GCC 8:
        movw r2, #3137
        movs r3, #0
        bfi r3, r2, #0, #16
        bfi r3, r0, #15, #5
        mov r0, r3
        bx lr

Changed in gcc:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.