Bug #637797 “[size] Failure to optimise (a/b) and (a%b) into sin...” : Bugs : Linaro GCC

Revision history for this message

In GCC Bugzilla #43721, Alexander Strange (astrange) wrote on 2010-04-12:

#1

Still the case with 4.5.

> arm-none-linux-gnueabi-gcc -Os -S divmod.c
> cat divmod.s
.cpu arm10tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.file "divmod.c"
.global __aeabi_idivmod
.global __aeabi_idiv
.text
.align 2
.global divmod
.type divmod, %function
divmod:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
stmfd sp!, {r4, r5, r6, lr}
mov r6, r0
mov r5, r1
bl __aeabi_idivmod
mov r0, r6
mov r4, r1
mov r1, r5
bl __aeabi_idiv
add r0, r4, r0
ldmfd sp!, {r4, r5, r6, pc}
.size divmod, .-divmod
.ident "GCC: (GNU) 4.5.0 20100325 (experimental)"
.section .note.GNU-stack,"",%progbits

Revision history for this message

In GCC Bugzilla #43721, Rearnsha (rearnsha) wrote on 2010-04-14:

#2

Before confirming this bug, we need to determine if the following program has well defined behaviour:

int count_div0 = 0;

int __aeabi_div0()
{
count_div0++;
return 0;
}

int divmod(int a, int b)
{
    int q = a / b;
    int r = a % b;
    return q + r;
}

int main()
{
  divmod(33, 0);
  printf("%d", count_div0);
  return 0;
}

If the above program should always print "2", then the compiler is correct and this report is invalid. If it may print any value, then the compiler is wrong and should be fixed to optimize this case. Note that __aeabi_div0 will be called if either __aeabi_idiv or __aeabi_idivmod are called with 0 as the second parameter.

Revision history for this message

In GCC Bugzilla #43721, Rguenth (rguenth) wrote on 2010-04-14:

#3

The policy has been AFAIR that we can collapse multiple traps into a single one,
but not remove traps completely (or of course not introduce new ones).

Revision history for this message

In GCC Bugzilla #43721, Mans Rullgard (mansr) wrote on 2010-04-14:

#4

The C99 standard says this about division by zero:

  The result of the / operator is the quotient from the division
  of the first operand by the second; the result of the % operator
  is the remainder. In both operations, if the value of the second
  operand is zero, the behavior is undefined.

The ARM ABI states the following about the __aeabi_div0() function:

  The *div0 functions:
  - Return the value passed to them as a parameter.
  - Or, return a fixed value defined by the execution environment (such as 0).
  - Or, raise a signal (often SIGFPE) or throw an exception, and do not return.
  [...]
  The *div and *divmod functions may be inlined by a tool chain. It is
  Q-o-I whether an inlined version calls *div0 out of line or returns the
  values that would have been returned by a particular value-returning
  version of *div0.

I interpret these as saying the application may make no assumptions about
the behaviour after a division (or modulus) by zero.

Revision history for this message

In GCC Bugzilla #43721, Rearnsha (rearnsha) wrote on 2010-04-14:

#5

I think that's enough evidence to confirm.

Revision history for this message

Yao Qi (yao-codesourcery) wrote on 2010-09-14:

#6

Track GCC PR http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721

Once this bug is fixed, code size of CINT2000/252.eon/src/myrand.cc:myrand() can be reduced to some extent.

GCC FSF turnk (20100802) generates code like this,
0000000c <_Z7my_randv>:
   c: b570 push {r4, r5, r6, lr}
   e: 4c0d ldr r4, [pc, #52] ; (44 <_Z7my_randv+0x38>)
  10: 490d ldr r1, [pc, #52] ; (48 <_Z7my_randv+0x3c>)
  12: 6825 ldr r5, [r4, #0]
  14: 4e0d ldr r6, [pc, #52] ; (4c <_Z7my_randv+0x40>)
  16: 4628 mov r0, r5
  18: f7ff fffe bl 0 <__aeabi_idiv>
  1c: 4346 muls r6, r0
  1e: 490a ldr r1, [pc, #40] ; (48 <_Z7my_randv+0x3c>)
  20: 4628 mov r0, r5
  22: f7ff fffe bl 0 <__aeabi_idivmod>
  26: f244 13a7 movw r3, #16807 ; 0x41a7
  2a: fb03 6601 mla r6, r3, r1, r6
  2e: 2e00 cmp r6, #0
  30: bfdc itt le
  32: f106 4600 addle.w r6, r6, #2147483648 ; 0x80000000
  36: 3e01 suble r6, #1
  38: 6026 str r6, [r4, #0]
  3a: 6820 ldr r0, [r4, #0]
  3c: f3c0 200e ubfx r0, r0, #8, #15
  40: bd70 pop {r4, r5, r6, pc}
  42: bf00 nop

Once bug is fixed, code can be better,
00000006 <_Z7my_randv>:
   6: b510 push {r4, lr}
   8: 4c0a ldr r4, [pc, #40] ; (34 <_Z7my_randv+0x2e>)
   a: 490b ldr r1, [pc, #44] ; (38 <_Z7my_randv+0x32>)
   c: 6820 ldr r0, [r4, #0]
   e: f7ff fffe bl 0 <__aeabi_idivmod>
  12: f244 12a7 movw r2, #16807 ; 0x41a7
  16: 4351 muls r1, r2
  18: f46f 7231 mvn.w r2, #708 ; 0x2c4
  1c: 4350 muls r0, r2
  1e: eb01 0080 add.w r0, r1, r0, lsl #2
  22: 2800 cmp r0, #0
  24: dc02 bgt.n 2c <_Z7my_randv+0x26>
  26: f06f 4100 mvn.w r1, #2147483648 ; 0x80000000
  2a: 4408 add r0, r1
  2c: 6020 str r0, [r4, #0]
  2e: f3c0 200e ubfx r0, r0, #8, #15
  32: bd10 pop {r4, pc}

Track GCC PR http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721

Once this bug is fixed, code size of CINT2000/252.eon/src/myrand.cc:myrand() can be reduced to some extent.

GCC FSF turnk (20100802) generates code like this,
0000000c <_Z7my_randv>:
   c:   b570            push    {r4, r5, r6, lr}
   e:   4c0d            ldr     r4, [pc, #52]   ; (44 <_Z7my_randv+0x38>)
  10:   490d            ldr     r1, [pc, #52]   ; (48 <_Z7my_randv+0x3c>)
  12:   6825            ldr     r5, [r4, #0]
  14:   4e0d            ldr     r6, [pc, #52]   ; (4c <_Z7my_randv+0x40>)
  16:   4628            mov     r0, r5
  18:   f7ff fffe       bl      0 <__aeabi_idiv>
  1c:   4346            muls    r6, r0
  1e:   490a            ldr     r1, [pc, #40]   ; (48 <_Z7my_randv+0x3c>)
  20:   4628            mov     r0, r5
  22:   f7ff fffe       bl      0 <__aeabi_idivmod>
  26:   f244 13a7       movw    r3, #16807      ; 0x41a7
  2a:   fb03 6601       mla     r6, r3, r1, r6
  2e:   2e00            cmp     r6, #0
  30:   bfdc            itt     le
  32:   f106 4600       addle.w r6, r6, #2147483648     ; 0x80000000
  36:   3e01            suble   r6, #1
  38:   6026            str     r6, [r4, #0]
  3a:   6820            ldr     r0, [r4, #0]
  3c:   f3c0 200e       ubfx    r0, r0, #8, #15
  40:   bd70            pop     {r4, r5, r6, pc}
  42:   bf00            nop

Once bug is fixed, code can be better,
00000006 <_Z7my_randv>:
   6:   b510            push    {r4, lr}
   8:   4c0a            ldr     r4, [pc, #40]   ; (34 <_Z7my_randv+0x2e>)
   a:   490b            ldr     r1, [pc, #44]   ; (38 <_Z7my_randv+0x32>)
   c:   6820            ldr     r0, [r4, #0]
   e:   f7ff fffe       bl      0 <__aeabi_idivmod>
  12:   f244 12a7       movw    r2, #16807      ; 0x41a7
  16:   4351            muls    r1, r2
  18:   f46f 7231       mvn.w   r2, #708        ; 0x2c4
  1c:   4350            muls    r0, r2
  1e:   eb01 0080       add.w   r0, r1, r0, lsl #2
  22:   2800            cmp     r0, #0
  24:   dc02            bgt.n   2c <_Z7my_randv+0x26>
  26:   f06f 4100       mvn.w   r1, #2147483648 ; 0x80000000
  2a:   4408            add     r0, r1
  2c:   6020            str     r0, [r4, #0]
  2e:   f3c0 200e       ubfx    r0, r0, #8, #15
  32:   bd10            pop     {r4, pc}

Revision history for this message

Yao Qi (yao-codesourcery) wrote on 2010-09-14:

#7

This optimization should be also useful to 256.bzip2/spec.c:ran().

Michael Hope (michaelh1) on 2010-09-28

tags:

added: size task

Revision history for this message

In GCC Bugzilla #43721, Andrew Stubbs (ams-codesourcery) wrote on 2010-10-20:

#8

Here's the problem: expand_divmod always prefers the straight "div" library call over the "divmod" library call, no exceptions.

Yes, "divmodsi4" in a .md is indeed preferred over "divsi4", so the optimization works fine on targets that have those, but ARM doesn't, so those rules are irrelevant.

ARM does not provide a separate library function for "mod", so expand_divmod does use "divmod" for mod operations, but never for div operations.

The reason for the apparent opposite rules appears to be that the divmodsi4 rule can be coded to auto-detect the most optimal kind of underlying operation to use, whereas the library call is fixed, once chosen.

I see two possible ways to fix this:
1. Teach CSE (or somewhere) that div and divmod library calls have some overlap.
2. Always prefer divmod, and find some other way to convert it to div, where appropriate.

I don't see any way to generate the RTL with the right call, so I'm pretty sure it does have to be fixed up after the fact.

Or, I could be way off. :)

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2010-10-20:

#9

Here's the problem: expand_divmod always prefers the straight "div" library
call over the "divmod" library call, no exceptions.

Yes, "divmodsi4" in a .md is indeed preferred over "divsi4", so the
optimization works fine on targets that have those, but ARM doesn't, so those
rules are irrelevant.

ARM does not provide a separate library function for "mod", so expand_divmod
does use "divmod" for mod operations, but never for div operations.

The reason for the apparent opposite rules appears to be that the divmodsi4
rule can be coded to auto-detect the most optimal kind of underlying operation
to use, whereas the library call is fixed, once chosen.

I see two possible ways to fix this:
1. Teach CSE (or somewhere) that div and divmod library calls have some
overlap.
2. Always prefer divmod, and find some other way to convert it to div, where
appropriate.

I don't see any way to generate the RTL with the right call, so I'm pretty sure
it does have to be fixed up after the fact.

Or, I could be way off. :)

Revision history for this message

Yao Qi (yao-codesourcery) wrote on 2010-10-21: Re: [Bug 637797] Re: [size] Failure to optimise (a/b) and (a%b) into single __aeabi_idivmod call

#10

On 10/21/2010 12:54 AM, Andrew Stubbs wrote:
> Here's the problem: expand_divmod always prefers the straight "div" library
> call over the "divmod" library call, no exceptions.
>

Can you explain a little bit further? I go through
expmed.c:expand_divmod, but didn't get such conclusion that
"expand_divmod always prefers the straight div over divmod".

> Yes, "divmodsi4" in a .md is indeed preferred over "divsi4", so the
> optimization works fine on targets that have those, but ARM doesn't, so those
> rules are irrelevant.
>

What is "divsi4"? Is it a RTL pattern or something else? I can't find
it in gcc source by grep.

> ARM does not provide a separate library function for "mod", so expand_divmod
> does use "divmod" for mod operations, but never for div operations.
>
> The reason for the apparent opposite rules appears to be that the divmodsi4
> rule can be coded to auto-detect the most optimal kind of underlying operation
> to use, whereas the library call is fixed, once chosen.
>
> I see two possible ways to fix this:
> 1. Teach CSE (or somewhere) that div and divmod library calls have some
> overlap.
> 2. Always prefer divmod, and find some other way to convert it to div, where
> appropriate.

Can we define a peephole2 here? or we can do something on tree level?

--
Yao Qi
CodeSourcery
<email address hidden>
(650) 331-3385 x739

Revision history for this message

Andrew Stubbs (ams-codesourcery) wrote on 2010-10-21:

#11

> Can we define a peephole2 here? or we can do something on tree level?

I believe a peephole optimization would work, if we set expand_divmod to always prefer the full divmod library call. We'd also need to either arrange for the divmod/div decision to be target specific, or else fix up any other affected backends.

However, a peephole wouldn't get to do anything until after register allocation, I think. Would that not make it suboptimal?

I'm not sure a tree level change would be appropriate. Apart from anything else, it might get rejected upstream simply because the existing system works (for most people). Presumably you'd want it to convert div+mod to a builtin function, or something, that would then be translated to the right backend function later?

Revision history for this message

Ramana Radhakrishnan (ramana) wrote on 2011-01-21:

#12

This is still a size and speed optimization at Os but not for this benchmark at O2 and O3. It's also a speed optimization in case you are generating quotients and remainders of 2 different numbers in a routine. It's not a speed optimization for this particular benchmark because you are still generating the multiplies at O2 and O3 but is alright for the same. For this case we won't get a further benefit with code generation and improvements in this particular benchmark for speed.

I've prototyped this on trunk with the branch

https://code.launchpad.net/~ramana/gcc-linaro/divmodsi4-experiments

Ramana

Changed in gcc-linaro:
assignee:	nobody → Ramana Radhakrishnan (ramana)
milestone:	none → 11.05-final
milestone:	11.05-final → none

Bug Watch Updater (bug-watch-updater) on 2011-01-25

Changed in gcc:
importance:	Unknown → Wishlist
status:	Unknown → Confirmed

Ramana Radhakrishnan (ramana) on 2011-01-27

Changed in gcc-linaro:
milestone:	none → 4.5-2011.02-0

Michael Hope (michaelh1) on 2011-02-08

Changed in gcc-linaro:
milestone:	4.5-2011.02-0 → 4.5-2011.03-0
status:	New → Confirmed

Revision history for this message

In GCC Bugzilla #43721, Pinskia (pinskia) wrote on 2011-02-17:

#13

*** Bug 47777 has been marked as a duplicate of this bug. ***

Michael Hope (michaelh1) on 2011-03-08

Changed in gcc-linaro:
milestone:	4.5-2011.03-0 → none

Ramana Radhakrishnan (ramana) on 2011-03-28

tags:

added: speed

Ramana Radhakrishnan (ramana) on 2012-05-29

Changed in gcc-linaro:
importance:	Undecided → Medium

Ulrich Weigand (uweigand) on 2012-05-29

Changed in gcc-linaro:
status:	Confirmed → Triaged

Revision history for this message

In GCC Bugzilla #43721, Prathamesh3492 (prathamesh3492) wrote on 2016-08-27:

#14

*** Bug 65668 has been marked as a duplicate of this bug. ***

Revision history for this message

In GCC Bugzilla #43721, Prathamesh3492 (prathamesh3492) wrote on 2016-10-28:

#15

Author: prathamesh3492
Date: Fri Oct 28 19:05:12 2016
New Revision: 241660

URL: https://gcc.gnu.org/viewcvs?rev=241660&root=gcc&view=rev
Log:
2016-10-28 Prathamesh Kulkarni <email address hidden>
Kugan Vivekanandarajah <email address hidden>
Jim Wilson <email address hidden>

PR tree-optimization/43721
* target.def: New hook expand_divmod_libfunc.
* doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
* doc/tm.texi: Regenerate.
* internal-fn.def: Add new entry for DIVMOD ifn.
* internal-fn.c (expand_DIVMOD): New.
* tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
targhooks.h.
(widen_mul_stats): Add new field divmod_calls_inserted.
(target_supports_divmod_p): New.
(divmod_candidate_p): Likewise.
(convert_to_divmod): Likewise.
(pass_optimize_widening_mul::execute): Call
calculate_dominance_info(), renumber_gimple_stmt_uids() at
beginning of function. Call convert_to_divmod()
and record stats for divmod.
* config/arm/arm.c (arm_expand_divmod_libfunc): Override hook
TARGET_EXPAND_DIVMOD_LIBFUNC.
* doc/sourcebuild.texi: Add items for arm_divmod_simode, divmod,
divmod_simode.

testsuite/
* lib/target-supports.exp (check_effective_target_divmod): New.
(check_effective_target_divmod_simode): Likewise.
(check_effective_target_arm_divmod_simode): Likewise.
* gcc.dg/divmod-1-simode.c: New test.
* gcc.dg/divmod-1.c: Likewise.
* gcc.dg/divmod-2-simode.c: Likewise.
* gcc.dg/divmod-2.c: Likewise.
* gcc.dg/divmod-3-simode.c: Likewise.
* gcc.dg/divmod-3.c: Likewise.
* gcc.dg/divmod-4-simode.c: Likewise.
* gcc.dg/divmod-4.c: Likewise.
* gcc.dg/divmod-5.c: Likewise.
* gcc.dg/divmod-6-simode.c: Likewise.
* gcc.dg/divmod-6.c: Likewise.
* gcc.dg/divmod-7.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/divmod-1-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-1.c
    trunk/gcc/testsuite/gcc.dg/divmod-2-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-2.c
    trunk/gcc/testsuite/gcc.dg/divmod-3-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-3.c
    trunk/gcc/testsuite/gcc.dg/divmod-4-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-4.c
    trunk/gcc/testsuite/gcc.dg/divmod-5.c
    trunk/gcc/testsuite/gcc.dg/divmod-6-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-6.c
    trunk/gcc/testsuite/gcc.dg/divmod-7.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
    trunk/gcc/doc/sourcebuild.texi
    trunk/gcc/doc/tm.texi
    trunk/gcc/doc/tm.texi.in
    trunk/gcc/internal-fn.c
    trunk/gcc/internal-fn.def
    trunk/gcc/target.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/target-supports.exp
    trunk/gcc/tree-ssa-math-opts.c

Author: prathamesh3492
Date: Fri Oct 28 19:05:12 2016
New Revision: 241660

URL: https://gcc.gnu.org/viewcvs?rev=241660&root=gcc&view=rev
Log:
2016-10-28  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
	    Kugan Vivekanandarajah  <kuganv@linaro.org>
	    Jim Wilson  <jim.wilson@linaro.org>

PR tree-optimization/43721
	* target.def: New hook expand_divmod_libfunc.
	* doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
	* doc/tm.texi: Regenerate.
	* internal-fn.def: Add new entry for DIVMOD ifn.
	* internal-fn.c (expand_DIVMOD): New.
	* tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
	targhooks.h.
	(widen_mul_stats): Add new field divmod_calls_inserted.
	(target_supports_divmod_p): New.
	(divmod_candidate_p): Likewise.
	(convert_to_divmod): Likewise.
	(pass_optimize_widening_mul::execute): Call
	calculate_dominance_info(), renumber_gimple_stmt_uids() at
	beginning of function. Call convert_to_divmod()
	and record stats for divmod.
	* config/arm/arm.c (arm_expand_divmod_libfunc): Override hook
	TARGET_EXPAND_DIVMOD_LIBFUNC.
	* doc/sourcebuild.texi: Add items for arm_divmod_simode, divmod,
	divmod_simode.

testsuite/
	* lib/target-supports.exp (check_effective_target_divmod): New.
	(check_effective_target_divmod_simode): Likewise.
	(check_effective_target_arm_divmod_simode): Likewise.
	* gcc.dg/divmod-1-simode.c: New test.
	* gcc.dg/divmod-1.c: Likewise.
	* gcc.dg/divmod-2-simode.c: Likewise.
	* gcc.dg/divmod-2.c: Likewise.
	* gcc.dg/divmod-3-simode.c: Likewise.
	* gcc.dg/divmod-3.c: Likewise.
	* gcc.dg/divmod-4-simode.c: Likewise.
	* gcc.dg/divmod-4.c: Likewise.
	* gcc.dg/divmod-5.c: Likewise.
	* gcc.dg/divmod-6-simode.c: Likewise.
	* gcc.dg/divmod-6.c: Likewise.
	* gcc.dg/divmod-7.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/divmod-1-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-1.c
    trunk/gcc/testsuite/gcc.dg/divmod-2-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-2.c
    trunk/gcc/testsuite/gcc.dg/divmod-3-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-3.c
    trunk/gcc/testsuite/gcc.dg/divmod-4-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-4.c
    trunk/gcc/testsuite/gcc.dg/divmod-5.c
    trunk/gcc/testsuite/gcc.dg/divmod-6-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-6.c
    trunk/gcc/testsuite/gcc.dg/divmod-7.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
    trunk/gcc/doc/sourcebuild.texi
    trunk/gcc/doc/tm.texi
    trunk/gcc/doc/tm.texi.in
    trunk/gcc/internal-fn.c
    trunk/gcc/internal-fn.def
    trunk/gcc/target.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/target-supports.exp
    trunk/gcc/tree-ssa-math-opts.c

Revision history for this message

In GCC Bugzilla #43721, Prathamesh3492 (prathamesh3492) wrote on 2016-10-28:

#16

Fixed on trunk.

Revision history for this message

In GCC Bugzilla #43721, Marxin-m (marxin-m) wrote on 2018-11-19:

#17

Can the bug be marked as resolved?

Revision history for this message

In GCC Bugzilla #43721, Ramana-gcc (ramana-gcc) wrote on 2018-11-19:

#18

Fixed in GCC7 then.

Bug Watch Updater (bug-watch-updater) on 2018-11-20

Changed in gcc:
status:	Confirmed → Fix Released

Linaro GCC

[size] Failure to optimise (a/b) and (a%b) into single __aeabi_idivmod call

Bug Description

Other bug subscribers

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	Linaro GCC	Triaged	Medium	Ramana Radhakrishnan
	gcc	Fix Released	Wishlist	gcc-bugzilla #43721