Inner loop can be optimized better in autcor00

Bug #662692 reported by Yao Qi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro GCC
Triaged
Medium
Unassigned

Bug Description

In the inner loop of autcor00:fxpAutoCorrelation, currently gcc generate code like this,

.L4:
        adds r2, r3, #2
        ldrh ip, [r0, r3]
        ldrh r7, [r5, r3]
        adds r3, r3, #4
        ldrh r6, [r0, r2]
        ldrh r2, [r5, r2]
        smulbb r7, ip, r7
        smulbb r2, r6, r2
        asrs r7, r7, r4
        adds r7, r1, r7
        asrs r6, r2, r4
        cmp r3, r8
        add r1, r7, r6
        bne .L4

r3 is used as a loop variable, and incremented for 4 each time. r8 is the upper bound of it. However, if we can transform loop variable to a decrement mode, we can make use of subs to replace add/cmp. Like this,

Change,
 adds r3, r3, #4
 .....
 cmp r3, r8
 bne .L4
to
 subs XX XX #1
 bne .L4

Tags: speed task
Revision history for this message
Ulrich Weigand (uweigand) wrote :

This is also related to IVOPTS choices.

Changed in gcc-linaro:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.