Drop in STREAM performance with gcc-linaro 4.8 without -fschedule-insns(2) flags

Bug #1211330 reported by Viswanath Puttagunta
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro GCC
Fix Committed
Undecided
Maxim Kuvyrkov

Bug Description

Release observed on
- gcc-linaro 4.8 based release: 13.06: (gcc-linaro-arm-linux-gnueabihf-4.8-2013.06_linux)
- Drop in performance first observed in gcc-linaro 4.7 based 13.03 release.

Description of Issue:
- Observing STREAM benchmark degradation on TI’s Keystone 2 device (Cortex-A15 based).
- Digging into various flags found that adding ‘-fschedule-insns’, ‘-fschedule-insns2’ are causing improvement in performance. Trying to understand why.

More info regarding STREAM benchmark
- STREAM benchmark: http://www.streambench.org
- Source code: http://www.cs.virginia.edu/stream/FTP/

Observations based on gcc-linaro: gcc-linaro-arm-linux-gnueabihf-4.8-2013.06_linux
Note the 3rd run below has much better ‘Scale’ numbers.

FLAGS = $(DEFINES) -O3 -march=armv7-a -ffast-math -mfpu=neon -ftree-vectorize -funsafe-math-optimizations -mfloat-abi=hard -fprefetch-loop-arrays -fomit-framepointer -fforce-addr -mthumb
Function Rate (MB/s)
Copy: 3206.9840
Scale: 1402.7693
Add: 2526.0525
Triad: 2642.3057

FLAGS = $(DEFINES) -O3
Function Rate (MB/s)
Copy: 3216.0284
Scale: 1399.2090
Add: 2499.4611
Triad: 2616.1260

FLAGS = $(DEFINES) -O3 -march=armv7-a -ffast-math -mfpu=neon -ftree-vectorize -funsafe-math-optimizations
-mfloat-abi=hard -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -mthumb -fno-schedule-insns -fno-schedule-insns2
Function Rate (MB/s)
Copy: 3230.9757
Scale: 3129.9206
Add: 2607.2765
Triad: 2557.7856

Revision history for this message
Bernhard Rosenkraenzer (berolinux) wrote :

Looking at the benchmark results posted, I think "Digging into various flags found that adding ‘-fschedule-insns’, ‘-fschedule-insns2’ are causing improvement in performance." should actually be -fno-schedule-insns and -fno-schedule-insns2?

Do the results change if you add -mcpu=cortex-a15 -mtune=cortex-a15? There's a couple of differences between A15 and generic v7-a that might effect instruction scheduling

Revision history for this message
Lalindra Jayatilleke (lalindra) wrote :

Yes, -fno-schedule-insns and -fno-schedule-insns2 were causing improvement. -mcpu=cortex-a15 -mtune=cortex-a15 did not improve numbers. Any other updates on this bug?
Thanks.

Changed in gcc-linaro:
assignee: nobody → Maxim Kuvyrkov (maxim-kuvyrkov)
Revision history for this message
Yvan Roux (yvan-roux) wrote :
Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :

Fix committed to FSF trunk, which will become GCC 5.0. It is planned to be backported to Linaro GCC 4.9.

Changed in gcc-linaro:
status: New → Fix Committed
Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :

Backported to linaro-4.9-branch in rev. 221634.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.