Some workloads experience more measurement variation with scaling_governor=performance than ondemand

Bug #1470404 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
Utopic
Fix Released
Medium
Chris J Arges
Vivid
Fix Released
Medium
Chris J Arges

Bug Description

SRU Justification:
[Impact]
Certain workloads can exhibit a large variance in behavior due to how how cpus are idled on power8 systems.

[Fix]

For 3.16:
74aa51b5ccd3975392e30d11820dc073c5f2cd32
92c83ff5b42b109c94fdeee53cb31f674f776d75
70734a786acfd1998e47d40df19cba5c29469bdf

For 3.16, 3.19:
78eaa10f027cf69f9bd409e64eaff902172b2327

$ git describe 78eaa10f027cf69f9bd409e64eaff902172b2327
v4.1-rc2-9-g78eaa10
Once we rebase to something v4.1+ we'll have this fixed in Wily.

[Test Case]
Set the system with the SMT8 mode and scaling_governor=performance or ondemand.
Run the workload 100 times.

--

== Comment: #0 - Peter W. Wong <email address hidden> - 2015-04-15 21:30:31 ==
---Problem Description---
Many workloads experience wide measurement variation, more with scaling_governor=performance than ondemand.

Contact Information = <email address hidden>, <email address hidden>

---uname output---
Linux c656f7n04 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = 20-core and 24-core Tuleta systems

---Debugger---
A debugger is not configured

---Steps to Reproduce---
Set the system with the SMT8 mode and scaling_governor=performance or ondemand.
Run the workload 100 times.
Get 100 data points and sort them.
Compare the spread of results with two governor modes.
The source and scripts to run a simple test case will be provided.

Stack trace output:
 no

Oops output:
 no

Userspace tool common name: not sure what it is.

Userspace rpm: ??

The userspace tool has the following bit modes: These are 64-bit programs.

System Dump Info:
  The system is not configured to capture a system dump.

Userspace tool obtained from project website: na

*Additional Instructions for <email address hidden>, <email address hidden>:
-Attach sysctl -a output output to the bug.
-Attach ltrace and strace of userspace application.

== Comment: #2 - Paul A. Clarke <email address hidden> - 2015-04-16 08:47:41 ==
This problem has a number of variables we were trying to reduce:
- endianness
- operating system
- kernel level
- compiler

Bob Walkup says he's seen the variability in a bunch of CPU-intensive test cases, in various languages, using various compilers, which would seem to eliminate the "compiler" variable.

We had not looked at the performance governor setting to this point. Interesting results, and yet another variable to add to the above mix. Perhaps two more runs? (LE-ondemand, LE-performance, BE-ondemand, BE-performance)

== Comment: #3 - Paul A. Clarke <email address hidden> - 2015-04-16 08:50:09 ==
Also, Bob says he can reproduce this with and without vectorization (the stalls move from the VSU to the FPU), and with and without floating point (the stalls move from the FPU to the FXU). Very odd.

== Comment: #4 - Andrea M. Davis <email address hidden> - 2015-04-16 10:10:01 ==
Peter, what version of Ubuntu are you running?

== Comment: #5 - Peter W. Wong <email address hidden> - 2015-04-16 10:32:58 ==
Andrea,

Ubuntu 14.04.2 LTS.

#uname -a
Linux c656f7n04 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

#lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

== Comment: #6 - Peter W. Wong <email address hidden> - 2015-04-16 10:50:11 ==
There are a few more things we have tried.

(1) For STREAM, it was originally compiled with gfotran and its corresponding OpenMP. I compiled it with xlf and its corresponding OpenMP. There is no difference in performance.

(2) There was a concern about NUMA, meaning is it possible the CPU binding by OpenMP is incorrect so that there are remote memory accesses behind the scene? By disabling one DCM and using 10 or 12 cores only in the other DCM, we can still see occasional drops in performance, although not often. We can conclude it is not due to NUMA.

(3) Farid and I also tried out different scheduler parameters (sched_min_granularity_ns, sched_wakeup_granularity_ns, sched_latency_ns, and others) and matched the correponding the other distro's values, but did not see performance changes.

(4) For the workload AMG2006, the use of scaling_performance=ondemand also reduces the spread of data significantly.

(5) For all the above investigations, I used a 20-core Tuleta and a 24-core Tuleta, although they are configured identically with Ubuntu 14.04.2. I mean two systems paint a consistent picture.

So far, we looked at compiler, NUMA, scheduler, memory test, CPU test, ST vs SMT, etc. There is a significant difference in variation between scaling_governor=performance and scaling_govenor=ondemand with the same application and system configurations.

Hopefully, the data point us to the right direction, i.e., there could be some unexpected behaviour with the implementation of scaling_governor=performance.

== Comment: #7 - Peter W. Wong <email address hidden> - 2015-04-16 14:30:21 ==
Note that Bob Walkup does not see the improvement using scaling_governor=ondemand on a borrowed POK lab system. However, he still suggested me to open a bug based on my findings. I guess he is not totally sure about the system he got.

It would be good to have data independently collected by others to verify my observations.

Bob's serial_loop.c program can be compiled and run very easily. The examination of data is straightforward too.

== Comment: #10 - JENIFER HOPPER <email address hidden> - 2015-04-17 16:33:38 ==
I was able to reproduce the problem with the serial_loop test described in comment 1 (my system is Ubunu 15.04), however disabling the nap cpuidle state seemed to resolve the variance:

cpupower idle-set -d 0

Can others reproduce? I am not sure why nap behavior would be any different w/ the performance governor though.. Note, to re-enable: cpupower idle-set -E

== Comment: #11 - JENIFER HOPPER <email address hidden> - 2015-04-20 13:09:34 ==
(In reply to comment #10)
> disabling the nap cpuidle
> state seemed to resolve the variance:
>
> cpupower idle-set -d 0

just want to clarify state0 is actually snooze, not nap:
# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
snooze
# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/name
Nap

== Comment: #12 - Peter W. Wong <email address hidden> - 2015-04-20 16:26:32 ==
Jenifer, thanks for the suggestion.

"cpupower idle-set -d 0" works for Bob's serial_loop.c program.

There are 24 identical processes running serial_loop in parallel, each bound to one core. With 100 iterations, there are 2400 elapsed times collected for each run. Each elapsed time over 5 seconds is counted as an outlier.

The following data were collected on a 24-core Tuleta system.

Scaling_govenor = P(erformance) or O(ndemand)
snooze (state0) = default (enabled) and disabled

P and default = 34-35 outliers
P and snooze disabled = 0 outliers

O and default = 2-4 outliers
O and snooze disabled = 0 outliers

As you asked, why do we need to disable snooze in order to reduce measurement variation when scaling_governor=performance?

== Comment: #13 - JENIFER HOPPER <email address hidden> - 2015-04-20 16:40:46 ==
Vaidy, could your team comment on this? In SMT8 mode, more measurement variation is seen using the performance governor compared to the ondemand governor when snooze is enabled, but disabling snooze seems to resolve the problem. Does it make sense that snooze impacts would be higher in performance mode?

Stewart mentioned some latency improvements in the new 830 OPAL firmware, is that related to this type of sleep state wakeup?

== Comment: #14 - Peter W. Wong <email address hidden> - 2015-04-21 12:23:01 ==
"cpupower idle-set -d 0" also fixes the measurement variation of STREAM on a 24-core Tuleta system.

scaling_governor=performance and default snooze = 65 outliers out of 400 runs.

scaling_governor=performance and snooze disabled = 0 outlier out of 400 runs.

== Comment: #15 - Peter W. Wong <email address hidden> - 2015-04-21 23:21:22 ==
"cpupower idle-set -d 0" also fixes the measurement variation of AMG2006 on a 24-core Tuleta system.

It means when scaling_governor=performance, disabling snooze (state0, shallow sleep) while still enabling Nap (state1, deep sleep) can stabilize measurements.

Vaidy, please help understand this behaviour.

== Comment: #17 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-04-22 14:22:11 ==
Hi Team,

Interesting observation. Let me give possible contributing factors:

(a) When running on ondemand, cpu frequency changed from min to max including turbo frequencies.
(b) When running performance governor, frequency is set to constantly run turbo.

Based on temperature, CPU may not be able to sustain turbo since we are constantly running at the frequency and burning more power. The variation could actually come from the fact that we the platform (OCC) could drop the frequency periodically due to over temperature.

While running ondemand, turning down the power could help sustain the turbo frequency longer.
Disabling snooze will further increase the power consumption and push for more variation at turbo frequency.

Our systems are designed to run consistently at nominal frequency and hence I would suggest that you run your experiment by setting nominal frequency to all cores using performance governor+max limit or userspace governor.

You could use "Throughput-performance profile" using tuned-adm for this purpose.

If running in "Nominal" Frequency gives you consistent performance, then the above theory of turbo mode variation holds good. We can confirm them with additional traces in cpufreq back-end driver code. We are currently improving our instrumentation to detect frequency variation and throttling. This is a good scenario to validate our trace design as well.

Let me know what you find.

--Vaidy

== Comment: #18 - JENIFER HOPPER <email address hidden> - 2015-04-22 14:28:15 ==
(In reply to comment #17)

> Disabling snooze will further increase the power consumption and push for
> more variation at turbo frequency.

We actually see the opposite effect, disabling snooze makes the variability at turbo freq go away :)

== Comment: #19 - Basu Vaidyanathan <email address hidden> - 2015-04-22 14:44:43 ==
Additionally, this is not a problem when running BE kernel, on the same P8 configuration box. I suspect
it is more to do with configuration settings on LE before we start pointing finger at the FW codepath
when using Ubuntu LE.

== Comment: #20 - Paul A. Clarke <email address hidden> - 2015-04-22 15:23:43 ==
Bob is finding another distro LE does _not_ exhibit variation.

This would seem to eliminate LE as the culprit.

Looking at the settings of /sys/devices/system/cpu/cpu*/cpuidle/state0/disable, they all report "0", which I believe is the same as having "snooze" enabled, correct? That would seem to eliminate "snooze" in and of itself as a culprit, *at least with this kernel level (3.10.0-210.ael7a)*.

I'm starting to suspect it's an issue with the kernel in Ubuntu (3.16...)

== Comment: #21 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-04-22 15:31:41 ==

Running at constant nominal frequency will help you eliminate turbo mode variation and focus on the Linux issues and root-cause faster.

The behavior I described above is not a bug or problem in firmware. It is the expected and correct behavior where throttling can happen. I am only trying to help you to reduce the number of variables that is affecting this experiment.

--Vaidy

== Comment: #22 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-04-22 15:35:45 ==
(In reply to comment #20)

This is good input. The other distro does not have fast-sleep support. We will have only snooze and nap. On the Ubuntu system do you see /sys/devices/system/cpu/cpu*/cpuidle/state2/name ?

Disabling fast-sleep state if present in your Ubuntu setup could help us to the next step.

--Vaidy

== Comment: #23 - Robert E. Walkup <email address hidden> - 2015-04-22 16:30:28 ==
On the different distro LE system provided by Paul Clarke, the observed behavior is different than what I have seen on Ubuntu LE systems, but one of the tests ... the MPI-enabled simple loop ... shows huge timing variations core-to-core for nearly every job. That system has 24 cores in smt8 mode

ppc64_cpu --frequency
Power Savings Mode: Dynamic, Favor Performance
min: 3.961 GHz (cpu 175)
max: 3.963 GHz (cpu 1)
avg: 3.962 GHz

and nearly every job provides output that looks like this :
out.10:tmin = 3.757, tmax = 6.519 on rank 17, tavg = 5.126

meaning that it takes anywhere from 3.757 to 6.519 seconds to get through the timed loop :

   MPI_Barrier(MPI_COMM_WORLD);
   t1 = MPI_Wtime();
   sum = 0.0;
   for (i=0; i<2000000000; i++) sum += ((double) (i%10));
   t2 = MPI_Wtime();
   elapsed = t2 - t1;

There are no loads or stores in that loop ... there is a separate process bound to each core, and they work independently. Additional instrumentation shows that the slow processes are in the run queue the whole time.

So far, the other work loads that I have tried on the different distro LE system showed significantly lower timing variations than what I had recorder on Ubuntu LE ... but not this one.

== Comment: #24 - Robert E. Walkup <email address hidden> - 2015-04-22 16:54:07 ==
Just adding that on the same different distro LE system, after turning off SMT via the command : ppc64_cpu --smt=1, all instances of the simple loop test have outputs like this :

tmin = 3.756, tmax = 3.757 on rank 5, tavg = 3.757

in other words it takes the same time to complete the work in the loop on every core ... every time, within the limits of what I have had the patience to check.

== Comment: #25 - Peter W. Wong <email address hidden> - 2015-04-22 17:03:16 ==
Bob, the use of ST mode reduces variation on Ubuntu 14.04.2 as well.

With SMT8 on another distro LE, I wonder whether "cpupower idle-set -d 0" helps reduce variation for the MPI-enabled simple loop?

Is it correct to say that both Ubuntu LE 14.04.2 (kernel 3.16.0) and another distro LE (kernel) exhibit variation?

Vaidy, Ubuntu 14.4.2 does not have cpuidle/state2 (fastsleep state).

== Comment: #26 - Robert E. Walkup <email address hidden> - 2015-04-22 17:11:42 ==
I ran the command :

[root@tuleta ~]# cpupower idle-set -d 0
Idlestate 0 disabled on CPU 0
Idlestate 0 disabled on CPU 1
...

on the different distro LE system after setting the state back to smt8, and the timing variability is still there :

out.2:tmin = 3.757, tmax = 9.010 on rank 4, tavg = 4.619
out.3:tmin = 3.757, tmax = 11.518 on rank 2, tavg = 4.684
out.4:tmin = 3.757, tmax = 9.398 on rank 3, tavg = 4.773

Essentially every job is showing truly huge timing variations.

== Comment: #27 - Peter W. Wong <email address hidden> - 2015-04-22 17:24:46 ==
Does it make any difference with "cpupower idle-set -d 1"? to disable Nap too?

I think we only have snooze and Nap on LE.

== Comment: #28 - Basu Vaidyanathan <email address hidden> - 2015-04-22 17:46:14 ==
(In reply to comment #27)

I have a p8 box running ubuntu 14.10 and I do see
cat /sys/devices/system/cpu/cpu0/cpuidle/state2/name
FastSleep

== Comment: #29 - Preeti U. Murthy <email address hidden> - 2015-04-23 06:01:57 ==
I see that there are hotplug operations being carried out simultaneously with running the benchmark. If so, the performance degradation could be due to the tasks being not allowed to run on the freshly onlined cpus.

I would suggest boot a system with all hardware threads and not do hotplug operations in order to keep the above issue away while verifying the performance of the benchmarks, if the intention is to profile the cpufreq governors.

Regards
Preeti U Murthy

== Comment: #31 - Peter W. Wong <email address hidden> - 2015-04-28 00:27:52 ==
On Ubuntu 14.04.2, there are two states in cpuidle: snooze and Nap.

Are the enabling and disabling of these two states independent?

== Comment: #32 - Robert E. Walkup <email address hidden> - 2015-04-28 16:16:23 ==
Adding an observation on ubuntu le systems, using the simple-loop example above and the userspace governor (chosen so that one can set the frequency to a desired value). When using one thread per core with the system in SMT8 state, the time for the loop varies from ~3.7 sec to over 8 sec. However, if a lot of iterations (10-20) of the same loop are done before starting the timed section of the code (adding a warmup loop), the variations in the timed section are dramatically reduced. There are still some outliers, but a much smaller number of them; and the timing spread is a fraction of one second, instead of several seconds. So there is a clear dependence on history, with the largest timing variations occurring immediately after job startup. I should mention that this remains a problem for many performance benchmarks in the HPC area, which often run in a total time of less than one minute. I would hope that with the userspace governor, or the performance governor, the power and frequency settings would remain constant. Can someone confirm that?

== Comment: #33 - Peter W. Wong <email address hidden> - 2015-04-29 17:16:58 ==
Vaidy, would you help answer my question on Comment 31?

== Comment: #34 - George A. Chochia <email address hidden> - 2015-05-13 11:52:53 ==
Vaidy, I am currently seeing a 2.5x performance degradation in the Message Rate benchmark on p8, Ubuntu 14.04.02 LE.

Performance was normal back in February, when we had 14.04.01 and older FW.

The degradation goes away once snooze state is disabled. There have been two FW updates: 1/13 and 2/17.

== Comment: #35 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-05-13 14:35:37 ==
(In reply to comment #31)
> On Ubuntu 14.04.2, there are two states in cpuidle: snooze and Nap.
>
> Are the enabling and disabling of these two states independent?

Hi Peter,

Yes the enable/disable for idle states are independent. Atleast 1 idle state is expected to be enabled, if not the CPU may busy loop at idle and not reduce the thread priority like snooze.

You can disable snooze and have nap enabled or the other way, but having both disabled will lead to idle threads burning more cycles.

--Vaidy

== Comment: #36 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-05-13 14:58:07 ==
(In reply to comment #34)

Hi George,

The idle state management code is same for both the kernels. You have only snooze and nap as idle states right?

As I explained over email, when snooze and nap are enabled, the cpuidle logic should choose nap for idle sibling threads after a short period in snooze.

Can you guys analyse and confirm that following points:

* Workloads is run on primary thread on each core always
* Remaining 7 sibling threads should be in nap (state1)
* Time spend in 'nap' state for each of the sibling threads can be obtained from sysfs
/sys/devices/system/cpu/cpuN/cpuidle/state1/time (unit is micro secs)
* Workload variation is related to nap residency of sibling threads on that core

If the nap residency (time spent in nap) is not uniform then workload performance would be proportionally non uniform.

The above statement (if proven) is one possible root-cause, that can help us move forward and design a fix.

--Vaidy

== Comment: #37 - Peter W. Wong <email address hidden> - 2015-05-13 17:45:33 ==
Hi Vaidy,

Let's use Bob's serial_loop.c as an example. There are 24 copies of his program running on 24 cores in parallel. Only the primary threads of the cores are used.

Did Shilpa use Bob's program to re-create the problem and find out that some unused sibling threads do not sleep fast enough and take away cycles from the primary thread to cause variability?

It is great to know that we can study the sleep time by examining the /sys/devices/system/cpu/cpuN/cpuidle/state1/time. Did Shilpa use this method to come up with the above understanding?

Based on George's finding, do you know whether there are thermal code changes in the old firmware that affects the thermal behavior in the current version?

Thanks,
Peter

== Comment: #38 - Preeti U. Murthy <email address hidden> - 2015-05-13 23:24:18 ==
Is this really related to snooze ? Jennifer mentioned in Comment 10 that disabling nap and not snooze also reduced the variance ? Can you please confirm if this is the case ? This will help us narrow down on the issue.

Regards
Preeti U Murthy

== Comment: #39 - JENIFER HOPPER <email address hidden> - 2015-05-14 10:19:09 ==
(In reply to comment #38)
Hi Preeti, sorry I corrected myself in comment 11, I was disabling state0 which is snooze, not nap:
# cpupower idle-set -d 0
# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
snooze

Still might be interesting to try some tests w/ nap disabled.

== Comment: #40 - Shilpasri G. Bhat <email address hidden> - 2015-05-14 11:15:45 ==
(In reply to comment #37)
Yes . I also used perf-trace events to get the same info.

Regards,
Shilpa

== Comment: #42 - Anton Blanchard <email address hidden> - 2015-05-19 19:40:45 ==
If I am reading that trace right, we spent over 200ms in snooze on a secondary thread of a badly performing core. That is an enormous amount of time to be chewing up the core.

== Comment: #43 - Peter W. Wong <email address hidden> - 2015-05-19 21:45:20 ==
Vaidy,

Could you provide more information on your proposed solution which is in the kernel, not in OPAL?

Does it mean that you need to upstream different patches to set of kernels for Ubuntu and other distro?

Peter

== Comment: #44 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-05-20 10:56:48 ==
(In reply to comment #42)
Hi Anton,

That is right, exit from snooze state is the problem. In the proposed fix Shilpa has added a forced exit from snooze loop after the target residency so that the cpuidle governor can select nap.

We have to rewrite the snooze loop and exit after the first interrupt or timer or after after target residency (100us) so that the idle state promotion can happen.

--Vaidy

== Comment: #45 - Shilpasri G. Bhat <email address hidden> - 2015-05-20 11:02:06 ==
 Hi,

I am sharing the link for ubuntu kernel packages with the fix:

1) http://kernel.stglabs.ibm.com/~shilpa/ubuntu-14-04.tar
    This file contains the following packages:
    a)linux-headers-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
    b)linux-image-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
    c)linux-image-extra-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
    d)linux-tools-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
    The fix is based on top of ubuntu-14.-04.02 3.16.0-38-generic + upstream commit (92c83ff5b42b cpuidle: powernv: Read target_residency value of idle states from DT if available)

2) http://kernel.stglabs.ibm.com/~shilpa/ubuntu-15.04.tar
    This file contains the following packages:
    linux-headers-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
    linux-image-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
    linux-image-extra-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
    linux-tools-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
    The fix is based on top of ubuntu-15.04 3.19.0-17-generic

== Comment: #46 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-05-20 11:21:07 ==
(In reply to comment #43)

Hi Peter,

Sure. As per our discussion yesterday, we agreed on the following:

* The issue is not machine specific, the problem was recreated by Jenifer on S822L also even though other teams believe the issue is S824L specific.

* The key issue observed is the sibling thread's snooze time variation which chews cycles from primary thread.

* The fix is to force exit snooze loop after target residency (100us) and allow the cpuidle governor to enter nap.

* This fix is completely in Linux kernel cpuidle driver code and does not require change in OPAL.

Yes, once we verify the solution, we will design the correct idle state auto-promotion logic in cpuidle driver and get it upstream and then push to the other distro and ubuntu distros that run bare-metal.

--Vaidy

== Comment: #47 - JENIFER HOPPER <email address hidden> - 2015-05-20 12:44:17 ==
I tested Shilpa's kernel packages w/ the fix and can confirm I no longer see the variation issue w/ the serial loop program running on primary threads in SMT8 mode when the performance governor is set. I will get with Peter to test with another benchmark that previously hit the variation issue.

----

System:
8247-42L
20 cores, SMT8
FW830_041
Ubuntu 15.04

Run script:
#!/bin/bash

for iter in `seq 1 100`
do
  for cpu in 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152
  do
  taskset -c ${cpu} ./serial_loop > out.${cpu}.${iter} &
  done
  echo $iter
  wait
done

Results:

-- 3.19.0-17 fix --

Performance
-----------
Loop elapsed: User time:
Min Max Min Max
3.885 3.92 3.877 3.914
3.885 3.892 3.877 3.886
3.885 3.908 3.877 3.901

Ondemand
--------
Loop elapsed: User time:
Min Max Min Max
3.933 3.949 3.901 3.912

-- orig 3.19.0-16 kernel --

Performance
-----------
Loop elapsed: User time:
Min Max Min Max
3.886 4.507 3.88 4.498
3.884 10.404 3.877 10.39

Ondemand
--------
Loop elapsed: User time:
Min Max Min Max
3.932 3.994 3.901 3.959

== Comment: #49 - JENIFER HOPPER <email address hidden> - 2015-05-21 18:59:33 ==
The fix from comment #45 also resolves large variance issues w/ STREAM and DGEMM workloads. Results listed below.

=========================================
STREAM:

MB/sec
SMT8, 1 thread per core, 100 loop

-------- orig 3.19.0-16 kernel --------

Performance:
____________
 Min Max %diff
run1: 304384.6341 308199.3341 1.25%
run2: 150096.0562 308516.5557 69.09%

Performance
+ disable snooze:
_________________
 Min Max %diff
run1: 305700.3257 308403.9185 0.88%
run2: 305547.2215 308771.2772 1.05%

Ondemand:
_________
 Min Max %diff
run1: 298386.1295 302209.7456 1.27%

----------- 3.19.0-17 fix -----------

Performance:
____________
 Min Max %diff
run1: 303486.8368 308433.0545 1.62%
run2: 304768.6159 308410.2177 1.19%
run3: 304723.2556 308847.065 1.34%

Ondemand:
_________
 Min Max %diff
run1: 297364.385 302473.0888 1.70%

=========================================

=========================================
DGEMM:

GFlops
SMT8, 1 thread per core, 20 loop

-------- orig 3.19.0-16 kernel --------

Performance:
____________
 Min Max %diff
run1: 479.53 520.2 8.14%

Performance
+ disable snooze:
_________________
 Min Max %diff
run1: 511.18 520.49 1.80%

Ondemand:
_________
 Min Max %diff
run1: 505.64 509.88 0.84%

----------- 3.19.0-17 fix -----------

Performance:
____________
 Min Max %diff
run1: 512.77 520.84 1.56%
run2: 517.19 520.34 0.61%
run3: 517.93 520.35 0.47%

Ondemand:
_________
 Min Max %diff
run1: 505.72 508.53 0.55%

== Comment: #51 - Peter W. Wong <email address hidden> - 2015-06-14 22:53:05 ==
Vaidy, is this fix being reviewed by the Linux kernel community? Can you give some estimates as to when this kernel fix will get into mainline and also when it will get into Ubuntu distro?

== Comment: #52 - Shilpasri G. Bhat <email address hidden> - 2015-06-24 07:18:28 ==
The patch can be found in the upstream kernel 4.2
78eaa10f027c cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state

Revision history for this message
bugproxy (bugproxy) wrote : ftrace of badly performing run

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-124023 severity-high targetmilestone-inin14043
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1470404/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
Chris J Arges (arges)
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu Utopic):
status: New → In Progress
Changed in linux (Ubuntu Vivid):
status: New → In Progress
Changed in linux (Ubuntu Utopic):
importance: Undecided → Medium
Changed in linux (Ubuntu Vivid):
importance: Undecided → Medium
Changed in linux (Ubuntu Utopic):
assignee: nobody → Chris J Arges (arges)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Chris J Arges (arges)
description: updated
Revision history for this message
Chris J Arges (arges) wrote :

Hi,
Is 78eaa10f027cf69f9bd409e64eaff902172b2327 alone sufficient to solve this issue, or are any other patches required? Thanks,
--chris

Revision history for this message
Chris J Arges (arges) wrote :

The last question was for both 3.16 and 3.19.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-07-02 06:52 EDT-------
(In reply to comment #57)
> Hi,
> Is 78eaa10f027cf69f9bd409e64eaff902172b2327 alone sufficient to solve this
> issue, or are any other patches required? Thanks,
> --chris
>
> The last question was for both 3.16 and 3.19.
Hi,

For 3.19: Yes 78eaa10f027cf69f9bd409e64eaff902172b2327 is alone sufficient.

For 3.16:
The following commits are required to properly read the target residency values of idle states from device tree.
3.16..3.19 drivers/cpuidle/cpuidle-powernv.c

a)3488cb1262f636cbbbfde90a33ed65f8d314bf9c cpuidle: powernv: Avoid endianness conversions while parsing DT
b)fcb96cf761458981368cc93110ed1133999d0435 cpuidle: powernv: Read target_residency value of idle states from DT if available
c)74aa51b5ccd3975392e30d11820dc073c5f2cd32 cpuidle: powernv: Populate cpuidle state details by querying the device-tree

Thanks and Regards,
Shilpa

Chris J Arges (arges)
description: updated
Revision history for this message
Chris J Arges (arges) wrote :

Sent SRU to Ubuntu kernel team ML for Utopic/Vivid.

description: updated
Brad Figg (brad-figg)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
penalvch (penalvch)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
tags: added: verification-needed-vivid
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-07-10 17:12 EDT-------
Hi,

Tested on 3.19.0-23-generic and 3.16.0-44-generic and found no variation with performance governor.

Thanks and Regards,
Shilpa

Chris J Arges (arges)
tags: added: verification-done-utopic verification-done-vivid
removed: verification-needed-utopic verification-needed-vivid
Chris J Arges (arges)
Changed in linux (Ubuntu):
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : ftrace of badly performing run

Default Comment by Bridge

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.6 KiB)

This bug was fixed in the package linux - 3.19.0-23.24

---------------
linux (3.19.0-23.24) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1472346

  [ Chris J Arges ]

  * SAUCE: Don't use atomic read in evlist.c
    - LP: #1410673

linux (3.19.0-23.23) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1472048

  [ Chris J Arges ]

  * [Config] Add CRYPTO_DEV_NX_*, 842_* as modules
    - LP: #1454687

  [ Lu, Han ]

  * SAUCE: i915_bpo: drm/i915/audio: add codec wakeup override
    enabled/disable callback
    - LP: #1460674

  [ Timo Aaltonen ]

  * SAUCE: Backport I915_OVERLAY_DISABLE_DEST_COLORKEY
    - LP: #1460674
  * SAUCE: i915_bpo: Rebase to drm-intel-next-fixes-2015-05-29
    - LP: #1460674
  * SAUCE: i915_bpo: Revert "drm/i915: Implement the intel_dp_autotest_edid
    function for DP EDID complaince tests"
    - LP: #1460674
  * SAUCE: i915_bpo: Revert "drm/i915: Add debugfs test control files for
    Displayport compliance testing"
    - LP: #1460674
  * SAUCE: Load i915_bpo from the hda driver on SKL/CHV
    - LP: #1460674
  * SAUCE: i915_bpo: Don't try to support BXT
    - LP: #1460674
  * SAUCE: i915_bpo: drm/i915/skl: Fix DMC API version.

  [ Upstream Kernel Changes ]

  * Revert "usb: dwc2: add bus suspend/resume for dwc2"
    - LP: #1471252
  * Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820"
    - LP: #1471252
  * Revert "KVM: x86: drop fpu_activate hook"
    - LP: #1471252
  * Revert "libceph: clear r_req_lru_item in __unregister_linger_request()"
    - LP: #1471252
  * drm/i915: add component support
    - LP: #1460661
  * ALSA: hda: export struct hda_intel
    - LP: #1460661
  * ALSA: hda: pass intel_hda to all i915 interface functions
    - LP: #1460661
  * ALSA: hda: add component support
    - LP: #1460661
  * drm/atomic-helpers: Fix documentation typos and wrong copy&paste
    - LP: #1460674
  * drm/atomic: Rename drm_atomic_helper_commit_pre_planes() state argument
    - LP: #1460674
  * drm/atomic-helper: Rename commmit_post/pre_planes
    - LP: #1460674
  * drm/atomic-helpers: make mode_set hooks optional
    - LP: #1460674
  * drm/atomic-helper: Fix kerneldoc for prepare_planes
    - LP: #1460674
  * drm: Complete moving rotation property to core
    - LP: #1460674
  * drm: Share plane pixel format check code between legacy and atomic
    - LP: #1460674
  * drm/atomic: Constify a bunch of functions pointer structs
    - LP: #1460674
  * drm: Fix some typo mistake of the annotations
    - LP: #1460674
  * drm: change connector to tmp_connector
    - LP: #1460674
  * drm: atomic: Expose CRTC active property
    - LP: #1460674
  * drm: atomic: Allow setting CRTC active property
    - LP: #1460674
  * drm/atomic-helpers: Properly avoid full modeset dance
    - LP: #1460674
  * drm/atomic: Add helpers for state-subclassing drivers
    - LP: #1460674
  * drm: Fix some typos
    - LP: #1460674
  * drm/atomic: Add for_each_{connector,crtc,plane}_in_state helper macros
    - LP: #1460674
  * drm/atomic-helper: Don't call atomic_update_plane when it stays off
    - LP: #1460674
  * drm/atomic-helper: Really recover pre-atomic plane/cursor behavior
 ...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.4 KiB)

This bug was fixed in the package linux - 3.16.0-44.59

---------------
linux (3.16.0-44.59) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1472030

  [ Iyappan Subramanian ]

  * SAUCE: (no-up) drivers: net: xgene: fix: Out of order descriptor bytes
    read
    - LP: #1425576

  [ Upstream Kernel Changes ]

  * Revert "tools/vm: fix page-flags build"
    - LP: #1471170
  * NVMe: Add shutdown timeout as module parameter.
    - LP: #1465136
  * Drivers: hv: vmbus: Add support for VMBus panic notifier handler
    - LP: #1463584
  * Drivers: hv: vmbus: Correcting truncation error for constant
    HV_CRASH_CTL_CRASH_NOTIFY
    - LP: #1463584
  * KVM: nVMX: fix lifetime issues for vmcs02
    - LP: #1448269
  * KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
    - LP: #1448269
  * mm/slab_common: support the slub_debug boot option on specific object
    size
    - LP: #1456952
  * kvm: x86: fix kvm_apic_has_events to check for NULL pointer
  * cpuidle: powernv: Populate cpuidle state details by querying the
    device-tree
    - LP: #1470404
  * cpuidle: powernv: Read target_residency value of idle states from DT if
    available
    - LP: #1470404
  * cpuidle: powernv: Avoid endianness conversions while parsing DT
    - LP: #1470404
  * cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
    - LP: #1470404
  * iio: adis16400: Report pressure channel scale
    - LP: #1471170
  * iio: adis16400: Use != channel indices for the two voltage channels
    - LP: #1471170
  * iio: adis16400: Compute the scan mask from channel indices
    - LP: #1471170
  * iio: adis16400: Remove unused variable
    - LP: #1471170
  * iio: adis16400: Fix burst mode
    - LP: #1471170
  * iio: adis16400: Fix burst transfer for adis16448
    - LP: #1471170
  * USB: serial: ftdi_sio: Add support for a Motion Tracker Development
    Board
    - LP: #1471170
  * iio: adc: twl6030-gpadc: Fix modalias
    - LP: #1471170
  * serial: imx: Fix DMA handling for IDLE condition aborts
    - LP: #1471170
  * usb: dwc3: gadget: Fix incorrect DEPCMD and DGCMD status macros
    - LP: #1471170
  * ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion
    - LP: #1471170
  * n_tty: Fix auditing support for cannonical mode
    - LP: #1471170
  * drm/i915/hsw: Fix workaround for server AUX channel clock divisor
    - LP: #1471170
  * x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
    - LP: #1471170
  * lib: Fix strnlen_user() to not touch memory after specified maximum
    - LP: #1471170
  * Input: elantech - fix detection of touchpads where the revision matches
    a known rate
    - LP: #1471170
  * ALSA: hda/realtek - Add a fixup for another Acer Aspire 9420
    - LP: #1471170
  * ALSA: usb-audio: add MAYA44 USB+ mixer control names
    - LP: #1471170
  * ALSA: usb-audio: fix missing input volume controls in MAYA44 USB(+)
    - LP: #1471170
  * USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle
    - LP: #1471170
  * Input: elantech - add new icbody type
    - LP: #1471170
  * MIPS: Fix enabling of DEBUG_STACKOVERFLOW
    - LP: #1471170
  * xfrm: fix a race in xfrm_state_lookup_byspi
    ...

Read more...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.