Bug #1917813 “HWP and C1E are incompatible - Intel processors” : Bugs : linux package : Ubuntu

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-17:

#1

Download full text (4.5 KiB)

Created attachment 294171
Graph of load sweep up and down at 347 Hertz.

Consider a steady state periodic single threaded workflow, with a work/sleep frequency of 347 Hertz and a load somewhere in the ~75% range at the steady state operating point.
For the intel-cpufreq CPU frequency scaling driver and powersave governor and hwp disabled, it goes indefinitely without any issues.
For the acpi-cpufreq CPU frequency scaling driver and ondemand governor, it goes indefinitely without any issues.
For the intel-cpufreq CPU frequency scaling driver and powersave governor and hwp enabled, it suffers from overruns.

Why?

For unknown reasons, HWP seems to incorrectly decide that the processor is idle and spins the PLL down to a very low frequency. Upon exit from the sleep portion of the periodic workflow it takes a very long time (on the order of 20 milliseconds (supporting data for that statement will added in a later posting)), resulting in the periodic job no being able to complete its work before the next interval, whereas it normally has plenty of time to do its work. Actually, typical worst case overruns are around 12 milliseconds, or several work/sleep periods (i.e. it takes a very long time to catch up.)

The probability of this occurring is about 3%, but varies significantly. Obviously, the recovery time is also a function of EPP, but mostly this work has been done with the default EPP of 128. I believe this to be a sampling and anti-aliasing issue, but can not prove it because HWP is black box. My best GUESS is:

If the periodic load is busy on a jiffy boundary, such that the tick is on.
Then if it is sleeping at the next jiffy boundary, with a pending wake such that idle state 2 was used.
Then if the rest of the system was idle such that HWP decides to spin down the PLL.
Then it is highly probable that upon that idle state 2 exit, the PLL is too slow to ramp up and the task will overrun as a result.
Else everything will be fine.

For a 1000 Hz kernel the above suggests that a work/sleep frequency of 500 Hz should behave in a binary way, either lots of overruns or none. It does.
For a 1000 Hz kernel the above suggests that a work/sleep frequency of 333.333 Hz should behave in a binary way, either lots of overruns or none. It does.
Note: in all cases the sleep time has to be within the window of opportunity.

Now, actually I can not prove if the idle state 2 part is a cause or consequence, but it never happens with it disabled, but at the cost of significant power.

Another way this issue would manifest itself is as seeming to be an extraordinary idle exit latency, but would be rather difficult to isolate as the cause.

processors tested:
Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz (mine)
Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz (not mine)

HWP has been around for years, why am I just reporting this now?

I never owned an HWP capable processor before. My older i7-2600K based test computer was getting a little old, so I built a new test computer. I noticed this issue the same day I first enabled HWP. That was months ago (notice the dates on the graphs that will eventually be added to this), and I tried, repeatedly, to get help from Intel via...

see also:

https://marc.info/?l=linux-pm&m=159354421400342&w=2

and on that old thread, I just added a link to this.

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-19:

#12

> A 250 hertz kernel was tested, and it did not have this
> issue in this area. Perhaps elsewhere, I didn't look.

Correction: same thing for 250 Hertz kernel.

Some summary data for the periodic workflow manifestation of the issue. 347 hertz work/sleep frequency, fixed packet of work to do per cycle, 5 minutes, kernel 5.10, both 1000 Hz and 250 Hz, teo and menu idle governors, idle state 2 enabled and disabled.

1000 Hz, teo, idle state 2 enabled:
overruns 28399
maximum catch up 13334 uSec
Ave. work percent: 76.767
Power: ~14.5 watts

1000 Hz, menu, idle state 2 enabled:
overruns 835
maximum catch up 10934 uSec
Ave. work percent: 68.106
Power: ~16.3 watts

1000 Hz, teo, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 67.453
Power: ~16.8 watts (+2.3 watts)

1000 Hz, menu, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 67.849
Power: ~16.4 watts (and yes the 0.1 diff is relevant)

250 Hz, teo, idle state 2 enabled:
overruns 193
maximum catch up 10768 uSec
Ave. work percent: 68.618
Power: ~16.1 watts

250 Hz, menu, idle state 2 enabled:
overruns 22
maximum catch up 10818 uSec
Ave. work percent: 68.607
Power: ~16.1 watts

250 Hz, teo, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 68.550
Power: ~16.1 watts

250 Hz, menu, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 68.586
Power: ~16.1 watts

So, the reason I missed the 250 hertz kernel in my earlier work, was because the probability was so much less. The probability is less because the operating point is so different between the teo and menu governors and the 1000 and 250 Hz kernels. i.e. there is much more spin down margin for the menu case.

The operating point difference between difference between the 250 Hz and 1000 Hz kernels for teo is worth a deeper look.

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-20:

#13

additionally, and for all other things being equal, the use of idle state 2 is dramatically different between the 1000 (0.66%) and 250 (0.03%) Hertz kernels, resulting in differing probabilities of hitting the timing window while in idle state 2.

HWP does not work correctly in these scenarios.

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-21:

#14

Created attachment 294275
Graph of load sweep at 200 Hertz for various idle states

> Now, actually I can not prove if the idle state 2 part
> is a cause or consequence, but it never happens with it
> disabled, but at the cost of significant power.

idle state 2, combined with the timing window, which is much much larger than previously known, is the cause.

The CPU load is increased to max, then decreased. As a side note, there is a staggering amount of hysteresis and very long time constants involved here.

If one just sits and watches turbostat with the system supposedly in steady state operation, HWP can be observed very gradually (10s of seconds) deciding that it can reduce the CPU frequency, thus saving power. Then it has one of these false frequency drops, HWP struggles to catch up, raising the CPU frequency as it does so, and the cycle repeats.

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-29:

#15

Created attachment 294399
step function system response - overview

1514 step function tests were done.
the system response was monitored each time.
For 93% of the tests, the system response was as expected.
(do not confuse "as expected" with "ideal" or "best".)
For 7% of the tests the system response was not as expected, being much much too slow and taking way too long thereafter to completely come up to speed.

Note: The y-axis of these graphs is now "gap-time" instead of CPU frequency. This was not done to confuse the reader, but the reverse frequency calculation was not done on purpose. It is preferable to observe the data in units of time, without introducing frequency errors due to ISR and other latency gaps. Approximate CPU frequency conversions have been added.

While I will post about 5 graphs for this experiment, I have hundreds and have done many different EPPs and on and on ...

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-29:

#16

Created attachment 294401
step function system response - detail A

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-29:

#17

Created attachment 294403
step function system response - detail B

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-29:

#18

Created attachment 294405
step function system response - detail B-1

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2020-12-29:

#19

Created attachment 294407
step function system response - detail B-2

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2021-01-02:

#20

Created attachment 294469
step function system response - idle state 2 disabled

1552 test runs with idle state 2 disabled, no failures.

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2021-01-16:

#21

Created attachment 294685
a set of tools for an automated test

At this point, I have provided 3 different methods that reveal the same HWP issue. Herein, tools are provided to perform an automated quick test to answer the question "does my processor have this HWP issue?"

The motivation for this automation is to make it easier to test other HWP capable Intel processors. Until now the other methods for manifesting the issue have required "tweeking", and have probabilities of occurrence even lower than 0.01%, requiring unbearably long testing times (many hours) in order to acquire enough data to be statistically valid. Typically, this test provides PASS/FAIL results in about 5 minutes.

The test changes idle state enabled/disabled status, requiring root rights to do so. The scale for the fixed workpacket periodic workflow is both arbitrary and different between processors. The test runs in two steps: The first finds the operating point for the test (i.e. it does the "tweeking" automatically); The second does the actual tests one without idle state 2 and one with only idle state 2 (recall that the issue is linked with the use of idle state 2). Forcing idle state 2 greatly increases the probability of the issue occurring. While this test has been created specifically for the intel_pstate CPU frequency scaling driver with HWP enabled and the powersave governor, it doesn't check. Therefore one way to test the test is to try it with HWP disabled.

Note: the subject test computer must be able to run one CPU at 100% without needing to throttle (power or thermal or any other reason), including with only idle state 2 enabled.

Results so far: 3 of 3 processors FAIL; i5-9600k; i5-6200U; i7-10610U.

use this command:

./job-control-periodic 347 6 6 900 10

Legend:
347 hertz work/sleep frequency
6 seconds per iteration run.
6 seconds per test run.
try for approximately 900 uSec average sleep time.
10 test loops at that 6 seconds per test.

the test will take about 5 minutes.

Created attachment 294685
a set of tools for an automated test

At this point, I have provided 3 different methods that reveal the same HWP issue. Herein, tools are provided to perform an automated quick test to answer the question "does my processor have this HWP issue?"

The motivation for this automation is to make it easier to test other HWP capable Intel processors. Until now the other methods for manifesting the issue have required "tweeking", and have probabilities of occurrence even lower than 0.01%, requiring unbearably long testing times (many hours) in order to acquire enough data to be statistically valid. Typically, this test provides PASS/FAIL results in about 5 minutes.

The test changes idle state enabled/disabled status, requiring root rights to do so. The scale for the fixed workpacket periodic workflow is both arbitrary and different between processors. The test runs in two steps: The first finds the operating point for the test (i.e. it does the "tweeking" automatically); The second does the actual tests one without idle state 2 and one with only idle state 2 (recall that the issue is linked with the use of idle state 2). Forcing idle state 2 greatly increases the probability of the issue occurring. While this test has been created specifically for the intel_pstate CPU frequency scaling driver with HWP enabled and the powersave governor, it doesn't check. Therefore one way to test the test is to try it with HWP disabled.

Note: the subject test computer must be able to run one CPU at 100% without needing to throttle (power or thermal or any other reason), including with only idle state 2 enabled.

Results so far: 3 of 3 processors FAIL; i5-9600k; i5-6200U; i7-10610U.

use this command:

./job-control-periodic 347 6 6 900 10

Legend:
347 hertz work/sleep frequency
6 seconds per iteration run.
6 seconds per test run.
try for approximately 900 uSec average sleep time.
10 test loops at that 6 seconds per test.

the test will take about 5 minutes.

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2021-01-16:

#22

Created attachment 294687
an example run of the quick test tools

the example contains results for:
HWP disabled: PASS (as expected)
HWP enabled: FAIL (as expected)

but tests were also done with a 250 Hertz kernel, turbo disabled, EEO and RHO bits changed... all give FAIL for HWP enabled forcing idle state 2, and PASS for other conditions.

Revision history for this message

In Linux Kernel Bug Tracker #210741, dsmythies (dsmythies-linux-kernel-bugs) wrote on 2021-02-09:

#23

some other results for the quick test:

i5-9600k (Doug): FAIL. (Ubuntu 20.04; kernel any)
i5-6200U (Alin): FAIL. (Debian.)
i7-7700HQ (Gunnar): FAIL (Ubuntu 20.10)
i7-10610U (Russell) : FAIL. (CentOS (RedHat 8), 4.18.0-240.10.1.el8_3.x86_64 #1 SMP).
another Skylake(Rick) still waiting to hear back.

so 4 out of 4 so far (and I gave them no guidance at all, on purpose, as to any particular kernel to try).

I have been picking away at this thread (pun intended) for months, and I think it is finally starting to unravel. Somewhere above i said:

> For unknown reasons, HWP seems to incorrectly decide
> that the processor is idle and spins the PLL down to
> a very low frequency.

I now believe it to be something inside the processor, but maybe not part of HWP. I think that non-hwp processors or ones with it disabled, also misdiagnose that the entire processor is idle. My evidence is both not very thorough and not currently in a presentable form, but this issue only ever occurs some short time or immediately after every core has been idle, with at least one in idle state 2. The huge difference between HWP and OS driven pstates is that the OS knows the system wasn't actually idle and HWP doesn't. Even though package C1E is disabled it behaves, perhaps, similar to be it being enabled.

There is some small timing window where this really screws up. Mostly is works fine, and either the CPU frequency doesn't even ramp down at all, or it recovers quickly, within about 120 uSec.

And as far as I know, it exits the idle state O.K. but it takes an incredibly long time for HWP to ramp up the CPU frequency again. Meanwhile, any non-HWP approach doesn't drop the pstate request to minimum nor re-start any sluggish ramp up.

Now, this issue is rare and would be extremely difficult to diagnose appearing as occasional glitches, i.e. a frame rate drop in a game, dropped data, unbelievably long latency is any kind of performance is required. I consider this issue to be of the utmost importance.