Comment 23 for bug 1917813

Revision history for this message
In , dsmythies (dsmythies-linux-kernel-bugs) wrote :

some other results for the quick test:

i5-9600k (Doug): FAIL. (Ubuntu 20.04; kernel any)
i5-6200U (Alin): FAIL. (Debian.)
i7-7700HQ (Gunnar): FAIL (Ubuntu 20.10)
i7-10610U (Russell) : FAIL. (CentOS (RedHat 8), 4.18.0-240.10.1.el8_3.x86_64 #1 SMP).
another Skylake(Rick) still waiting to hear back.

so 4 out of 4 so far (and I gave them no guidance at all, on purpose, as to any particular kernel to try).

I have been picking away at this thread (pun intended) for months, and I think it is finally starting to unravel. Somewhere above i said:

> For unknown reasons, HWP seems to incorrectly decide
> that the processor is idle and spins the PLL down to
> a very low frequency.

I now believe it to be something inside the processor, but maybe not part of HWP. I think that non-hwp processors or ones with it disabled, also misdiagnose that the entire processor is idle. My evidence is both not very thorough and not currently in a presentable form, but this issue only ever occurs some short time or immediately after every core has been idle, with at least one in idle state 2. The huge difference between HWP and OS driven pstates is that the OS knows the system wasn't actually idle and HWP doesn't. Even though package C1E is disabled it behaves, perhaps, similar to be it being enabled.

There is some small timing window where this really screws up. Mostly is works fine, and either the CPU frequency doesn't even ramp down at all, or it recovers quickly, within about 120 uSec.

And as far as I know, it exits the idle state O.K. but it takes an incredibly long time for HWP to ramp up the CPU frequency again. Meanwhile, any non-HWP approach doesn't drop the pstate request to minimum nor re-start any sluggish ramp up.

Now, this issue is rare and would be extremely difficult to diagnose appearing as occasional glitches, i.e. a frame rate drop in a game, dropped data, unbelievably long latency is any kind of performance is required. I consider this issue to be of the utmost importance.