Comment 482 for bug 1690085

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

I have Ryzen 7 1800X on Asus Prime X370-Pro. I upgraded the BIOS to v4011(Update AGESA 1.0.0.2a + SMU 43.18) and:

  1) turned on the "Typical Current Idle" option.

  2) stopped using zenstates.py -- which I had been using to enable "C6 Core"
     but disable "C6 Package" (to no avail).

  3) did *not* change Linux -- which was 4.16.5 -- Fedora 27.

  4) continued to use CONFIG_RCU_NOCB_CPU and rcu_nocbs=0-15

After 67 days uptime (leaving the system completely idle and changing nothing), I became convinced that the "Typical Current Idle" option has dealt with the "freezing when idle" problem.

When I say "freezing when idle", what I mean is: if the machine is left idle (typically over night) it simply stops responding. Nothing at all is logged -- no application, driver or kernel errors or warnings are logged -- the machine is still powered up, but frozen solid. The only way to restart the machine is to power down and up again.

Reviewing this thread, it seems to be mostly concerned with the "freezing while idle" issue.

The symptoms of the original "Random Soft Lockup" include log messages of the form:

     NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s!

is that related to "freezing while idle", or is it a separate issue ?

I get the impression that CONFIG_RCU_NOCB_CPU and rcu_nocbs=0-15 may be related to the "Random Soft Lockup"... but not to "freezing while idle" ???

It seems that other crashes/lockups are trying to attach themselves to this thread.

I note that this bug is asigned to <email address hidden>. This bug is very nearly 1 year old. Is this a good moment for the assignee to address this thread and say:

  * what, if any, Kernel issues have been identified

  * what, if any, Kernel fixes have been applied

related to this thread.

If the root cause of (some or all of) the issues in this thread is fixed or worked around by the "Typical Current Idle" BIOS option, does the assignee think that this "bug" can now be closed, or are there actual Kernel issues that remain, waiting to be fixed ?

Is it significant that W*nd*rs does not seem to suffer ?