Comment 138 for bug 1690085

Revision history for this message
In , malakudi (malakudi-linux-kernel-bugs) wrote :

I want to confirm this issue on my system (AMD Ryzen 1700X, segfault free chip after RMA) and offer some more info:

- First of all, let's define what CONFIG_RCU_NOCB_CPU does. It enables the support for rcuo kernel threads which handle the RCU callback processing. Option is automatically enabled if you select CONFIG_NO_HZ_FULL=y and this is the reason why it is automatically enabled in Fedora kernels. Fedora also chose to set CONFIG_RCU_NOCB_CPU_ALL=y in their 4.12 series kernels, while the default is CONFIG_RCU_NOCB_CPU_NONE=y which only enables RCU callback offloading to rcuo kernel threads for those CPUs that are defined in rcu_nocbs boot parameter. This is also the default for 4.13 kernels, and options CONFIG_RCU_NOCB_CPU_* have been dropped. So, in order to have the behaviour of CONFIG_RCU_NOCB_CPU_ALL=y in 4.13 kernel, you need to supply boot parameter rcu_nocbs=0-XX, where XX is your number of cpus-1. So, by setting CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y (or rcu_nocbs=0-XX), we are telling the kernel to offload RCU callbacks to seperate kernel threads called rcuo (rcu offload), one for each cpu. Affinity of those kernel threads is set to ffff, so they can run to any CPU.

- C6 is not effectively disabled by using rcuo kernel threads for RCU callback processing. When C6 is disabled, CPU voltage never drops under the voltage defined for P2 state (0.9V for my CPU). With C6 enabled and rcuo kernel threads enabled, my voltage drops to 0.4V frequently. Also, disabling C6 prevents single core to hit XFR turbo speeds. Max cpu frequency for my 1700X cpu is 3500 MHz when C6 is disabled. With C6 enabled, I can run single thread processes at 3900 MHz (XFR turbo speed of 1700X). Enabling rcuo kernel threads for RCU callback processing does not disable XFR turbo speeds. So, this feature does not effectively disables C6, this is not true.

- C6 can be disabled manually with zenstates.py script, even if your BIOS does not offer this option. It can be found at https://github.com/r4m0n/ZenStates-Linux

- In my case, idle freezes are prevented with either C6 disabled or with rcuo kernel threads for RCU callback processing enabled and C6 enabled. I had 14 days uptime with rcuo kernel threads enabled, while my system will usually freeze overnight without it.

- And finally, I have no idle freezes in Windows 10 with C6 enabled.

My opinion is that this idle freeze is a hardware issue of the AMD Zen processor and the rcuo kernel threads for RCU callback just hide or make less probable to trigger this hardware issue. Probably something similar happens in Windows 10 kernel.

I think AMD should be contacted.