Comment 690 for bug 1690085

Revision history for this message
In , itanium_de (itaniumde-linux-kernel-bugs) wrote :

(In reply to Liu Liu from comment #597)
> Some updates since I last posted. I've updated to gcc-8 and enabled
> idle=halt. Even though idle=halt + gcc-7 with the original reprod steps can
> still cause a lockup. By defaulting to gcc-8 and idle=halt, in day-to-day
> uses, I haven't encountered any system lockup in the past 2 months. I
> concluded that idle=halt should mitigate this problem for normal uses.

I have a Ryzen Threadripper 2990WX setup and experience the similar random lockups that you describe! Setting idle=halt however improved but did not fix the problem for me entirely. When do a "btrfs scrub" on my NVME SSD with 250 GB data I could get reproducible freezes even with idle=halt set. In average every second btrfs scrub over 250 GB data would cause a freeze so this was a very effective way to reproduce the error condition. (scrubbing twice through 250GB data takes only about 3 minutes on my system)

The "good" news is that since I changed the idle parameter to idle=poll (which obviously burns off electricity like crazy) I can now do many (20+) btrfs scrub runs in a row without provoking any lockups and so far the system runs stable.

Maybe someone here who also has a btrfs on a SSD drive can try if they can reproduce the freezes with it in the same way. The btrfs scrub may be the more "evil" workload compared to parallel kernel compilations with gcc-7/8.

A final thought is that this thread may report two independent issues. There is the "completely unstable system issue" that initially froze up during two out of three boot cycles that can be solved for me by seting idle=nomwait and disabling c6 states in bios.

And then there is this random freeze once-in-a-while issue that only is resolved so far by going to idle=poll.