Comment 432 for bug 1690085

Revision history for this message
In , devurandom (devurandom-linux-kernel-bugs) wrote :

(In reply to Dennis Schridde from comment #295)

As an update from my earlier post, here is what I wrote to AMD support recently:

I can confirm the effectiveness of the "Power Supply Idle Control = Typical Current Idle" non-default firmware-option introduced with AGESA 1.0.0.2a only partially. After setting this option it often still takes several attempts until the machine boots -- during the failing attempts I either get 3 long beeps from the mainboard followed by an automatic reboot, or I see varying Linux kernel stack-traces during the early boot, seemingly related to firmware / EFI and CPU / idle. Sometimes the system can be restarted from this state using soft-reboot (ctrl+alt+del). But sometimes the situation requires a hard-reset, e.g. because the system completely froze (similar to the original problem reported here), or because the init process dies. I have the feeling that in such situations the system, even if it does not freeze completely, works with corrupt data and then writes this to the hard disk [3,4]. I destroyed my installation several times in the last month due to these problems, e.g. because the system had destroyed all (?!) superblocks of the file system, or lately the LVM cache. Surprisingly, once the system has booted properly, it will run stably for days without any further issues.

I could not reproduce the problem synthentically using either PassMark memtest86 [5], TU Dresden Firestarter [6] or Google Stressful Application Test [7], the latter including CPU, RAM, hard drive and file-system tests. My hardware supplier also tested all components (CPU, RAM, mainboard) again using Windows with Prime95 and Furmark, as well as with Memtest, and assured me that they could not detect an issue either. During regular use I can reliably reproduce it on Gentoo (with Linux 4.16), Fedora 27 (Linux 4.14), Fedora 28 Beta (Linux 4.15), Fedora 28 (Linux 4.16) and Arch 2018.03 (Linux 4.15). Before the introduction of the "Power Supply Idle Control = Typical Current Idle" firmware-option, I first noticed the issue while compiling large amounts of software on Gentoo, but was later able to reliably reproduce the freeze with simple `rsync` operations on all other operating systems, too. This even happened when no X-server is running and the GPU is not utilised in any other way (as far as I know), so I do not see a connection to possibly incomplete Vega-GPU-support in Linux. After setting "Power Supply Idle Control = Typical Current Idle" these freezes stopped happening and I only see the problem during early boot.

[3]: https://www.redhat.com/archives/linux-lvm/2018-May/msg00006.html
[4]: https://bugzilla.redhat.com/show_bug.cgi?id=1585670
[5]: https://www.memtest86.com/
[6]: https://tu-dresden.de/zih/forschung/projekte/firestarter
[7]: https://github.com/stressapptest/stressapptest