Comment 346 for bug 1690085

Revision history for this message
In , robert (robert-linux-kernel-bugs) wrote :

Hello guys, (new here, but not to unix/linux)

I have at the moment a new Epyc-based server, and also a new Ryzen PC.
No soft lokups on Epyc, only a very evil error under heavy load I have been hunting down for a while (https://forums.fedoraforum.org/showthread.php?317537-first-server-error-reboot-what-is-this-UUID) until I decided to move away from FC27 (not certified on this server), and load Centos7. So far no error, but I have yet to test more load on it.

Specs, Epyc:
Kernel: 3.10.0-693.21.1.el7.x86_64 x86_64 bits: 64 gcc: 4.8.5
           Desktop: Openbox Distro: CentOS Linux release 7.4.1708 (Core)
Machine: Device: kvm System: Supermicro product: AS -2023US-TR4 v: 0123456789 serial: <filter>
           Mobo: Supermicro model: H11DSU-iN v: 1.02A serial: <filter>
           UEFI [Legacy]: American Megatrends v: 1.1 date: 02/07/2018
CPU(s): 2 16 core AMD EPYC 7351s (-MCP-SMP-) arch: Zen rev.2 cache: 16384 KB

Ryzen:
Kernel: 4.15.10-300.fc27.x86_64 x86_64 bits: 64 gcc: 7.3.1 Console: tty 0
           Distro: Fedora release 27 (Twenty Seven)
Machine: Device: desktop Mobo: ASUSTeK model: PRIME B350M-A v: Rev X.0x serial: <filter>
           UEFI [Legacy]: American Megatrends v: 3402 date: 12/11/2017
CPU: 6 core AMD Ryzen 5 1600X Six-Core (-MT-MCP-) arch: Zen rev.1 cache: 3072 KB

I can say that I have seen the soft lockup on the Ryzen several times in the past 3 months, maybe once / week or so. (but then again the machine is always doing stuff in my home, i.e. downloading backups from servers)

The boot options on both are now:
Epyc: GRUB_CMDLINE_LINUX="rhgb selinux=0 rcu_nocbs=0-63"
Ryzen: GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet selinux=0 nmi_watchdog=0 nohpet pci=biosirq rcu_nocbs=0-11"

I have not been home yet to upgrade the BIOS on the B350 mobo, but I will be there in 7 days, do it, and post my findings. In the meantime, I will leave the Ryzen completely idle, waiting for a crash.
I have not (ever) touched the C6 issue, alhough I do remember setting my Mobo BIOS on "Standard" (not Performance, and not Power Saving).