Comment 62 for bug 1690085

Revision history for this message
Huygens (huygens-25) wrote :

We are experiencing the same behaviour and error messages with a Ryzen5 1600 CPU and an ASUS PRIME A320M-K motherboard.

What we can add is:

1. When no VMs are running, even if there are long idle periods, the machines can be stable for several days on Ubuntu 16.04.3 using the HWE Kernel 4.10.
2. When we run a single VM on the machine, but both the VM and the host are idle, then we get a crash. The machine is unresponsive, we see CPU soft lockups in the screen, no logs, etc. We tested it with Ubuntu 16.04.3 using the stable and edge HWE kernel (so 4.10 and 4.13), we also even tested with Ubuntu 17.10 and Kernel 4.13. And latest test was with Ubuntu 16.04.3 with custom built Kernel based on HWE edge (so 4.13) with the config CONFIG_RCU_NOCB_CPU, we also got a crash within the following night on one of the 2 machines (note that we did not have ASLR deactivated).

For our latest test we did:
- Ubuntu 16.04.3
- HWE Edge custom built kernel with CONFIG_RCU_NOCB_CPU
- ASLR deactivated
- boot parameters: rcu_nocbs=0-11 processor.max_cstate=1

We set this up last Friday, on Monday morning both machine were down, so it did not help.

PS: our motherboard had the latest "BIOS/UEFI" update which includes AMD AGESA 1.0.0.6B, we even upgraded to a newer firmware (which ASUS has since removed) with stated that it contains AMD AGESA 1.0.0.7A. This morning we saw again another firmware update from ASUS, we are going to apply it on top of our other changes, but if it does not work, we will try to return the machine (and that would be a pity, but stability is the number 1 feature more than the number of core or throughput).