random freeze of AMD Ryzen based workstations

Bug #1704718 reported by Tommy Giesler
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

I'm currently looking into stability issues of several AMD Ryzen based PCs in our company using Ubuntu 16.04 HWE and 17.04.

So far I was only able to find a single journal or log entry, which may be related to the problem:
----------------->%-----------------
Jul 17 06:03:24 workstation kernel: BUG: Bad rss-counter state mm:ffff94692f171f00 idx:0 val:18739
Jul 17 06:03:24 workstation kernel: BUG: Bad rss-counter state mm:ffff94692f171f00 idx:1 val:9897
Jul 17 06:03:24 workstation kernel: BUG: Bad rss-counter state mm:ffff94692f171f00 idx:3 val:229
Jul 17 06:03:24 workstation kernel: BUG: non-zero nr_ptes on freeing mm: 496
Jul 17 06:03:24 workstation kernel: BUG: non-zero nr_pmds on freeing mm: 27
-----------------%<-----------------

The issue appears to occur mostly after the system was running idle for a long time (e.g. several days during a weekend).

When you return to the frozen system it appears to show the last image, which was displayed before the freeze.

The attached files were taken from the same workstation where I was able to recover these log entries from.

All systems are using the latest BIOS version available at ASUS or MSI. All memory was chosen by QVL and has already been replaced several times.

Here the hardware configuration:
AMD Ryzen 7 1700X or AMD Ryzen 5 1600X
ASUS Prime B350M-A or MSI B350M PRO-VDH

Let me know if you need any additional information.

Revision history for this message
Tommy Giesler (guardion) wrote :
Revision history for this message
Tommy Giesler (guardion) wrote :
Revision history for this message
Tommy Giesler (guardion) wrote :
Revision history for this message
Tommy Giesler (guardion) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Tommy Giesler (guardion) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Tommy Giesler (guardion) wrote :

I installed Kernel 4.13-rc1 on 3 workstations so far and tonight one of them froze once again, this time leaving the attached Kernel panics in the journal.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

You need to file an upstream bug to escalate this issue to AMD developers.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.