AMD

Prevent timer value 0 for MWAITX

Bug #1729442 reported by Kim Naru
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
AMD
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Xenial
Won't Fix
Medium
Joseph Salisbury
Zesty
Won't Fix
Medium
Joseph Salisbury

Bug Description

Newer hardware has uncovered a bug in the software implementation of using MWAITX for the delay function. A value of 0 for the timer is meant to indicate that a timeout will not be used to exit MWAITX. On newer hardware this can result in MWAITX never returning, resulting in NMI soft lockup messages being printed. On older hardware, some of the other conditions under which MWAITX can exit masked this issue.

The AMD APM does not currently document this and will be updated. Please refer to http://marc.info/?l=kvm&m=148950623231140 for information regarding NMI soft lockup messages on an AMD Ryzen 1800X. This has been root-caused as a 0 passed to MWAITX causing it to wait indefinitely. This change has the added benefit of avoiding the unnecessary setup of MONITORX/MWAITX when the delay value is zero.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=88d879d29f9cc0de2d930b584285638cdada6625

Tags: xenial zesty
information type: Proprietary → Public
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1729442

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Incomplete → In Progress
tags: added: artful xenial zesty
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: New → In Progress
Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
Changed in linux (Ubuntu Artful):
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
assignee: nobody → Joseph Salisbury (jsalisbury)
no longer affects: linux (Ubuntu Artful)
tags: removed: artful
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built Xenial and Zesty test kernels, both with a pick of commit 88d879d29f9cc0d. The test kernels can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1729442/xenial
http://kernel.ubuntu.com/~jsalisbury/lp1729442/zesty

Can you test this kernels and see if they resolve this bug?

Revision history for this message
Kim Naru (kim-naru) wrote : RE: [Bug 1729442] Re: Prevent timer value 0 for MWAITX

Joe,
I will take a look.

-kim

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Joseph Salisbury
Sent: Thursday, November 02, 2017 1:56 PM
To: Naru, Kim <email address hidden>
Subject: [Bug 1729442] Re: Prevent timer value 0 for MWAITX

I built Xenial and Zesty test kernels, both with a pick of commit 88d879d29f9cc0d. The test kernels can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1729442/xenial
http://kernel.ubuntu.com/~jsalisbury/lp1729442/zesty

Can you test this kernels and see if they resolve this bug?

--
You received this bug notification because you are a member of AMD Team, which is subscribed to amd.
https://bugs.launchpad.net/bugs/1729442

Title:
  Prevent timer value 0 for MWAITX

Status in amd:
  New
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  Newer hardware has uncovered a bug in the software implementation of
  using MWAITX for the delay function. A value of 0 for the timer is
  meant to indicate that a timeout will not be used to exit MWAITX. On
  newer hardware this can result in MWAITX never returning, resulting in
  NMI soft lockup messages being printed. On older hardware, some of the
  other conditions under which MWAITX can exit masked this issue.

  The AMD APM does not currently document this and will be updated.
  Please refer to http://marc.info/?l=kvm&m=148950623231140 for
  information regarding NMI soft lockup messages on an AMD Ryzen 1800X.
  This has been root-caused as a 0 passed to MWAITX causing it to wait
  indefinitely. This change has the added benefit of avoiding the
  unnecessary setup of MONITORX/MWAITX when the delay value is zero.

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=88d879d29f9cc0de2d930b584285638cdada6625

To manage notifications about this bug go to:
https://bugs.launchpad.net/amd/+bug/1729442/+subscriptions

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Changed in linux (Ubuntu Xenial):
status: In Progress → Incomplete
Changed in linux (Ubuntu Zesty):
status: In Progress → Incomplete
Timo Aaltonen (tjaalton)
Changed in linux (Ubuntu Zesty):
status: Incomplete → Won't Fix
Changed in linux (Ubuntu Xenial):
status: Incomplete → Won't Fix
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in amd:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.