Performing function level reset of AMD onboard USB and audio devices causes system lockup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
You-Sheng Yang | ||
Bionic |
Fix Released
|
Undecided
|
You-Sheng Yang | ||
Eoan |
Fix Released
|
Undecided
|
You-Sheng Yang | ||
Focal |
Fix Released
|
Undecided
|
You-Sheng Yang | ||
Groovy |
Fix Released
|
Undecided
|
You-Sheng Yang | ||
linux-oem-5.6 (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Invalid
|
Undecided
|
Unassigned | ||
Eoan |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
You-Sheng Yang | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned | ||
linux-oem-osp1 (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
You-Sheng Yang | ||
Eoan |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Invalid
|
Undecided
|
Unassigned | ||
Groovy |
Invalid
|
Undecided
|
Unassigned |
Bug Description
[SRU Justifcation]
[Impact]
Devices affected:
* [1022:148c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Starship
USB 3.0 Host Controller
* [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse
USB 3.0 Host Controller
* [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
Starship/Matisse HD Audio Controller
Despite advertising FLReset device capabilities, performing a function level
reset of either of these devices causes the system to lock up. This is of
particular issue where these devices appear in their own IOMMU groups and are
well suited to VFIO passthrough.
Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B" microcode
update, and affects dozens of motherboard models across various vendors.
Additional discussion of this issue:
https:/
[Fix]
Two commits currently landed in linux-pci pci/virutualiza
* 0d14f06cd665 PCI: Avoid FLR for AMD Matisse HD Audio & USB 3.0
* 5727043c73fd PCI: Avoid FLR for AMD Starship USB 3.0
[Test Case]
Peform the test on an impacted system:
* B350, B450, X370, X470, X570 motherboards (practically anything with an AM4
socket);
* Ryzen 3000-series CPU (2000-series possibly also affected);
* BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check
vendor release notes)
In the above case where '0000:10:00.3' is the USB controller '1022:149c', issue
a reset command:
$ echo 1 | sudo tee /sys/bus/
Impacted systems will not return successfully and become unstable, requiring a
reboot. `/var/logs/syslog` will show something resembling the following:
xhci_hcd 0000:10:00.3: not ready 1023ms after FLR; waiting
xhci_hcd 0000:10:00.3: not ready 2047ms after FLR; waiting
xhci_hcd 0000:10:00.3: not ready 4095ms after FLR; waiting
xhci_hcd 0000:10:00.3: not ready 8191ms after FLR; waiting
xhci_hcd 0000:10:00.3: not ready 16383ms after FLR; waiting
xhci_hcd 0000:10:00.3: not ready 32767ms after FLR; waiting
xhci_hcd 0000:10:00.3: not ready 65535ms after FLR; giving up
clocksource: timekeeping watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is too large:
clocksource: 'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
clocksource: 'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
tsc: Marking TSC unstable due to clocksource watchdog
TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
sched_clock: Marking unstable (1817664630139, 314261908)
[Regression Risk]
Low. These two patches affect only systems with a device needs fix.
========== Original Bug Description ==========
$ lsb_release -rd
Description: Ubuntu 19.10
Release: 19.10
[Impact]
Devices affected:
* [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
* [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
Despite advertising FLReset device capabilities, performing a function level reset of either of these devices causes the system to lock up. This is of particular issue where these devices appear in their own IOMMU groups and are well suited to VFIO passthrough.
Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B" microcode update, and affects dozens of motherboard models across various vendors.
Additional discussion of this issue:
https:/
[Fix]
Add a quirk to disable FLR on these devices. Sample patch attached.
[Test Case]
Peform the test on an impacted system:
* B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 socket);
* Ryzen 3000-series CPU (2000-series possibly also affected);
* BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check vendor release notes)
In the above case where '0000:10:00.3' is the USB controller '1022:149c', issue a reset command
$ echo 1 | sudo tee /sys/bus/
Impacted systems will not return successfully and become unstable, requiring a reboot. `/var/logs/syslog` will show something resembling the following
Mar 4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not ready 1023ms after FLR; waiting
Mar 4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not ready 2047ms after FLR; waiting
Mar 4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not ready 4095ms after FLR; waiting
Mar 4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not ready 8191ms after FLR; waiting
Mar 4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not ready 16383ms after FLR; waiting
Mar 4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not ready 32767ms after FLR; waiting
Mar 4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not ready 65535ms after FLR; giving up
Mar 4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is too large:
Mar 4 14:52:39 bunty kernel: [ 1817.978806] clocksource: 'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
Mar 4 14:52:39 bunty kernel: [ 1817.978809] clocksource: 'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
Mar 4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due to clocksource watchdog
Mar 4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Mar 4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable (1817664630139, 314261908)
[Regression Risk]
Unknown
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/
DistroRelease: Ubuntu 19.10
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
NonfreeKernelMo
Package: linux (not installed)
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.183.3
Tags: eoan
Uname: Linux 5.3.0-40+
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo
_MarkForUpload: False
dmi.bios.date: 11/14/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: L3.77
dmi.board.name: X470 Taichi
dmi.board.vendor: ASRock
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.
dmi.sys.vendor: To Be Filled By O.E.M.
tags: | added: patch |
Changed in linux (Ubuntu): | |
assignee: | nobody → You-Sheng Yang (vicamo) |
Changed in linux-oem-osp1 (Ubuntu Eoan): | |
status: | New → Invalid |
Changed in linux-oem-osp1 (Ubuntu Focal): | |
status: | New → Invalid |
Changed in linux-oem-osp1 (Ubuntu Groovy): | |
status: | New → Invalid |
Changed in linux-oem-5.6 (Ubuntu Bionic): | |
status: | New → Invalid |
Changed in linux-oem-5.6 (Ubuntu Eoan): | |
status: | New → Invalid |
Changed in linux-oem-5.6 (Ubuntu Groovy): | |
status: | New → Invalid |
Changed in linux (Ubuntu Bionic): | |
assignee: | nobody → You-Sheng Yang (vicamo) |
status: | New → In Progress |
Changed in linux (Ubuntu Eoan): | |
assignee: | nobody → You-Sheng Yang (vicamo) |
status: | New → In Progress |
Changed in linux (Ubuntu Focal): | |
assignee: | nobody → You-Sheng Yang (vicamo) |
status: | New → In Progress |
Changed in linux (Ubuntu Groovy): | |
status: | Confirmed → In Progress |
Changed in linux-oem-5.6 (Ubuntu Focal): | |
assignee: | nobody → You-Sheng Yang (vicamo) |
status: | New → In Progress |
Changed in linux-oem-osp1 (Ubuntu Bionic): | |
assignee: | nobody → You-Sheng Yang (vicamo) |
status: | New → In Progress |
description: | updated |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Eoan): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-eoan removed: verification-needed-eoan |
Changed in linux-oem-5.6 (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in linux-oem-osp1 (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1865988
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.