* [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
* [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
Despite advertising FLReset device capabilties, performing a function level reset of either of these devices causes the system to lock up. This is of particular issue where these devices appear in their own IOMMU groups and are well suited to VFIO passhthrough.
Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B" microcode update, and affects dozens of motherboad models across various vendors.
[Fix]
Add a quirk to disable FLR on these devices. Sample patch attached.
[Test Case]
Peform the test on an impacted system:
* B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 socket);
* Ryzen 3000-series CPU (2000-series possibly also affected);
* BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check vendor release notes)
In the above case where '0000:10:00.3' is the USB controller '1022:149c', issue a reset command
$ echo 1 | sudo tee /sys/bus/pci/devices/0000\:10\:00.3/reset
Impacted systems will not return successfully and become unstable, requiring a reboot. `/var/logs/syslog` will show something resembling the following
Mar 4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not ready 1023ms after FLR; waiting
Mar 4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not ready 2047ms after FLR; waiting
Mar 4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not ready 4095ms after FLR; waiting
Mar 4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not ready 8191ms after FLR; waiting
Mar 4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not ready 16383ms after FLR; waiting
Mar 4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not ready 32767ms after FLR; waiting
Mar 4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not ready 65535ms after FLR; giving up
Mar 4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is too large:
Mar 4 14:52:39 bunty kernel: [ 1817.978806] clocksource: 'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
Mar 4 14:52:39 bunty kernel: [ 1817.978809] clocksource: 'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
Mar 4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due to clocksource watchdog
Mar 4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Mar 4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable (1817664630139, 314261908)<-(1817981099530, -2209419)
$ lsb_release -rd
Description: Ubuntu 19.10
Release: 19.10
[Impact]
Devices affected:
* [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
* [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
Despite advertising FLReset device capabilties, performing a function level reset of either of these devices causes the system to lock up. This is of particular issue where these devices appear in their own IOMMU groups and are well suited to VFIO passhthrough.
Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B" microcode update, and affects dozens of motherboad models across various vendors.
[Fix]
Add a quirk to disable FLR on these devices. Sample patch attached.
[Test Case]
Peform the test on an impacted system:
* B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 socket);
* Ryzen 3000-series CPU (2000-series possibly also affected);
* BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check vendor release notes)
In the above case where '0000:10:00.3' is the USB controller '1022:149c', issue a reset command
$ echo 1 | sudo tee /sys/bus/ pci/devices/ 0000\:10\ :00.3/reset
Impacted systems will not return successfully and become unstable, requiring a reboot. `/var/logs/syslog` will show something resembling the following
Mar 4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not ready 1023ms after FLR; waiting <-(181798109953 0, -2209419)
Mar 4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not ready 2047ms after FLR; waiting
Mar 4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not ready 4095ms after FLR; waiting
Mar 4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not ready 8191ms after FLR; waiting
Mar 4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not ready 16383ms after FLR; waiting
Mar 4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not ready 32767ms after FLR; waiting
Mar 4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not ready 65535ms after FLR; giving up
Mar 4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is too large:
Mar 4 14:52:39 bunty kernel: [ 1817.978806] clocksource: 'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
Mar 4 14:52:39 bunty kernel: [ 1817.978809] clocksource: 'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
Mar 4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due to clocksource watchdog
Mar 4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Mar 4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable (1817664630139, 314261908)
[Regression Risk]
Unknown