xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13

Bug #1667750 reported by l3iggs
266
This bug affects 56 people
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
Linux
Confirmed
High
linux (Arch Linux)
New
Undecided
Unassigned
linux (Debian)
New
Undecided
Unassigned
linux (Fedora)
Confirmed
Undecided
linux (Ubuntu)
Fix Released
Medium
Kai-Heng Feng
Xenial
Fix Released
Undecided
Unassigned
Zesty
Fix Released
Undecided
Unassigned
Artful
Fix Released
Medium
Kai-Heng Feng

Bug Description

[SRU Justification]

[Impact]
Dell TB16 docking station has issue to use gigabit ethernet. The ethernet
will disconnect unless it's changed to 100Mb/s.

[Test Case]
Download some big files from the web.
User confirms the patch fixes the issue.

[Regression Potential]
This patch only effects ASMEDIA's ASM1042A.
The regression potential is low, also limited to the specific device.

---

My system contains a Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter which is on usb3 bus in my docking station (Dell TB16) which is attached to my laptop (Dell XPS9550) via Thunderbolt 3.

I get usb related kernel error messages when I initiate a high speed transfer (by issuing wget http://cdimage.ubuntu.com/daily-live/current/zesty-desktop-amd64.iso) and the download fails.

This does not happened when the Ethernet adapter is connected to a 100Mb/s switch, but only when connected to 1000Mb/s. It also does not happened with slow traffic (e.g. web page browsing). This is not a new bug with kernel 4.10, but has been going on since at least 4.7 and maybe (probably?) since forever. I'm aware of several others with this configuration (RTL8153 on usb3 behind thunderbolt 3) that have the same issue. This bug is also not specific to Ubuntu; I also get it on Arch Linux. I've also tested and seen this bug with several different models of thunderbolt 3 docks.

Here are the relevant kernel log messages:

Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9010 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9020 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9030 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9040 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9050 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:38 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9060 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:42:39 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:39 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9070 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:42:39 ubuntu kernel: xhci_hcd 0000:0e:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 24 16:42:39 ubuntu kernel: xhci_hcd 0000:0e:00.0: Looking for event-dma 00000004777d9080 trb-start 0000000475a14fe0 trb-end 0000000475a14fe0 seg-start 0000000475a14000 seg-end 0000000475a14ff0
Feb 24 16:43:06 ubuntu kernel: r8152 4-1.2:1.0 enx204747f8f471: Tx timeout
Feb 24 16:43:06 ubuntu kernel: r8152 4-1.2:1.0 enx204747f8f471: Tx status -2
Feb 24 16:43:06 ubuntu kernel: r8152 4-1.2:1.0 enx204747f8f471: Tx status -2
Feb 24 16:43:06 ubuntu kernel: r8152 4-1.2:1.0 enx204747f8f471: Tx status -2
Feb 24 16:43:06 ubuntu kernel: r8152 4-1.2:1.0 enx204747f8f471: Tx status -2
Feb 24 16:43:09 ubuntu kernel: usb 4-1.2: reset SuperSpeed USB device number 3 using xhci_hcd

I can't seem to make this bug appear with any other type of USB traffic. I've reported it to the realtek kernel dev team and they don't think their RTL8153 driver (in this case the r8152 module) is to blame, but instead that it's an xhci_hcd issue.

If you look through the dmesg log attached here, you'll see that at 45.967025 I plugged the thunderbolt 3 cable from my dock into my laptop.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-8-generic 4.10.0-8.10
ProcVersionSignature: Ubuntu 4.10.0-8.10-generic 4.10.0-rc8
Uname: Linux 4.10.0-8-generic x86_64
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
CasperVersion: 1.380
CurrentDesktop: Unity:Unity7
Date: Fri Feb 24 16:53:35 2017
LiveMediaBuild: Ubuntu 17.04 "Zesty Zapus" - Alpha amd64 (20170224)
MachineType: Dell Inc. XPS 15 9550
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi file=/cdrom/preseed/username.seed boot=casper quiet splash ---
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-8-generic N/A
 linux-backports-modules-4.10.0-8-generic N/A
 linux-firmware 1.163
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/22/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.2.19
dmi.board.name: 0N7TVV
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.2.19:bd12/22/2016:svnDellInc.:pnXPS159550:pvr:rvnDellInc.:rn0N7TVV:rvrA00:cvnDellInc.:ct9:cvr:
dmi.product.name: XPS 15 9550
dmi.sys.vendor: Dell Inc.

Revision history for this message
l3iggs (l3iggs) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
l3iggs (l3iggs)
description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.10 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
l3iggs (l3iggs) wrote :

Hi Joseph. Are you a robot?

I believe I've answered your questions in my bug report (when I wrote that this bug has been going on since forever). Also, this bug report was made with linux-image-4.10.0-8-generic so it seems that your request to test with a newer kernel does not apply.

l3iggs (l3iggs)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
l3iggs (l3iggs)
tags: added: kernel-bug-exists-upstream
Revision history for this message
l3iggs (l3iggs) wrote :
Revision history for this message
Jonathan Booth (svirpridon+ubuntu) wrote :

+1, same configuration, same behavior, 16.10.

Revision history for this message
Harald Nordgård-Hansen (hhansen) wrote :

A bit more info, as I've gotten hold of a separate usb network card with the same chip in it (Realtek 8153). Plugging it into the ASMedia ASM1042A USB controller found in the TB16 docking station gives the same result as reported here. But plugging it directly into the computer (Intel Sunrise Point-H USB controller) works without problems.

So the problem seems to be in the combination of the Realtek and ASMedia chips.

Revision history for this message
David Ibarra (dibarra) wrote :

Hey all, just +1'ing this- seeing this on ubuntu 16.10, and Fedora 25 (kernel 4.9). TB16 dock and Dell Precision 5510.

Revision history for this message
André Düwel (aduewel) wrote :

+1, Dell XPS15 9550 + Dell TB16 + Ubuntu 16.10.

Workaround:
Limiting the connection speed to 100MBit FDX via "ethtool eth..... speed 100 duplex full autoneg on" also circumvents the problem.

On Windows 10 its working without issues at full speed (Gigabit).

Revision history for this message
André Düwel (aduewel) wrote :

reloading the Realtek kernel module r8152 and restarting the network-manager also fixes the problem temporary:
sudo rmmod r8152.ko
sudo modprobe r8152.ko
sudo service network-manager restart

Revision history for this message
Kaz Wolfe (kazwolfe) wrote :

Seems related to Bug #1663975. Same problem, I'd think...

Also, just for the sake of completeness, yet another error log: http://pastebin.com/z8U9usDY

4.8.0-41-generic #44~16.04.1-Ubuntu, HWE because reasons. Kernel is tainted (NVIDIA, VirtualBox), but this issue seems to exist anyways.

Posted same comment over on the other bug report, sorry for any spam that may report.

Revision history for this message
Hordur Heidarsson (hordur-z) wrote :

+1, Dell Precision M5510 + Dell TB16 + Ubuntu 16.10 + 4.8.0-41-generic #44~16.04.1-Ubuntu SMP

@aduewel: thanks for the speed downgrade workaround!

Revision history for this message
Karlyn Fielding (karlyn) wrote :

I can confirm that I have the same issue being reported here. I have a Dell XPS13 Developer Edition (9360) with Ubuntu 16.04, TB16 Dock and an upgraded Ubuntu Mainline build kernel of 4.10.4.

Additionally, I tried downloading the source code from Realtek for their v2.08.0 r8152 driver. I compiled that driver and manually removed and re-inserted the new kernel module into my running system. With that driver running, I see the exact same behavior described here.

I'd be happy to volunteer for any testing on my hardware that might help debug the issue.

I can also confirm that changing the connection speed to 100Mb with ethtool provides a work around to the problem.

Also, I have some kernel output I have saved while experiencing the issue if there is interest in it.

Revision history for this message
l3iggs (l3iggs) wrote :

I'm starting to think this issue stems not from our Realtek RTL8153 Ethernet chip but rather from something upstream of it.

My best guess now is that there's something wrong with the handling of the "ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller" that's in our docks. This is a usb3.0 <--> PCIe bridge which the RTL8153 hangs off of.

Revision history for this message
Robert Sandberg (srobban) wrote :

Can confirm same issue on Dell Precision 5520 + TB16 running pre-installed Ubuntu 16.04 LTS.

Same issue on KDE Neon with different kernels 4.8.x, 4.10.x

Are also experience other USB issues when connecting various devices on the TB-16 e.g. all other USB devices freezes.

Revision history for this message
l3iggs (l3iggs) wrote :

Also, I wonder if this could somehow be a thunderbolt 3 bandwidth allocation issue.
This is pure uneducated speculation though ;-)

Revision history for this message
l3iggs (l3iggs) wrote :

By the way, removing and reinserting the r8152 module as suggested above does not seem to prevent or work around this issue.

Revision history for this message
l3iggs (l3iggs) wrote :

Higher transfer rates seem to have some impact here:

wget http://cdimage.ubuntu.com/daily-live/current/zesty-desktop-amd64.iso
errors out in a few seconds

wget --limit-rate=10k http://cdimage.ubuntu.com/daily-live/current/zesty-desktop-amd64.iso
might run for a few 10s of seconds before erroring out

wget --limit-rate=1k http://cdimage.ubuntu.com/daily-live/current/zesty-desktop-amd64.iso
continues to work until I run out of patience (forever?). I've not waited the 18 days required for this to complete though :P

Revision history for this message
l3iggs (l3iggs) wrote :

I've attached a trace for the 4.11 kernel.

First
echo xhci-hcd >> /sys/kernel/debug/tracing/set_event

Then initiate network transport to create the bug.

/sys/kernel/debug/tracing/trace (as 4.11.trace.txt)
and
dmesg (as 4.11.dmesg.txt) are attached.

Revision history for this message
l3iggs (l3iggs) wrote :

4.11.dmesg.txt

Revision history for this message
Alex Shchagin (qalex) wrote :

+1 here Precision 5510 + TB16 + 16.10 4.8.0-41
However I think this is not a r8152 bug too. When I plug in Dell USB-C Ethernet adapter into TB16 it works fine at the full speed with the same r8152 module.

Revision history for this message
l3iggs (l3iggs) wrote :

Alex, What's model number of that Dell USB-C Ethernet adapter?

Revision history for this message
Alex Shchagin (qalex) wrote :

@l3iggs Nothing is written on it, but it seems to be 470-ABQJ. It came with my Precision.

I thing the culprit is a USB host controller 'ASM1042A USB 3.0 Host Controller' embedded into TB16. See here:
> lshw -short
...
/0/100/1d.6/0 bridge DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015]
/0/100/1d.6/0/0 bridge DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015]
/0/100/1d.6/0/0/0 generic DSL6340 Thunderbolt 3 NHI [Alpine Ridge 2C 2015]
/0/100/1d.6/0/1 bridge DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015]
/0/100/1d.6/0/1/0 bridge DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
/0/100/1d.6/0/1/0/1 bridge DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
/0/100/1d.6/0/1/0/4 bridge DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
/0/100/1d.6/0/1/0/4/0 bridge DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
/0/100/1d.6/0/1/0/4/0/1 bridge DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
/0/100/1d.6/0/1/0/4/0/1/0 bus ASM1042A USB 3.0 Host Controller
/0/100/1d.6/0/1/0/4/0/1/0/0 usb3 bus xHCI Host Controller
/0/100/1d.6/0/1/0/4/0/1/0/0/1 bus USB2137B
/0/100/1d.6/0/1/0/4/0/1/0/0/1/5 multimedia USB Audio
/0/100/1d.6/0/1/0/4/0/1/0/1 usb4 bus xHCI Host Controller
/0/100/1d.6/0/1/0/4/0/1/0/1/1 bus USB5537B
/0/100/1d.6/0/1/0/4/0/1/0/1/1/2 generic USB 10/100/1000 LAN <<-- EMBEDDED, NOT WORKING
/0/100/1d.6/0/1/0/4/0/4 bridge DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
/0/100/1d.6/0/1/0/4/0/4/0 bus DSL6540 USB 3.1 Controller [Alpine Ridge]
/0/100/1d.6/0/1/0/4/0/4/0/0 usb5 bus xHCI Host Controller
/0/100/1d.6/0/1/0/4/0/4/0/1 usb6 bus xHCI Host Controller
/0/100/1d.6/0/1/0/4/0/4/0/1/2 generic USB 10/100/1000 LAN <<-- EXTERNAL, WORKING
...

By the way, Dell listed some special driver for Windows at the TB16 page for this ASM controller.

Revision history for this message
Karlyn Fielding (karlyn) wrote :

I can confirm that the issue remains with the latest 4.10.7 mainline kernel build for Ubuntu.

Same specs as before:
Dell XPS13 DE (9360)
TB16 Dock
Ubuntu 16.04 ( installed as shipped from Dell )

Revision history for this message
imperia (imperia777) wrote :

I have the same problem with my USB 3.1 controller:
ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller.

I passthru the controller to XEN VM. I then connect to it USB TV Tuner card.
I am using VDR software which is TV software that when not in use is scanning for new channels.

Sometimes after few hours, sometimes after few days it crashes with following error:
[131382.068144] xhci_hcd 0000:00:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 3
[131382.068182] xhci_hcd 0000:00:00.0: Looking for event-dma 0000000210196600 trb-start 0000000210196740 trb-end 0000000210196760 seg-start 0000000210196000 seg-end 0000000210196ff0

Then same problem is not present with the onboard USB 3.0 controller.
If I passhtru it to the XEN VM it working without any problems.

So this must be some problem with USB 3.1 driver (not 3.0) or ASMedia firmware.

I can provide whatever information is necessary to fix this bug.
I can provide shell account to my VM also if somebody wants to debug it.

Revision history for this message
Li Dongyang (dongyang-li) wrote :

Could someone try:
ethtool --offload <eth interface> tx off
ethtool --offload <eth interface> rx off

And then see if it works?

Revision history for this message
Robert Sandberg (srobban) wrote :

I've tried:
ethtool --offload <eth interface> tx off
ethtool --offload <eth interface> rx off

But the issue remains.

The only workaround that works is to limit speed to 100, as suggested previously.

Revision history for this message
André Düwel (aduewel) wrote :

Since I upgraded to Ubuntu 17.04 (fresh install), I can confirm that this bug also affects the (now) current release and therefore kernel version 4.10.0-19-generic.

I also now implemented an other "workaround" and bought an 7€ USB3->1Gb Ethernet dongle, this works without issues.

Additional Information:
lsusb
Bus 004 Device 004: ID 0bda:8153 Realtek Semiconductor Corp.
Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp.
Bus 004 Device 002: ID 0424:5537 Standard Microsystems Corp.
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 007: ID 03f0:094a Hewlett-Packard Optical Mouse [672662-001]
Bus 003 Device 006: ID 046d:c31c Logitech, Inc. Keyboard K120
Bus 003 Device 005: ID 2109:2811 VIA Labs, Inc. Hub
Bus 003 Device 004: ID 2109:2811 VIA Labs, Inc. Hub
Bus 003 Device 003: ID 0bda:4014 Realtek Semiconductor Corp.
Bus 003 Device 002: ID 0424:2137 Standard Microsystems Corp.
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 04f3:21d5 Elan Microelectronics Corp.
Bus 001 Device 002: ID 0a5c:6410 Broadcom Corp.
Bus 001 Device 004: ID 0c45:6713 Microdia
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

lshw --short: (see attachment)

Revision history for this message
André Düwel (aduewel) wrote :

sorry, attached wrong file in last comment. here is the right one

Revision history for this message
André Düwel (aduewel) wrote :

I need to correct me: Having issues during high load on the USB3 Ethernet adapter, too.

Only workaround is limiting to 100MBit.

Revision history for this message
Bram Biesbrouck (b-m) wrote :

André,

I followed your advice and bought an inexpensive USB3 1gbit ethernet adapter and noticed the same drops and corruptions as the built-in ethernet port. However, when I plug it in the USB-C port of the dock (using a little USB3 to USB-C cable), everything seems to work correctly.

B.

Revision history for this message
Alex Shchagin (qalex) wrote :

Bram,

This is because USB3 and USB-C ports in TB16 are connected to different controllers. See my lshw output here - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750/comments/23 - I've marked Ethernet cards with <<--. Working one is USB-C and it is under this one:
/0/100/1d.6/0/1/0/4/0/4 bridge DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
/0/100/1d.6/0/1/0/4/0/4/0 bus DSL6540 USB 3.1 Controller [Alpine Ridge]

Alex

Revision history for this message
André Düwel (aduewel) wrote :

Ohh okay, thanks for this advise I will order an adapter and try it out.

This seems to verify that the problem exists somewhere in the usb3 controller/driver (ASM1042A) in the TB16 and not in the Ethernet controller/driver itself.

Revision history for this message
Bram Biesbrouck (b-m) wrote :

Ah, cool, didn't know that, thanks!

Revision history for this message
zwigno (zwigno) wrote :

I have the same issue. I'm using a Dell Precision 5510 with the Dell TB16 Dock. I'm running Ubuntu 17.04 with kernel version 4.10.0-21-generic. What I first noticed is that some SSL-enabled websites failed to load with errors like, "SSL_ERROR_BAD_MAC_READ." Setting the speed of the r8152 fixes the issue.

Does anyone have a solution for setting the speed to 100Mb upon plugin of the Thunderbolt connector or when the interface comes up? Setting it at boot time isn't ideal because I don't often have the dock plugged in at first boot.

Revision history for this message
Zhenfang Wei (kopkop) wrote :

the same issue, xps9360 + ubuntu 16.04 with kernel 4.4.78 + tb16

Revision history for this message
Mario Limonciello (superm1) wrote :

This is an issue with the host controller. The vendor (ASMedia) has submitted a patch here that fixes the issue:
http://www.spinics.net/lists/linux-usb/msg157958.html

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I can confirm the patch works on the TB15 at my hand, can you guys try patched 4.11 kernel [1] on TB16?

I applied the patch to 4.11 - the patch cannot be cleanly applied to Xenial/Yakkety/Zesty kernel.

I'll do the proper backport when the patch is being accepted by upstream maintainers.

[1] http://people.canonical.com/~khfeng/lp1667750/

Revision history for this message
André Düwel (aduewel) wrote :

Hi Kai-Heng,

I can confirm your Kernel is working on my XPS 15 9550 + TB16 running Zesty and it fixes the Ethernet issue.

But, the whole TB16 USB3 Controller including Keyboard, Ethernet and other USB devices are still not working when connected during system start. I need to disconnect and reconnect it after booting.

Thanks again! :)

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Sounds like another issue. Can you file another bug?

AceLan Kao (acelankao)
tags: added: originate-from-1696057 somerville
Changed in linux (Fedora):
importance: Undecided → Unknown
status: New → Unknown
Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
description: updated
Seth Forshee (sforshee)
Changed in linux (Ubuntu Artful):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Xenial):
status: New → In Progress
status: In Progress → Fix Committed
Changed in linux (Ubuntu Zesty):
status: New → Fix Committed
Bram Biesbrouck (b-m)
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Xenial):
status: Fix Released → Fix Committed
Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
tags: added: verification-needed-xenial
tags: added: verification-needed-zesty
Corey Schuhen (cschuhen)
tags: added: verification-done-zesty
removed: verification-needed-zesty
tags: added: verification-done-xenial
removed: verification-needed-xenial
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Changed in linux (Fedora):
importance: Unknown → Undecided
status: Unknown → Confirmed
Changed in hwe-next:
status: New → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
292 comments hidden view all 372 comments
Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Stanislaw, short notice for you. Now, I'm running the fresh kernel (the RYZEN is really fast compiling it). Patch v2 is applied.
Everything is working fine and all Bogus messages are gone.
Thanks again.

Revision history for this message
In , wgh (wgh-linux-kernel-bugs) wrote :

(In reply to Mathias Nyman from comment #139)
> rewritten URB cancel, endpoint stop and set trb deq can be found in my tree
> in rewrite_halt_stop_handling branch
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
> rewrite_halt_stop_handling
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/
> ?h=rewrite_halt_stop_handling
>
> Does that help?

I applied the patch to 5.10.11-gentoo, and it did help with my HackRF One (see comment #136 for details and hardware)! No ill effects so far.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

After discussion on my posted patch here:

https://<email address hidden>/t/#u

it was concluded that this should be rather be xhci quirk instead of rt2800usb driver flag.

If change from comment 147 help for you with the problem, please provide PCI-id of your xHCI controller. This can be done by command:

lspci -k -nn | grep -B2 xhci

If you have more than one xHCI controller please assure you provide PCI-id's of one that actually has the problem ('lspci -t' command can be useful as well)

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Stanislaw Gruszka from comment #173)
> If you have more than one xHCI controller please assure you provide PCI-id's
> of one that actually has the problem ('lspci -t' command can be useful as
> well)

I meant 'lsusb -t'

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
Subsystem: ASMedia Technology Inc. Device [1b21:1142]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 295055
0001-usb-xhci-do-not-perform-Soft-Retry-for-some-xHCI-hos.patch

This is next proposed fix. It suppose to disable Soft Retry for affected xHCI controllers. Currently only for xHCI device reported by Michael:
PCI_VENDOR_ID_AMD = 0x1022 , PCI_DEVICE_ID_AMD_PROMONTORYA_4 = 0x43b9

If you want to test and have different xHCI host you need to add your PCI-id's to
drivers/usb/host/xhci-pci.c part of the patch.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Stanislaw, I followed the discussion you mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c173

Other devices than rt2800usb devices are affected, too.
Tested this one before applying your patch:
ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
and running into the same xhci issue on USB controller mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c175

[10214.423508] usb 1-2: new high-speed USB device number 3 using xhci_hcd
[10214.602833] usb 1-2: New USB device found, idVendor=7392, idProduct=7710, bcdDevice= 0.00
[10214.602838] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[10214.602841] usb 1-2: Product: Edimax Wi-Fi
[10214.602843] usb 1-2: Manufacturer: MediaTek
[10214.602845] usb 1-2: SerialNumber: 1.0
[10214.931553] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
[10215.102895] mt7601u 1-2:1.0: ASIC revision: 76010001 MAC revision: 76010500
[10215.132670] mt7601u 1-2:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[10216.101346] mt7601u 1-2:1.0: EEPROM ver:0d fae:00
[10216.111983] mt7601u 1-2:1.0: EEPROM country region 01 (channels 1-13)
[10217.189574] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[10217.190361] usbcore: registered new interface driver mt7601u
[10217.199429] mt7601u 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[10296.419053] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[10296.419228] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :

The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE 20):

Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:49:49 [kernel] [35029.419748] usb 1-6: USB disconnect, device number 3
Feb 03 09:49:52 [kernel] [35031.994403] usb 1-6: new full-speed USB device number 6 using xhci_hcd
Feb 03 09:50:45 [kernel] [35085.400634] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:50:45 [kernel] [35085.404278] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX
Feb 03 09:50:45 [kernel] [35085.404398] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 1
Feb 03 09:50:45 [kernel] [35085.404401] xhci_hcd 0000:01:00.0: Looking for event-dma 00000008146ff050 trb-start 00000008146ff060 trb-end 00000008146ff060 seg-start 00000008146ff000 seg-end 00000008146ffff0

$ lspci -k -nn | grep -B2 xhci
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
 Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142]
 Kernel driver in use: xhci_hcd
--
09:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1)
 Subsystem: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:139d]
 Kernel driver in use: xhci_hcd
--
0a:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f]
 Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:7914]
 Kernel driver in use: xhci_hcd

$ uname -a
Linux Gentoo 5.4.92-gentoo #1 SMP PREEMPT Thu Jan 28 20:45:52 MSK 2021 x86_64 AMD Ryzen 5 2600 Six-Core Processor AuthenticAMD GNU/Linux

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Michael from comment #177)
> Other devices than rt2800usb devices are affected, too.
> Tested this one before applying your patch:
> ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
> and running into the same xhci issue on USB controller mentioned here:
> https://bugzilla.kernel.org/show_bug.cgi?id=202541#c175

Ok, so it makes sense to disable Soft Retry per xHCI.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to alpir from comment #178)
> The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE
> 20):
>
> Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR
> Deq Ptr cmd failed due to incorrect slot or ep state.

alpir, does the change from comment 147 help for you ?

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

alpir, you have different device-id than Michael, but you both have the same subsytem device: ASMedia 1b21:1142. So perhaps patch should be based on subdevice id's. Let's wait for other users reports regarding xHCI controller, we will see then.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :
Download full text (9.5 KiB)

I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the same.

Why did I even start looking for the reason for the strange behavior of OSD ports: two my JetFlash Transcend 8GB flash drives connected to the USB3 port is sometimes not detected by the system as being mountable (fat32). When I run a disk check (8 Gb) with the command badblocks -nvs / dev / sdd, then after a while the check ends with the following error: Pass completed, 5662144 bad blocks found. (5662144/0/0 errors). And both flash drives.

But if you connect them to USB2, then there are no errors at all.

At the same time, when looking at the logs, I found errors: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Now, after patch, i get next in logs:

Feb 03 17:47:14 [kernel] [ 52.603587] usb 2-3: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:47:14 [kernel] [ 52.636130] usb-storage 2-3:1.0: USB Mass Storage device detected
Feb 03 17:47:14 [kernel] [ 52.636242] scsi host11: usb-storage 2-3:1.0
Feb 03 17:47:14 [kernel] [ 52.651996] usbcore: registered new interface driver uas
Feb 03 17:47:16 [kernel] [ 54.013780] scsi 11:0:0:0: Direct-Access JetFlash Transcend 8GB 1100 PQ: 0 ANSI: 6
Feb 03 17:47:16 [kernel] [ 54.014688] sd 11:0:0:0: [sdd] 15425536 512-byte logical blocks: (7.90 GB/7.36 GiB)
Feb 03 17:47:16 [kernel] [ 54.015150] sd 11:0:0:0: [sdd] Write Protect is off
Feb 03 17:47:16 [kernel] [ 54.015156] sd 11:0:0:0: [sdd] Mode Sense: 43 00 00 00
Feb 03 17:47:16 [kernel] [ 54.015625] sd 11:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 03 17:47:16 [kernel] [ 54.028165] sdd: sdd1
Feb 03 17:47:16 [kernel] [ 54.045687] sd 11:0:0:0: [sdd] Attached SCSI removable disk
Feb 03 17:48:04 [kernel] [ 102.221862] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:51:52 [kernel] [ 330.009696] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:55:55 [kernel] [ 573.644576] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:01 [kernel] [ 579.149875] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:01 [kernel] [ 579.254204] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:06 [kernel] [ 584.781836] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:07 [kernel] [ 585.073435] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:12 [kernel] [ 590.413816] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:12 [kernel] [ 590.518146] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:18 [kernel] [ 596.046034] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:18 [kernel] [ 596.336445] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:23 [kernel] [ 601.677932] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:23 [kernel] [ 601.782091] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:29 [kernel] [ 607.309722] usb 2-3: device descr...

Read more...

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

My controller has the PCI ID 43bb, so I've added "PCI_DEVICE_ID_AMD_PROMONTORYA_2" to the patch from #176, and that fixed the issue for me.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Stanislaw, I'm running an older mobo and a RYZEN 1700.
I don't need CPU power - GPU power is more important for me (crypto analysis).

Revision history for this message
In , biopsin (biopsin-linux-kernel-bugs) wrote :

[Continuing my first report in comment:https://bugzilla.kernel.org/show_bug.cgi?id=202541#c107]

$ lspci -k -nn | grep -B2 xhci
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
        Subsystem: ASMedia Technology Inc. Device [1b21:1142]
        Kernel driver in use: xhci_hcd

I have adapted the patch by Mr. Gruszka [https://bugzilla.kernel.org/show_bug.cgi?id=202541#c176] for my current system and needs

$ uname -a
Linux voidx 5.4.95_1 #1 SMP PREEMPT 1612063540 x86_64 GNU/Linux

If someone has some spare time to glance at it or comment on my error ;)
(diff availible for 30 days) @
https://p.teknik.io/lIBbA

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to alpir from comment #182)
> I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed
> due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the
> same.
[snip]
> But if you connect them to USB2, then there are no errors at all.

alpir, I think you experiencing different issue that can not be solved by simply disabling Soft Retry. Some more fixes are possibly needed for handing your xHCI/usb hardware. Maybe you can try patch from comment 139? If this is regression, maybe you can bisect to find offending commit? Anyway your problems, most likely will require expertise of Mathias Nyman - xhci driver maintainer.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to biopsin from comment #185)
> [Continuing my first report in
> comment:https://bugzilla.kernel.org/show_bug.cgi?id=202541#c107]

Similarly like for as for alpir case this most likely will require some different fixes, but you can try if disabling Soft Retry works. You can just disable like showed in comment 147

 > $ lspci -k -nn | grep -B2 xhci
> 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series
> Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
> Subsystem: ASMedia Technology Inc. Device [1b21:1142]
> Kernel driver in use: xhci_hcd
>
[snip]
> If someone has some spare time to glance at it or comment on my error ;)
> (diff availible for 30 days) @
> https://p.teknik.io/lIBbA

ASMedia is subsystem_{vendor,device) so most likely quirk flag is not set properly for you. You can print values by patch like this to see:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 906a0e08821e..0ec9c3637b7a 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -102,6 +102,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)

        id = pci_match_id(pdev->driver->id_table, pdev);

+ printk("vendor: 0x%04x device 0x%04x subvendor 0x%04x subdevice 0x%04x\n",
+ pdev->vendor, pdev->device, pdev->subsystem_vendor, pdev->subsystem_device);
+
        if (id && id->driver_data) {
                driver_data = (struct xhci_driver_data *)id->driver_data;
                xhci->quirks |= driver_data->quirks;

If indeed those are subsystem ID's I think there is bug in existing xhci-pci.c quirks code:

        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI)
                xhci->quirks |= XHCI_BROKEN_STREAMS;
        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI)
                xhci->quirks |= XHCI_TRUST_TX_LENGTH;
        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
            (pdev->device == PCI_DEVICE_ID_ASMEDIA_1142_XHCI ||
             pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI))
                xhci->quirks |= XHCI_NO_64BIT_SUPPORT

and those check should be replaced by pdev->subsystem_vendor and pdev->subsystem_device.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 295065
asmedia_subsytem_quirks.patch

This patch apply existing xhci ASMedia quirks also for ASMedia subdevices .

Looking into changelog history those quirks helped with some usb disk issues, so perhaps patch could help with disk issues reported here i.e. alpir and biopsin cases. Please test.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :

None of the patches (comments 139, 147, 188) did not solve my problem.

Revision history for this message
In , biopsin (biopsin-linux-kernel-bugs) wrote :

@Gruszka
Your patch [https://bugzilla.kernel.org/show_bug.cgi?id=202541#c188] makes very mutch sense, thank you.
I'm currently testing it with my setup and kernel 5.4.95_x86_64.
Tested against one PATA and one SATA drives, so far I see no ill effects, but I also can't confirm or deny it does anything with this short timespan, and much have change since my initial post last year. I will at least continuing applying it now and then out this year and report any newsworthy. Thank you for your time and help!

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :
Download full text (6.4 KiB)

Created attachment 295151
Dmesg of a Toshiba USB 3.0 HDD connected to USB 3.0 front port and back port.

I am having this error on Linux 5.10.10-051010 while trying to connect a USB 3.0 hard disk, Toshiba Touro 4TB (HitachiGST). If I connect the disk to a USB 2.0 port it works flawlessly.

The kernel shows a different kind of error depending on whether I connect the HDD to the front or back USB 3.0 ports of the motherboard MSI X470 Gaming Plus MAX.

lspci -vnnt:
> -[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) Root Complex [1022:1450]
> +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) I/O Memory Management Unit [1022:1451]
> +-01.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-01.1-[01]----00.0 Samsung Electronics Co Ltd NVMe SSD
> Controller SM981/PM981/PM983 [144d:a808]
> +-01.3-[03-26]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device
> [1022:43d0]
> | +-00.1 Advanced Micro Devices, Inc. [AMD] 400
> Series Chipset SATA Controller [1022:43c8]
> | \-00.2-[20-26]--+-00.0-[21]--
> | +-01.0-[22]----00.0 Realtek
> Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit
> Ethernet Controller [10ec:8168]
> | +-02.0-[23]--
> | +-03.0-[24]--
> | +-04.0-[25]--
> | \-08.0-[26]----00.0 ASMedia
> Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]
> +-02.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.1-[27]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df]
> | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
> +-04.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.1-[28]--+-00.0 Advanced Micro Devices, Inc. [AMD]
> Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
> | +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h
> (Models 00h-0fh) Platform Security Processor [1022:1456]
> | \-00.3 Advanced Micro Devices, Inc. [AMD] Zeppelin
> USB 3.0 Host controller [1022:145f]
> +-08.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-08.1-[29]--+-00.0 Advance...

Read more...

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :

Created attachment 295183
Dmesg of a OnePlus 7 Pro connecting in USB 3.1 gen1 mode. No errors.

(In reply to raul from comment #191)
Connecting a Oneplus 7 Pro smartphone does show any error. This phone has a USB 3.1 gen1 port and connects in that mode without errors. I can navigate the filesystem as one would expect.

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
In , tisaak (tisaak-linux-kernel-bugs) wrote :

Same issue with a Seagate Portable 4 TB USB 3.0 drive that I connect with usb-storage quirks as its UAS implementation is problematic. Random hangs that flood dmesg with errors.

lsusb -tv
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 3: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        ID 0bc2:231a Seagate RSS LLC Expansion Portable

Errors in dmesg start like this...

xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
usb 3-3: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
sd 5:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
sd 5:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 a4 01 ed 78 00 00 00 10 00 00

After that:

task:usb-storage state:D stack: 0 pid: 286 ppid: 2 flags:0x00004000
Call Trace:
  __schedule+0x282/0x870
  ? usleep_range+0x80/0x80
  schedule+0x46/0xb0
  schedule_timeout+0xff/0x140
  ? __prepare_to_swait+0x4b/0x70
  __wait_for_common+0xae/0x160
  usb_sg_wait+0xe0/0x1a0 [usbcore]
  usb_stor_bulk_transfer_sglist.part.0+0x64/0xb0 [usb_storage]
  usb_stor_Bulk_transport+0x188/0x410 [usb_storage]
  usb_stor_invoke_transport+0x3a/0x520 [usb_storage]
  ? __prepare_to_swait+0x4b/0x70
  ? __wait_for_common+0xed/0x160
  usb_stor_control_thread+0x185/0x280 [usb_storage]
  ? storage_probe+0x2a0/0x2a0 [usb_storage]
  kthread+0x11b/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

(In reply to Zak from comment #193)
>
>
> Errors in dmesg start like this...
>
> xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
> xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.

There are recent major changes in this area in the xhci driver.
The above message no longer exists, new message in this case is
"Set TR Deq already pending, don't submit for x"

Can you try this on a 5.12-rc kernel?

Thanks
Mathias

Revision history for this message
In , mlkcampion (mlkcampion-linux-kernel-bugs) wrote :

Created attachment 296259
xhci no soft retry for Intel xhci 8086:06ed and 8086:31a8

Hi

I am having this issue on 2 systems when I plug in
a Hoco Hub HB16. The Hoco Hub HB16 is a 6 in 1 adapter that
includes
Type-C to USB3.0 x3
Type-C to HDMI
Type-C to RJ45 Ethernet (RealTek RTL8153, linux loads driver rtl8153b-2)
Type-C to Type-C(PD2.0)
USB Billboard device

Also when the device is plugged into a Windows10 machine
for the first time it presents a disk that contains the RTL8153
drivers, the user is provided with an option to install these. This
"disk" is not visible later.

The 2 systems where this device failed both reported
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
Both systems have Ubuntu Mate 20.10

$ uname -a
5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

1. Dell XPS 9500 (Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz)
$ sudo lspci -k -nn | grep -B2 xhci
    00:14.0 USB controller [0c03]: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller [8086:06ed]
 Subsystem: Dell Comet Lake USB 3.1 xHCI Host Controller [1028:097d]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci
--
    7:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
 Subsystem: Dell JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [1028:097d]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci

2. Seed Studio Odyssey J4105 (Intel(R) Celeron(R) J4105 CPU @ 1.50GHz)
$ sudo lspci -k -nn | grep -B3 xhci
    00:15.0 USB controller [0c03]: Intel Corporation Device [8086:31a8] (rev 03)
 DeviceName: Onboard - Other
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci

I applied the changes in Stanislaw's patch at comment 176, I added the
PCI IDs to match both my systems.

I can confirm that with the patch applied both systems no longer reported the
issue ""WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."

Just to note that on the Dell XPS I use the Dell DA20 Adapter which is a Type-C
to USB and HDMI adapter. This appears to have an ASIX Elec. Corp. AX88179
USB 3.0 to Gigabit Ethernet which I don't have any issues with.

Revision history for this message
In , luke-jr+linuxbugs (luke-jr+linuxbugs-linux-kernel-bugs) wrote :

Encountered this with a PCI-e card using ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

Moved to my native "Intel Corporation Device a3af" USB bus, this error disappeared (though other problems remain in my case)

Linux 5.10.33

Of potential noteworthiness: When I got my Talos II, I tried to move this ASMedia USB PCI-e card to it, and found it was immediately shutdown by the IOMMU whenever I would try to use it at all. It seems the firmware is garbage.

IIRC, someone was getting close to an open source firmware replacement without those issues... would be interesting to see if it helps with this bug as well.

Revision history for this message
In , dront78 (dront78-linux-kernel-bugs) wrote :
Download full text (16.3 KiB)

same problem
5.12.12-arch1-1 #1 SMP PREEMPT Fri, 18 Jun 2021 21:59:22 +0000 x86_64 GNU/Linux

GPD Pocket

00:00.0 Host bridge [0600]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series SoC Transaction Register [8086:2280] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: iosf_mbi_pci
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller [8086:22b0] (rev 34)
 DeviceName: Onboard IGD
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: i915
 Kernel modules: i915
00:0b.0 Signal processing controller [1180]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series Power Management Controller [8086:22dc] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: proc_thermal
 Kernel modules: processor_thermal_device
00:14.0 USB controller [0c03]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series USB xHCI Controller [8086:22b5] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci
00:1a.0 Encryption controller [1080]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series Trusted Execution Engine [8086:2298] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel modules: mei_txe
00:1c.0 PCI bridge [0604]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series PCI Express Port #1 [8086:22c8] (rev 34)
 Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series PCU [8086:229c] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel modules: lpc_ich
01:00.0 Network controller [0280]: Broadcom Inc. and subsidiaries BCM4356 802.11ac Wireless Network Adapter [14e4:43ec] (rev 02)
 Subsystem: Gemtek Technology Co., Ltd Device [17f9:0036]
 Kernel driver in use: brcmfmac
 Kernel modules: brcmfmac

# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x5B8DE000.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
 Vendor: American Megatrends Inc.
 Version: 5.11
 Release Date: 06/28/2017
 Address: 0xF0000
 Runtime Size: 64 kB
 ROM Size: 4 MB
 Characteristics:
  PCI is supported
  BIOS is upgradeable
  BIOS shadowing is allowed
  Boot from CD is supported
  Selectable boot is supported
  BIOS ROM is socketed
  EDD is supported
  5.25"/1.2 MB floppy services are supported (int 13h)
  3.5"/720 kB floppy services are supported (int 13h)
  3.5"/2.88 MB floppy services are supported (int 13h)
  Print screen service is supported (int 5h)
  Serial services are supported (int 14h)
  Printer services are supported (int 17h)
  ACPI is supported
  USB legacy is supported
  BIOS boot specification is supported
  Targeted content distribution is supported
  UEFI is supported
 BIOS Revision: 5.11

Handle 0x0001, DMI type 1, 27 bytes
System Information
 Manufacturer: Default string
 Product Name: Default string
 Version: Default string
 Serial Number: Default string
 UUID: 03000200-0400-0500-0006-000700080009
 Wake-up ...

Revision history for this message
In , antdev66 (antdev66-linux-kernel-bugs) wrote :

I have same problem with kernels 5.13.12 and 5.14.0-rc7:

dmesg:
xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

journalctl:
ago 24 18:38:40 SERVER kernel: sd 4:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s

Revision history for this message
In , stulluk (stulluk-linux-kernel-bugs) wrote :

I also experience exactly same issue on multiple USB devices ( USB-WIFI or a USB-Webcam ) only on my brand new AMD Mainboard ( ASRock model: B550M-HDV)

I tried both ubuntu focal and hirsute with latest kernels on my OldPC (ASUSTeK model: M5A78L-M LX3) and on my IntelNUC (NUC8BEB) and this issue does not happen (Tried with same USB-WIFI and USB-Webcam devices).

Issue is easily reproducible by inserting USB-WIFI and then executing "ip a" on a shell.

Revision history for this message
In , dion (dion-linux-kernel-bugs) wrote :
Download full text (3.6 KiB)

I also have exactly same problem, but with a bit different HW.

Now it's USB DAC branded as "Qudelix-5K". As far as I understand it's USB1 device.

[ 174.358189] usb 5-2.3.2.2.1.1: new full-speed USB device number 17 using xhci_hcd
[ 174.475229] usb 5-2.3.2.2.1.1: New USB device found, idVendor=0a12, idProduct=4025, bcdDevice=19.70
[ 174.475232] usb 5-2.3.2.2.1.1: New USB device strings: Mfr=1, Product=8, SerialNumber=3
[ 174.475233] usb 5-2.3.2.2.1.1: Product: Qudelix-5K USB DAC/MIC 48KHz
[ 174.475234] usb 5-2.3.2.2.1.1: Manufacturer: QTIL
[ 174.475235] usb 5-2.3.2.2.1.1: SerialNumber: ABCDEF0123456789

It produces corrupted sound (actually some noise) just after a few seconds of playback if connected to Dell WD19TB thunderbolt dock station. Issue happens with USB-A ports on dock plus one Type-C port (front). Second Type-C port (named as "Type-C with Thunderbolt 3 port" works.

When such noise happens I'm getting followed in dmesg:

xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe940f0 trb-start 00000000ffe94100 trb-end 00000000ffe94100 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0
xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe949b0 trb-start 00000000ffe949c0 trb-end 00000000ffe949c0 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0

I've tried to add/remove extra USB hubs (originally Qudelix was plugged to internal USB3 hub of monitor). But even if plugged directly to dock, it produces corrupted sound.

Another important thing: this dock has built-in Ethernet with r8153 chipset like mentioned above.

After reading comments here I've tried to disable soft retry using followed patch:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 1c9a7957c45c..07cbcf50160c 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -189,10 +189,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)

        if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
                xhci->quirks |= XHCI_LPM_SUPPORT;
                xhci->quirks |= XHCI_INTEL_HOST;
                xhci->quirks |= XHCI_AVOID_BEI;
+ xhci->quirks |= XHCI_NO_SOFT_RETRY;
        }
        if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
                        pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI) {
                xhci->quirks |= XHCI_EP_LIMIT_QUIRK;
                xhci->limit_active_eps = 64;

And it completely fixed issue for me. DAC produces clear sound even if connected through chain of two hubs!

PS.
lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation Comet Lake PCH-LP USB 3.1 xHCI Host Controller [8086:02ed]
        Subsystem: Hewlett-Packard Company Comet Lake PCH-LP USB 3.1 xHCI Host Controller [103c:8724]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
--
37:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
        Subsystem: Hewlett-P...

Read more...

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :

Turns out the problem was the cable, it was too long. A shorter USB 3.0 cable (1.8m) allowed a stable connection. On the same Linux 5.13 (the previous dmesg was on Linux 5.10) the longer 3 meters cable kept failing while with the 1.8 meters cable the HDD works without issue.

(In reply to raul from comment #191)

Revision history for this message
In , S.Braendlin (s.braendlin-linux-kernel-bugs) wrote :

Hi,
I have also issues with USB3 on my Debian 10 with kernel 5.10.0-0.bpo.5-amd64 which is not appearing when using USB2 port:

Aug 6 13:20:14 media-server kernel: [ 964.069355] scsi host17: uas_eh_device_reset_handler start
Aug 6 13:20:14 media-server kernel: [ 964.197532] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Aug 6 13:20:14 media-server kernel: [ 964.219053] scsi host17: uas_eh_device_reset_handler success
Aug 6 13:20:18 media-server kernel: [ 968.137601] task:sync state:D stack: 0 pid:12237 ppid: 11291 flags:0x00004324
Aug 6 13:20:18 media-server kernel: [ 968.137607] Call Trace:
Aug 6 13:20:18 media-server kernel: [ 968.137621] __schedule+0x2be/0x770
Aug 6 13:20:18 media-server kernel: [ 968.137630] schedule+0x3c/0xa0
Aug 6 13:20:18 media-server kernel: [ 968.137635] io_schedule+0x12/0x40
Aug 6 13:20:18 media-server kernel: [ 968.137644] wait_on_page_bit+0x127/0x230
Aug 6 13:20:18 media-server kernel: [ 968.137651] ? __page_cache_alloc+0x80/0x80
Aug 6 13:20:18 media-server kernel: [ 968.137657] wait_on_page_writeback+0x25/0x70
Aug 6 13:20:18 media-server kernel: [ 968.137663] __filemap_fdatawait_range+0x89/0xf0
Aug 6 13:20:18 media-server kernel: [ 968.137673] ? sync_inodes_one_sb+0x20/0x20
Aug 6 13:20:18 media-server kernel: [ 968.137679] filemap_fdatawait_keep_errors+0x1a/0x40
Aug 6 13:20:18 media-server kernel: [ 968.137684] iterate_bdevs+0xad/0x150
Aug 6 13:20:18 media-server kernel: [ 968.137691] ksys_sync+0x7c/0xb0
Aug 6 13:20:18 media-server kernel: [ 968.137697] __do_sys_sync+0xa/0x10
Aug 6 13:20:18 media-server kernel: [ 968.137704] do_syscall_64+0x33/0x80
Aug 6 13:20:18 media-server kernel: [ 968.137709] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 6 13:20:18 media-server kernel: [ 968.137714] RIP: 0033:0x7fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137717] RSP: 002b:00007ffcddf49048 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
Aug 6 13:20:18 media-server kernel: [ 968.137723] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137725] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000a8002000
Aug 6 13:20:18 media-server kernel: [ 968.137728] RBP: 0000000000000000 R08: 0000555ba9703dcf R09: 00007ffcddf4afe2
Aug 6 13:20:18 media-server kernel: [ 968.137730] R10: 00007fc4ec01a201 R11: 0000000000000246 R12: 0000000000000001
Aug 6 13:20:18 media-server kernel: [ 968.137733] R13: 0000000000000001 R14: 00007ffcddf49158 R15: 0000000000000000

Revision history for this message
In , pupilla (pupilla-linux-kernel-bugs) wrote :
Download full text (45.7 KiB)

Hello everyone,

I encountered the problem with kernel 6.0.0-rc3 on a lenovo t470 laptop and a usb3 axis card. The system was started with the parameter intel_idle.max_cstate=1 and this appears to affect the possibility of the bug appearing. I have now rebooted the system without this parameter.

I have another similar setup (same laptop and same usb3 network card, but with linux 6.0.0-rc2) that has been active for 8 days started without the parameter intel_idle.max_cstate=1 and the problem has not occurred to date.

The distribution is Slackware 15 (64 bit).

This is the full output of dmesg.

Any feedback is welcome.

Marco

[ 0.000000] Linux version 6.0.0-rc3 (root@Cherepakha) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP PREEMPT_DYNAMIC Tue Aug 30 16:07:18 CEST 2022
[ 0.000000] Command line: auto BOOT_IMAGE=Linux ro root=10303 intel_idle.max_cstate=1
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
[ 0.000000] signal: max sigframe size: 1616
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000040000000-0x00000000403fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000040400000-0x000000008b79bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000008b79c000-0x0000000090652fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000090653000-0x0000000090653fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000090654000-0x000000009b52cfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000009b52d000-0x000000009b599fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000009b59a000-0x000000009b5fefff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000009b5ff000-0x000000009f7fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f3ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed84000-0x00000000fed84fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00...

Revision history for this message
In , pupilla (pupilla-linux-kernel-bugs) wrote :

Hello everyone,

unfortunately it happened again (system started without parameters):

[ 9.561808] br0: port 2(eth1) entered forwarding state
[95735.974041] usb 2-1: USB disconnect, device number 2
[95735.974215] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974439] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974471] ax88179_178a 2-1:1.0 eth1: unregister 'ax88179_178a' usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet
[95735.974523] ax88179_178a 2-1:1.0 eth1: Failed to read reg index 0x0002: -19
[95735.974532] ax88179_178a 2-1:1.0 eth1: Failed to write reg index 0x0002: -19
[95735.974595] br0: port 2(eth1) entered disabled state
[95735.974783] device eth1 left promiscuous mode
[95735.974790] br0: port 2(eth1) entered disabled state
[95735.992489] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95735.992503] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0001: -19
[95735.992510] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95736.215301] usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd
[95736.566562] ax88179_178a 2-1:1.0 eth1: register 'ax88179_178a' at usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet, 00:0e:c6:81:79:01

Marco

Revision history for this message
In , ske5074 (ske5074-linux-kernel-bugs) wrote :
Download full text (9.6 KiB)

I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet adapters. Each one is stable in a lower USB slot. Swapping the adapters does not change the behavior and only impacts the USB device in the higher slot. Changes to different ports without change.

Easily reproducible with the following commands. Basically I'm trying to plumb bond0 again, which works initially, I get the xhci_hcd warning, and the link is down again. System details are also below.

root@higgins:~# dmesg -C ; ifup -a ; ip link | grep enx ; \
> dmesg -H ; dmesg -C ; sleep 70 ; \
> ip link | grep enx ; dmesg -H
3: enxd03745be5afc: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000
16: enx54af9786ab11: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000

[Sep 3 11:05] device enx54af9786ab11 entered promiscuous mode
[ +0.001236] bond0: (slave enx54af9786ab11): Enslaving as a backup interface with a down link
[ +0.006363] vmbr0: the hash_elasticity option has been deprecated and is always 16
[ +0.013972] r8152 2-4:1.0 enx54af9786ab11: Promiscuous mode enabled
[ +0.001344] r8152 2-4:1.0 enx54af9786ab11: carrier on

3: enxd03745be5afc: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000
17: enx54af9786ab11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

[Sep 3 11:05] bond0: (slave enx54af9786ab11): link status definitely up, 1000 Mbps full duplex
[Sep 3 11:06] usb 2-4: USB disconnect, device number 12
[ +0.001544] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ +0.001435] bond0: (slave enx54af9786ab11): Releasing backup interface
[ +0.029081] device enx54af9786ab11 left promiscuous mode
[ +0.316190] usb 2-4: new SuperSpeed USB device number 13 using xhci_hcd
[ +0.022053] usb 2-4: New USB device found, idVendor=2357, idProduct=0601, bcdDevice=30.00
[ +0.001297] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[ +0.001337] usb 2-4: Product: USB 10/100/1000 LAN
[ +0.001261] usb 2-4: Manufacturer: TP-Link
[ +0.001208] usb 2-4: SerialNumber: 000001
[ +0.137200] usb 2-4: reset SuperSpeed USB device number 13 using xhci_hcd
[ +0.049197] r8152 2-4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[ +0.030905] r8152 2-4:1.0 eth0: v1.12.12
[ +0.007834] r8152 2-4:1.0 enx54af9786ab11: renamed from eth0
root@higgins:~#

-------
System Details
-------

root@higgins:~# uname -a
Linux higgins 5.15.39-4-pve #1 SMP PVE 5.15.39-4 (Mon, 08 Aug 2022 15:11:15 +0200) x86_64 GNU/Linux

root@higgins:~# lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
        Subsystem: Lenovo 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [17aa:310b]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci

root@higgins:~# lsusb -tv
/: Bus 02.Port 1: D...

Read more...

Revision history for this message
In , ske5074 (ske5074-linux-kernel-bugs) wrote :

(In reply to Sean Kennedy from comment #205)
> I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo
> M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet
> adapters. Each one is stable in a lower USB slot. Swapping the adapters does
> not change the behavior and only impacts the USB device in the higher slot.
> Changes to different ports without change.

Update - Tried a different dongle - a 2.5Gbe and have two hard drives attached to the system. Doesn't matter where the 2.5Gbe dongle is attached, it eventually errors with "WARN Set TR Deq Ptr cmd failed" And the error rate is only around six times a day right now:

8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G LAN

# dmesg -T | grep xhci
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810
[Tue Sep 6 13:37:13 2022] usb usb1: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
[Tue Sep 6 13:37:13 2022] usb usb2: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-3: new SuperSpeed USB device number 3 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-4: new SuperSpeed USB device number 4 using xhci_hcd
[Tue Sep 6 14:39:22 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 14:39:22 2022] usb 2-4: new SuperSpeed USB device number 5 using xhci_hcd
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:02 2022] usb 2-4: new SuperSpeed USB device number 6 using xhci_hcd
[Tue Sep 6 22:19:06 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 22:19:07 2022] usb 2-4: new SuperSpeed USB device number 7 using xhci_hcd

Since this drops the device from the system and offlines the link, I created a simple script to detect zero UP ethernet devices via cron once a minute and runs a ifnet -a. It's clunky but works.

crontab:
# m h dom mon dow command
* * * * * /root/fixnet.sh >/dev/null 2>&1

fixnet.sh:
#!/bin/sh

STATE=`ip link | grep " enx" | grep UP | wc -l`
if [ $STATE -gt 0 ]; then
  # All good. Exit
  exit 0
fi

/usr/sbin/ifup -a
sleep 20

ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -eq 0 ]; then
  # Network looks good. Exit.
  exit 0
fi

sleep 310
ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -ne 0 ]; then
  # The network is still down.
  systemctl reboot
fi

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

I'm using a 2.5gb ethernet usb device and getting this error intermittently (a dozen times per day).

$ uname -a
Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ lsusb
<snip>
Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G

This is what plays out via /var/log/syslog each time:

Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect, device number 15
Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1: unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1: Network is down
Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed Gen 1 USB device number 16 using xhci_hcd
Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB 10/100/1G/2.5G LAN
Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer: Realtek
Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber: 001000001
Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0: MAC-Address: xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting rx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting tx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1: register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus systemd-udevd[325501]: Using default interface naming scheme 'v245'.
Dec 21 10:26:47 hephaestus systemd-udevd[325501]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_100_1G_2.5G_LAN.
(then things start back up and the ethernet link goes live again after about 10 seconds)

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

FYI: I have built a kernel with the previously (on this thread) discussed patch (on a 5.4 kernel) and I still have the error multiple times per day.

(In reply to James H from comment #207)
> I'm using a 2.5gb ethernet usb device and getting this error intermittently
> (a dozen times per day).
>
> $ uname -a
> Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC
> 2022 x86_64 x86_64 x86_64 GNU/Linux
>
>
> $ lsusb
> <snip>
> Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB
> 10/100/1G/2.5G
>
>
>
> This is what plays out via /var/log/syslog each time:
>
> Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect,
> device number 15
> Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1:
> unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
> Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1:
> Network is down
> Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
> Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed
> Gen 1 USB device number 16 using xhci_hcd
> Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device
> found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
> Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device
> strings: Mfr=1, Product=2, SerialNumber=6
> Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB
> 10/100/1G/2.5G LAN
> Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer:
> Realtek
> Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber:
> 001000001
> Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0:
> MAC-Address: xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting
> rx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting
> tx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1:
> register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus systemd-udevd[325501]: Using default interface
> naming scheme 'v245'.
> Dec 21 10:26:47 hephaestus systemd-udevd[325501]: ethtool: autonegotiation
> is unset or enabled, the speed and duplex are not writable.
> Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_100_1G_2.5G_LAN.
> (then things start back up and the ethernet link goes live again after about
> 10 seconds)

Revision history for this message
Chris Adams (fkmjo) wrote :

I am also having this issue with an ASMedia controller, and I think this is a duplicate bug https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1874442

Revision history for this message
In , svmohr (svmohr-linux-kernel-bugs) wrote :
Download full text (4.2 KiB)

I also get random disconnects on kernel 6.3.0-7-generic with a Samsung T7 Shield external SSD drive. Unfortunately it is hard to reproduce this error, it usually takes hours before it occurs the first time.

System:
  Kernel: 6.3.0-7-generic arch: x86_64 bits: 64 compiler: N/A Console: pty pts/10 Distro: Ubuntu
    23.10 (Mantic Minotaur)
Machine:
  Type: Server System: Supermicro product: C9Z390-PGW v: 0123456789 serial: <filter>
  Mobo: Supermicro model: C9Z390-PGW v: 1.01A serial: <filter> UEFI: American Megatrends v: 1.3
    date: 06/03/2020
CPU:
  Info: 8-core model: Intel Core i9-9900K bits: 64 type: MT MCP arch: Coffee Lake rev: D cache:
    L1: 512 KiB L2: 2 MiB L3: 16 MiB
  Speed (MHz): avg: 3687 high: 5002 min/max: 800/5000 cores: 1: 5002 2: 3600 3: 3600 4: 3600
    5: 3600 6: 3600 7: 3600 8: 3600 9: 3600 10: 3600 11: 3600 12: 3600 13: 3600 14: 3600 15: 3600
    16: 3600 bogomips: 115200
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 10000M
        ID 04e8:61fb Samsung Electronics Co., Ltd

BOOT_IMAGE=/boot/vmlinuz-6.3.0-7-generic root=UUID=2c8c7990-bb1d-47dc-a70c-0272867b1807 ro quiet splash intel_iommu=on iommu=pt pcie_aspm=off initcall_blacklist=sysfb_init rd.modules-load=vf
io-pci vfio_pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7,1462:3710 vt.handoff=7

[349280.239403] usb 2-4: USB disconnect, device number 9
[349280.239689] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[349280.239695] usb 2-4: cmd cmplt err -108
[349280.239702] sd 9:0:0:0: [sdh] tag#13 uas_zap_pending 0 uas-tag 1 inflight: CMD
[349280.239705] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239724] sd 9:0:0:0: [sdh] tag#13 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[349280.239726] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239728] I/O error, dev sdh, sector 3542672384 op 0x1:(WRITE) flags 0x8800 phys_seg 27 prio class 2
[349280.239741] device offline error, dev sdh, sector 3542674432 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239747] device offline error, dev sdh, sector 3542672640 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 2
[349280.239750] device offline error, dev sdh, sector 3542677504 op 0x1:(WRITE) flags 0x8800 phys_seg 45 prio class 2
[349280.239753] device offline error, dev sdh, sector 3542680576 op 0x1:(WRITE) flags 0x8800 phys_seg 41 prio class 2
[349280.239788] device offline error, dev sdh, sector 3542663168 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239793] device offline error, dev sdh, sector 3542663680 op 0x1:(WRITE) flags 0x8800 phys_seg 29 prio class 2
[349280.239799] device offline error, dev sdh, sector 3542663936 op 0x1:(WRITE) flags 0x8800 phys_seg 26 prio class 2
[349280.299534] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[349280.523475] sd 9:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVE...

Read more...

Displaying first 40 and last 40 comments. View all 372 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers