xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state

Bug #1940004 reported by M K S
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High

Bug Description

Recently my external USB drive enclosure stops working after a bit of IO activity (copy jobs etc.). This wasn't the case not too long ago. I use this enclosure as an archive backup and plug it every month or so.

Import to note that this issue is/was being tracked here: https://bugzilla.kernel.org/show_bug.cgi?id=202541 and a few patches have been suggested. I have a dell XPS 7590 laptop and according to comment
195 patch in comment # 176 fixes the issue.

$ uname -a
Linux kambuntu 5.11.0-25-generic #27~20.04.1-Ubuntu SMP Tue Jul 13 17:41:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ dmesg -T
[Sun Aug 15 10:47:19 2021] usb 2-2: new SuperSpeed Gen 1 USB device number 10 using xhci_hcd
[Sun Aug 15 10:47:19 2021] usb 2-2: New USB device found, idVendor=152d, idProduct=0539, bcdDevice= 1.00
[Sun Aug 15 10:47:19 2021] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=5
[Sun Aug 15 10:47:19 2021] usb 2-2: Product: USB to ATA/ATAPI Bridge
[Sun Aug 15 10:47:19 2021] usb 2-2: Manufacturer: JMicron
[Sun Aug 15 10:47:19 2021] usb 2-2: SerialNumber: DCC10435415F
[Sun Aug 15 10:47:19 2021] usb-storage 2-2:1.0: USB Mass Storage device detected
[Sun Aug 15 10:47:19 2021] usb-storage 2-2:1.0: Quirks match for vid 152d pid 0539: 4000000
[Sun Aug 15 10:47:19 2021] scsi host4: usb-storage 2-2:1.0
[Sun Aug 15 10:47:20 2021] scsi 4:0:0:0: Direct-Access WDC WD30 EFRX-68AX9N0 PQ: 0 ANSI: 5
[Sun Aug 15 10:47:20 2021] scsi 4:0:0:1: Direct-Access WDC WD30 EFRX-68AX9N0 PQ: 0 ANSI: 5
[Sun Aug 15 10:47:20 2021] scsi 4:0:0:2: Direct-Access WDC WD30 EFRX-68AX9N0 PQ: 0 ANSI: 5
[Sun Aug 15 10:47:20 2021] scsi 4:0:0:3: Direct-Access WDC WD30 EFRX-68AX9N0 PQ: 0 ANSI: 5
[Sun Aug 15 10:47:20 2021] sd 4:0:0:0: Attached scsi generic sg1 type 0
[Sun Aug 15 10:47:20 2021] scsi 4:0:0:1: Attached scsi generic sg2 type 0
[Sun Aug 15 10:47:20 2021] sd 4:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
[Sun Aug 15 10:47:20 2021] sd 4:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
[Sun Aug 15 10:47:20 2021] sd 4:0:0:2: Attached scsi generic sg3 type 0
[Sun Aug 15 10:47:20 2021] sd 4:0:0:1: [sdc] Very big device. Trying to use READ CAPACITY(16).
[Sun Aug 15 10:47:20 2021] sd 4:0:0:3: Attached scsi generic sg4 type 0
[Sun Aug 15 10:47:20 2021] sd 4:0:0:0: [sdb] Write Protect is off
[Sun Aug 15 10:47:20 2021] sd 4:0:0:0: [sdb] Mode Sense: 28 00 00 00
[Sun Aug 15 10:47:20 2021] sd 4:0:0:1: [sdc] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
[Sun Aug 15 10:47:20 2021] sd 4:0:0:0: [sdb] No Caching mode page found
[Sun Aug 15 10:47:20 2021] sd 4:0:0:0: [sdb] Assuming drive cache: write through
[Sun Aug 15 10:47:20 2021] sd 4:0:0:1: [sdc] Write Protect is off
[Sun Aug 15 10:47:20 2021] sd 4:0:0:1: [sdc] Mode Sense: 28 00 00 00
[Sun Aug 15 10:47:20 2021] sd 4:0:0:2: [sdd] Very big device. Trying to use READ CAPACITY(16).
[Sun Aug 15 10:47:20 2021] sd 4:0:0:1: [sdc] No Caching mode page found
[Sun Aug 15 10:47:20 2021] sd 4:0:0:1: [sdc] Assuming drive cache: write through
[Sun Aug 15 10:47:20 2021] sd 4:0:0:3: [sde] Very big device. Trying to use READ CAPACITY(16).
[Sun Aug 15 10:47:20 2021] sd 4:0:0:3: [sde] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
[Sun Aug 15 10:47:20 2021] sd 4:0:0:3: [sde] Write Protect is off
[Sun Aug 15 10:47:20 2021] sd 4:0:0:3: [sde] Mode Sense: 28 00 00 00
[Sun Aug 15 10:47:20 2021] sd 4:0:0:3: [sde] No Caching mode page found
[Sun Aug 15 10:47:20 2021] sd 4:0:0:3: [sde] Assuming drive cache: write through
[Sun Aug 15 10:47:20 2021] sd 4:0:0:2: [sdd] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
[Sun Aug 15 10:47:20 2021] sd 4:0:0:2: [sdd] Write Protect is off
[Sun Aug 15 10:47:20 2021] sd 4:0:0:2: [sdd] Mode Sense: 28 00 00 00
[Sun Aug 15 10:47:20 2021] sd 4:0:0:2: [sdd] No Caching mode page found
[Sun Aug 15 10:47:20 2021] sd 4:0:0:2: [sdd] Assuming drive cache: write through
[Sun Aug 15 10:47:22 2021] sdc: sdc1
[Sun Aug 15 10:47:22 2021] sde: sde1
[Sun Aug 15 10:47:22 2021] sdb: sdb1
[Sun Aug 15 10:47:22 2021] sdd: sdd1
[Sun Aug 15 10:47:22 2021] sd 4:0:0:1: [sdc] Attached SCSI disk
[Sun Aug 15 10:47:22 2021] sd 4:0:0:3: [sde] Attached SCSI disk
[Sun Aug 15 10:47:22 2021] sd 4:0:0:0: [sdb] Attached SCSI disk
[Sun Aug 15 10:47:22 2021] sd 4:0:0:2: [sdd] Attached SCSI disk
[Sun Aug 15 11:00:35 2021] usb 2-2: USB disconnect, device number 10

[Sun Aug 15 11:00:35 2021] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: ubuntu-release-upgrader-core 1:20.04.36
ProcVersionSignature: Ubuntu 5.11.0-25.27~20.04.1-generic 5.11.22
Uname: Linux 5.11.0-25-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27.18
Architecture: amd64
CasperMD5CheckResult: skip
CrashDB: ubuntu
CurrentDesktop: KDE
Date: Sun Aug 15 11:30:57 2021
InstallationDate: Installed on 2021-03-26 (142 days ago)
InstallationMedia: Kubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1)
PackageArchitecture: all
ProcEnviron:
 LANGUAGE=en_CA:en
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
SourcePackage: ubuntu-release-upgrader
Symptom: ubuntu-release-upgrader
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

After upgrading to the 4.20 Kernel(was using 4.19 previously) my usb wifi stick doesn´t work until I reboot the system. This issue happens every time I start my pc(only when the system was shut down, it doesn´t happen after rebooting). The wifi driver in use is rt2800usb. I tried restarting the NetworkManager, but this didn´t change anything.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Hmm, that's strange perhaps this is some USB host problem. Please provide dmesg of your system.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 281677
dmesg output before reboot

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 281679
dmesg output after reboot

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

We have this xhci_hcd warning on bad case:

 xhci_hcd 0000:15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state

Not sure where it come from. But I notice you are using AMD IOMMU which we have troubles with with different drivers.

You could try to disable iommu via kerenl boot parameter and check if that improve things. You could also try test this patch if possible:
https://bugzilla.kernel.org/attachment.cgi?id=281675

If none of that helps I will prepare some rt2800 patches to see if this not caused by some of v4.19 .. v4.20 rt2800 commits:

0240564430c0 rt2800: flush and txstatus rework for rt2800mmio
adf26a356f13 rt2x00: use different txstatus timeouts when flushing
5022efb50f62 rt2x00: do not check for txstatus timeout every time on tasklet
0b0d556e0ebb rt2800mmio: use txdone/txstatus routines from lib
5c656c71b1bf rt2800: move usb specific txdone/txstatus routines to rt2800lib
f483039cf51a rt2x00: use simple_read_from_buffer()

But I would rather suspect problem introduced in AMD IOMMU or usb/xhci drivers.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I tried disabling iommu, and I also compiled the 4.20.15 kernel from source with that patch applied, but the wifi didn´t work in both cases either.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 281711
rt2x00_revert_4.20_changes.patch

Please test this patch and report if it makes problem gone or not.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

The problem is still there after applying that patch.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

You need to report this bug usb maintainers. I'm changing the topic and component, but USB bugs should be reported directly to mailing list.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Please send bug report to <email address hidden>

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

I can confirm this issue. Also I can confirm that other USB devices are effected, too (mostly if plugged into an USB3 port).
For example:
ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

dmesg doesn't show IOMMU warnings, so I assume it is a problem introduced in usb/xhci driver.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

(In reply to Michael from comment #10)
> I can confirm this issue. Also I can confirm that other USB devices are
> effected, too (mostly if plugged into an USB3 port).
> For example:
> ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
>
> dmesg doesn't show IOMMU warnings, so I assume it is a problem introduced in
> usb/xhci driver.

I think this affects only a specific hardware configuration(I've tried using my wifi stick on a different machine and it worked without problems).
Which hardware are you using? Maybe there are some parts we have in common.

My hardware configuration:
CPU: AMD Ryzen 3 2200G, Motherboard: MSI B350 PC MATE
GPU: AMD Radeon RX 580 8GB

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Bernhard
The parts we have in common : AMD RYZEN

AMD RYZEN 1700 MSI X370 KRAIT, MSI AERO GTX1080Ti, 5.0.6-arch1-1-ARCH (system was also affected by IOMMU issue - but that is fixed)

Affected USB WiFi devices (tested):
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter (ALFA AWUS036NH - rt2800usb)
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter (ipTime/ zioncom - rt2800usb)
ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
ID 7392:a812 Edimax Technology Co., Ltd (Edimax EW-7811USC - rtl88xxau)
ID 148f:761a Ralink Technology, Corp. MT7610U ("Archer T2U" 2.4G+5G WLAN Adapter - mt76x0)
ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]
ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)
I'm sure there are more.

After he has fixed some driver / IOMMU issues, Stanislaw has found out, that it possibly could be a xhci/driver issue. I share his opinion.

You can read more about the issues here:
https://github.com/ZerBea/hcxdumptool/issues/42
and the fixed IOMMU issue here:
https://bugzilla.kernel.org/show_bug.cgi?id=202241

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

FTR: I think those two commits could help:

commit 6cbcf596934c8e16d6288c7cc62dfb7ad8eadf15
Author: Mathias Nyman <email address hidden>
Date: Fri Mar 22 17:50:15 2019 +0200

    xhci: Fix port resume done detection for SS ports with LPM enabled

commit d92f2c59cc2cbca6bfb2cc54882b58ba76b15fd4
Author: Mathias Nyman <email address hidden>
Date: Fri Mar 22 17:50:17 2019 +0200

    xhci: Don't let USB3 ports stuck in polling state prevent suspend

Also I'm not sure if if issue was reported to proper maintainer. If not and problem is not already fixed on latest upstream, either bisection will be needed to precede with this bug or fill properly informative bug report to proper maintainer.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Stanislaw, thanks for additional information.

@ Bernhard, ‎have you already sent this bug report to linux-usb mailing list?

can we change affected kernel version from 4.20 to >= 4.20, because 5.0.6 is affected, too?

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Yes, I already sent this to the mailing list, but I got no response unfortunately.

I've changed the affected kernel version btw.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Bernhard, thanks for your answer. So there is no need for me to report this issue, too.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I just tried the two patches Stanislaw mentioned, but the problem is still there.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Tried them, too, some days ago, but the didn't solve the issue.
Just downloaded 5.1rc3, but I don't expect a working driver (usb/host), inside.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Tested an ASUS X555U (Intel i5-6200 - 5.0.6-arch1-1-ARCH) and that system is affected, if the device is plugged into one of the USB3 ports. The device is working, if plugged into the USB2 port.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I just tried replacing the xhci_ring.c file with the version from the 4.19 kernel, that solved the problem. Then I started patching the code until the problem occurs again.
The change in the function "static int process_bulk_intr_td" is causing the problem, it's part of this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/usb/host/xhci-ring.c?id=9703fc8caf36ac65dca1538b23dd137de0b53233

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Berna(In reply to Bernhard from comment #20)
> I just tried replacing the xhci_ring.c file with the version from the 4.19
> kernel, that solved the problem. Then I started patching the code until the
> problem occurs again.
> The change in the function "static int process_bulk_intr_td" is causing the
> problem, it's part of this patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/
> drivers/usb/host/xhci-ring.c?id=9703fc8caf36ac65dca1538b23dd137de0b53233

Good findings, great. This seems to be part of

commit f8f80be501aa2f10669585c3e328fad079d8cb3a
Author: Mathias Nyman <email address hidden>
Date: Thu Sep 20 19:13:37 2018 +0300

    xhci: Use soft retry to recover faster from transaction errors

Just add information you found in the posted linux-usb email and CC "Mathias Nyman <email address hidden>" to make sure he is aware of the problem.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

The issue isn't fixed in 5.1rc3, so it look's like Mathias Nyman isn't aware of the problem, yet.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Still present in 5.1.2

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

This issue is really funny:
running
D 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]

on kernel
$ uname -r
5.1.7-arch1-1-ARCH

will spam the log after the know WARN
43163.034783] mt76x0u 1-10.2:1.0 wlp3s0f0u10u2: renamed from wlan0
[43163.351656] usb 1-10.2: USB disconnect, device number 6
[43163.352176] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

with tons of failed vendor requests:
[43160.683383] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3dc failed:-71
[43160.813398] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e0 failed:-71
[43160.943415] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e4 failed:-71
[43161.073440] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e8 failed:-71
[43161.203439] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3ec failed:-71
[43161.333458] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f0 failed:-71
[43161.463468] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f4 failed:-71
[43161.593561] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f8 failed:-71
[43161.723502] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3fc failed:-71
[43161.853512] mt76x0u 1-10.2:1.0: vendor request req:06 off:108c failed:-71
....

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

If the same device is connected to an Intel Core I5-6200 system (USB3 port), the log looks different to the AMD RYZEN system.

[ 204.231872] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.231901] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.231940] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.231980] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232020] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232188] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232226] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232275] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232304] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232345] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.233284] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[ 204.233291] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
[ 204.263427] mt76x0u 1-1:1.0: TX DMA did not stop
[ 207.596726] mt76x0u 1-1:1.0: Warning: MAC TX did not stop!
[ 209.650050] mt76x0u 1-1:1.0: Warning: MAC RX did not stop!
[ 209.651133] mt76x0u 1-1:1.0: RX DMA did not stop

Also I noticed some changes in xhci-ring.c between 5.1.7 and 5.2_rc4. Maybe they'll fix the problem. I didn't tested it, yet.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I already tried the 5.2-rc3 kernel and the problem isn't fixed yet. There were no changes in the xhci driver between rc3 and rc4, so it's very unlikely that the problem doesn't occur in the 5.2-rc4 kernel.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Thanks for the information. I skipped 5.2rc1 ... rc3.

But with your information, there is no real need for me to run some more tests.

Unfortunately it looks like the issue is back ported to older kernel versions (4.19), because I got some issue reports here, too:
https://github.com/ZerBea/hcxdumptool/issues/57

and 90% of my devices doesn't work any longer.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

When did it get back ported? I'm on 4.19.48 and haven't had a problem with this version...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

It's just a guess, because of this post:
https://github.com/ZerBea/hcxdumptool/issues/57#issuecomment-483964293

But it looks like the device was working before that post.
I cant test it, because I have not such a device.

I tested a TP-LINK Archer T2UH and this device is not working on 4.19.46 arm (Raspberry Pi).

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Yes, rt2800usb is working fine on 4.19.46.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

hcxdumptool running on kernel 4.19.46 arm doesn't receive packets on several different devices. In this case:
ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]
INFO: cha=1, rx=0, rx(dropped)=0, tx=18, err=0, aps=0 (0 in range)

while a few other devices still working
INFO: cha=1, rx=805, rx(dropped)=0, tx=93, err=0, aps=29 (21 in range)

BTW:
I'm running/testing only devices on which driver support monitor mode and packet injection.

Very interesting on that arm kernel is that dmesg doesn't show any WARNs.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Still no fix?
$ uname -r
5.1.11-arch1-1-ARCH

and most of the USB devices WiFI, BLUETOOTH,....) are still not working:
32942.700591] usb 1-10.4: new full-speed USB device number 7 using xhci_hcd
[32944.721410] usb 1-10.4: New USB device found, idVendor=0a12, idProduct=0001, bcdDevice=52.76
[32944.721412] usb 1-10.4: New USB device strings: Mfr=0, Product=2, SerialNumber=0
[32945.069015] Bluetooth: hci0: hardware error 0x37

How about kernel 5.2?

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Some USB card readers are also affected (connected to USB 3 port):

$ uname -r
5.1.12-arch1-1-ARCH

[ 3510.100114] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[ 3510.134121] usb 2-2: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 0.02
[ 3510.134126] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 3510.134128] usb 2-2: Product: Intenso Ultra Line
[ 3510.134130] usb 2-2: Manufacturer: ALCOR
...
[ 5129.997608] usb 1-1: reset high-speed USB device number 7 using xhci_hcd
[ 5130.218618] sd 9:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 5130.218631] sd 9:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 20 c3 c0 00 00 20 00
[ 5130.218637] print_req_error: I/O error, dev sdb, sector 2147264 flags 80700

I really wonder why that issue hasn't been fixed, yet, because many, many devices are affected.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

The list of changes for 5.2-rc6 contains this two commits:

Mathias Nyman (2):
      usb: xhci: Don't try to recover an endpoint if port is in error state.
      xhci: detect USB 3.2 capable host controllers correctly

I think this could be the fix for this issue.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Great, thanks for the information. The issue is really ugly, because many USB devices are affected (hdd, card reader, bleutooth, wlan, ... - this list is long)
I'll check 5.2-rc6.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Just tried 5.2-rc6, but unfortunately I still have the same issue.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Thanks for the information. I tested 5.2-rc6, too. Even an USB 3.0 HDD isn't working.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Now running mainline kernel 5.2 and the issue still exists.
Tested on this device:
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
but the same applies to many other devices, too

dmesg after plug in the device:

[75.482165] usb 1-2: new high-speed USB device number 6 using xhci_hcd
[75.639236] usb 1-2: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[75.639238] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[75.639239] usb 1-2: Product: 802.11 n WLAN
[75.639240] usb 1-2: Manufacturer: Ralink
[75.639241] usb 1-2: SerialNumber: 1.0
[75.952611] usb 1-2: reset high-speed USB device number 6 using xhci_hcd
[76.107232] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[76.120228] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 0005 detected
[76.121079] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[76.130873] usbcore: registered new interface driver rt2800usb
[76.194447] audit: type=1130 audit(1562833499.983:49): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76.195313] rt2800usb 1-2:1.0 wlp0s20f0u2: renamed from wlan0
[76.216178] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[76.241382] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[76.544022] ieee80211 phy1: rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x0404 with error -71
[77.562305] ieee80211 phy1: rt2800_wait_csr_ready: Error - Unstable hardware
[77.562316] ieee80211 phy1: rt2800usb_set_device_state: Error - Device failed to enter state 4 (-5)
...
followed by this message on access to the interface:
[341.598563] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[341.598573] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.

134 comments hidden view all 214 comments
Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

After discussion on my posted patch here:

https://<email address hidden>/t/#u

it was concluded that this should be rather be xhci quirk instead of rt2800usb driver flag.

If change from comment 147 help for you with the problem, please provide PCI-id of your xHCI controller. This can be done by command:

lspci -k -nn | grep -B2 xhci

If you have more than one xHCI controller please assure you provide PCI-id's of one that actually has the problem ('lspci -t' command can be useful as well)

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Stanislaw Gruszka from comment #173)
> If you have more than one xHCI controller please assure you provide PCI-id's
> of one that actually has the problem ('lspci -t' command can be useful as
> well)

I meant 'lsusb -t'

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
Subsystem: ASMedia Technology Inc. Device [1b21:1142]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 295055
0001-usb-xhci-do-not-perform-Soft-Retry-for-some-xHCI-hos.patch

This is next proposed fix. It suppose to disable Soft Retry for affected xHCI controllers. Currently only for xHCI device reported by Michael:
PCI_VENDOR_ID_AMD = 0x1022 , PCI_DEVICE_ID_AMD_PROMONTORYA_4 = 0x43b9

If you want to test and have different xHCI host you need to add your PCI-id's to
drivers/usb/host/xhci-pci.c part of the patch.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Stanislaw, I followed the discussion you mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c173

Other devices than rt2800usb devices are affected, too.
Tested this one before applying your patch:
ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
and running into the same xhci issue on USB controller mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c175

[10214.423508] usb 1-2: new high-speed USB device number 3 using xhci_hcd
[10214.602833] usb 1-2: New USB device found, idVendor=7392, idProduct=7710, bcdDevice= 0.00
[10214.602838] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[10214.602841] usb 1-2: Product: Edimax Wi-Fi
[10214.602843] usb 1-2: Manufacturer: MediaTek
[10214.602845] usb 1-2: SerialNumber: 1.0
[10214.931553] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
[10215.102895] mt7601u 1-2:1.0: ASIC revision: 76010001 MAC revision: 76010500
[10215.132670] mt7601u 1-2:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[10216.101346] mt7601u 1-2:1.0: EEPROM ver:0d fae:00
[10216.111983] mt7601u 1-2:1.0: EEPROM country region 01 (channels 1-13)
[10217.189574] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[10217.190361] usbcore: registered new interface driver mt7601u
[10217.199429] mt7601u 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[10296.419053] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[10296.419228] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :

The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE 20):

Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:49:49 [kernel] [35029.419748] usb 1-6: USB disconnect, device number 3
Feb 03 09:49:52 [kernel] [35031.994403] usb 1-6: new full-speed USB device number 6 using xhci_hcd
Feb 03 09:50:45 [kernel] [35085.400634] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:50:45 [kernel] [35085.404278] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX
Feb 03 09:50:45 [kernel] [35085.404398] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 1
Feb 03 09:50:45 [kernel] [35085.404401] xhci_hcd 0000:01:00.0: Looking for event-dma 00000008146ff050 trb-start 00000008146ff060 trb-end 00000008146ff060 seg-start 00000008146ff000 seg-end 00000008146ffff0

$ lspci -k -nn | grep -B2 xhci
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
 Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142]
 Kernel driver in use: xhci_hcd
--
09:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1)
 Subsystem: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:139d]
 Kernel driver in use: xhci_hcd
--
0a:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f]
 Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:7914]
 Kernel driver in use: xhci_hcd

$ uname -a
Linux Gentoo 5.4.92-gentoo #1 SMP PREEMPT Thu Jan 28 20:45:52 MSK 2021 x86_64 AMD Ryzen 5 2600 Six-Core Processor AuthenticAMD GNU/Linux

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Michael from comment #177)
> Other devices than rt2800usb devices are affected, too.
> Tested this one before applying your patch:
> ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
> and running into the same xhci issue on USB controller mentioned here:
> https://bugzilla.kernel.org/show_bug.cgi?id=202541#c175

Ok, so it makes sense to disable Soft Retry per xHCI.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to alpir from comment #178)
> The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE
> 20):
>
> Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR
> Deq Ptr cmd failed due to incorrect slot or ep state.

alpir, does the change from comment 147 help for you ?

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

alpir, you have different device-id than Michael, but you both have the same subsytem device: ASMedia 1b21:1142. So perhaps patch should be based on subdevice id's. Let's wait for other users reports regarding xHCI controller, we will see then.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :
Download full text (9.5 KiB)

I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the same.

Why did I even start looking for the reason for the strange behavior of OSD ports: two my JetFlash Transcend 8GB flash drives connected to the USB3 port is sometimes not detected by the system as being mountable (fat32). When I run a disk check (8 Gb) with the command badblocks -nvs / dev / sdd, then after a while the check ends with the following error: Pass completed, 5662144 bad blocks found. (5662144/0/0 errors). And both flash drives.

But if you connect them to USB2, then there are no errors at all.

At the same time, when looking at the logs, I found errors: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Now, after patch, i get next in logs:

Feb 03 17:47:14 [kernel] [ 52.603587] usb 2-3: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:47:14 [kernel] [ 52.636130] usb-storage 2-3:1.0: USB Mass Storage device detected
Feb 03 17:47:14 [kernel] [ 52.636242] scsi host11: usb-storage 2-3:1.0
Feb 03 17:47:14 [kernel] [ 52.651996] usbcore: registered new interface driver uas
Feb 03 17:47:16 [kernel] [ 54.013780] scsi 11:0:0:0: Direct-Access JetFlash Transcend 8GB 1100 PQ: 0 ANSI: 6
Feb 03 17:47:16 [kernel] [ 54.014688] sd 11:0:0:0: [sdd] 15425536 512-byte logical blocks: (7.90 GB/7.36 GiB)
Feb 03 17:47:16 [kernel] [ 54.015150] sd 11:0:0:0: [sdd] Write Protect is off
Feb 03 17:47:16 [kernel] [ 54.015156] sd 11:0:0:0: [sdd] Mode Sense: 43 00 00 00
Feb 03 17:47:16 [kernel] [ 54.015625] sd 11:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 03 17:47:16 [kernel] [ 54.028165] sdd: sdd1
Feb 03 17:47:16 [kernel] [ 54.045687] sd 11:0:0:0: [sdd] Attached SCSI removable disk
Feb 03 17:48:04 [kernel] [ 102.221862] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:51:52 [kernel] [ 330.009696] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:55:55 [kernel] [ 573.644576] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:01 [kernel] [ 579.149875] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:01 [kernel] [ 579.254204] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:06 [kernel] [ 584.781836] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:07 [kernel] [ 585.073435] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:12 [kernel] [ 590.413816] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:12 [kernel] [ 590.518146] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:18 [kernel] [ 596.046034] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:18 [kernel] [ 596.336445] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:23 [kernel] [ 601.677932] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:23 [kernel] [ 601.782091] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:29 [kernel] [ 607.309722] usb 2-3: device descr...

Read more...

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

My controller has the PCI ID 43bb, so I've added "PCI_DEVICE_ID_AMD_PROMONTORYA_2" to the patch from #176, and that fixed the issue for me.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Stanislaw, I'm running an older mobo and a RYZEN 1700.
I don't need CPU power - GPU power is more important for me (crypto analysis).

Revision history for this message
In , biopsin (biopsin-linux-kernel-bugs) wrote :

[Continuing my first report in comment:https://bugzilla.kernel.org/show_bug.cgi?id=202541#c107]

$ lspci -k -nn | grep -B2 xhci
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
        Subsystem: ASMedia Technology Inc. Device [1b21:1142]
        Kernel driver in use: xhci_hcd

I have adapted the patch by Mr. Gruszka [https://bugzilla.kernel.org/show_bug.cgi?id=202541#c176] for my current system and needs

$ uname -a
Linux voidx 5.4.95_1 #1 SMP PREEMPT 1612063540 x86_64 GNU/Linux

If someone has some spare time to glance at it or comment on my error ;)
(diff availible for 30 days) @
https://p.teknik.io/lIBbA

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to alpir from comment #182)
> I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed
> due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the
> same.
[snip]
> But if you connect them to USB2, then there are no errors at all.

alpir, I think you experiencing different issue that can not be solved by simply disabling Soft Retry. Some more fixes are possibly needed for handing your xHCI/usb hardware. Maybe you can try patch from comment 139? If this is regression, maybe you can bisect to find offending commit? Anyway your problems, most likely will require expertise of Mathias Nyman - xhci driver maintainer.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to biopsin from comment #185)
> [Continuing my first report in
> comment:https://bugzilla.kernel.org/show_bug.cgi?id=202541#c107]

Similarly like for as for alpir case this most likely will require some different fixes, but you can try if disabling Soft Retry works. You can just disable like showed in comment 147

 > $ lspci -k -nn | grep -B2 xhci
> 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series
> Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
> Subsystem: ASMedia Technology Inc. Device [1b21:1142]
> Kernel driver in use: xhci_hcd
>
[snip]
> If someone has some spare time to glance at it or comment on my error ;)
> (diff availible for 30 days) @
> https://p.teknik.io/lIBbA

ASMedia is subsystem_{vendor,device) so most likely quirk flag is not set properly for you. You can print values by patch like this to see:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 906a0e08821e..0ec9c3637b7a 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -102,6 +102,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)

        id = pci_match_id(pdev->driver->id_table, pdev);

+ printk("vendor: 0x%04x device 0x%04x subvendor 0x%04x subdevice 0x%04x\n",
+ pdev->vendor, pdev->device, pdev->subsystem_vendor, pdev->subsystem_device);
+
        if (id && id->driver_data) {
                driver_data = (struct xhci_driver_data *)id->driver_data;
                xhci->quirks |= driver_data->quirks;

If indeed those are subsystem ID's I think there is bug in existing xhci-pci.c quirks code:

        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI)
                xhci->quirks |= XHCI_BROKEN_STREAMS;
        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI)
                xhci->quirks |= XHCI_TRUST_TX_LENGTH;
        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
            (pdev->device == PCI_DEVICE_ID_ASMEDIA_1142_XHCI ||
             pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI))
                xhci->quirks |= XHCI_NO_64BIT_SUPPORT

and those check should be replaced by pdev->subsystem_vendor and pdev->subsystem_device.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 295065
asmedia_subsytem_quirks.patch

This patch apply existing xhci ASMedia quirks also for ASMedia subdevices .

Looking into changelog history those quirks helped with some usb disk issues, so perhaps patch could help with disk issues reported here i.e. alpir and biopsin cases. Please test.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :

None of the patches (comments 139, 147, 188) did not solve my problem.

Revision history for this message
In , biopsin (biopsin-linux-kernel-bugs) wrote :

@Gruszka
Your patch [https://bugzilla.kernel.org/show_bug.cgi?id=202541#c188] makes very mutch sense, thank you.
I'm currently testing it with my setup and kernel 5.4.95_x86_64.
Tested against one PATA and one SATA drives, so far I see no ill effects, but I also can't confirm or deny it does anything with this short timespan, and much have change since my initial post last year. I will at least continuing applying it now and then out this year and report any newsworthy. Thank you for your time and help!

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :
Download full text (6.4 KiB)

Created attachment 295151
Dmesg of a Toshiba USB 3.0 HDD connected to USB 3.0 front port and back port.

I am having this error on Linux 5.10.10-051010 while trying to connect a USB 3.0 hard disk, Toshiba Touro 4TB (HitachiGST). If I connect the disk to a USB 2.0 port it works flawlessly.

The kernel shows a different kind of error depending on whether I connect the HDD to the front or back USB 3.0 ports of the motherboard MSI X470 Gaming Plus MAX.

lspci -vnnt:
> -[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) Root Complex [1022:1450]
> +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) I/O Memory Management Unit [1022:1451]
> +-01.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-01.1-[01]----00.0 Samsung Electronics Co Ltd NVMe SSD
> Controller SM981/PM981/PM983 [144d:a808]
> +-01.3-[03-26]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device
> [1022:43d0]
> | +-00.1 Advanced Micro Devices, Inc. [AMD] 400
> Series Chipset SATA Controller [1022:43c8]
> | \-00.2-[20-26]--+-00.0-[21]--
> | +-01.0-[22]----00.0 Realtek
> Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit
> Ethernet Controller [10ec:8168]
> | +-02.0-[23]--
> | +-03.0-[24]--
> | +-04.0-[25]--
> | \-08.0-[26]----00.0 ASMedia
> Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]
> +-02.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.1-[27]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df]
> | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
> +-04.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.1-[28]--+-00.0 Advanced Micro Devices, Inc. [AMD]
> Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
> | +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h
> (Models 00h-0fh) Platform Security Processor [1022:1456]
> | \-00.3 Advanced Micro Devices, Inc. [AMD] Zeppelin
> USB 3.0 Host controller [1022:145f]
> +-08.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-08.1-[29]--+-00.0 Advance...

Read more...

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :

Created attachment 295183
Dmesg of a OnePlus 7 Pro connecting in USB 3.1 gen1 mode. No errors.

(In reply to raul from comment #191)
Connecting a Oneplus 7 Pro smartphone does show any error. This phone has a USB 3.1 gen1 port and connects in that mode without errors. I can navigate the filesystem as one would expect.

Revision history for this message
In , tisaak (tisaak-linux-kernel-bugs) wrote :

Same issue with a Seagate Portable 4 TB USB 3.0 drive that I connect with usb-storage quirks as its UAS implementation is problematic. Random hangs that flood dmesg with errors.

lsusb -tv
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 3: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        ID 0bc2:231a Seagate RSS LLC Expansion Portable

Errors in dmesg start like this...

xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
usb 3-3: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
sd 5:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
sd 5:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 a4 01 ed 78 00 00 00 10 00 00

After that:

task:usb-storage state:D stack: 0 pid: 286 ppid: 2 flags:0x00004000
Call Trace:
  __schedule+0x282/0x870
  ? usleep_range+0x80/0x80
  schedule+0x46/0xb0
  schedule_timeout+0xff/0x140
  ? __prepare_to_swait+0x4b/0x70
  __wait_for_common+0xae/0x160
  usb_sg_wait+0xe0/0x1a0 [usbcore]
  usb_stor_bulk_transfer_sglist.part.0+0x64/0xb0 [usb_storage]
  usb_stor_Bulk_transport+0x188/0x410 [usb_storage]
  usb_stor_invoke_transport+0x3a/0x520 [usb_storage]
  ? __prepare_to_swait+0x4b/0x70
  ? __wait_for_common+0xed/0x160
  usb_stor_control_thread+0x185/0x280 [usb_storage]
  ? storage_probe+0x2a0/0x2a0 [usb_storage]
  kthread+0x11b/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

(In reply to Zak from comment #193)
>
>
> Errors in dmesg start like this...
>
> xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
> xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.

There are recent major changes in this area in the xhci driver.
The above message no longer exists, new message in this case is
"Set TR Deq already pending, don't submit for x"

Can you try this on a 5.12-rc kernel?

Thanks
Mathias

Revision history for this message
In , mlkcampion (mlkcampion-linux-kernel-bugs) wrote :

Created attachment 296259
xhci no soft retry for Intel xhci 8086:06ed and 8086:31a8

Hi

I am having this issue on 2 systems when I plug in
a Hoco Hub HB16. The Hoco Hub HB16 is a 6 in 1 adapter that
includes
Type-C to USB3.0 x3
Type-C to HDMI
Type-C to RJ45 Ethernet (RealTek RTL8153, linux loads driver rtl8153b-2)
Type-C to Type-C(PD2.0)
USB Billboard device

Also when the device is plugged into a Windows10 machine
for the first time it presents a disk that contains the RTL8153
drivers, the user is provided with an option to install these. This
"disk" is not visible later.

The 2 systems where this device failed both reported
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
Both systems have Ubuntu Mate 20.10

$ uname -a
5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

1. Dell XPS 9500 (Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz)
$ sudo lspci -k -nn | grep -B2 xhci
    00:14.0 USB controller [0c03]: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller [8086:06ed]
 Subsystem: Dell Comet Lake USB 3.1 xHCI Host Controller [1028:097d]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci
--
    7:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
 Subsystem: Dell JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [1028:097d]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci

2. Seed Studio Odyssey J4105 (Intel(R) Celeron(R) J4105 CPU @ 1.50GHz)
$ sudo lspci -k -nn | grep -B3 xhci
    00:15.0 USB controller [0c03]: Intel Corporation Device [8086:31a8] (rev 03)
 DeviceName: Onboard - Other
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci

I applied the changes in Stanislaw's patch at comment 176, I added the
PCI IDs to match both my systems.

I can confirm that with the patch applied both systems no longer reported the
issue ""WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."

Just to note that on the Dell XPS I use the Dell DA20 Adapter which is a Type-C
to USB and HDMI adapter. This appears to have an ASIX Elec. Corp. AX88179
USB 3.0 to Gigabit Ethernet which I don't have any issues with.

Revision history for this message
In , luke-jr+linuxbugs (luke-jr+linuxbugs-linux-kernel-bugs) wrote :

Encountered this with a PCI-e card using ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

Moved to my native "Intel Corporation Device a3af" USB bus, this error disappeared (though other problems remain in my case)

Linux 5.10.33

Of potential noteworthiness: When I got my Talos II, I tried to move this ASMedia USB PCI-e card to it, and found it was immediately shutdown by the IOMMU whenever I would try to use it at all. It seems the firmware is garbage.

IIRC, someone was getting close to an open source firmware replacement without those issues... would be interesting to see if it helps with this bug as well.

Revision history for this message
In , dront78 (dront78-linux-kernel-bugs) wrote :
Download full text (16.3 KiB)

same problem
5.12.12-arch1-1 #1 SMP PREEMPT Fri, 18 Jun 2021 21:59:22 +0000 x86_64 GNU/Linux

GPD Pocket

00:00.0 Host bridge [0600]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series SoC Transaction Register [8086:2280] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: iosf_mbi_pci
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller [8086:22b0] (rev 34)
 DeviceName: Onboard IGD
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: i915
 Kernel modules: i915
00:0b.0 Signal processing controller [1180]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series Power Management Controller [8086:22dc] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: proc_thermal
 Kernel modules: processor_thermal_device
00:14.0 USB controller [0c03]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series USB xHCI Controller [8086:22b5] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci
00:1a.0 Encryption controller [1080]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series Trusted Execution Engine [8086:2298] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel modules: mei_txe
00:1c.0 PCI bridge [0604]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series PCI Express Port #1 [8086:22c8] (rev 34)
 Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series PCU [8086:229c] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel modules: lpc_ich
01:00.0 Network controller [0280]: Broadcom Inc. and subsidiaries BCM4356 802.11ac Wireless Network Adapter [14e4:43ec] (rev 02)
 Subsystem: Gemtek Technology Co., Ltd Device [17f9:0036]
 Kernel driver in use: brcmfmac
 Kernel modules: brcmfmac

# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x5B8DE000.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
 Vendor: American Megatrends Inc.
 Version: 5.11
 Release Date: 06/28/2017
 Address: 0xF0000
 Runtime Size: 64 kB
 ROM Size: 4 MB
 Characteristics:
  PCI is supported
  BIOS is upgradeable
  BIOS shadowing is allowed
  Boot from CD is supported
  Selectable boot is supported
  BIOS ROM is socketed
  EDD is supported
  5.25"/1.2 MB floppy services are supported (int 13h)
  3.5"/720 kB floppy services are supported (int 13h)
  3.5"/2.88 MB floppy services are supported (int 13h)
  Print screen service is supported (int 5h)
  Serial services are supported (int 14h)
  Printer services are supported (int 17h)
  ACPI is supported
  USB legacy is supported
  BIOS boot specification is supported
  Targeted content distribution is supported
  UEFI is supported
 BIOS Revision: 5.11

Handle 0x0001, DMI type 1, 27 bytes
System Information
 Manufacturer: Default string
 Product Name: Default string
 Version: Default string
 Serial Number: Default string
 UUID: 03000200-0400-0500-0006-000700080009
 Wake-up ...

197 comments hidden view all 214 comments
Revision history for this message
M K S (muhkamsad) wrote :
description: updated
M K S (muhkamsad)
description: updated
Changed in ubuntu-release-upgrader:
importance: Unknown → High
status: Unknown → Confirmed
198 comments hidden view all 214 comments
Revision history for this message
In , antdev66 (antdev66-linux-kernel-bugs) wrote :

I have same problem with kernels 5.13.12 and 5.14.0-rc7:

dmesg:
xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

journalctl:
ago 24 18:38:40 SERVER kernel: sd 4:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu-release-upgrader (Ubuntu):
status: New → Confirmed
Revision history for this message
Sertac TULLUK (stulluk) wrote :

I also experience exactly same issue on multiple USB devices ( USB-WIFI or a USB-Webcam ) only on my brand new AMD Mainboard ( ASRock model: B550M-HDV)

I tried both focal and hirsute with latest kernels on my OldPC (ASUSTeK model: M5A78L-M LX3) and on my IntelNUC (NUC8BEB) and this issue does not happen (Tried with same USB-WIFI and USB-Webcam devices).

Issue is easily reproducible by inserting USB-WIFI and then executing "ip a" on a shell.

Revision history for this message
In , stulluk (stulluk-linux-kernel-bugs) wrote :

I also experience exactly same issue on multiple USB devices ( USB-WIFI or a USB-Webcam ) only on my brand new AMD Mainboard ( ASRock model: B550M-HDV)

I tried both ubuntu focal and hirsute with latest kernels on my OldPC (ASUSTeK model: M5A78L-M LX3) and on my IntelNUC (NUC8BEB) and this issue does not happen (Tried with same USB-WIFI and USB-Webcam devices).

Issue is easily reproducible by inserting USB-WIFI and then executing "ip a" on a shell.

Revision history for this message
In , dion (dion-linux-kernel-bugs) wrote :
Download full text (3.6 KiB)

I also have exactly same problem, but with a bit different HW.

Now it's USB DAC branded as "Qudelix-5K". As far as I understand it's USB1 device.

[ 174.358189] usb 5-2.3.2.2.1.1: new full-speed USB device number 17 using xhci_hcd
[ 174.475229] usb 5-2.3.2.2.1.1: New USB device found, idVendor=0a12, idProduct=4025, bcdDevice=19.70
[ 174.475232] usb 5-2.3.2.2.1.1: New USB device strings: Mfr=1, Product=8, SerialNumber=3
[ 174.475233] usb 5-2.3.2.2.1.1: Product: Qudelix-5K USB DAC/MIC 48KHz
[ 174.475234] usb 5-2.3.2.2.1.1: Manufacturer: QTIL
[ 174.475235] usb 5-2.3.2.2.1.1: SerialNumber: ABCDEF0123456789

It produces corrupted sound (actually some noise) just after a few seconds of playback if connected to Dell WD19TB thunderbolt dock station. Issue happens with USB-A ports on dock plus one Type-C port (front). Second Type-C port (named as "Type-C with Thunderbolt 3 port" works.

When such noise happens I'm getting followed in dmesg:

xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe940f0 trb-start 00000000ffe94100 trb-end 00000000ffe94100 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0
xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe949b0 trb-start 00000000ffe949c0 trb-end 00000000ffe949c0 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0

I've tried to add/remove extra USB hubs (originally Qudelix was plugged to internal USB3 hub of monitor). But even if plugged directly to dock, it produces corrupted sound.

Another important thing: this dock has built-in Ethernet with r8153 chipset like mentioned above.

After reading comments here I've tried to disable soft retry using followed patch:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 1c9a7957c45c..07cbcf50160c 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -189,10 +189,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)

        if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
                xhci->quirks |= XHCI_LPM_SUPPORT;
                xhci->quirks |= XHCI_INTEL_HOST;
                xhci->quirks |= XHCI_AVOID_BEI;
+ xhci->quirks |= XHCI_NO_SOFT_RETRY;
        }
        if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
                        pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI) {
                xhci->quirks |= XHCI_EP_LIMIT_QUIRK;
                xhci->limit_active_eps = 64;

And it completely fixed issue for me. DAC produces clear sound even if connected through chain of two hubs!

PS.
lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation Comet Lake PCH-LP USB 3.1 xHCI Host Controller [8086:02ed]
        Subsystem: Hewlett-Packard Company Comet Lake PCH-LP USB 3.1 xHCI Host Controller [103c:8724]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
--
37:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
        Subsystem: Hewlett-P...

Read more...

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :

Turns out the problem was the cable, it was too long. A shorter USB 3.0 cable (1.8m) allowed a stable connection. On the same Linux 5.13 (the previous dmesg was on Linux 5.10) the longer 3 meters cable kept failing while with the 1.8 meters cable the HDD works without issue.

(In reply to raul from comment #191)

Revision history for this message
In , S.Braendlin (s.braendlin-linux-kernel-bugs) wrote :

Hi,
I have also issues with USB3 on my Debian 10 with kernel 5.10.0-0.bpo.5-amd64 which is not appearing when using USB2 port:

Aug 6 13:20:14 media-server kernel: [ 964.069355] scsi host17: uas_eh_device_reset_handler start
Aug 6 13:20:14 media-server kernel: [ 964.197532] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Aug 6 13:20:14 media-server kernel: [ 964.219053] scsi host17: uas_eh_device_reset_handler success
Aug 6 13:20:18 media-server kernel: [ 968.137601] task:sync state:D stack: 0 pid:12237 ppid: 11291 flags:0x00004324
Aug 6 13:20:18 media-server kernel: [ 968.137607] Call Trace:
Aug 6 13:20:18 media-server kernel: [ 968.137621] __schedule+0x2be/0x770
Aug 6 13:20:18 media-server kernel: [ 968.137630] schedule+0x3c/0xa0
Aug 6 13:20:18 media-server kernel: [ 968.137635] io_schedule+0x12/0x40
Aug 6 13:20:18 media-server kernel: [ 968.137644] wait_on_page_bit+0x127/0x230
Aug 6 13:20:18 media-server kernel: [ 968.137651] ? __page_cache_alloc+0x80/0x80
Aug 6 13:20:18 media-server kernel: [ 968.137657] wait_on_page_writeback+0x25/0x70
Aug 6 13:20:18 media-server kernel: [ 968.137663] __filemap_fdatawait_range+0x89/0xf0
Aug 6 13:20:18 media-server kernel: [ 968.137673] ? sync_inodes_one_sb+0x20/0x20
Aug 6 13:20:18 media-server kernel: [ 968.137679] filemap_fdatawait_keep_errors+0x1a/0x40
Aug 6 13:20:18 media-server kernel: [ 968.137684] iterate_bdevs+0xad/0x150
Aug 6 13:20:18 media-server kernel: [ 968.137691] ksys_sync+0x7c/0xb0
Aug 6 13:20:18 media-server kernel: [ 968.137697] __do_sys_sync+0xa/0x10
Aug 6 13:20:18 media-server kernel: [ 968.137704] do_syscall_64+0x33/0x80
Aug 6 13:20:18 media-server kernel: [ 968.137709] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 6 13:20:18 media-server kernel: [ 968.137714] RIP: 0033:0x7fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137717] RSP: 002b:00007ffcddf49048 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
Aug 6 13:20:18 media-server kernel: [ 968.137723] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137725] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000a8002000
Aug 6 13:20:18 media-server kernel: [ 968.137728] RBP: 0000000000000000 R08: 0000555ba9703dcf R09: 00007ffcddf4afe2
Aug 6 13:20:18 media-server kernel: [ 968.137730] R10: 00007fc4ec01a201 R11: 0000000000000246 R12: 0000000000000001
Aug 6 13:20:18 media-server kernel: [ 968.137733] R13: 0000000000000001 R14: 00007ffcddf49158 R15: 0000000000000000

Nick Rosbrook (enr0n)
affects: ubuntu-release-upgrader → linux
Changed in ubuntu-release-upgrader (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
In , pupilla (pupilla-linux-kernel-bugs) wrote :
Download full text (45.7 KiB)

Hello everyone,

I encountered the problem with kernel 6.0.0-rc3 on a lenovo t470 laptop and a usb3 axis card. The system was started with the parameter intel_idle.max_cstate=1 and this appears to affect the possibility of the bug appearing. I have now rebooted the system without this parameter.

I have another similar setup (same laptop and same usb3 network card, but with linux 6.0.0-rc2) that has been active for 8 days started without the parameter intel_idle.max_cstate=1 and the problem has not occurred to date.

The distribution is Slackware 15 (64 bit).

This is the full output of dmesg.

Any feedback is welcome.

Marco

[ 0.000000] Linux version 6.0.0-rc3 (root@Cherepakha) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP PREEMPT_DYNAMIC Tue Aug 30 16:07:18 CEST 2022
[ 0.000000] Command line: auto BOOT_IMAGE=Linux ro root=10303 intel_idle.max_cstate=1
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
[ 0.000000] signal: max sigframe size: 1616
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000040000000-0x00000000403fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000040400000-0x000000008b79bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000008b79c000-0x0000000090652fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000090653000-0x0000000090653fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000090654000-0x000000009b52cfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000009b52d000-0x000000009b599fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000009b59a000-0x000000009b5fefff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000009b5ff000-0x000000009f7fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f3ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed84000-0x00000000fed84fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00...

Revision history for this message
In , pupilla (pupilla-linux-kernel-bugs) wrote :

Hello everyone,

unfortunately it happened again (system started without parameters):

[ 9.561808] br0: port 2(eth1) entered forwarding state
[95735.974041] usb 2-1: USB disconnect, device number 2
[95735.974215] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974439] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974471] ax88179_178a 2-1:1.0 eth1: unregister 'ax88179_178a' usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet
[95735.974523] ax88179_178a 2-1:1.0 eth1: Failed to read reg index 0x0002: -19
[95735.974532] ax88179_178a 2-1:1.0 eth1: Failed to write reg index 0x0002: -19
[95735.974595] br0: port 2(eth1) entered disabled state
[95735.974783] device eth1 left promiscuous mode
[95735.974790] br0: port 2(eth1) entered disabled state
[95735.992489] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95735.992503] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0001: -19
[95735.992510] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95736.215301] usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd
[95736.566562] ax88179_178a 2-1:1.0 eth1: register 'ax88179_178a' at usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet, 00:0e:c6:81:79:01

Marco

Revision history for this message
In , ske5074 (ske5074-linux-kernel-bugs) wrote :
Download full text (9.6 KiB)

I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet adapters. Each one is stable in a lower USB slot. Swapping the adapters does not change the behavior and only impacts the USB device in the higher slot. Changes to different ports without change.

Easily reproducible with the following commands. Basically I'm trying to plumb bond0 again, which works initially, I get the xhci_hcd warning, and the link is down again. System details are also below.

root@higgins:~# dmesg -C ; ifup -a ; ip link | grep enx ; \
> dmesg -H ; dmesg -C ; sleep 70 ; \
> ip link | grep enx ; dmesg -H
3: enxd03745be5afc: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000
16: enx54af9786ab11: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000

[Sep 3 11:05] device enx54af9786ab11 entered promiscuous mode
[ +0.001236] bond0: (slave enx54af9786ab11): Enslaving as a backup interface with a down link
[ +0.006363] vmbr0: the hash_elasticity option has been deprecated and is always 16
[ +0.013972] r8152 2-4:1.0 enx54af9786ab11: Promiscuous mode enabled
[ +0.001344] r8152 2-4:1.0 enx54af9786ab11: carrier on

3: enxd03745be5afc: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000
17: enx54af9786ab11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

[Sep 3 11:05] bond0: (slave enx54af9786ab11): link status definitely up, 1000 Mbps full duplex
[Sep 3 11:06] usb 2-4: USB disconnect, device number 12
[ +0.001544] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ +0.001435] bond0: (slave enx54af9786ab11): Releasing backup interface
[ +0.029081] device enx54af9786ab11 left promiscuous mode
[ +0.316190] usb 2-4: new SuperSpeed USB device number 13 using xhci_hcd
[ +0.022053] usb 2-4: New USB device found, idVendor=2357, idProduct=0601, bcdDevice=30.00
[ +0.001297] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[ +0.001337] usb 2-4: Product: USB 10/100/1000 LAN
[ +0.001261] usb 2-4: Manufacturer: TP-Link
[ +0.001208] usb 2-4: SerialNumber: 000001
[ +0.137200] usb 2-4: reset SuperSpeed USB device number 13 using xhci_hcd
[ +0.049197] r8152 2-4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[ +0.030905] r8152 2-4:1.0 eth0: v1.12.12
[ +0.007834] r8152 2-4:1.0 enx54af9786ab11: renamed from eth0
root@higgins:~#

-------
System Details
-------

root@higgins:~# uname -a
Linux higgins 5.15.39-4-pve #1 SMP PVE 5.15.39-4 (Mon, 08 Aug 2022 15:11:15 +0200) x86_64 GNU/Linux

root@higgins:~# lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
        Subsystem: Lenovo 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [17aa:310b]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci

root@higgins:~# lsusb -tv
/: Bus 02.Port 1: D...

Read more...

Revision history for this message
In , ske5074 (ske5074-linux-kernel-bugs) wrote :

(In reply to Sean Kennedy from comment #205)
> I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo
> M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet
> adapters. Each one is stable in a lower USB slot. Swapping the adapters does
> not change the behavior and only impacts the USB device in the higher slot.
> Changes to different ports without change.

Update - Tried a different dongle - a 2.5Gbe and have two hard drives attached to the system. Doesn't matter where the 2.5Gbe dongle is attached, it eventually errors with "WARN Set TR Deq Ptr cmd failed" And the error rate is only around six times a day right now:

8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G LAN

# dmesg -T | grep xhci
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810
[Tue Sep 6 13:37:13 2022] usb usb1: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
[Tue Sep 6 13:37:13 2022] usb usb2: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-3: new SuperSpeed USB device number 3 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-4: new SuperSpeed USB device number 4 using xhci_hcd
[Tue Sep 6 14:39:22 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 14:39:22 2022] usb 2-4: new SuperSpeed USB device number 5 using xhci_hcd
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:02 2022] usb 2-4: new SuperSpeed USB device number 6 using xhci_hcd
[Tue Sep 6 22:19:06 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 22:19:07 2022] usb 2-4: new SuperSpeed USB device number 7 using xhci_hcd

Since this drops the device from the system and offlines the link, I created a simple script to detect zero UP ethernet devices via cron once a minute and runs a ifnet -a. It's clunky but works.

crontab:
# m h dom mon dow command
* * * * * /root/fixnet.sh >/dev/null 2>&1

fixnet.sh:
#!/bin/sh

STATE=`ip link | grep " enx" | grep UP | wc -l`
if [ $STATE -gt 0 ]; then
  # All good. Exit
  exit 0
fi

/usr/sbin/ifup -a
sleep 20

ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -eq 0 ]; then
  # Network looks good. Exit.
  exit 0
fi

sleep 310
ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -ne 0 ]; then
  # The network is still down.
  systemctl reboot
fi

no longer affects: ubuntu-release-upgrader (Ubuntu)
Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

I'm using a 2.5gb ethernet usb device and getting this error intermittently (a dozen times per day).

$ uname -a
Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ lsusb
<snip>
Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G

This is what plays out via /var/log/syslog each time:

Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect, device number 15
Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1: unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1: Network is down
Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed Gen 1 USB device number 16 using xhci_hcd
Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB 10/100/1G/2.5G LAN
Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer: Realtek
Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber: 001000001
Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0: MAC-Address: xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting rx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting tx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1: register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus systemd-udevd[325501]: Using default interface naming scheme 'v245'.
Dec 21 10:26:47 hephaestus systemd-udevd[325501]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_100_1G_2.5G_LAN.
(then things start back up and the ethernet link goes live again after about 10 seconds)

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

FYI: I have built a kernel with the previously (on this thread) discussed patch (on a 5.4 kernel) and I still have the error multiple times per day.

(In reply to James H from comment #207)
> I'm using a 2.5gb ethernet usb device and getting this error intermittently
> (a dozen times per day).
>
> $ uname -a
> Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC
> 2022 x86_64 x86_64 x86_64 GNU/Linux
>
>
> $ lsusb
> <snip>
> Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB
> 10/100/1G/2.5G
>
>
>
> This is what plays out via /var/log/syslog each time:
>
> Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect,
> device number 15
> Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1:
> unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
> Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1:
> Network is down
> Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
> Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed
> Gen 1 USB device number 16 using xhci_hcd
> Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device
> found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
> Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device
> strings: Mfr=1, Product=2, SerialNumber=6
> Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB
> 10/100/1G/2.5G LAN
> Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer:
> Realtek
> Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber:
> 001000001
> Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0:
> MAC-Address: xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting
> rx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting
> tx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1:
> register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus systemd-udevd[325501]: Using default interface
> naming scheme 'v245'.
> Dec 21 10:26:47 hephaestus systemd-udevd[325501]: ethtool: autonegotiation
> is unset or enabled, the speed and duplex are not writable.
> Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_100_1G_2.5G_LAN.
> (then things start back up and the ethernet link goes live again after about
> 10 seconds)

Revision history for this message
Sven Mohr (svmohr) wrote :
Download full text (4.2 KiB)

I also get random disconnects on kernel 6.3.0-7-generic with a Samsung T7 Shield external SSD drive. Unfortunately it is hard to reproduce this error, it usually takes hours before it occurs the first time.

System:
  Kernel: 6.3.0-7-generic arch: x86_64 bits: 64 compiler: N/A Console: pty pts/10 Distro: Ubuntu
    23.10 (Mantic Minotaur)
Machine:
  Type: Server System: Supermicro product: C9Z390-PGW v: 0123456789 serial: <filter>
  Mobo: Supermicro model: C9Z390-PGW v: 1.01A serial: <filter> UEFI: American Megatrends v: 1.3
    date: 06/03/2020
CPU:
  Info: 8-core model: Intel Core i9-9900K bits: 64 type: MT MCP arch: Coffee Lake rev: D cache:
    L1: 512 KiB L2: 2 MiB L3: 16 MiB
  Speed (MHz): avg: 3687 high: 5002 min/max: 800/5000 cores: 1: 5002 2: 3600 3: 3600 4: 3600
    5: 3600 6: 3600 7: 3600 8: 3600 9: 3600 10: 3600 11: 3600 12: 3600 13: 3600 14: 3600 15: 3600
    16: 3600 bogomips: 115200
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 10000M
        ID 04e8:61fb Samsung Electronics Co., Ltd

BOOT_IMAGE=/boot/vmlinuz-6.3.0-7-generic root=UUID=2c8c7990-bb1d-47dc-a70c-0272867b1807 ro quiet splash intel_iommu=on iommu=pt pcie_aspm=off initcall_blacklist=sysfb_init rd.modules-load=vf
io-pci vfio_pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7,1462:3710 vt.handoff=7

[349280.239403] usb 2-4: USB disconnect, device number 9
[349280.239689] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[349280.239695] usb 2-4: cmd cmplt err -108
[349280.239702] sd 9:0:0:0: [sdh] tag#13 uas_zap_pending 0 uas-tag 1 inflight: CMD
[349280.239705] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239724] sd 9:0:0:0: [sdh] tag#13 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[349280.239726] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239728] I/O error, dev sdh, sector 3542672384 op 0x1:(WRITE) flags 0x8800 phys_seg 27 prio class 2
[349280.239741] device offline error, dev sdh, sector 3542674432 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239747] device offline error, dev sdh, sector 3542672640 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 2
[349280.239750] device offline error, dev sdh, sector 3542677504 op 0x1:(WRITE) flags 0x8800 phys_seg 45 prio class 2
[349280.239753] device offline error, dev sdh, sector 3542680576 op 0x1:(WRITE) flags 0x8800 phys_seg 41 prio class 2
[349280.239788] device offline error, dev sdh, sector 3542663168 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239793] device offline error, dev sdh, sector 3542663680 op 0x1:(WRITE) flags 0x8800 phys_seg 29 prio class 2
[349280.239799] device offline error, dev sdh, sector 3542663936 op 0x1:(WRITE) flags 0x8800 phys_seg 26 prio class 2
[349280.299534] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[349280.523475] sd 9:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=...

Read more...

Revision history for this message
In , svmohr (svmohr-linux-kernel-bugs) wrote :
Download full text (4.2 KiB)

I also get random disconnects on kernel 6.3.0-7-generic with a Samsung T7 Shield external SSD drive. Unfortunately it is hard to reproduce this error, it usually takes hours before it occurs the first time.

System:
  Kernel: 6.3.0-7-generic arch: x86_64 bits: 64 compiler: N/A Console: pty pts/10 Distro: Ubuntu
    23.10 (Mantic Minotaur)
Machine:
  Type: Server System: Supermicro product: C9Z390-PGW v: 0123456789 serial: <filter>
  Mobo: Supermicro model: C9Z390-PGW v: 1.01A serial: <filter> UEFI: American Megatrends v: 1.3
    date: 06/03/2020
CPU:
  Info: 8-core model: Intel Core i9-9900K bits: 64 type: MT MCP arch: Coffee Lake rev: D cache:
    L1: 512 KiB L2: 2 MiB L3: 16 MiB
  Speed (MHz): avg: 3687 high: 5002 min/max: 800/5000 cores: 1: 5002 2: 3600 3: 3600 4: 3600
    5: 3600 6: 3600 7: 3600 8: 3600 9: 3600 10: 3600 11: 3600 12: 3600 13: 3600 14: 3600 15: 3600
    16: 3600 bogomips: 115200
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 10000M
        ID 04e8:61fb Samsung Electronics Co., Ltd

BOOT_IMAGE=/boot/vmlinuz-6.3.0-7-generic root=UUID=2c8c7990-bb1d-47dc-a70c-0272867b1807 ro quiet splash intel_iommu=on iommu=pt pcie_aspm=off initcall_blacklist=sysfb_init rd.modules-load=vf
io-pci vfio_pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7,1462:3710 vt.handoff=7

[349280.239403] usb 2-4: USB disconnect, device number 9
[349280.239689] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[349280.239695] usb 2-4: cmd cmplt err -108
[349280.239702] sd 9:0:0:0: [sdh] tag#13 uas_zap_pending 0 uas-tag 1 inflight: CMD
[349280.239705] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239724] sd 9:0:0:0: [sdh] tag#13 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[349280.239726] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239728] I/O error, dev sdh, sector 3542672384 op 0x1:(WRITE) flags 0x8800 phys_seg 27 prio class 2
[349280.239741] device offline error, dev sdh, sector 3542674432 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239747] device offline error, dev sdh, sector 3542672640 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 2
[349280.239750] device offline error, dev sdh, sector 3542677504 op 0x1:(WRITE) flags 0x8800 phys_seg 45 prio class 2
[349280.239753] device offline error, dev sdh, sector 3542680576 op 0x1:(WRITE) flags 0x8800 phys_seg 41 prio class 2
[349280.239788] device offline error, dev sdh, sector 3542663168 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239793] device offline error, dev sdh, sector 3542663680 op 0x1:(WRITE) flags 0x8800 phys_seg 29 prio class 2
[349280.239799] device offline error, dev sdh, sector 3542663936 op 0x1:(WRITE) flags 0x8800 phys_seg 26 prio class 2
[349280.299534] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[349280.523475] sd 9:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVE...

Read more...

Displaying first 40 and last 40 comments. View all 214 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.