xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

Bug #1749961 reported by Guilherme G. Piccoli
108
This bug affects 21 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High
linux (Debian)
Confirmed
Undecided
Unassigned
linux (Ubuntu)
In Progress
Medium
Unassigned
Trusty
Won't Fix
Medium
Unassigned
Xenial
Confirmed
Medium
Unassigned
Bionic
Confirmed
Medium
Unassigned
Focal
Confirmed
Medium
Unassigned

Bug Description

It was observed that while trying to use a 4K USB webcam connected to USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam does not work and kernel log shows the following messages:

[431.928016] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928021] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e020 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928024] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928026] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e030 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928027] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928029] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e050 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928386] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13

A similar issue was already reported on Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750

The fix to this issue seems to be the following patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9da5a109

Tests in our scenario with this patch proved still broken. Our next approach is to modify the patch a bit and re-test.

This LP will be used to document our progress in the investigation.

no longer affects: linux-meta (Ubuntu)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1749961

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Medium
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu Artful):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Patch was modified (by adding the PCI_ID of device 1142A, which confusingly is 1242!) and still the problem reproduces.

New approaches to be tried soon.

tags: added: kernel-da-key
Revision history for this message
imperia (imperia777) wrote :

Hello,
Looks like I am having the same problem.

After some hours(random time) my USB 3.1 asmedia controller crashes the driver with following error:
[ 873.661534] xhci_hcd 0000:00:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 3
[ 873.661629] xhci_hcd 0000:00:00.0: Looking for event-dma 00000002722ed630 trb-start 00000002722ed9b0 trb-end 00000002722ed9d0 seg-start 00000002722ed000 seg-end 00000002722edff0
[ 875.673409] xhci_hcd 0000:00:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
I am struggling with this error for more than year. Its very annoying to have to restart the PC every few hours. USB tuner card is connected to the port.
I would like to provide whatever information and support is necessary to fix this damn bug. Logs, ssh access to the affected box and everything else what is needed.

Please ask me here or write to my e mail imperia777_yahoo.com
Thanks.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Nice imperia, thanks for the report here. First we need to be sure it's exactly the same adapter.
Can you provide the the output of "lspci -nn" ?

Then, if it's the same adapter:

0) Which Ubuntu version are you running? Which kernel version are you using? Can you try in the latest 4.13 for Xenial? (or even better, the hwe-edge 4.15)
Instructions to run the latest 4.15 version: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable?field.series_filter=xenial

1) You said "after some hours" - can you provide some details? You've been using the USB tuner for like 2 hours? 12 hours? The tuner is in constant use and suddenly the issue happens?

2) If possible, enable xhci dynamic debug and provide logs after the issue; in order to do this, run the following command as root:
echo "module xhci_hcd +flpt" > /sys/kernel/debug/dynamic_debug/control

After issue reproduces, collect the /var/log/kern.log file.

Thanks,

Guilherme

Revision history for this message
imperia (imperia777) wrote :

Hello,

00:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]

Actually I am on debian buster. I am running kernel 4.16-rc6 from experimental repository.

I am running program for watching satellite channels called vdr.
When I am not watching TV, while idle, every few minutes vdr scans for channel list updates from satellites. It is safe to say that tuner is occupied every few minutes for a scan, but not occupied with bandwidth like when watching TV. While in this mode vdr is able to crash the driver in ~6-30 hours.

There is program that you use to initially create your channels list for vdr. When I use it I am able to crash the driver in ~1-2 hours.

But when I just watch one channel and don't change it for hours, driver is least likely to crash.

I think something in consecutive opening (initializing) of the usb port/driver forces this error.
Because the program that scans for channels crash it much faster.
This program work like this:

:go
open port
scan some frequency
write to file new channels
close port
goto go

I made this script that I will use to capture the log.

echo "module xhci_hcd +flpt" > /sys/kernel/debug/dynamic_debug/control
(tail -F -n0 /var/log/kern.log &) | grep -q "TRB DMA"
cp /var/log/kern.log /home/imperia/log1.log

And I will run initial channels list scan to force it faster.

I will be back later with the logs.
Thanks for your help.

Revision history for this message
imperia (imperia777) wrote :
Download full text (8.6 KiB)

Mar 29 20:20:03 vdr kernel: [119370.230528] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c590 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230533] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230537] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:20:03 vdr kernel: [119370.230542] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue segment = 00000000573583cc (virtual)
Mar 29 20:20:03 vdr kernel: [119370.230547] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue pointer = 0x2ae36c5a0 (DMA)
Mar 29 20:20:03 vdr kernel: [119370.230553] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Set TR Deq Ptr cmd, new deq seg = 00000000573583cc (0x2ae36c000 dma), new deq ptr = 0000000041e92668 (0x2ae36c5a0 dma), new cycle = 0
Mar 29 20:20:03 vdr kernel: [119370.230558] <intr> xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230631] [27868] xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cancel URB 0000000060641c50, dev 2, ep 0x82, starting at offset 0x2ae36c5a0
Mar 29 20:20:03 vdr kernel: [119370.230638] [27868] xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230650] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Successful Set TR Deq Ptr cmd, deq = @2ae36c5a0
Mar 29 20:20:03 vdr kernel: [119370.230700] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c5a0 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230705] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230710] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:20:03 vdr kernel: [119370.230715] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue segment = 00000000573583cc (virtual)
Mar 29 20:20:03 vdr kernel: [119370.230719] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue pointer = 0x2ae36c5b0 (DMA)
Mar 29 20:20:03 vdr kernel: [119370.230725] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Set TR Deq Ptr cmd, new deq seg = 00000000573583cc (0x2ae36c000 dma), new deq ptr = 0000000050070757 (0x2ae36c5b0 dma), new cycle = 0
Mar 29 20:20:03 vdr kernel: [119370.230730] <intr> xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230798] [27868] xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cancel URB 00000000588cca08, dev 2, ep 0x82, starting at offset 0x2ae36c5b0
Mar 29 20:20:03 vdr kernel: [119370.230805] [27868] xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230816] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Successful Set TR Deq Ptr cmd, deq = @2ae36c5b0
Mar 29 20:20:03 vdr kernel: [119370.230865] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c5b0 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230870] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230874] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:...

Read more...

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks a lot Imperia! It's indeed the same PCI adapter, and it's even better you're running an upstream kernel like this.

I'll analyze your logs in order to match with the ones I have here.
I might need some xhci traces to understand the TRBs operations (like the enqueue and completion of TRBs). I'll comment here in case I need it.

Cheers,

Guilherme

Revision history for this message
imperia (imperia777) wrote :

echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
(tail -F -n0 /var/log/kern.log &) | grep -q "TRB DMA"
cp /var/log/kern.log /home/imperia/log1.log

Is this correct command to get traces?
I will run it in advance.

Somebody told me to run this before when I was looking for help.

BTW did you download the full logs so I can remove it from web page?

I will can provide ssh access to box affected if needed.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Wow Imperia, you're being really helpful here, thank you very much!

To enable traces, these are the instructions I've provided to other people affected so far:

0) Reboot the machine in order to put it in a consistent state;
1) echo "module xhci_hcd +flpt" > /sys/kernel/debug/dynamic_debug/control
2) echo nop > /sys/kernel/debug/tracing/current_tracer
3) echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
4) echo 0 > /sys/kernel/debug/tracing/trace
5) echo 1 > /sys/kernel/debug/tracing/tracing_on
6) echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

After reproduce the issue, you should collect /sys/kernel/debug/tracing/trace. Problem is that the file might be huge, much larger than the kernel log you provided for instance.

About the SSH access, I'm interested in getting it next week, if it doesn't annoy you too much. It'll be really helpful, but I might need to reboot the machine.

Oh, I've downloaded the logs from your website, so you can delete it now.
Cheers,

Guilherme

Revision history for this message
imperia (imperia777) wrote :

Hello,

I think I am ready with the trace log. Hopefully it is full, because machine run out of disk space :)
http://imperia.mine.nu/trace1.log.bz2
Interesting is that it took ~12 hours to crash it this time.

The problem with ssh access is that this is virtual machine under XEN and when you reboot it, the USB controller is gone(not assigned to virtual machine anymore). I have to re-assign the USB controller for passthrough from xen host. (this is xen bug I think, it wasn't like this before).

This is what I do when I have to restart vdr virtual machine:
xl pci-assignable-remove 03:00.0
xl pci-assignable-add 03:00.0
xl create /etc/xen/vdr.cfg

Anyway we can get in touch on irc and I can do restarts for you.

BTW, I shutdown the whole xen server. Then I turn off the power button on PSU and pressed the power button on the case to discharge any electricity left and put it in consistent state before getting the trace logs.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks again Imperia, the traces are fine. They're only 25MB, shouldn't have caused any kind of disk issues, like out of space condition. Also, I'd like to see the correlated kernel log to match the problematic TRBs from the kernel log with trace information. Can you provide me the relevant kern.log file?

I've already downloaded the traces from your server, in case you want to remove the file.

About the SSH, thanks for the offering and let's talk on IRC in case I need it. I'll try the logs first, not sure they're enough for me to understand the issue completely.

Cheers,

Guilherme

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Imperia, I built a mainline kernel (version 4.16) with a different quirk that I think might help here. Can you test it? Thanks in advance!

Instructions (run all as root):

1) wget people.canonical.com/~gpiccoli/imperia416.tgz
2) mv imperia416.tgz /
3) tar -zxf imperia416.tgz
4) update-initramfs -c -k 4.16.0-imperia+

Now that's important: if you have access to some serial console in the machine (or if you have physical access), you can reboot into this new kernel. In case _you only have ssh_, I'd suggest to remove the kernel boot entry from grub, and boot through kexec for safety reasons:

a) Remove boot entries from grub.cfg (you can copy away vmlinuz-4.16-imperia+ to some place outside /boot and run "update-grub" for this)
b) apt-get install kexec-tools
c) kexec vmlinuz-4.16-imperia+ --initrd initrd.img-4.16-imperia+ --append="$(cat /proc/cmdline)"
----

After machine (hopefully!) boot to the new kernel, check in dmesg if the quirk is there:
#$ dmesg|grep QUIRK
[0.813486] QUIRK: XHCI_AVOID_BEI

If you can see that output ("QUIRK: XHCI_AVOID_BEI"), then the quirk was applied.
Now, just need to try to reproduce the issue again.

Thanks a lot,

Guilherme

Revision history for this message
imperia (imperia777) wrote :

Hello,

I am unable to test with the kernel you provided, because my tuner card doesn't have driver in mainline kernel tree. So I have to compile it myself and I need kernel headers for this.

So I compiled kernel 4.16 from debian linux-source-4.16 package and applied the patch you provided:

From dd0375ffba55172194999d40b35344e9dc2682df Mon Sep 17 00:00:00 2001
From: "Guilherme G. Piccoli" <email address hidden>
Date: Wed, 11 Apr 2018 11:04:13 +0000
Subject: [PATCH] xhci: Add quirk to ASMedia 0x1242 adapter to avoid BEI

Signed-off-by: Guilherme G. Piccoli <email address hidden>
---
 drivers/usb/host/xhci-pci.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index d9f831b..0654461 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -213,6 +213,12 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
                xhci->quirks |= XHCI_TRUST_TX_LENGTH;

        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
+ pdev->device == 0x1242) {
+ xhci->quirks |= XHCI_AVOID_BEI;
+ pr_warn("QUIRK: XHCI_AVOID_BEI");
+ }
+
+ if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI)
                xhci->quirks |= XHCI_ASMEDIA_MODIFY_FLOWCONTROL;

--
2.7.4

Compiled my tuner card driver now and I am testing.

Revision history for this message
Andy Whitcroft (apw) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Artful):
status: In Progress → Won't Fix
Revision history for this message
imperia (imperia777) wrote :

this is dmidecode output of my machine, in case the fix is FW related, it may be useful in order to contact the motherboard vendor

Revision history for this message
Roy Thompson (royt77) wrote :

I am running into this same issue with an ASMedia 2142 USB board. Was a fix ever identified?

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Roy, thanks for the report. What is your motherboard? What kernel are you running? And what tests are triggering this issue for you?
If you have logs, it'll be pretty useful.

Maybe it's a similar but different case..or the logs may help to confirm it's exact the same issue.

ASMedia seems to have a FW fix but that depends on your motherboard vendor to provide it. They don't provide the fix themselves...it needs some cooking from the vendor, to match subsystem IDs and whatnot.

Cheers,

Guilherme

Changed in linux (Debian):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
status: New → Confirmed
Revision history for this message
Roy Thompson (royt77) wrote :

Hi Guilherme,

Thanks for the response. I have several (3) quad port ASMedia 2142 PCIe/USB 3.1 cards installed in a Dell R740 rack server. I am using the standard Ubuntu 18.04 kernel (Linux dell-PowerEdge-R740 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux).

For one of my applications, it runs a loop that opens and closes a high speed connection to a USB device connected through the ASMedia board. After this goes on for several minutes without any issues, I see this in dmesg:

[Oct 5 10:12] xhci_hcd 0000:be:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ +3.418076] xhci_hcd 0000:be:00.0: WARN Successful completion on short TX
[ +0.000035] xhci_hcd 0000:be:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 12 comp_code 1
[ +0.000003] xhci_hcd 0000:be:00.0: Looking for event-dma 0000001fe9759610 trb-start 0000001fe9759620 trb-end 0000001fe9759620 seg-start 0000001fe9759000 seg-end 0000001fe9759ff0

This is then followed shortly after by several kernel dump messages, and then the whole system starts behaving erratically, requiring a hard reboot to recover.

The condition is easy for me to reproduce and I will happily provide any logs that may be of use to help debug this. Please just let me know what you would like and how to get them (as I am not a kernel expert).

Thanks,
Roy

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Roy, thanks for your quick response. First thing, I'd like to ask you to attach the output of "lspci -vvv" and "dmidecode" in this LP so we can validate the adapters and be sure they are exactly the same, and also the motherboard type. Run both commands as root user.

After that, i'll ask you to reproduce the issue and attach the output of "dmesg" command right after reproduction. If you can also elaborate more about the test you're running, I'd really be glad.

I'll then provide you custom commands to use the kernel trace system to infer more about the issue. One final thing: are you willing to test with mainline kernel in order to check if there's some upstream fix for your instance of the issue?
If so, you can get it here: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable
This PPA provides a build from kernel 4.18.

Thanks in advance,

Guilherme

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

After upgrading to the 4.20 Kernel(was using 4.19 previously) my usb wifi stick doesn´t work until I reboot the system. This issue happens every time I start my pc(only when the system was shut down, it doesn´t happen after rebooting). The wifi driver in use is rt2800usb. I tried restarting the NetworkManager, but this didn´t change anything.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Hmm, that's strange perhaps this is some USB host problem. Please provide dmesg of your system.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 281677
dmesg output before reboot

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 281679
dmesg output after reboot

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

We have this xhci_hcd warning on bad case:

 xhci_hcd 0000:15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state

Not sure where it come from. But I notice you are using AMD IOMMU which we have troubles with with different drivers.

You could try to disable iommu via kerenl boot parameter and check if that improve things. You could also try test this patch if possible:
https://bugzilla.kernel.org/attachment.cgi?id=281675

If none of that helps I will prepare some rt2800 patches to see if this not caused by some of v4.19 .. v4.20 rt2800 commits:

0240564430c0 rt2800: flush and txstatus rework for rt2800mmio
adf26a356f13 rt2x00: use different txstatus timeouts when flushing
5022efb50f62 rt2x00: do not check for txstatus timeout every time on tasklet
0b0d556e0ebb rt2800mmio: use txdone/txstatus routines from lib
5c656c71b1bf rt2800: move usb specific txdone/txstatus routines to rt2800lib
f483039cf51a rt2x00: use simple_read_from_buffer()

But I would rather suspect problem introduced in AMD IOMMU or usb/xhci drivers.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I tried disabling iommu, and I also compiled the 4.20.15 kernel from source with that patch applied, but the wifi didn´t work in both cases either.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 281711
rt2x00_revert_4.20_changes.patch

Please test this patch and report if it makes problem gone or not.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

The problem is still there after applying that patch.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

You need to report this bug usb maintainers. I'm changing the topic and component, but USB bugs should be reported directly to mailing list.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Please send bug report to <email address hidden>

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

I can confirm this issue. Also I can confirm that other USB devices are effected, too (mostly if plugged into an USB3 port).
For example:
ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

dmesg doesn't show IOMMU warnings, so I assume it is a problem introduced in usb/xhci driver.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

(In reply to Michael from comment #10)
> I can confirm this issue. Also I can confirm that other USB devices are
> effected, too (mostly if plugged into an USB3 port).
> For example:
> ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
>
> dmesg doesn't show IOMMU warnings, so I assume it is a problem introduced in
> usb/xhci driver.

I think this affects only a specific hardware configuration(I've tried using my wifi stick on a different machine and it worked without problems).
Which hardware are you using? Maybe there are some parts we have in common.

My hardware configuration:
CPU: AMD Ryzen 3 2200G, Motherboard: MSI B350 PC MATE
GPU: AMD Radeon RX 580 8GB

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Bernhard
The parts we have in common : AMD RYZEN

AMD RYZEN 1700 MSI X370 KRAIT, MSI AERO GTX1080Ti, 5.0.6-arch1-1-ARCH (system was also affected by IOMMU issue - but that is fixed)

Affected USB WiFi devices (tested):
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter (ALFA AWUS036NH - rt2800usb)
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter (ipTime/ zioncom - rt2800usb)
ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
ID 7392:a812 Edimax Technology Co., Ltd (Edimax EW-7811USC - rtl88xxau)
ID 148f:761a Ralink Technology, Corp. MT7610U ("Archer T2U" 2.4G+5G WLAN Adapter - mt76x0)
ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]
ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)
I'm sure there are more.

After he has fixed some driver / IOMMU issues, Stanislaw has found out, that it possibly could be a xhci/driver issue. I share his opinion.

You can read more about the issues here:
https://github.com/ZerBea/hcxdumptool/issues/42
and the fixed IOMMU issue here:
https://bugzilla.kernel.org/show_bug.cgi?id=202241

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

FTR: I think those two commits could help:

commit 6cbcf596934c8e16d6288c7cc62dfb7ad8eadf15
Author: Mathias Nyman <email address hidden>
Date: Fri Mar 22 17:50:15 2019 +0200

    xhci: Fix port resume done detection for SS ports with LPM enabled

commit d92f2c59cc2cbca6bfb2cc54882b58ba76b15fd4
Author: Mathias Nyman <email address hidden>
Date: Fri Mar 22 17:50:17 2019 +0200

    xhci: Don't let USB3 ports stuck in polling state prevent suspend

Also I'm not sure if if issue was reported to proper maintainer. If not and problem is not already fixed on latest upstream, either bisection will be needed to precede with this bug or fill properly informative bug report to proper maintainer.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Stanislaw, thanks for additional information.

@ Bernhard, ‎have you already sent this bug report to linux-usb mailing list?

can we change affected kernel version from 4.20 to >= 4.20, because 5.0.6 is affected, too?

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Yes, I already sent this to the mailing list, but I got no response unfortunately.

I've changed the affected kernel version btw.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Bernhard, thanks for your answer. So there is no need for me to report this issue, too.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I just tried the two patches Stanislaw mentioned, but the problem is still there.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Tried them, too, some days ago, but the didn't solve the issue.
Just downloaded 5.1rc3, but I don't expect a working driver (usb/host), inside.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Tested an ASUS X555U (Intel i5-6200 - 5.0.6-arch1-1-ARCH) and that system is affected, if the device is plugged into one of the USB3 ports. The device is working, if plugged into the USB2 port.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I just tried replacing the xhci_ring.c file with the version from the 4.19 kernel, that solved the problem. Then I started patching the code until the problem occurs again.
The change in the function "static int process_bulk_intr_td" is causing the problem, it's part of this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/usb/host/xhci-ring.c?id=9703fc8caf36ac65dca1538b23dd137de0b53233

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Berna(In reply to Bernhard from comment #20)
> I just tried replacing the xhci_ring.c file with the version from the 4.19
> kernel, that solved the problem. Then I started patching the code until the
> problem occurs again.
> The change in the function "static int process_bulk_intr_td" is causing the
> problem, it's part of this patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/
> drivers/usb/host/xhci-ring.c?id=9703fc8caf36ac65dca1538b23dd137de0b53233

Good findings, great. This seems to be part of

commit f8f80be501aa2f10669585c3e328fad079d8cb3a
Author: Mathias Nyman <email address hidden>
Date: Thu Sep 20 19:13:37 2018 +0300

    xhci: Use soft retry to recover faster from transaction errors

Just add information you found in the posted linux-usb email and CC "Mathias Nyman <email address hidden>" to make sure he is aware of the problem.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

The issue isn't fixed in 5.1rc3, so it look's like Mathias Nyman isn't aware of the problem, yet.

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :
Download full text (4.0 KiB)

Hello,

I think I am seeing the same or related issue with the ASM1142 controller on my Razer Core Chroma EGPU enclosure. I'm running Ubuntu 19.04, kernel version 5.0.0-13-generic. Ethernet on the enclosure stops working while downloading large files. Dmesg produces the following error messages:

[ 569.641475] xhci_hcd 0000:0f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 13
[ 569.641487] xhci_hcd 0000:0f:00.0: Looking for event-dma 000000048d9c5770 trb-start 000000048d9c5750 trb-end 000000048d9c5750 seg-start 000000048d9c5000 seg-end 000000048d9c5ff0

lspci output:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 08)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Intel(R) 100 Series Chipset Family LPC Controller/eSPI Controller - 9D4E (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-V (rev 21)
02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C step) [Alpine Ridge 4C 2016] (rev 02)
08:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
09:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
09:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
0a:00....

Read more...

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Bryan, thanks for the report. It could be the same issue, can you provide the full dmesg, and also the outputs of the following commands: "lspci -nnvvv", "lspci -t" and "ls -l /sys/class/net"?

The issue was fixed for the first reporter via a FW update in the ASMedia adapter; unfortunately this FW update comes from the vendor, so the way of getting it varies according to the HW presenting the problem.

Cheers,

Guilherme

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

Please see attached log for the outputs that you requested.

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

Please see attached log for the outputs that you requested.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Great Bryan, the model of your USB controller is the same reported in this LP; also, given the outputs you provided, the network interface "enx90203a19dcb6" in under one of those USB controllers - you mentioned you see the TRB DMA errors and the interface stops responding. Is the problematic interface that one, "enx90203a19dcb6" ?

Who is the vendor of your device? I'd suggest you to seek help from them, mentioning this LP and that ASMedia may have a potential firmware fix for the issue.

Thanks,

Guilherme

Revision history for this message
Gabe Esposito (gabespo) wrote :

I'm also experiencing the same issue with the ASM1142 controller on the Core X Chroma and can reproduce consistently. I'm running kernel 5.0.9.

Guilherme, thanks for your work diagnosing this. This device is sold by Razer. I will try and reach out but they do not claim Linux support on any of their devices so I worry this may go unfixed. Barring a firmware fix, is there any hope of this being fixed with a quirk, as the other controller was? I realize this LP is not the ideal place for such a fix to take place, but I am happy to participate in finding a solution.

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

In attempt to update the firmware I installed the razer software on my newly created windows partition, to see if it could be updated through there. No luck.

I emailed Razer support to ask about obtaining updated firmware. I'll let everyone know what I hear back.

And yes, "enx90203a19dcb6" is the problematic interface.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks Gabe! I agree with you, would be really nice to have a quirk for that. It would be more easy to analyze that possibility with a datasheet for this adapter, which unfortunately I don't have.
I'm on vacation until next week, I'll try to discuss that in linux-usb when I'm back, and pursue a kernel quirk instead of firmware-only fix.

@Bryan, thanks for checking with the vendor, let us know the outcome.
Cheers,

Guilherme

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Still present in 5.1.2

Revision history for this message
Alex Lourenco (nyb-2017) wrote :

I am experiencing the exact same issue first reported in this LP (ASMedia ASM1142 USB 3.1 Controller with a Logitech Brio 4k, ERROR Transfer event TRB DMA ptr not part of current TD ...). In my case the controller is provided by a StarTech.com 4 Port USB 3.1 PCIe Card 3x USB-A and 1x USB-C [PEXUS313AC2V].

While searching online I found a couple of LP's and forum posts with similar issues. The common factor seems to be high speed usb devices (e.g 4k webcam, usb ethernet adapters) connected to ASMedia controllers.

I have compiled 5.0.0 with a variety of existing quirks but nothing has done the trick so far. There are a couple of ASMedia firmwares posted on station-drivers. Unfortunately none of them seem to fix the issue either.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

This issue is really funny:
running
D 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]

on kernel
$ uname -r
5.1.7-arch1-1-ARCH

will spam the log after the know WARN
43163.034783] mt76x0u 1-10.2:1.0 wlp3s0f0u10u2: renamed from wlan0
[43163.351656] usb 1-10.2: USB disconnect, device number 6
[43163.352176] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

with tons of failed vendor requests:
[43160.683383] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3dc failed:-71
[43160.813398] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e0 failed:-71
[43160.943415] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e4 failed:-71
[43161.073440] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e8 failed:-71
[43161.203439] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3ec failed:-71
[43161.333458] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f0 failed:-71
[43161.463468] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f4 failed:-71
[43161.593561] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f8 failed:-71
[43161.723502] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3fc failed:-71
[43161.853512] mt76x0u 1-10.2:1.0: vendor request req:06 off:108c failed:-71
....

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

If the same device is connected to an Intel Core I5-6200 system (USB3 port), the log looks different to the AMD RYZEN system.

[ 204.231872] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.231901] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.231940] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.231980] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232020] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232188] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232226] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232275] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232304] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.232345] mt76x0u 1-1:1.0: rx urb failed: -71
[ 204.233284] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[ 204.233291] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
[ 204.263427] mt76x0u 1-1:1.0: TX DMA did not stop
[ 207.596726] mt76x0u 1-1:1.0: Warning: MAC TX did not stop!
[ 209.650050] mt76x0u 1-1:1.0: Warning: MAC RX did not stop!
[ 209.651133] mt76x0u 1-1:1.0: RX DMA did not stop

Also I noticed some changes in xhci-ring.c between 5.1.7 and 5.2_rc4. Maybe they'll fix the problem. I didn't tested it, yet.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I already tried the 5.2-rc3 kernel and the problem isn't fixed yet. There were no changes in the xhci driver between rc3 and rc4, so it's very unlikely that the problem doesn't occur in the 5.2-rc4 kernel.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Thanks for the information. I skipped 5.2rc1 ... rc3.

But with your information, there is no real need for me to run some more tests.

Unfortunately it looks like the issue is back ported to older kernel versions (4.19), because I got some issue reports here, too:
https://github.com/ZerBea/hcxdumptool/issues/57

and 90% of my devices doesn't work any longer.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

When did it get back ported? I'm on 4.19.48 and haven't had a problem with this version...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

It's just a guess, because of this post:
https://github.com/ZerBea/hcxdumptool/issues/57#issuecomment-483964293

But it looks like the device was working before that post.
I cant test it, because I have not such a device.

I tested a TP-LINK Archer T2UH and this device is not working on 4.19.46 arm (Raspberry Pi).

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Yes, rt2800usb is working fine on 4.19.46.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

hcxdumptool running on kernel 4.19.46 arm doesn't receive packets on several different devices. In this case:
ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]
INFO: cha=1, rx=0, rx(dropped)=0, tx=18, err=0, aps=0 (0 in range)

while a few other devices still working
INFO: cha=1, rx=805, rx(dropped)=0, tx=93, err=0, aps=29 (21 in range)

BTW:
I'm running/testing only devices on which driver support monitor mode and packet injection.

Very interesting on that arm kernel is that dmesg doesn't show any WARNs.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Still no fix?
$ uname -r
5.1.11-arch1-1-ARCH

and most of the USB devices WiFI, BLUETOOTH,....) are still not working:
32942.700591] usb 1-10.4: new full-speed USB device number 7 using xhci_hcd
[32944.721410] usb 1-10.4: New USB device found, idVendor=0a12, idProduct=0001, bcdDevice=52.76
[32944.721412] usb 1-10.4: New USB device strings: Mfr=0, Product=2, SerialNumber=0
[32945.069015] Bluetooth: hci0: hardware error 0x37

How about kernel 5.2?

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Some USB card readers are also affected (connected to USB 3 port):

$ uname -r
5.1.12-arch1-1-ARCH

[ 3510.100114] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[ 3510.134121] usb 2-2: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 0.02
[ 3510.134126] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 3510.134128] usb 2-2: Product: Intenso Ultra Line
[ 3510.134130] usb 2-2: Manufacturer: ALCOR
...
[ 5129.997608] usb 1-1: reset high-speed USB device number 7 using xhci_hcd
[ 5130.218618] sd 9:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 5130.218631] sd 9:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 20 c3 c0 00 00 20 00
[ 5130.218637] print_req_error: I/O error, dev sdb, sector 2147264 flags 80700

I really wonder why that issue hasn't been fixed, yet, because many, many devices are affected.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

The list of changes for 5.2-rc6 contains this two commits:

Mathias Nyman (2):
      usb: xhci: Don't try to recover an endpoint if port is in error state.
      xhci: detect USB 3.2 capable host controllers correctly

I think this could be the fix for this issue.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Great, thanks for the information. The issue is really ugly, because many USB devices are affected (hdd, card reader, bleutooth, wlan, ... - this list is long)
I'll check 5.2-rc6.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Just tried 5.2-rc6, but unfortunately I still have the same issue.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Thanks for the information. I tested 5.2-rc6, too. Even an USB 3.0 HDD isn't working.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Now running mainline kernel 5.2 and the issue still exists.
Tested on this device:
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
but the same applies to many other devices, too

dmesg after plug in the device:

[75.482165] usb 1-2: new high-speed USB device number 6 using xhci_hcd
[75.639236] usb 1-2: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[75.639238] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[75.639239] usb 1-2: Product: 802.11 n WLAN
[75.639240] usb 1-2: Manufacturer: Ralink
[75.639241] usb 1-2: SerialNumber: 1.0
[75.952611] usb 1-2: reset high-speed USB device number 6 using xhci_hcd
[76.107232] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[76.120228] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 0005 detected
[76.121079] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[76.130873] usbcore: registered new interface driver rt2800usb
[76.194447] audit: type=1130 audit(1562833499.983:49): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76.195313] rt2800usb 1-2:1.0 wlp0s20f0u2: renamed from wlan0
[76.216178] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[76.241382] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[76.544022] ieee80211 phy1: rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x0404 with error -71
[77.562305] ieee80211 phy1: rt2800_wait_csr_ready: Error - Unstable hardware
[77.562316] ieee80211 phy1: rt2800usb_set_device_state: Error - Device failed to enter state 4 (-5)
...
followed by this message on access to the interface:
[341.598563] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[341.598573] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

BTW:
The tested device is an ALFA AWUS036NH and I really can't see "Unstable hardware" here.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I don't really think the problem is caused by the WIFI stick itself, maybe the cause is the xHCI controller from the motherboard? We're both using a 300-series AM4 board(even the same brand), so we probably have the same controller.

Btw. I've already tried the git snapshot from 5.3-rc1, problem isn't fixed there either.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

No, I don't think it's the controller. I'm running three different systems here:
RYZEN 1700, MSI X370 KRAIT
INTEL I5-6200U, ASUS X555U (notebook)
INTEL i7-3930K, ASUS P9X79
and all of them running into the same issue. Also, not all of the testing devices are affected. Some devices are still working as expected (for example TENDA W311U+), while others failed epically (ALFA AWUSH036NH). The same applies to several bluetooth devices.
Absolutely new (and really funny) is the error message "Unstable hardware" on 5.2.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (5.2 KiB)

And 5.2 makes things more worse. Most of my adapters are not working.

EDIMAX EW-7711UAN V2
ID 7392:7710 Edimax Technology Co., Ltd

[ 228.451035] usb 1-2: new high-speed USB device number 53 using xhci_hcd
[ 228.629543] usb 1-2: New USB device found, idVendor=7392, idProduct=7710, bcdDevice= 0.00
[ 228.629548] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 228.629550] usb 1-2: Product: Edimax Wi-Fi
[ 228.629552] usb 1-2: Manufacturer: MediaTek
[ 228.629554] usb 1-2: SerialNumber: 1.0
[ 228.779827] usb 1-2: reset high-speed USB device number 53 using xhci_hcd
[ 229.037761] mt7601u 1-2:1.0: ASIC revision: 76010001 MAC revision: 76010500
[ 229.064654] mt7601u 1-2:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[ 230.045089] mt7601u 1-2:1.0: EEPROM ver:0d fae:00
[ 230.055724] mt7601u 1-2:1.0: EEPROM country region 01 (channels 1-13)
[ 230.763955] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 231.084339] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 231.404311] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 231.724294] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 232.044298] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 232.044301] mt7601u 1-2:1.0: Error: mt7601u_mcu_wait_resp timed out
[ 232.044485] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 232.046810] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 232.197641] mt7601u 1-2:1.0: Vendor request req:07 off:0080 failed:-71
[ 232.347631] mt7601u 1-2:1.0: Vendor request req:02 off:0080 failed:-71
[ 232.497630] mt7601u 1-2:1.0: Vendor request req:02 off:0080 failed:-71
[ 232.497675] mt7601u: probe of 1-2:1.0 failed with error -110

LOGILINK WL0150
ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter

[ 527.994013] usb 1-2: new high-speed USB device number 86 using xhci_hcd
[ 528.238517] usb 1-2: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[ 528.238519] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 528.238521] usb 1-2: Product: 802.11 n WLAN
[ 528.238522] usb 1-2: Manufacturer: Ralink
[ 528.238523] usb 1-2: SerialNumber: 1.0
[ 528.495914] usb 1-2: reset high-speed USB device number 86 using xhci_hcd
[ 528.747058] ieee80211 phy81: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[ 529.426163] ieee80211 phy81: rt2x00_set_rf: Info - RF chipset 5370 detected
[ 529.432544] ieee80211 phy81: Selected rate control algorithm 'minstrel_ht'
[ 529.433131] usbcore: registered new interface driver rt2800usb
[ 529.447058] audit: type=1130 audit(1562850994.757:43): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 529.447260] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[ 534.453471] audit: type=1131 audit(1562850999.761:44): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 560.993915] ieee80211 phy81: r...

Read more...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Sorry, copy and paste error of the last dmesg log. Due to several tests, dmesg log was flooded by warnings and error messages.
I'll stop the tests and will wait for next LTS kernel.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

BTW:
For me the issue started at this point:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/build&id=72a9c673636b779e370983fea08e40f97039b981
when the Linux kernel's default i386/x86_64 kernel configurations shiped with USB 3.0 support enabled (CONFIG_USB_XHCI_HCD).

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Looks like there was requested a debug tracing, what was ignored:

https://<email address hidden>/

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I didn't ignore it, I sent it to Mathias Nyman only, and not to the whole mailing list("Send output to me" didn't sound like I should send it to the whole mailing list but idk). I have to admit that the first traces weren't really useful though, when I ran the commands he told me the traces started too late(because the error happens immediately after system startup, so when I run this commands after startup the important part is missing).

Then he gave me instructions how to enable tracing at startup, which only resulted in this error: [ 0.172042] Failed to enable trace event: xhci-hcd
and the tracing file was empty afterwards.

Just about one week ago I had another idea how I could get it working, and it actually worked. The solution was to just unplug the wifi stick at boot, then enable tracing and plug in the stick again(I don't know why I didn't try that a few months ago tbh). I've sent the two files(dmesg and tracing) to Mathias Nyman again, but this time he didn't respond(I've sent the mail on July 11th).

Should I send the whole tracing file and dmesg log to the mailing list instead? What is the preferred way to send files that are too big for an e-mail(tracing is around 17.6MB in size)?

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Bernhard, thanks for the update and provide debug data to the maintainer.

I think you should ping him on mailing list and ask if anything else need to be provided or how to precede otherwise. Maybe we can we just revert the patch?

This issue is annoying and I see more users entering it (and blaming mt76x0u or rt2800usb drivers). It should not be hard to fix since is regression (commit causing it is known) and is reproducible.

Please also point that changes in process_bulk_intr_td() are main cause of the problem as stated in comment 20.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

This issue is blaming nearly everything (like this SAMSUNG Galaxy S3):

[34385.294067] usb 1-2: new high-speed USB device number 6 using xhci_hcd
[34385.465017] usb 1-2: New USB device found, idVendor=18d1, idProduct=4ee7, bcdDevice= 2.26
[34385.465022] usb 1-2: New USB device strings: Mfr=2, Product=3, SerialNumber=4
[34385.465025] usb 1-2: Product: GT-I9300
[34385.465028] usb 1-2: Manufacturer: samsung
...
[35074.182055] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (3.2 KiB)

Bernhard, running my RYZEN for some days and noticed that tha xhci issue also
affected the USB keyboard and the USB mouse:

At this time, the system is allready running for 2 days:
Aug 24 08:38:41.665376 tux1 kernel: usb 1-12: new low-speed USB device number 19 using xhci_hcd
Aug 24 08:38:42.001609 tux1 kernel: usb 1-12: New USB device found, idVendor=046a, idProduct=0011, bcdDevice= 1.00
Aug 24 08:38:42.001850 tux1 kernel: usb 1-12: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Aug 24 08:38:42.098291 tux1 kernel: hid-generic 0003:046A:0011.0003: input,hidraw0: USB HID v1.11 Keyboard [HID 046a:0011] on usb-0000:03:00.0-12/input0
Aug 24 08:38:43.631091 tux1 kernel: usb 1-12: input irq status -75 received
Aug 24 08:38:43.631384 tux1 kernel: usb usb1-port12: disabled by hub (EMI?), re-enabling...
Aug 24 08:38:43.631409 tux1 kernel: usb 1-12: USB disconnect, device number 19
Aug 24 08:38:44.025057 tux1 kernel: usb 1-12: new low-speed USB device number 20 using xhci_hcd
Aug 24 08:38:44.361600 tux1 kernel: usb 1-12: New USB device found, idVendor=046a, idProduct=0011, bcdDevice= 1.00
Aug 24 08:38:44.361839 tux1 kernel: usb 1-12: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Aug 24 08:38:44.401604 tux1 kernel: input: HID 046a:0011 as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-12/1-12:1.0/0003:046A:0011.0004/input/input18
Aug 24 08:38:44.458277 tux1 kernel: hid-generic 0003:046A:0011.0004: input,hidraw0: USB HID v1.11 Keyboard [HID 046a:0011] on usb-0000:03:00.0-12/input0
Aug 24 08:38:49.031776 tux1 kernel: usb 1-12: input irq status -75 received
Aug 24 08:38:49.032082 tux1 kernel: usb usb1-port12: disabled by hub (EMI?), re-enabling...
Aug 24 08:38:49.032099 tux1 kernel: usb 1-12: USB disconnect, device number 20
Aug 24 08:38:49.425365 tux1 kernel: usb 1-12: new low-speed USB device number 21 using xhci_hcd
Aug 24 08:39:04.905175 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:20.478280 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:20.774967 tux1 kernel: usb 1-12: new low-speed USB device number 22 using xhci_hcd
Aug 24 08:39:36.331757 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:51.838723 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:51.945370 tux1 kernel: usb usb1-port12: attempt power cycle
Aug 24 08:39:52.588394 tux1 kernel: usb 1-12: new low-speed USB device number 23 using xhci_hcd
Aug 24 08:39:57.415723 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:02.448295 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:02.655042 tux1 kernel: usb 1-12: device not accepting address 23, error -71
Aug 24 08:40:02.778269 tux1 kernel: usb 1-12: new low-speed USB device number 24 using xhci_hcd
Aug 24 08:40:07.604975 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:12.638751 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:12.845561 tux1 kernel: usb 1-12: device not accepting address 24, error -71
Aug 24 08:40:12.845696 tux1 kernel: usb usb1-port12: unable to enumerate USB device

At this time only hard power o...

Read more...

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Was one of the affected USB devices plugged in and you rebooted to get the wifi working? Or did that happen even without the device plugged in?

I've noticed once that even after if I rebooted my system to get wifi working, my external HDD didn't work after plugging it in, so I had to reboot again to get that working...

I'm just using the LTS kernel right now, which works fine for me, but because of that bug I'm kinda limited when choosing a distribution since most distros don't offer different kernel versions and I don't really want to recompile my kernel every time.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

No, it happened without a warning. Keyboard LED flashed some times, according to the device descriptor errors. This was the first time I noticed something like that and only on the RYZEN machine.
We talked about that xhci issue in other (git) threads, too:
https://github.com/aircrack-ng/rtl8812au/issues/376#issuecomment-522169478

BTW:
LTS kernel (4.19) still working fine here, too. In my opinion xhci host is unstable since 4.20. I noticed that everytime, when testing/improving a driver.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (5.8 KiB)

I noticed the same behavior. Not on an USB hdd, but on an USB ram:

This is an INTENSO USB 2 ALU LINE 64 GB USB stick:

[ 1032.600762] usb 1-11.4: new high-speed USB device number 15 using xhci_hcd
[ 1032.626487] hub 1-11:1.0: hub_ext_port_status failed (err = -71)
[ 1032.629487] usb 1-11-port4: cannot reset (err = -71)
[ 1032.632491] usb 1-11-port4: cannot reset (err = -71)
[ 1032.635486] usb 1-11-port4: cannot reset (err = -71)
[ 1032.638482] usb 1-11-port4: cannot reset (err = -71)
[ 1032.638483] usb 1-11-port4: Cannot enable. Maybe the USB cable is bad?

The stick is ok plugged in on another port:
[ 1465.770379] usb 1-11.4: USB disconnect, device number 23
[ 1708.302214] usb 1-2: new high-speed USB device number 24 using xhci_hcd
[ 1708.471933] usb 1-2: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 1.ff
[ 1708.471938] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1708.471940] usb 1-2: Product: Intenso Alu Line
[ 1708.471943] usb 1-2: Manufacturer: 6989
[ 1708.471945] usb 1-2: SerialNumber: 21F84CE8
[ 1708.479111] usb-storage 1-2:1.0: USB Mass Storage device detected

re-plugged in on 1-11-port4:
[ 1776.661289] usb 1-11.4: new high-speed USB device number 25 using xhci_hcd
[ 1776.810678] usb 1-11.4: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 1.ff
[ 1776.810684] usb 1-11.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1776.810687] usb 1-11.4: Product: Intenso Alu Line
[ 1776.810691] usb 1-11.4: Manufacturer: 6989
[ 1776.810694] usb 1-11.4: SerialNumber: 21F84CE8
[ 1776.817710] usb-storage 1-11.4:1.0: USB Mass Storage device detected

That leads me to the assumption that the xhci host is unstable, at least in combination with my controller:

[ 1.325164] xhci_hcd 0000:03:00.0: hcc params 0x0200ef81 hci version 0x110 quirks 0x0000000008000410
[ 1.325319] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.02
[ 1.325321] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 1.325322] usb usb1: Product: xHCI Host Controller
[ 1.325323] usb usb1: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[ 1.325323] usb usb1: SerialNumber: 0000:03:00.0
[ 1.325428] hub 1-0:1.0: USB hub found
[ 1.325443] hub 1-0:1.0: 14 ports detected
[ 1.325922] xhci_hcd 0000:03:00.0: xHCI Host Controller
[ 1.325925] xhci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 2
[ 1.325927] xhci_hcd 0000:03:00.0: Host supports USB 3.1 Enhanced SuperSpeed
[ 1.325958] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[ 1.325974] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.02
[ 1.325976] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 1.325977] usb usb2: Product: xHCI Host Controller
[ 1.325978] usb usb2: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[ 1.325979] usb usb2: SerialNumber: 0000:03:00.0
[ 1.326046] hub 2-0:1.0: USB hub found
[ 1.326057] hub 2-0:1.0: 8 ports detected
[ 1.326289] usb: port power management may be unreliable
[ 1.326451] xhci_hcd 0000:25:00.0: xHCI Host Controller
[ 1.326454] xhci_hcd 000...

Read more...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

tested another USB controller (at this time 3.1) and the results are even worse than on USB 3.0:
USB controller: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller (rev 02)
and
TENDA W311U+
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
This device is one of the few that work on an USB 3.0 controller
Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh)

but it failed epically on USB 3.1:
[ 1213.285622] rt2800usb 5-3.1.2:1.0 wlp39s0f3u3u1u2: renamed from wlan0
[ 1218.918384] ieee80211 phy6: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 1218.918427] ieee80211 phy6: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[ 1219.222282] device wlp39s0f3u3u1u2 entered promiscuous mode
[ 1220.797413] rt2800usb_tx_sta_fifo_read_completed: 186 callbacks suppressed
[ 1220.797417] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797452] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797531] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797611] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797692] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797772] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797851] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797931] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.798011] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.798091] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.814661] xhci_hcd 0000:27:00.3: WARN Cannot submit Set TR Deq Ptr
[ 1220.814663] xhci_hcd 0000:27:00.3: A Set TR Deq Ptr command is pending.
[ 1221.378769] ieee80211 phy6: rt2x00queue_flush_queue: Warning - Queue 0 failed to flush
[ 1221.409201] device wlp39s0f3u3u1u2 left promiscuous mode

I really hope it will be fixed until we reach next LTS-KERNEL.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Stanislaw Gruszka
We once talked about a rt2800usb issue (rt2800usb stops receiving) here:
https://bugzilla.kernel.org/show_bug.cgi?id=202243#c19

Now, I'm not sure, if it is related to this xhci issue or not, because I get it sometimes on kernel 4.19, too.

After doing setsockopt PACKET_MR_PROMISC:
https://github.com/ZerBea/hcxdumptool/blob/master/hcxdumptool.c#L5513

dmesg will show this warning (in this case running an USB 2.0 controller):
[ 1687.106514] device wlp3s0f0u2 entered promiscuous mode
[ 1687.106551] audit: type=1700 audit(1567932110.523:46): dev=wlp3s0f0u2 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
[ 1718.525815] ieee80211 phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush
[ 1718.558846] device wlp3s0f0u2 left promiscuous mode
[ 1718.558888] audit: type=1700 audit(1567932141.974:47): dev=wlp3s0f0u2 prom=0 old_prom=256 auid=1000 uid=0 gid=0 ses=2

The adapter stops working until it is plugged out and plugged in again:
[ 1722.950110] usb 1-2: USB disconnect, device number 5

If you think it is not related to this issue, I can open a new rt2800usb issue.

Revision history for this message
In , k.j.vanmierlo (k.j.vanmierlo-linux-kernel-bugs) wrote :

Hi,

a google search led me here. I'm getting the same error on my Lenovo Thinkpad X220 running Kubuntu 19.04. Everytime I plug in an USB memory stick or a SD card I get the following messages in dmesg:

[ 9649.078958] xhci_hcd 0000:05:00.0: WARN Cannot submit Set TR Deq Ptr
[ 9649.078966] xhci_hcd 0000:05:00.0: A Set TR Deq Ptr command is pending.

Linux koen-ThinkPad-X220 5.0.0-29-generic #31-Ubuntu SMP Thu Sep 12 13:05:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Koen

Revision history for this message
In , doug16k (doug16k-linux-kernel-bugs) wrote :

Got this issue on 5.0.0-29-generic, host hardware is Ryzen 2700X on B350 chipset (Asus Prime B350-Plus).

USB Device is Samsung Galaxy A5, Model SM-A520W, Android 8.0

[57460.411327] usb 1-4.1.4: USB disconnect, device number 10
[57460.411566] xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[57460.685963] usb 1-4.1.4: new high-speed USB device number 11 using xhci_hcd
[57460.830379] usb 1-4.1.4: New USB device found, idVendor=04e8, idProduct=6860, bcdDevice= 4.00
[57460.830382] usb 1-4.1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[57460.830383] usb 1-4.1.4: Product: SAMSUNG_Android
[57460.830385] usb 1-4.1.4: Manufacturer: SAMSUNG
[57460.830386] usb 1-4.1.4: SerialNumber: **withheld**

doug@doug-dt:~$ sudo lspci -s 2:0.0 -vvvvvv
[sudo] password for doug:
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset USB 3.1 xHCI Controller (rev 02) (prog-if 30 [XHCI])
 Subsystem: ASMedia Technology Inc. 300 Series Chipset USB 3.1 xHCI Controller

I have this kernel parameter to prevent other USB issues: usbcore.autosuspend=-1

Linux doug-dt 5.0.0-29-generic #31~18.04.1-Ubuntu SMP Thu Sep 12 18:29:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

This issue still exists on
$ uname -r
5.3.1-arch1-1-ARCH

Sep 24 08:14:00.374050 tux1 kernel: device wlp3s0f0u2 entered promiscuous mode
Sep 24 08:14:39.757848 tux1 kernel: xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Sep 24 08:14:39.758158 tux1 kernel: xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Sep 24 08:14:39.770950 tux1 kernel: mt7601u 1-2:1.0: Warning: TX DMA did not stop!

xhci host is running completely instable after receiving the first warning:
WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state
Ignoring this warning, the whole system freezes. At this time only a "hard" power off will help.

BTW:
Shouldn't we increase importance (next kernel will be LTS - and this issue will reach the major distributions).

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I have noticed that I don't get that error("WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state") anymore, even though I still have the same USB issues(maybe something in the rt2800usb driver changed, idk). I've even tried applying all the patches in the "for-usb-linus" branch from Mathias Nyman's git repo - but I still have the same issue.

Maybe more people should send a message to the usb kernel mailing list(<email address hidden>)? I didn't get a response the last time but maybe they will address this issue if they see that more users are affected by this regression.

BTW @Michael:
There is a commit in the for-usb-linus branch that could fix the system freezes you've encountered: https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/commit/?h=for-usb-linus&id=750ed908bbb57153c75b79c50135e7cc94feb4a5

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Bernhard.
Thanks. I'll check it. Also thanks for setting prio to high.
Until the system freezes, I receive the funniest warnings from the xhci system: bad cable, bad device, firmware not loaded,...

"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" depend also on the device:
Running a
148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
I got no "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"

Running
148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter
I got the warning.

Both of them using the rt2800usb driver.

That and the different warnings let me assume, the xhci host is running completely instable, especially when hcxdumptool doing high workload.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

It seems that the commit is working - no freeze, up to now.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Nop, doesn't work as expected. No freezes, but:

[ 2914.285601] ieee80211 phy77: Atheros AR9271 Rev:1
[ 2914.286229] ath9k_htc 1-3:1.0 wlp0s20f0u3: renamed from wlan0
[ 2914.389748] usb 1-3: USB disconnect, device number 83
[ 2914.749819] ath: phy77: Failed to wakeup in 500us
[ 2914.760221] ath: phy77: Failed to wakeup in 500us
[ 2914.770309] ath: phy77: Failed to wakeup in 500us
[ 2914.780411] ath: phy77: Failed to wakeup in 500us
[ 2915.283332] usb 1-3: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[ 2915.531824] usb 1-3: ath9k_htc: Firmware - ath9k_htc/htc_9271-1.4.0.fw download failed
[ 2915.532206] usb 1-3: ath9k_htc: USB layer deinitialized
[ 2928.339410] ------------[ cut here ]------------
[ 2928.339505] WARNING: CPU: 1 PID: 704 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[ 2928.339506] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath nfnetlink_queue nfnetlink_log nfnetlink ccm uas usb_storage rt2800usb rt2x00usb rt2800lib rt2x00lib fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo snd_soc_skl videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_soc_hdac_hda snd_hda_codec_hdmi videobuf2_common snd_hda_ext_core videodev snd_soc_skl_ipc snd_hda_codec_realtek rtsx_usb_ms memstick mc snd_soc_sst_ipc x86_pkg_temp_thermal snd_soc_sst_dsp r8169 intel_powerclamp snd_soc_acpi_intel_match snd_soc_acpi coretemp snd_soc_core kvm_intel snd_hda_codec_generic ledtrig_audio realtek snd_compress rtl8821ae ac97_bus kvm libphy irqbypass snd_pcm_dmaengine ipmi_devintf btcoexist ipmi_msghandler crct10dif_pclmul crc32_pclmul i915 rtl_pci rtlwifi mac80211 ghash_clmulni_intel joydev cfg80211 mousedev aesni_intel mei_hdcp libarc4 iTCO_wdt aes_x86_64 i2c_hid crypto_simd snd_hda_intel i2c_algo_bit cryptd asus_nb_wmi iTCO_vendor_support
.....

At this time xhci crashed TP-LINK TL722WN v1.
And that device worked, before...
xhci is still running completely unstable and the delivered warnings are unpredictable.

Revision history for this message
Felix Moreno (felix-justdust) wrote :

having same problem with Bus 002 Device 004: ID 174c:55aa ASMedia Technology Inc. Name: ASM1051E SATA 6Gb

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

So Felix, can you provide more details like the machine or device you're using, a dmesg showing the problem, and a bit more information about the device itself? I guess you're the first reporter with a "SATA" device showing that.

Thanks,

Guilherme

Revision history for this message
In , viniciuspython (viniciuspython-linux-kernel-bugs) wrote :
Download full text (5.9 KiB)

Just providing some information that could be helpful to debug the issue. It is also affecting me.

Kernel version:
# uname -a
Linux arch 5.3.1-arch1-1-ARCH #1 SMP PREEMPT Sat Sep 21 11:33:49 UTC 2019 x86_64 GNU/Linux

Hardware specs: AMD Ryzen 5 2400G

The issue happens when I plug in an Alfa AWUS036NH (148f:3070 Ralink Technology, Corp. RT2870/RT3070) - It uses the module rt2800usb

Below you can find my dmesg output when I plug in the Alfa device:

---
[ 1130.410091] usb 1-10: new high-speed USB device number 5 using xhci_hcd
[ 1130.653103] usb 1-10: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[ 1130.653108] usb 1-10: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1130.653111] usb 1-10: Product: 802.11 n WLAN
[ 1130.653113] usb 1-10: Manufacturer: Ralink
[ 1130.653114] usb 1-10: SerialNumber: 1.0
[ 1130.864470] usb 1-10: reset high-speed USB device number 5 using xhci_hcd
[ 1131.110058] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[ 1131.788103] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 0005 detected
[ 1131.794331] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[ 1131.833798] rt2800usb 1-10:1.0 wlp1s0f0u10: renamed from wlan0
[ 1131.834234] audit: type=1130 audit(1569896348.109:56): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1131.867763] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 1131.867797] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[ 1136.117228] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 1136.840084] audit: type=1131 audit(1569896353.117:57): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
---

I don't know if this is useful, but I do have another USB WiFi that uses another module but doesn't trigger the issue when I plug in:
lsusb output: 2357:010c TP-Link TL-WN722N v2

Below is the dmesg output when I plug in the TP-LINK:

---
[ 1697.619576] usb 1-7: new high-speed USB device number 9 using xhci_hcd
[ 1697.846601] usb 1-7: New USB device found, idVendor=2357, idProduct=010c, bcdDevice= 0.00
[ 1697.846603] usb 1-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1697.846605] usb 1-7: Product: 802.11n NIC
[ 1697.846606] usb 1-7: Manufacturer: Realtek
[ 1697.846607] usb 1-7: SerialNumber: 00E04C0001
[ 1697.858603] Chip Version Info: CHIP_8188E_Normal_Chip_TSMC_D_CUT_1T1R_RomVer(0)
[ 1698.262531] r8188eu 1-7:1.0 wlp1s0f0u7: renamed from wlan0
[ 1711.847379] MAC Address = c0:25:e9:1f:5c:3c
[ 1712.075372] R8188EU: indicate disassoc
---

Additionally, I see the warning when I plug in a Samsung Galaxy S5 device, but the warning appears only when I select certain "USB modes" in Android. Below you can see the dmesg log for each one of the USB modes:

--- dmesg log for "No data transfer" USB mode ---
[ 2523.666729] usb 1-7: USB disconnect, device number 32
[ 2524.157919] usb 1-7: new high-speed U...

Read more...

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

@Vinicius
Which motherboard do you have?

Maybe the issue is related to 300-series motherboards...

Revision history for this message
In , viniciuspython (viniciuspython-linux-kernel-bugs) wrote :

My motherboard is a Biostar B350GT3.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

I've sent another mail to the kernel usb mailing list, this time I got a response. I sent them kernel debugging logs/traces from xhci, unfortunately I have one of the devices where the error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state." doesn't get shown anymore, which makes it harder to find the cause for the problem.

@Michael
Could you do the following steps, upload the dmesg log and trace file somewhere and post the link to the files here(or send them directly to the mailing list yourself, if you prefer that)? When using one of the devices where the error gets shown obviously.

1. start the PC with an affected kernel, but without the affected device plugged in, then run the following commands as root
2. mount -t debugfs none /sys/kernel/debug
3. echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
4. echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
5. echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
6. echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
7. Plug in the affected device
8. Send output of dmesg and the /sys/kernel/debug/tracing/trace file(upload them somewhere, especially the trace file will be big)

Thanks in advance

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Here it goes:
https://www.sendspace.com/file/413hlj

ALFA AWUS036NH connected to USB 3.x port running stress test using hcxdumptool.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

If the error occurred once, xhci will be unusable for all other devices:

[20480.414467] usb 1-2: new full-speed USB device number 6 using xhci_hcd
[20480.717690] usb 1-2: New USB device found, idVendor=1546, idProduct=01a7, bcdDevice= 1.00
[20480.717695] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[20480.717698] usb 1-2: Product: u-blox 7 - GPS/GNSS Receiver
[20480.717700] usb 1-2: Manufacturer: u-blox AG - www.u-blox.com
[20480.726485] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[20480.760327] audit: type=1130 audit(1570274963.323:75): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=gpsdctl@ttyACM0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20486.732259] usb 1-2: USB disconnect, device number 6
[20486.746846] audit: type=1131 audit(1570274969.310:76): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=gpsdctl@ttyACM0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20487.027593] usb 1-2: new full-speed USB device number 7 using xhci_hcd
[20487.244298] usb 1-2: device descriptor read/64, error -71
[20487.540954] usb 1-2: device descriptor read/64, error -71
[20487.837571] usb 1-2: new full-speed USB device number 8 using xhci_hcd
[20487.991378] usb 1-2: device descriptor read/64, error -71
[20488.287616] usb 1-2: device descriptor read/64, error -71
[20488.394301] usb usb1-port2: attempt power cycle
[20489.037910] usb 1-2: new full-speed USB device number 9 using xhci_hcd
[20489.065424] usb 1-2: Device not responding to setup address.
[20489.271605] usb 1-2: Device not responding to setup address.
[20489.477900] usb 1-2: device not accepting address 9, error -71

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Good News!
After reading a bit in the xhci spec sheet I've figured out what the problem is. I've already created a patch and sent it to the mailing list, so it will hopefully be fixed in 5.4.

If you want to see or try the patch, you can find it here: https://marc.info/?l=linux-usb&m=157092844415047

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Nevermind, I've misunderstood something in the xhci spec sheet, apparently the xhci slot id isn't the same as the "TT Hub slot id".

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

Created attachment 285501
Patch adding doorbell tracing

Patch that adds even more tracing, this will show if xhci driver
correctly rings endpoint doorbell to start endpoint after soft retry

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 285505
Dmesg log and trace file

Not sure how useful the logs from my device are, because the error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" only gets shown after unplugging the device, but it looks like the error messages are mostly the same.

There are still some differences compared to Michaels device though:
Dmesg from him(the one I've also sent to the mailing list):
[ 96.789306] xhci_hcd 0000:03:00.0: Resetting device with slot ID 4
[ 96.789313] xhci_hcd 0000:03:00.0: // Ding dong!
[ 96.791053] xhci_hcd 0000:03:00.0: Completed reset device command.
[ 96.791111] xhci_hcd 0000:03:00.0: Successful reset device command.

compared to mine:
[ 91.777887] xhci_hcd 0000:15:00.0: Resetting device with slot ID 4
[ 91.777892] xhci_hcd 0000:15:00.0: // Ding dong!
[ 91.777940] xhci_hcd 0000:15:00.0: Completed reset device command.
[ 91.777950] xhci_hcd 0000:15:00.0: Can't reset device (slot ID 4) in default state
[ 91.777951] xhci_hcd 0000:15:00.0: Not freeing device rings.
[ 91.777956] xhci_hcd 0000:15:00.0: // Ding dong!

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Some times the warning doesn't appear. Instead the the driver crashed:
$ dmidecode
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: X555UB

$ cat /proc/cpuinfo
model name : Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz

device connected to USB 3:

[10799.155340] usb 1-2: reset high-speed USB device number 12 using xhci_hcd
[10799.310446] ieee80211 phy5: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[10799.364982] ieee80211 phy5: rt2x00_set_rf: Info - RF chipset 0005 detected
[10799.365842] ieee80211 phy5: Selected rate control algorithm 'minstrel_ht'
[10799.412456] rt2800usb 1-2:1.0 wlp0s20f0u2: renamed from wlan0
[10799.432236] ieee80211 phy5: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[10799.432263] ieee80211 phy5: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[10799.728051] ieee80211 phy5: rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x0404 with error -71
[10800.745185] ieee80211 phy5: rt2800_wait_csr_ready: Error - Unstable hardware
[10800.745197] ieee80211 phy5: rt2800usb_set_device_state: Error - Device failed to enter state 4 (-5)
...
[11237.887923] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[11237.887929] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.

xhci is unstable - not the hardware.

The same device, connected to the same notebook, but to a USB 2 port:
[11243.042957] usb 1-3: reset high-speed USB device number 13 using xhci_hcd
[11243.197261] ieee80211 phy6: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[11243.251969] ieee80211 phy6: rt2x00_set_rf: Info - RF chipset 0005 detected
[11243.253036] ieee80211 phy6: Selected rate control algorithm 'minstrel_ht'
[11243.272919] rt2800usb 1-3:1.0 wlp0s20f0u3: renamed from wlan0
[11243.293056] ieee80211 phy6: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[11243.293082] ieee80211 phy6: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

@Michael
Could you apply the patch from Mathias(comment 71) to the kernel, enable xhci tracing(steps in comment 66), and upload the dmesg and trace file?
The patch adds more tracing which will make it easier to find the exact issue.

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :
Download full text (4.9 KiB)

@Bernhard

Logs with added tracing show that driver does ring the endpoint doorbell, so
host controller should start processing the pending requests. Endpoint is
in stopped state as it should after endpoint reset, before we ring the doorbell.

So this part looks like hardware isn't doing its part.

when class driver starts cancelling transfer requests after some timeout time, we can see that the endpoint is in halted state. Host controller didn't issue any
event when endpoint turned into halted state. so driver is unaware of this state.

There is also a bug in the driver how the error is handled later. After the timeout, when class driver starts cancelling transfers, and xhci driver tries to stop the endpoint to cancel tranfers, it sohuld react to the context state error,
and check endpoint state, and handle the halted endpoint p
Driver should react to this, it should detect and handle the halted endpoint before attempting to set a new dequeue pointer. Now it just bluntly tries to set
a new dequeue pointer, and fails.

Details:
* We get a transaction error event, for transfer request (TRB) at 0xf61a0000

96.985254: xhci_handle_event: EVENT: TRB 00000000f61a0000 status 'USB Transaction Error' len 3860 slot 4 ep 3 type 'Transfer Event' flags e:C
96.985262: xhci_handle_transfer: BULK: Buffer 00000000ff32b04c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C

* We issue a Reset endpoint command to resolve the halted endpoint
 (move endpoint from halted to stopped state)

96.985264: xhci_queue_trb: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
96.985265: xhci_inc_enq: CMD 0000000090dd7572: enq 0x00000000fff7e550(0x00000000fff7e000) deq 0x00000000fff7e540(0x00000000fff7e000) segs 1
96.985266: xhci_ring_host_doorbell: Ring doorbell for Command Ring 0
96.985268: xhci_inc_deq: EVENT 000000005715d3fc: enq 0x00000000fff7c000(0x00000000fff7c000) deq 0x00000000fff7c4a0(0x00000000fff7c000) segs 1

* Reset endpoint command successfully, endpoint state is now "stopped"

96.985395: xhci_handle_event: EVENT: TRB 00000000fff7e540 status 'Success' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:C
96.985396: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
96.985397: xhci_handle_cmd_reset_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512 deq 00000000f61a0001 avg \
trb len 0

* We ring the doorbell, xHC hardware should start processing events on ring,

96.985402: xhci_ring_ep_doorbell: Ring doorbell for Slot 4 ep1in

* but nothing happends, this endpoint i silent until class driver starts cancelling Transfers ~25 seconds later

122.813121: xhci_urb_dequeue: ep1in-bulk: urb 00000000790ce3f7 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
122.813134: xhci_dbg_cancel_urb: Cancel URB 00000000790ce3f7, dev 4, ep 0x81, starting at offset 0xf61a07f0

* stop the endpoint to cancel the pending transfers

122.813137: xhci_queue_trb: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
122.813137: xhci_inc_enq: CMD 0000000090dd7572: enq 0x00000000fff7e560(0x00000000fff7e000) deq 0x00000000fff7e550(0x0...

Read more...

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

you could try to flush the endpoint ringing PCI write, and see if it helps
starting the endpint, but I don't have high hopes for this, a PCI write should
be flushed anyway, especially in 25 seconds.

maybe also add trace to re-read the endpoint state after flushing pci write:
(untested)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e74518e7de6a..20e209b64551 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -408,6 +408,7 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
        trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id));

        writel(DB_VALUE(ep_index, stream_id), db_addr);
+ readl(db_addr);
        /* The CPU has better things to do at this point than wait for a
         * write-posting flush. It'll get there soon enough.
         */
@@ -1176,6 +1177,8 @@ static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id,
        /* if this was a soft reset, then restart */
        if ((le32_to_cpu(trb->generic.field[3])) & TRB_TSP)
                ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+
+ trace_xhci_handle_cmd_reset_ep(ep_ctx);
 }

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 285527
Logs after flushing endpoint

I've applied the patch, but it seems like the endpoint doesn't get started even after flushing the endpoint.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Bernhard, can't do further going tests at the moment, because I'm on vacation until November.

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

Created attachment 285709
Patch handling halted endpoints at completion of stop endpoint command

Patch to handle a context state error at stop endpoint completion
where a endpoint TRB processing had a error/stall, and hardware halted the
endpoint just before completing normal stop endpoint command.

This won't fix the initial issue about endpoint not restarting after
soft retry, but it should resolve the flood of "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" messages

Code is completely untested as I can't trigger this codepath manually.
It requires hardware halting a endpoint just before completing a stop
endpoint command

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 285713
Logs after applying the patch

After applying the patch the "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" messages are indeed gone, and the issue is (as expected) still there.

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

(In reply to Bernhard from comment #80)
> Created attachment 285713 [details]
> Logs after applying the patch

Did you by mistake attach some old logs?

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Created attachment 285717
Logs after applying the patch

Yes, looks like I've uploaded the zip file from the wrong folder. The new file should be the right one.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

The "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" doesn't flood the log file. The message appear only if the device is disconnected (after xhci died):

Connected the device:
[42407.193511] usb 1-2: ath9k_htc: USB layer deinitialized
[42410.956671] usb 1-2: new high-speed USB device number 9 using xhci_hcd
[42411.214091] usb 1-2: New USB device found, idVendor=0cf3, idProduct=9271, bcdDevice= 1.08
[42411.214095] usb 1-2: New USB device strings: Mfr=16, Product=32, SerialNumber=48
[42411.214098] usb 1-2: Product: USB2.0 WLAN
[42411.214100] usb 1-2: Manufacturer: ATHEROS
[42411.214102] usb 1-2: SerialNumber: 12345
[42411.232116] usb 1-2: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[42412.308181] usb 1-2: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[42412.558320] ath9k_htc 1-2:1.0: ath9k_htc: HTC initialized with 33 credits
[42412.784721] ath9k_htc 1-2:1.0: ath9k_htc: FW Version: 1.4
[42412.784724] ath9k_htc 1-2:1.0: FW RMW support: On
[42412.784726] ath: EEPROM regdomain: 0x809c
[42412.784727] ath: EEPROM indicates we should expect a country code
[42412.784728] ath: doing EEPROM country->regdmn map search
[42412.784729] ath: country maps to regdmn code: 0x52
[42412.784730] ath: Country alpha2 being used: CN
[42412.784731] ath: Regpair used: 0x52
[42412.788460] ieee80211 phy2: Atheros AR9271 Rev:1
[42412.791852] ath9k_htc 1-2:1.0 wlp3s0f0u2: renamed from wlan0

and everything is looking fine.

after running the device for a few minutes
[42445.806367] device wlp3s0f0u2 entered promiscuous mode

we receive the first indication that xhci died
[42911.706734] ath: phy2: Unable to set channel

and the device stops working. There are absolutely no other error messages, shwon by dmesg or the running application (in this case hcxdumptool).

Now we disconnect the device and got the final warning:
[43082.759737] usb 1-2: USB disconnect, device number 9
[43082.760434] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[43082.760607] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[43082.764275] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[43082.784722] device wlp3s0f0u2 left promiscuous mode

At this point xhci is dead. No other device connected to the same port is working.

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

(In reply to Michael from comment #83)
> The "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
> doesn't flood the log file. The message appear only if the device is
> disconnected (after xhci died):
>

Could you take full logs and traces of this:

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
< Trigger the issue >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace

In Bernhards case there were issues both with hardware not starting the
ring after soft retry, and software not handling context state error when stopping an endpoint. Second issue can be fixed in driver.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

I try to trigger it. That isn't so easy, because different devices showing different behavior and the occurrence of the issue is totally random. Sometimes it happens immediately after connecting the device and sometimes it happens after a while or heavy stressing the device.

BTW:
mount -t debugfs none /sys/kernel/debug
is done by default here.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

I'm doing several runs, using different devices. So we have the chance to compare them against each other.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Here go.
https://www.sendspace.com/file/8ybhnk

Unfortunately it looks like this stress test was to heavy for dmesg's ringbuffer.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

After several tests, I assume that this warning:
"rt2x00queue_flush_queue: Warning - Queue 14 failed to flush"
is also related to the xhci issue. I don't think that the issue is related to powermanagement (https://bugzilla.kernel.org/show_bug.cgi?id=61621), because pwrmgt is disabled, here.

affected: rt2800usb
[ 7384.825764] usb 1-2: new high-speed USB device number 8 using xhci_hcd
[ 7385.069208] usb 1-2: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[ 7385.069211] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 7385.069214] usb 1-2: Product: 802.11 n WLAN
[ 7385.069216] usb 1-2: Manufacturer: Ralink
[ 7385.069217] usb 1-2: SerialNumber: 1.0
[ 7385.280539] usb 1-2: reset high-speed USB device number 8 using xhci_hcd
[ 7385.526260] ieee80211 phy3: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[ 7386.204480] ieee80211 phy3: rt2x00_set_rf: Info - RF chipset 0005 detected
[ 7386.210679] ieee80211 phy3: Selected rate control algorithm 'minstrel_ht'
[ 7386.227147] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[ 7386.227812] audit: type=1130 audit(1572610437.724:150): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 7387.737404] ieee80211 phy3: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 7387.737440] ieee80211 phy3: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

The bad thing on this issue is that it isn't detectable by an application, while the device is plugged in. The device doesn't start or stops working without any warning. The application says every thing is fine and dmesg showing absolutely no warning.
Only when the device is plugged out, we get a bunch of warnings, depending on the device (tested on INTEL and AMD systems):
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
"rt2x00queue_flush_queue: Warning - Queue 14 failed to flush"
"rx urb failed: -71"
"A Set TR Deq Ptr command is pending."
and more (bad cable, hardware error, ....).

BTW:
I'm running kernel 4.19.80 in parallel and every thing is fine here. This issue appeared for the first time on 4.20.

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

Seems that it was a known issue that xHCI on AMD platforms can fail to restart an endpoint if it wasn't running when the stop command was issued. This also applies to Berhards case where the endpoint stop command raced with an error halting the endpoint.
See patch:

commit 28a2369f7d72ece55089f33e7d7b9c1223673cc3
Author: Shyam Sundar S K <email address hidden>
Date: Thu Jul 20 14:48:28 2017 +0300

    usb: xhci: Issue stop EP command only when the EP state is running

    on AMD platforms with SNPS 3.1 USB controller if stop endpoint command is
    issued the controller does not respond, when the EP is not in running
    state. HW completes the command execution and reports
    "Context State Error" completion code. This is as per the spec. However
    HW on receiving the second command additionally marks EP to Flow control
    state in HW which is RTL bug. This bug causes the HW not to respond
    to any further doorbells that are rung by the driver. This makes the EP
    to not functional anymore and causes gross functional failures.

    As a workaround, not to hit this problem, it's better to check the EP state
    and issue a stop EP command only when the EP is in running state.

    As a sidenote, even with this patch there is still a possibility of
    triggering the RTL bug if the context state races with the stop endpoint
    command as described in xHCI spec 4.6.9

    [code simplification and reworded sidenote in commit message -Mathias]
    Signed-off-by: Shyam Sundar S K <email address hidden>
    Signed-off-by: Nehal Shah <email address hidden>
    Signed-off-by: Mathias Nyman <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

Does anybody have a link to that errata?

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

I can't confirm that, because this issue happens on all platforms if the device is connected to an USB 3 port:
RYZEN 1700, MSI X370 KRAIT
INTEL I5-6200U, ASUS X555U (notebook)
INTEL i7-3930K, ASUS P9X79

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

The only systems which are running without this issue are my Raspberry Pi's:
$ uname -r
4.19.80-2-ARCH

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Generating a lot of traffic on the socket, causes xhci to die very early.

Here it happened on an AMD RYZEN system, running hcxdumptool:
[ 8316.184018] device wlp3s0f0u2 entered promiscuous mode
[ 8372.392206] ath: phy0: Unable to remove monitor interface at idx: 0
[ 8374.525500] ath: phy0: Unable to remove station entry for monitor mode
[ 8381.692889] usb 1-2: USB disconnect, device number 5
[ 8381.693576] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or

and here on an INTEL notebook running NetworkManager:
[ 166.174157] usb 1-1: new high-speed USB device number 8 using xhci_hcd
[ 166.330703] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[ 166.330713] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 166.330719] usb 1-1: Product: WiFi
[ 166.330725] usb 1-1: Manufacturer: MediaTek
[ 166.330729] usb 1-1: SerialNumber: 1.0
[ 166.458249] usb 1-1: reset high-speed USB device number 8 using xhci_hcd
[ 166.607874] usb 1-1: ASIC revision: 76100002 MAC revision: 76502000
[ 167.669762] usb 1-1: EEPROM ver:02 fae:01
[ 203.846465] mt76u_complete_rx: 13 callbacks suppressed
[ 203.846479] usb 1-1: rx urb failed: -71
[ 203.846552] usb 1-1: rx urb failed: -71
[ 203.846614] usb 1-1: rx urb failed: -71
[ 203.846667] usb 1-1: rx urb failed: -71
[ 203.846712] usb 1-1: rx urb failed: -71
[ 203.846799] usb 1-1: rx urb failed: -71
[ 203.846874] usb 1-1: rx urb failed: -71
[ 203.846924] usb 1-1: rx urb failed: -71
[ 203.846998] usb 1-1: rx urb failed: -71
[ 203.847069] usb 1-1: rx urb failed: -71
[ 203.848249] usb 1-1: USB disconnect, device number 8
[ 203.850032] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[ 203.850040] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (4.8 KiB)

Running really heavy traffic, first xhci caused the driver to crash, than the whole system crashed:
System: ASUS X555UB (INTEL)

[ 1564.588784] mt7601u 1-2:1.0: Error: TSSI upper saturation
[ 1614.221860] ------------[ cut here ]------------
[ 1614.221923] WARNING: CPU: 1 PID: 0 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[ 1614.221924] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink uas usb_storage ccm mt7601u hid_generic usbhid fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev snd_soc_skl x86_pkg_temp_thermal intel_powerclamp snd_soc_hdac_hda coretemp mc kvm_intel snd_hda_ext_core rtl8821ae snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi kvm btcoexist snd_soc_core snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev ledtrig_audio snd_compress mousedev ac97_bus snd_pcm_dmaengine irqbypass rtsx_usb_ms rtl_pci r8169 memstick rtlwifi i915 btusb mac80211 btrtl ipmi_devintf realtek ipmi_msghandler libphy i2c_algo_bit cfg80211 btbcm crct10dif_pclmul drm_kms_helper crc32_pclmul btintel ghash_clmulni_intel snd_hda_intel drm bluetooth snd_hda_codec libarc4 aesni_intel asus_nb_wmi snd_hda_core asus_wmi intel_gtt aes_x86_64
[ 1614.221947] intel_rapl_msr agpgart ecdh_generic crypto_simd sparse_keymap i2c_hid cryptd rfkill iTCO_wdt mei_hdcp hid snd_hwdep glue_helper syscopyarea ecc sysfillrect iTCO_vendor_support sysimgblt fb_sys_fops snd_pcm pcspkr intel_cstate intel_uncore mxm_wmi intel_rapl_perf input_leds elan_i2c tpm_crb snd_timer tpm_tis snd tpm_tis_core tpm int3403_thermal soundcore intel_xhci_usb_role_switch evdev i2c_i801 roles processor_thermal_device mei_me mei rng_core idma64 mac_hid intel_lpss_pci intel_lpss intel_rapl_common int340x_thermal_zone intel_soc_dts_iosf intel_pch_thermal int3400_thermal acpi_thermal_rel asus_wireless wmi ac battery sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb sr_mod cdrom sd_mod serio_raw atkbd ahci libps2 libahci libata xhci_pci crc32c_intel scsi_mod xhci_hcd i8042 serio
[ 1614.221975] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P W OE 5.3.8-arch1-1 #1
[ 1614.221976] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[ 1614.221993] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[ 1614.221994] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 40 a1 d1 b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 7b c1 c0 e8 b7 53 4f d1 <0f> 0b 48 89 ef e8 7f 28 b4 d1 e9 d1 5b fb ff 48 c7 c7 60 7b c1 c0
[ 1614.221995] RSP: 0018:ffffa50840120e10 EFLAGS: 00010246
[ 1614.221996] RAX: 0000000000000024 RBX: ffff92206bc407a0 RCX: 0000000000000000
[ 1614.221997] RDX: 0000000000000000 RSI: ffff92207ba97708 RDI: 00000000ffffffff
[ 1614.221998] RBP: ffff922034510400 R08: 0000000000001137 R09: 0000000000000001
[ 1614.221998] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 1614.221999] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000000
[ 1614.222000] FS: 0000000000000000(0000) GS:ffff92207ba8000...

Read more...

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

Michael, I've been looking at the traces and can't find anything xhci related in your logs that could cause this. xhci isn't dying, crashig or causing other drivers to crash in the above logs either. It doesn't seem related to Bernhards case.

Have you tried bisecting what patch causes the problems between 4.19 and 4.20 kernels?

The "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is related to unplugging of the device. In short, while unplugging the device we get a transaction error for each running endpoint before the hub thread notices the disconnect, so xhci driver tries to recover the endpoint before everything is tore down and returned for the device. It's should be harmless at this stage.

There are several disconnect events initiated by device, or then actual physical
disconnect, could be related to firmware loading?

Traces also show many bulk-in urbs being queued but none completed until cancel at disconnect. so we are waiting 49 seconds to get data from the device before disconnect.

URB b2383f4 TRB is queued from ep4in, waiting for data from device:

  13714.468994: xhci_urb_enqueue: ep4in-bulk: urb 000000000b2383f4 pipe 3221360512 slot 14 length 0/4096 sgs 1/1 stream 0 flags 00040200
  13714.468996: xhci_queue_trb: BULK: Buffer 00000000ff5df000 length 4096 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
  13714.468996: xhci_inc_enq: BULK 0000000096dfdec9: enq 0x00000000feaec010(0x00000000feaec000) deq 0x00000000feaec000(0x00000000feaec000) segs 2 stream 0 free_trbs 508 bounce 512\

49 seconds later transaction error on ep4in on disconnect:

   13763.472759: xhci_handle_event: EVENT: TRB 00000000feaec000 status 'USB Transaction Error' len 4096 slot 14 ep 9 type 'Transfer Event' flags e:c
   ...
   13763.472787: xhci_handle_event: EVENT: TRB 000000000a000000 status 'Success' len 0 slot 0 ep 0 type 'Port Status Change Event' flags e:c
   13763.472792: xhci_handle_port_status: port-1: Powered Not-connected Disabled Link:RxDetect PortSpeed:0 Change: CSC Wake:

After this urb b2383f4 is canceled and given back:

  13763.474221: xhci_urb_dequeue: ep4in-bulk: urb 000000000b2383f4 pipe 3221360512 slot 14 length 0/4096 sgs 1/1 stream 0 flags 00040200
  13763.474225: xhci_dbg_cancel_urb: Cancel URB 000000000b2383f4, dev 2, ep 0x84, starting at offset 0xfeaec000
   ...
   13763.474673: xhci_urb_giveback: ep4in-bulk: urb 000000000b2383f4 pipe 3221360512 slot 14 length 0/4096 sgs 1/1 stream 0 flags 00040200

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Mathias, it is really hard to find the cause of that issue. dmesg is showing nothing until something crashed. I'm not able to detect the cause:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c89
At this point, I know:
- the driver stops working (independent of the driver - rt2800usb as well as mt76)
- no warning, no error message)
- the system became instable (AMD as well as INTEL)
- kernel 4.20 up to 5.3

It is very unlikely that the driver caused this, because it doesn't happen on USB2 and it happens on different drivers and different systems.

I can try to bisect to identify the patch, but that will take a while.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (18.6 KiB)

"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" only appeared when something went wrong.
If everything's fine and I plug out the device, this warning is not shown.

Here are the results from another device
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
running on an INTEL system.

dmesg output if everything is ok:
[14492.749187] usb 1-1: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[14492.749197] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[14492.749203] usb 1-1: Product: 802.11 n WLAN
[14492.749208] usb 1-1: Manufacturer: Ralink
[14492.749213] usb 1-1: SerialNumber: 1.0
[14492.881097] usb 1-1: reset high-speed USB device number 20 using xhci_hcd
[14493.035766] ieee80211 phy11: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[14493.090480] ieee80211 phy11: rt2x00_set_rf: Info - RF chipset 0005 detected
[14493.091489] ieee80211 phy11: Selected rate control algorithm 'minstrel_ht'
[14493.113656] rt2800usb 1-1:1.0 wlp0s20f0u1: renamed from wlan0
[14493.116525] audit: type=1130 audit(1573227592.687:137): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[14493.141430] ieee80211 phy11: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[14493.141456] ieee80211 phy11: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[14498.126056] audit: type=1131 audit(1573227597.697:138): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[14506.300174] usb 1-1: USB disconnect, device number 20
[14506.463603] audit: type=1130 audit(1573227606.037:139): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

demsg if the device stops working and something went wrong:
[14565.489976] usb 1-1: new high-speed USB device number 21 using xhci_hcd
[14565.648114] usb 1-1: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[14565.648124] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[14565.648130] usb 1-1: Product: 802.11 n WLAN
[14565.648135] usb 1-1: Manufacturer: Ralink
[14565.648140] usb 1-1: SerialNumber: 1.0
[14565.773934] usb 1-1: reset high-speed USB device number 21 using xhci_hcd
[14565.927986] ieee80211 phy12: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[14565.982385] ieee80211 phy12: rt2x00_set_rf: Info - RF chipset 0005 detected
[14565.983295] ieee80211 phy12: Selected rate control algorithm 'minstrel_ht'
[14566.002249] rt2800usb 1-1:1.0 wlp0s20f0u1: renamed from wlan0
[14566.004829] audit: type=1130 audit(1573227665.577:141): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[14566.018308] ieee80211 phy12: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[14566.018335] ieee80211 phy12: rt2x00lib_request_firmware: Info - Firmware detected - versi...

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

(In reply to Michael from comment #96)
> I can try to bisect to identify the patch, but that will take a while.

Tbh I would try reverting the commit that caused the problem for me first, just to make sure you're not spending multiple hours bisecting this issue and then find out that you're affected by the same commit.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Bernhard, that will be great. I'm not at home and my ASUS notebook is really too slow to perform a bisect.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (10.5 KiB)

@Bernhard, @Mathias
I'm not sure anymore if the issue is related to xhci, because of the lates WARNINGs and traces.
I tested a PCIe card
Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter
and running into similar issues:
12506.901197] wlp3s0: deauthenticating from 00:24:d4:9e:e8:c4 by local choice (Reason: 3=DEAUTH_LEAVING)
[12506.902535] ------------[ cut here ]------------
[12506.902589] WARNING: CPU: 1 PID: 15941 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[12506.902590] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink mt7601u ccm fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc rtsx_usb_ms snd_soc_sst_dsp rtl8821ae snd_soc_acpi_intel_match x86_pkg_temp_thermal memstick snd_soc_acpi intel_powerclamp btcoexist coretemp r8169 kvm_intel snd_soc_core rtl_pci rtlwifi snd_hda_codec_hdmi snd_compress ac97_bus kvm mac80211 snd_hda_codec_realtek ipmi_devintf ipmi_msghandler snd_pcm_dmaengine snd_hda_codec_generic irqbypass ledtrig_audio cfg80211 realtek libphy joydev mousedev iTCO_wdt i915 iTCO_vendor_support crct10dif_pclmul crc32_pclmul libarc4 ghash_clmulni_intel btusb btrtl aesni_intel btbcm btintel bluetooth aes_x86_64 snd_hda_intel snd_hda_codec crypto_simd cryptd i2c_algo_bit i2c_hid glue_helper drm_kms_helper asus_nb_wmi intel_rapl_msr
[12506.902624] asus_wmi drm sparse_keymap mei_hdcp snd_hda_core intel_cstate mxm_wmi intel_uncore intel_rapl_perf intel_gtt agpgart ecdh_generic snd_hwdep pcspkr rfkill syscopyarea snd_pcm sysfillrect ecc sysimgblt fb_sys_fops tpm_crb input_leds snd_timer elan_i2c tpm_tis tpm_tis_core snd int3403_thermal tpm i2c_i801 evdev rng_core soundcore processor_thermal_device intel_rapl_common mac_hid idma64 intel_xhci_usb_role_switch int340x_thermal_zone roles intel_soc_dts_iosf mei_me int3400_thermal mei acpi_thermal_rel intel_pch_thermal intel_lpss_pci intel_lpss asus_wireless wmi battery ac sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sr_mod cdrom sd_mod serio_raw atkbd libps2 ahci libahci libata xhci_pci crc32c_intel i8042 xhci_hcd scsi_mod serio
[12506.902660] CPU: 1 PID: 15941 Comm: Netlink Monitor Tainted: P W OE 5.3.8-arch1-1 #1
[12506.902661] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[12506.902684] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[12506.902687] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 10 77 dc b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 ab eb c0 e8 b7 23 25 dc <0f> 0b 48 89 ef e8 7f f8 89 dc e9 d1 5b fb ff 48 c7 c7 60 ab eb c0
[12506.902688] RSP: 0000:ffffb624c0120e10 EFLAGS: 00010246
[12506.902690] RAX: 0000000000000024 RBX: ffff8ee22cae07a0 RCX: 0000000000000000
[12506.902691] RDX: 0000000000000000 RSI: ffff8ee23ba97708 RDI: 00000000ffffffff
[12506.902692] RBP: ffff8ee1ab8b8400 R08: 00000000000014eb R09: 0000000000000001
[12506.902692] R10: 0000000000000000 R1...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Here is a new log (dmesg and trace):
https://www.sendspace.com/file/hy2puw

Device: ALFA AWUS036NH
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter

The device is connected and entered promiscuous mode
[76538.089897] xhci_hcd 0000:03:00.0: Waiting for status stage event
[76541.048223] xhci_hcd 0000:03:00.0: Transfer error for slot 23 ep 2 on endpoint
[76541.048233] xhci_hcd 0000:03:00.0: // Ding dong!
[76541.048356] xhci_hcd 0000:03:00.0: Ignoring reset ep completion code of 1
[76542.194353] device wlp3s0f0u2 entered promiscuous mode
...
we do not receive data via AF_PACKET socket.
...
[76542.194385] audit: type=1700 audit(1573639400.432:141): dev=wlp3s0f0u2 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
[76554.680919] xhci_hcd 0000:03:00.0: Cancel URB 00000000e8c9ee79, dev 2, ep 0x81, starting at offset 0xff05d000
[76554.680929] xhci_hcd 0000:03:00.0: // Ding dong!

I can't find anything that caused it, except of the transfer error at 76541.048223.

If we connect the device to an USB2 port, everything is fine:
https://www.sendspace.com/file/azoa4a
we receive data via AF_PACKET socket.
The device is working as expected.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Revision history for this message
Erik Davidson (aphistic) wrote :

I'm also seeing this issue on a fresh install of Ubuntu 19.10 with a Razer Core X Chroma and a Lenovo X1 Extreme Gen2. I was seeing it on a fully updated Arch Linux install and installed Ubuntu in hopes it would fix the issue. Here's some info from my current install. Let me know if you need anything else!

uname:
Linux fate 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:33:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I've attached all the same info you were asking for earlier.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Erik, thanks for your report! Can you attach a dmesg right after the issue reproduces?
Also, are you willing to run debug kernels in your machine?

The problem was narrowed down to a FW issue fixed by ASMedia in form of firmware upgrade but this seems to not be available from ASMedia themselves; instead, the motherboard vendor usually is the path for obtaining such fix.

That said, I'd be really glad if we could quirk this from kernel perspective to get the fix to a wider audience, not relying on unresponsive motherboard vendors. So let me know if you (also applies to anybody that reported the issue) are willing to run debug kernels.

Cheers,

Guilherme

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

For reference, here's the analysis from xHCI maintainer:
https://<email address hidden>/

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks a lot @kaihengfeng! Quite great discussion with Mathias - it seems there's a potential quirk for IN packets, but the right approach indeed is getting the HW fixed by ASMedia.

Cheers,

Guilherme

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

I would be willing to try a debug kernel.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thank you Bryan! We can try the "hackish" approach proposed by Mathias in that thread..let me study the code and get back to you in next few weeks!

Cheers,

Guilherme

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

Sounds good. I'm not sure if matters or not but, I'm now on Ubuntu 19.10. I'm seeing the exact same behavior as before.

Revision history for this message
Erik Davidson (aphistic) wrote :

Guilherme, I've attached a dmesg that ends as soon as my ethernet in the egpu disconnects. It's just a matter of running something like "fast.com" a couple times to trigger it.

I'd also be willing to try a debug kernel or whatever else I can do to help get this fixed!

Revision history for this message
Erik Davidson (aphistic) wrote :

I also wanted to mention that in my case after the issue is triggered I can unplug the cable from the ethernet jack on the eGPU I have, then plug it back in and it'll work again for a little bit until I trigger it again.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Still present running

$ uname -r
5.5.2-arch1-1

[16300.890097] mt76x0u 5-3.1.2:1.0: ASIC revision: 76100002 MAC revision: 76502000
[16301.239555] mt76x0u 5-3.1.2:1.0: EEPROM ver:02 fae:01
[16301.578393] ieee80211 phy6: Selected rate control algorithm 'minstrel_ht'
[16301.595805] mt76x0u 5-3.1.2:1.0 wlp39s0f3u3u1u2: renamed from wlan0
[16316.881303] device wlp39s0f3u3u1u2 entered promiscuous mode
[16316.881347] audit: type=1700 audit(1581158632.980:189): dev=wlp39s0f3u3u1u2 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
[16316.882150] mt76x0u 5-3.1.2:1.0: tx urb failed: -71
[16316.882187] mt76u_complete_rx: 1989 callbacks suppressed
[16316.882190] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882227] mt76x0u 5-3.1.2:1.0: tx urb failed: -71
[16316.882267] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882346] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882426] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882505] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882586] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882666] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882745] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882825] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.882905] mt76x0u 5-3.1.2:1.0: rx urb failed: -71
[16316.911559] usb 5-3.1.2: USB disconnect, device number 8
[16316.911980] xhci_hcd 0000:27:00.3: WARN Cannot submit Set TR Deq Ptr
[16316.911982] xhci_hcd 0000:27:00.3: A Set TR Deq Ptr command is pending.
[16316.921294] mt76x0u 5-3.1.2:1.0: mac specific condition occurred
[16316.948240] device wlp39s0f3u3u1u2 left promiscuous mode

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

From now on this USB port is unusable.
E.g. connecting an USB memory stick to the same USB port
ID 13fe:6300 Kingston Technology Company Inc. USB DISK 3.0

spams dmesg log:
[16924.494936] usb 1-2: device descriptor read/8, error -71
[16924.625947] usb 1-2: device descriptor read/8, error -71
[16925.060354] usb 1-2: new high-speed USB device number 10 using xhci_hcd
[16925.339024] usb 1-2: device descriptor read/8, error -71
[16925.439343] usb usb2-port2: config error
[16925.469057] usb 1-2: device descriptor read/8, error -71
[16925.573848] usb usb1-port2: attempt power cycle
[16926.217012] usb 1-2: new high-speed USB device number 11 using xhci_hcd
[16926.890380] usb 1-2: device descriptor read/64, error -71
[16927.837037] usb 1-2: device descriptor read/64, error -71
[16928.067117] usb 1-2: new high-speed USB device number 12 using xhci_hcd
[16928.390350] usb usb2-port2: config error
[16928.783690] usb 1-2: device descriptor read/64, error -71
[16929.730336] usb 1-2: device descriptor read/64, error -71

I noticed this behavior only on AMD RYZEN systems.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

A Garmin eTrex 30 connected to an USB 3.0 port of an AMD RYZEN system showing the same behavior:
[23803.507473] usb 1-2: new high-speed USB device number 14 using xhci_hcd
[23803.547562] usb 1-2: New USB device found, idVendor=05e3, idProduct=0727, bcdDevice= 2.50
[23803.547566] usb 1-2: New USB device strings: Mfr=3, Product=4, SerialNumber=2
[23803.547568] usb 1-2: Product: USB Storage
[23803.547570] usb 1-2: Manufacturer: Generic
[23803.547572] usb 1-2: SerialNumber: 000000000250
[23803.554609] usb-storage 1-2:1.0: USB Mass Storage device detected
[23803.554796] scsi host9: usb-storage 1-2:1.0
[23804.580523] scsi 9:0:0:0: Direct-Access Generic STORAGE DEVICE 0250 PQ: 0 ANSI: 0
[23804.580860] sd 9:0:0:0: Attached scsi generic sg2 type 0
[23804.818580] sd 9:0:0:0: [sdb] 30392320 512-byte logical blocks: (15.6 GB/14.5 GiB)
[23804.820914] sd 9:0:0:0: [sdb] Write Protect is off
[23804.820918] sd 9:0:0:0: [sdb] Mode Sense: 0b 00 00 08
[23804.822987] sd 9:0:0:0: [sdb] No Caching mode page found
[23804.822991] sd 9:0:0:0: [sdb] Assuming drive cache: write through
[23804.849969] sdb: sdb1
[23804.854844] sd 9:0:0:0: [sdb] Attached SCSI removable disk
[24257.645365] usb 1-1: new full-speed USB device number 15 using xhci_hcd
[24257.862068] usb 1-1: device descriptor read/64, error -71

Connected to an USB 2.0 port or to an INTEL system (using the same cable!), everything is fine.

Revision history for this message
In , sapier (sapier-linux-kernel-bugs) wrote :

Hello,
I found this bug by google search when looking for the error message. I have a quite similar behaviour when trying to clear a IDE disk by writing urandom data to it. I'm using a usb<->IDE converter. It's working quite fine when using one of the USB2.0 ports but fails with upper error message in most USB3.1 Port scenarios.

System:
Vanilla Kernel 5.4.21 (Debian bullseye configuration)
Ryzen 7 1800X
Gigabyte AX370 Gaming 5
 - X370 Series Chipset USB 3.1 xHCI Controller (rev 02)
 - ASMedia Technology Inc. ASM1143 USB 3.1 Host Controller (doesn't work at all)
 - Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller

The issue seems to have some sort of "cable" component as I do get scenarios which seem to work even on USB3.0 Ports, I'll just write what I observed.

Case 1 USB 3.1:
USB<->SATA device -> USB3.1 Port -> worked at least once

Case 2 USB 3.1:
USB<->SATA device -> 4-Port USB hub (10cm cable) -> worked at least once

Case 3 USB 3.1:
USB<->SATA device -> 4-Port USB hub (40cm cable) -> USB 3.1 Port --> never managed to clean disk

Case 4 USB 3.1:
USB<->SATA device -> 2m usb cable -> USB 3.1 Port --> never managed to clean disk

case 5 USB 2.0:
USB<->SATA devive -> USB 2.0 Port -> works

case 6 USB 2.0:
USB<->SATA device -> 4-Port USB hub (10cm cable) -> USB 2.0 Port -> works

case 7 USB 2.0:
USB<->SATA device -> 4-Port USB hub (40cm cable) -> USB 2.0 Port -> works

case 8 USB 2.0:
USB<->SATA device -> 2m cable -> USB 2.0 Port -> works

case 9 USB 2.0:
USB<->SATA device -> 2m cable 4-Port USB hub (40cm cable) -> USB 2.0 Port -> works

All tested USB hubs are 2.0 hubs. The asmedia usb doesn't work at all port is dead right after booting, yet this seems to be unrelated to this issue here.

To me this does look like the 3.1 Ports are extremely sensitive to cable issues.

Revision history for this message
Danny Pacheco (vfdb67) wrote :
Download full text (6.3 KiB)

I am seeing this same issue on my system. Any help would be greatly appreciated. I am using Ubuntu 16.04 with the 4.15.0-88-generic kernel. I have seen it on both host controllers on the motherboard.

Here is the info for the host controllers.

00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af] (prog-if 30 [XHCI])
        Subsystem: ASRock Incorporation Device [1849:a2af]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 32
        Region 0: Memory at 92f30000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [70] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
                Address: 00000000fee00278 Data: 0000
        Kernel driver in use: xhci_hcd

b3:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:2142] (prog-if 30 [XHCI])
        Subsystem: ASRock Incorporation Device [1849:2142]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 33
        Region 0: Memory at fbe00000 (64-bit, non-prefetchable) [size=32K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000 Data: 0000
        Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00002080
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME+
        Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <2us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <2us, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supp...

Read more...

Revision history for this message
In , biopsin (biopsin-linux-kernel-bugs) wrote :

5.4.25_1 - ROG STRIX B450-I GAMING (RYZEN)

Hi,
bug is still present with external HDD conectected on DELTACO USB3.0 TO SATAII + 3.5*IDE Cable.

dmesg output:

[ 9540.086599] usb 2-3: device descriptor read/8, error -110
[ 9545.717826] usb 2-3: device descriptor read/8, error -110
[ 9551.350614] usb 2-3: device descriptor read/8, error -110
[ 9556.982468] usb 2-3: device descriptor read/8, error -110
[ 9562.614549] usb 2-3: device descriptor read/8, error -110
[ 9568.246248] usb 2-3: device descriptor read/8, error -110
[ 9573.878494] usb 2-3: device descriptor read/8, error -110
[ 9579.510536] usb 2-3: device descriptor read/8, error -110
[ 9579.658663] blk_update_request: I/O error, dev sdc, sector 319807200 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[ 9579.658779] blk_update_request: I/O error, dev sdc, sector 319807456 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[ 9579.658809] blk_update_request: I/O error, dev sdc, sector 319807200 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 9579.658842] blk_update_request: I/O error, dev sdc, sector 2048 op 0x1:(WRITE) flags 0x100000 phys_seg
1 prio class 0
[ 9579.658846] Buffer I/O error on dev sdc1, logical block 0, lost async page write
[ 9580.671946] EXT4-fs error (device sdc1): __ext4_find_entry:1531: inode #2: comm udevil: reading directory lblock 0
[ 9580.674408] EXT4-fs error (device sdc1): __ext4_find_entry:1531: inode #2: comm udevil: reading directory lblock 0
[ 9585.142455] usb 2-3: device descriptor read/8, error -110
[ 9590.773921] usb 2-3: device descriptor read/8, error -110
[ 9590.773935] xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 9592.045019] Buffer I/O error on dev sdc1, logical block 30441472, lost sync page write
[ 9592.045023] JBD2: Error -5 detected when updating journal superblock for sdc1-8.
[ 9592.045025] Buffer I/O error on dev sdc1, logical block 30441472, lost sync page write
[ 9592.045026] JBD2: Error -5 detected when updating journal superblock for sdc1-8.
[ 9596.406564] usb 2-3: device descriptor read/8, error -110

Revision history for this message
In , erickperez (erickperez-linux-kernel-bugs) wrote :

Hello,

This bug is present on ARM64 SBC system too.

uname -r
5.4.26-rockchip64 (Ubuntu 18.04.4 LTS)

Device: Realtek Ethernet 8152 USB 3.0 Gigabit adapter

dmesg:
[11519.368679] xhci-hcd xhci-hcd.1.auto: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[11519.997949] usb 8-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd
[11522.779552] xhci-hcd xhci-hcd.1.auto: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[11523.389784] usb 8-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd
[11528.980290] xhci-hcd xhci-hcd.1.auto: xHCI host not responding to stop endpoint command.
[11528.993885] xhci-hcd xhci-hcd.1.auto: xHCI host controller not responding, assume dead
[11528.994627] xhci-hcd xhci-hcd.1.auto: HC died; cleaning up

Revision history for this message
Aki Nyrhinen (aki-n) wrote :

I'm also getting the above errors when trying to read from a 1080p60 USB3 webcam. It seems I have some information not mentioned above.

[ 8.822044] usb 4-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[ 8.851879] usb 4-2: New USB device found, idVendor=15aa, idProduct=1555, bcdDevice=10.02
[ 8.851883] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 8.851886] usb 4-2: Product: 3.0 USB Camera

When opening /dev/video0 with an application, I immediatelly get the following error repeatedly (but event-dma increments by 0x10 each line), as seems to be the case for everyone.

[ 54.007465] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 54.010154] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000849690010 trb-start 0000000849690000 trb-end 0000000849690000 seg-start 0000000849690000 seg-end 0000000849690ff0

I added some previously posted diagnostic code from Mathias Nyman to drivers/usb/host/xhci-ring.c, and got the following, which is similar to previously reported bug with some ASMedia chip, but seems to be missing transfer events much more frequently. It seems to be missing a "Transfer Event" after each "Normal" packet. It's not about the LINK TRB because it doesn't work for long enough to get to it (there's plenty of zeroed out descriptors at the end of both rings, which I've omitted).

This is likely an ASMedia firmware/hardware issue, though this card (& camera) works fine on the same machine when running Windows. I saw mentions that ASMedia would be releasing a firmware fix, but I've updated to what I believe to be the latest and it didn't help at all. Has anyone found a firmware that works?

Attached dmesg also contains the event & endpoint ring dumps

Revision history for this message
Aki Nyrhinen (aki-n) wrote :

Beyond all of that, does anyone happen to know of a USB3 card that does work (on Linux)?

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Just a small notice: bug is still alive running
$ uname -r
5.6.3-arch1-1

[21762.874883] usb 1-2: new high-speed USB device number 5 using xhci_hcd
[21762.928994] usb 1-2: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[21762.928997] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[21762.929000] usb 1-2: Product: 802.11 n WLAN
[21762.929002] usb 1-2: Manufacturer: Ralink
[21762.929003] usb 1-2: SerialNumber: 1.0
[21763.223013] usb 1-2: reset high-speed USB device number 5 using xhci_hcd
[21763.266041] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[21763.944109] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 5370 detected
[21763.950281] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[21763.950768] usbcore: registered new interface driver rt2800usb
[21763.962966] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[21807.879713] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[21807.879968] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[21811.980951] device wlp3s0f0u2 entered promiscuous mode
[21831.708072] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[21831.813186] device wlp3s0f0u2 left promiscuous mode

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

Created attachment 288441
Patch v2 1/2 handling halted endpoints at completion of stop endpoint command

patch 1/2 of two patch series to fix this issue

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

Created attachment 288443
Patch v2 2/2 handling halted endpoints at completion of stop endpoint command

patch 2/2 of series to fix this issue

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Thanks for the patches. Added both of them and the issue is still present:

[ 21.783543] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[ 21.837511] usb 1-1: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[ 21.837515] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 21.837518] usb 1-1: Product: 802.11 n WLAN
[ 21.837520] usb 1-1: Manufacturer: Ralink
[ 21.837522] usb 1-1: SerialNumber: 1.0
[ 22.165094] usb 1-1: reset high-speed USB device number 5 using xhci_hcd
[ 22.207584] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[ 22.886244] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 5370 detected
[ 22.891860] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[ 22.892493] usbcore: registered new interface driver rt2800usb
[ 22.904005] rt2800usb 1-1:1.0 wlp3s0f0u1: renamed from wlan0
[ 43.634949] device wlp3s0f0u1 entered promiscuous mode
[ 43.635031] audit: type=1700 audit(1586871720.135:70): dev=wlp3s0f0u1 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
[ 65.039630] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 65.091107] device wlp3s0f0u1 left promiscuous mode

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Not all devices are affected in the same way.
Same USB3 port and not affected:
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter

[ 765.080527] usb 1-1: new high-speed USB device number 7 using xhci_hcd
[ 765.133195] usb 1-1: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[ 765.133199] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 765.133201] usb 1-1: Product: 802.11 n WLAN
[ 765.133203] usb 1-1: Manufacturer: Ralink
[ 765.133204] usb 1-1: SerialNumber: 1.0
[ 765.345171] usb 1-1: reset high-speed USB device number 7 using xhci_hcd
[ 765.388223] ieee80211 phy2: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[ 766.066341] ieee80211 phy2: rt2x00_set_rf: Info - RF chipset 0005 detected
[ 766.072528] ieee80211 phy2: Selected rate control algorithm 'minstrel_ht'
[ 766.091676] audit: type=1130 audit(1586872442.909:94): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 766.097778] rt2800usb 1-1:1.0 wlp3s0f0u1: renamed from wlan0
[ 771.664919] ieee80211 phy2: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 771.664959] ieee80211 phy2: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[ 775.893631] device wlp3s0f0u1 entered promiscuous mode
[ 775.893663] audit: type=1700 audit(1586872452.713:99): dev=wlp3s0f0u1 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
[ 777.876925] device wlp3s0f0u1 left promiscuous mode

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :
Download full text (4.6 KiB)

I just applied both patches, the Xhci Error message actually went away BUT the device still didn't work.

Logs after unplugging & plugging in the device with the patches:
[ 72.648791] usb 1-4: USB disconnect, device number 3
[ 72.650675] ieee80211 phy0: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x101c with error -19
[ 72.753779] wlan0: deauthenticating from cc:ce:1e:99:77:ed by local choice (Reason: 3=DEAUTH_LEAVING)
[ 72.781608] audit: type=1130 audit(1586883567.129:93): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 72.793722] audit: type=1130 audit(1586883567.139:94): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 73.317939] audit: type=1131 audit(1586883567.665:95): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=geoclue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 77.799300] audit: type=1131 audit(1586883572.149:96): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 80.933744] usb 1-4: new high-speed USB device number 6 using xhci_hcd
[ 80.988187] usb 1-4: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[ 80.988190] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 80.988191] usb 1-4: Product: 802.11 n WLAN
[ 80.988192] usb 1-4: Manufacturer: Ralink
[ 80.988193] usb 1-4: SerialNumber: 1.0
[ 81.131897] usb 1-4: reset high-speed USB device number 6 using xhci_hcd
[ 81.174210] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[ 81.852225] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 5370 detected
[ 81.858386] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[ 81.920689] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 81.920711] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36

Compared to the output without patches:
[ 67.093338] usb 1-4: USB disconnect, device number 3
[ 67.093964] xhci_hcd 0000:15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 67.096166] ieee80211 phy0: rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x101c with error -19
[ 67.168604] wlan0: deauthenticating from cc:ce:1e:99:77:ed by local choice (Reason: 3=DEAUTH_LEAVING)
[ 67.179973] audit: type=1130 audit(1586883250.510:93): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 67.231226] audit: type=1130 audit(1586883250.560:94): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 72.236839] audit: type=1131 audit(1586883255.570:95): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=...

Read more...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Tested another device against the applied patches to make sure the issue isn't related to the combination rt2800usb - usb host:
ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi

[ 68.126337] usb 1-2: new high-speed USB device number 17 using xhci_hcd
[ 68.181565] usb 1-2: New USB device found, idVendor=7392, idProduct=7710, bcdDevice= 0.00
[ 68.181568] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 68.181571] usb 1-2: Product: Edimax Wi-Fi
[ 68.181573] usb 1-2: Manufacturer: MediaTek
[ 68.181575] usb 1-2: SerialNumber: 1.0
[ 68.398420] usb 1-2: reset high-speed USB device number 17 using xhci_hcd
[ 68.446602] mt7601u 1-2:1.0: ASIC revision: 76010001 MAC revision: 76010500
[ 68.473662] mt7601u 1-2:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[ 69.461098] mt7601u 1-2:1.0: EEPROM ver:0d fae:00
[ 69.472103] mt7601u 1-2:1.0: EEPROM country region 01 (channels 1-13)
[ 70.152995] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 70.472567] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 70.792966] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 71.112927] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 71.432909] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[ 71.432913] mt7601u 1-2:1.0: Error: mt7601u_mcu_wait_resp timed out
[ 71.433388] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 71.435442] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 71.582930] mt7601u 1-2:1.0: Vendor request req:07 off:0080 failed:-71
[ 71.729217] mt7601u 1-2:1.0: Vendor request req:02 off:0080 failed:-71

After the device is unplugged, dmesg log is spammed:
[ 363.312561] mt7601u 1-2:1.0: Vendor request req:07 off:0730 failed:-71
[ 363.479252] mt7601u 1-2:1.0: Vendor request req:07 off:0730 failed:-71
[ 363.649243] mt7601u 1-2:1.0: Vendor request req:07 off:0730 failed:-71
...
[ 380.069000] mt7601u 1-2:1.0: Vendor request req:02 off:0080 failed:-71
[ 380.069055] mt7601u: probe of 1-2:1.0 failed with error -110
[ 380.069272] usb 1-2: USB disconnect, device number 90

@Bernhard: I can confirm missing Error message on some devices, too. The devices are not working.

Revision history for this message
John Jackson (mrjohnengineer) wrote :

Hi Ubuntu and Linux Team, I have been observing the attached dmesg output with several different ASMedia 3142 Cards whilst doing usb bulk transfers using the linux read and write api's to a usb device. This tends to break the software invoking said API's. I am using Ubuntu server 18.04.3 with kernel 4.15.0-55-generic. I see this issue with both Intel and AMD based computers.

Has there been an update or patch or change in priory that might resolve this bug?

[155247.078810] xhci_hcd 0000:07:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_in
dex 2 comp_code 1
[155247.083037] xhci_hcd 0000:07:00.0: Looking for event-dma 0000000892c11000 trb-start 0000000892c0
8fe0 trb-end 0000000892c08fe0 seg-start 0000000892c08000 seg-end 0000000892c08ff0
[155247.091767] xhci_hcd 0000:07:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_in
dex 2 comp_code 1
[155247.096313] xhci_hcd 0000:07:00.0: Looking for event-dma 0000000892c11010 trb-start 0000000892c0
8fe0 trb-end 0000000892c08fe0 seg-start 0000000892c08000 seg-end 0000000892c08ff0
[155247.106027] xhci_hcd 0000:07:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_in
dex 2 comp_code 1
[155247.110869] xhci_hcd 0000:07:00.0: Looking for event-dma 0000000890436000 trb-start 0000000893b7
3fe0 trb-end 0000000893b73fe0 seg-start 0000000893b73000 seg-end 0000000893b73ff0
[155247.120830] xhci_hcd 0000:07:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_in
dex 2 comp_code 1
[155247.125985] xhci_hcd 0000:07:00.0: Looking for event-dma 0000000890436010 trb-start 0000000893b7
3fe0 trb-end 0000000893b73fe0 seg-start 0000000893b73000 seg-end 0000000893b73ff0

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi John and all, thanks for updating logs and enhancing the report! Due to a lot of other work, and the nature of this problem (being a FW issue that we'd try to alleviate using a hack in linux), I wasn't able to work the tentative hack approach yet.

The issue was "resolved" through a FW update by ASMedia, but this was only worked with a specific motherboard vendor, not as a general release. This is the problem with FW fixes...they are quite scattered and vendor-depending. So, can you John or any of the reporters try to reproduce the problem with:

(a) Ubuntu 20.04, just released?
(b) Ubuntu 18.04 running the current HWE kernel (5.3)?

That'd be good data points. Also, if you could try Ubuntu 20.04 with latest mainline kernel (from [0]), that would be a gigantic help!
Thanks in advance,

Guilherme

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D

no longer affects: linux (Ubuntu Artful)
Changed in linux (Ubuntu Trusty):
status: In Progress → Won't Fix
Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

Hi Guilherme,

First, congrats on the release of Ubuntu 20.04. I've been following ubuntu for over 15 years now and it is one of the most polished releases of seen. Job well done.

...but, yes I'm still seeing the same behavior on ubuntu 20.04.

Thanks,

Bryan

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Awesome Bryan, thanks for your kind words!!
I'll nominate this bug for Focal then.
Cheers,

Guilherme

Changed in linux (Ubuntu Focal):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Revision history for this message
John Jackson (mrjohnengineer) wrote :

Hi Ubuntu Community,

I would like to find one of the COTS products that is reproducing this issue. From reading the thread above it looks like it is a camera and a usb ethernet adapter. Does anyone have more specifics or a product link where I can purchase these. I currently havent seen the issue with a Usb block device such as a JMicron nvme to usb device type c, but I will continue testing.

Where can i find this camera?
Camera 1080p60 USB3 webcam
idVendor=15aa, idProduct=1555

Thanks,
John

Revision history for this message
Lior Lobel (alf20) wrote : Re: [Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller
Download full text (3.2 KiB)

Hi John,

I see this issue in a Razer core X chroma, related to the Ethernet
adapter. Obviously, this it is expansive to just buy for testing
purposes. But just wanted to mention..

Best,

Lior

On Wed, May 6, 2020 at 15:44, John Jackson <email address hidden>
wrote:
> Hi Ubuntu Community,
>
> I would like to find one of the COTS products that is reproducing this
> issue. From reading the thread above it looks like it is a camera and
> a
> usb ethernet adapter. Does anyone have more specifics or a product
> link
> where I can purchase these. I currently havent seen the issue with a
> Usb
> block device such as a JMicron nvme to usb device type c, but I will
> continue testing.
>
> Where can i find this camera?
> Camera 1080p60 USB3 webcam
> idVendor=15aa, idProduct=1555
>
>
> Thanks,
> John
>
> --
> You received this bug notification because you are subscribed to the
> bug
> report.
> <https://bugs.launchpad.net/bugs/1749961>
>
> Title:
> xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1
> Controller
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Trusty:
> Won't Fix
> Status in linux source package in Xenial:
> In Progress
> Status in linux source package in Bionic:
> In Progress
> Status in linux source package in Focal:
> Confirmed
> Status in linux package in Debian:
> Confirmed
>
> Bug description:
> It was observed that while trying to use a 4K USB webcam connected
> to
> USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam
> does not work and kernel log shows the following messages:
>
> [431.928016] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA
> ptr not part of current TD ep_index 2 comp_code 13
> [431.928021] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e020 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928024] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA
> ptr not part of current TD ep_index 2 comp_code 13
> [431.928026] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e030 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928027] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA
> ptr not part of current TD ep_index 2 comp_code 13
> [431.928029] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e050 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928386] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA
> ptr not part of current TD ep_index 2 comp_code 13
>
> A similar issue was already reported on Launchpad:
> <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750>
>
> The fix to this issue seems to be the following patch:
>
> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9da5a109>
>
> Tests in our scenario with this patch proved still broken. Our next
> approach is to modify the patch a bit and re-test.
>
> This LP will be used to document our progress in the investigation.
>
> To manage notific...

Read more...

Revision history for this message
Aki Nyrhinen (aki-n) wrote :
Download full text (3.4 KiB)

Hi John,

The webcam in my trace is
https://www.amazon.com/gp/product/B07P7YSZV1/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1
The pcie card that looks identical to this one
https://www.newegg.com/model-orico-pa31-2p-pci-express-to-usb-card/p/17Z-0003-00008?Description=usb%203.1%20&cm_re=usb_3.1-_-9SIA1DS4TW2779-_-Product

I could mail you one of each if you're interested in working on a fix.

On Wed, May 6, 2020 at 8:50 AM John Jackson <email address hidden>
wrote:

> Hi Ubuntu Community,
>
> I would like to find one of the COTS products that is reproducing this
> issue. From reading the thread above it looks like it is a camera and a
> usb ethernet adapter. Does anyone have more specifics or a product link
> where I can purchase these. I currently havent seen the issue with a Usb
> block device such as a JMicron nvme to usb device type c, but I will
> continue testing.
>
> Where can i find this camera?
> Camera 1080p60 USB3 webcam
> idVendor=15aa, idProduct=1555
>
>
> Thanks,
> John
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1749961
>
> Title:
> xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1
> Controller
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Trusty:
> Won't Fix
> Status in linux source package in Xenial:
> In Progress
> Status in linux source package in Bionic:
> In Progress
> Status in linux source package in Focal:
> Confirmed
> Status in linux package in Debian:
> Confirmed
>
> Bug description:
> It was observed that while trying to use a 4K USB webcam connected to
> USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam
> does not work and kernel log shows the following messages:
>
> [431.928016] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928021] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e020 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928024] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928026] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e030 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928027] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928029] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e050 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928386] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
>
> A similar issue was already reported on Launchpad:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750
>
> The fix to this issue seems to be the following patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9da5a109
>
> Tests in our scenario with this patch proved still broken. Our...

Read more...

Revision history for this message
John Jackson (mrjohnengineer) wrote :

Thanks Lior, and Aki. I fear the Razer core X chroma may require extensive setup and a graphics card that I do not have handy or am prepared to take on right now. And I cannot get the IMX291 CMOS Camera overnight. Aki, do you mind looking at the Camera and trying decipher which device side usb controller or mcu is on the board, maybe I can find something similar that I can get over night.

Revision history for this message
Aki Nyrhinen (aki-n) wrote :

John,

Unfortunately the chip is unmarked except for the letters "V3" and a dot to
mark a corner. It's QFP88 package of some kind.

On Wed, May 6, 2020, 11:55 AM John Jackson <email address hidden>
wrote:

> Thanks Lior, and Aki. I fear the Razer core X chroma may require
> extensive setup and a graphics card that I do not have handy or am
> prepared to take on right now. And I cannot get the IMX291 CMOS Camera
> overnight. Aki, do you mind looking at the Camera and trying decipher
> which device side usb controller or mcu is on the board, maybe I can
> find something similar that I can get over night.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1749961
>
> Title:
> xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1
> Controller
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Trusty:
> Won't Fix
> Status in linux source package in Xenial:
> In Progress
> Status in linux source package in Bionic:
> In Progress
> Status in linux source package in Focal:
> Confirmed
> Status in linux package in Debian:
> Confirmed
>
> Bug description:
> It was observed that while trying to use a 4K USB webcam connected to
> USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam
> does not work and kernel log shows the following messages:
>
> [431.928016] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928021] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e020 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928024] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928026] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e030 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928027] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928029] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e050 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928386] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
>
> A similar issue was already reported on Launchpad:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750
>
> The fix to this issue seems to be the following patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9da5a109
>
> Tests in our scenario with this patch proved still broken. Our next
> approach is to modify the patch a bit and re-test.
>
> This LP will be used to document our progress in the investigation.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1749961/+subscriptions
>

Revision history for this message
John Jackson (mrjohnengineer) wrote :

Hi Aki and Linux Community, I was able to get the camera and Yes it does produce the issue reported above with an ASMedia 3142. I have captured a lecroy trace that I can share, I hope this helps someone. You can observe the enumeration and timing between packets, as well as see where the flow control error occurs.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Just got a new variant of that issue:
[39799.493322] usb usb1-port9: disabled by hub (EMI?), re-enabling...
[39799.493328] usb 1-9: USB disconnect, device number 5
[39799.833153] usb 1-9: new low-speed USB device number 6 using xhci_hcd
[39815.286287] usb 1-9: device descriptor read/64, error -110
[39826.307239] xhci_hcd 0000:03:00.0: ERROR Transfer event pointed to bad slot 4
[39826.307247] xhci_hcd 0000:03:00.0: @00000000dffed510 dff3d720 00000000 03000005 04038001
[39826.307267] xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x60 flags=0x0020]
[39826.307395] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 3
[39826.307399] xhci_hcd 0000:03:00.0: Looking for event-dma 00000000dfeff000 trb-start 00000000dfeff0f0 trb-end 00000000dfeff110 seg-start 00000000dfeff000 seg-end 00000000dfeffff0

Affected: CHERRY RS 6000 USB ON keyboard and Logitech RX1000 mouse stopped working - reboot required.

$ uname -r
5.6.15-arch1-1

Revision history for this message
In , oyvind (oyvind-linux-kernel-bugs) wrote :

My Linux server just crashed rather hard with these errors:
juni 02 19:26:17 nori kernel: xhci_hcd 0000:04:00.0: WARN Cannot submit Set TR Deq Ptr
juni 02 19:26:17 nori kernel: xhci_hcd 0000:04:00.0: A Set TR Deq Ptr command is pending.
juni 02 19:26:17 nori kernel: usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
juni 02 19:26:37 nori kernel: xhci_hcd 0000:04:00.0: WARN Cannot submit Set TR Deq Ptr
juni 02 19:26:37 nori kernel: xhci_hcd 0000:04:00.0: A Set TR Deq Ptr command is pending.
juni 02 19:26:37 nori kernel: usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

Affected a USB3-connected hard-drive (Bus 004 Device 002: ID 059f:1057 LaCie, Ltd), which became unresponsive, and there were several hung processes blocked on I/O to the drive. The drive itself has zero logged SMART-errors, so it's likely not failing. Another USB2-connected drive also was affected, but not in an unrecoverable fashion, i.e. hung processes could be killed. The server has been stable for several years, but this one forced me to do a hard power-off, due to soft reboot not able to complete.

Running Ubuntu 18.04.4.
[ 0.000000] Linux version 5.3.0-53-generic (buildd@lgw01-amd64-016) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 (Ubuntu 5.3.0-53.47~18.04.1-generic 5.3.18)

With USB controller:
04:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (prog-if 30 [XHCI])
        Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer ASM1042 SuperSpeed USB Host Controller
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at f7c00000 (64-bit, non-prefetchable) [size=32K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Kernel driver in use: xhci_hcd

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

definitely weak xHCI host system (at this time Logitech keyboard, connected to USB 3 port):

[ 9241.775664] usb 1-9: new low-speed USB device number 7 using xhci_hcd
[ 9242.327943] xhci_hcd 0000:03:00.0: ERROR unknown event type 2
[ 9246.875619] xhci_hcd 0000:03:00.0: ERROR mismatched command completion event
[ 9249.008917] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command
[ 9264.462045] xhci_hcd 0000:03:00.0: Abort failed to stop command ring: -110
[ 9264.462080] xhci_hcd 0000:03:00.0: xHCI host controller not responding, assume dead
[ 9264.462093] xhci_hcd 0000:03:00.0: HC died; cleaning up
[ 9264.462128] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command
[ 9264.668691] usb 1-9: device not accepting address 7, error -62
[ 9264.668723] usb usb1-port9: couldn't allocate usb_device

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Still present on
$ uname -r
5.7.2-arch1-1

Revision history for this message
Mikko Rantalainen (mira) wrote :

Could somebody that is seeing this problem try to boot with kernel flag "iommu=off". Some hardware that used to work with older kernels may be broken and fails to work with modern kernels which default to using IOMMU.

Note that disabling IOMMU is a heavy handed workaround, not a proper fix. For more information about IOMMU, see https://heiko-sieger.info/iommu-groups-what-you-need-to-consider/

Revision history for this message
Aki Nyrhinen (aki-n) wrote :

I tried with iommu=off when I last looked at this and the problem
persisted.

On Tue, Jun 23, 2020, 12:25 AM Mikko Rantalainen <email address hidden>
wrote:

> Could somebody that is seeing this problem try to boot with kernel flag
> "iommu=off". Some hardware that used to work with older kernels may be
> broken and fails to work with modern kernels which default to using
> IOMMU.
>
> Note that disabling IOMMU is a heavy handed workaround, not a proper
> fix. For more information about IOMMU, see https://heiko-sieger.info
> /iommu-groups-what-you-need-to-consider/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1749961
>
> Title:
> xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1
> Controller
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Trusty:
> Won't Fix
> Status in linux source package in Xenial:
> In Progress
> Status in linux source package in Bionic:
> In Progress
> Status in linux source package in Focal:
> Confirmed
> Status in linux package in Debian:
> Confirmed
>
> Bug description:
> It was observed that while trying to use a 4K USB webcam connected to
> USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam
> does not work and kernel log shows the following messages:
>
> [431.928016] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928021] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e020 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928024] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928026] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e030 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928027] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
> [431.928029] xhci_hcd 0000:12:00.0: Looking for event-dma
> 0000003f3330e050 trb-start 0000003f3330e000 trb-end 0000003f3330e000
> seg-start 0000003f3330e000 seg-end 0000003f3330eff0
> [431.928386] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not
> part of current TD ep_index 2 comp_code 13
>
> A similar issue was already reported on Launchpad:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750
>
> The fix to this issue seems to be the following patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9da5a109
>
> Tests in our scenario with this patch proved still broken. Our next
> approach is to modify the patch a bit and re-test.
>
> This LP will be used to document our progress in the investigation.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1749961/+subscriptions
>

Revision history for this message
In , himanshu.xt (himanshu.xt-linux-kernel-bugs) wrote :

Have the same error
$ dmesg | grep xhci
has
[40557.207677] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

Tried running with iommu=off with my EGPU, exact same behavior as before.

Revision history for this message
In , R.E.Wolff (r.e.wolff-linux-kernel-bugs) wrote :

I'm using "stock Ubuntu 20.04"

I have this happening on kernel 5.4.0 .

[ 4063.051692] usb 3-10.4: New USB device found, idVendor=0483, idProduct=5740, bcdDevice= 2.00
[ 4063.051695] usb 3-10.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 4063.051696] usb 3-10.4: Product: ChibiOS/RT Virtual COM Port
[ 4063.051698] usb 3-10.4: Manufacturer: STMicroelectronics
[ 4063.051699] usb 3-10.4: SerialNumber: 400
[ 4063.058680] cdc_acm 3-10.4:1.0: ttyACM1: USB ACM device
[ 4073.043695] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[ 4073.043697] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
[ 4073.059819] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[ 4073.059822] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.

I plugged in my development board that provides a virtual comport. I then hit boot-and-reset buttons on the board to make it boot into DFU bootloader mode.

This has worked the last decade or so. I didn't read everything above, but I saw something about AMD... I have:

00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)

as the USB controller: Not AMD. -> Not hardware vendor related imho.

Revision history for this message
In , sixerjman (sixerjman-linux-kernel-bugs) wrote :

Happening at ~30 second intervals on Debian kernel 5.7.0-1-amd64 with Dell XHCI Controller and USB 3.0 hub:

xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
Jul 4 05:02:32 hostname kernel: [33164.415980] xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
Jul 4 05:02:32 hostname kernel: [33164.497202] usb 3-3.1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd

Revision history for this message
In , R.E.Wolff (r.e.wolff-linux-kernel-bugs) wrote :

I've now managed to get a workable situation for myself:
Using an old USB2-hub instead of the USB3-hub that I was using before.

The DFU download takes ages when there is no hub between my computer and the STM32. Using the USB3-hub worked a few months ago when I was still on Ubuntu 16.04.

[ 5382.799225] usb 3-4.1: Product: ChibiOS/RT Virtual COM Port
[ 5382.799226] usb 3-4.1: Manufacturer: STMicroelectronics
[ 5382.799227] usb 3-4.1: SerialNumber: 400
[ 5382.807282] cdc_acm 3-4.1:1.0: ttyACM3: USB ACM device
[ 5387.003761] usb 3-4: clear tt 1 (91a1) error -32
About 12 identical messages in the same millisecond deleted.
[ 5387.004976] usb 3-4: clear tt 1 (91a1) error -32
[ 5387.222030] usb 3-4.1: USB disconnect, device number 22
[ 5387.224061] cdc_acm 3-4.1:1.0: failed to set dtr/rts
[ 5387.522299] usb 3-4.1: new full-speed USB device number 23 using xhci_hcd
[ 5387.627345] usb 3-4.1: New USB device found, idVendor=0483, idProduct=df11, bcdDevice=22.00
[ 5387.627348] usb 3-4.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 5387.627350] usb 3-4.1: Product: STM32 BOOTLOADER
[ 5387.627352] usb 3-4.1: Manufacturer: STMicroelectronics
[ 5387.627353] usb 3-4.1: SerialNumber: FFFFFFFEFFFF

So this is a mostly normal switchover from the usercode running ACM USB code and the bootloader. Through an USB2 switch.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I was told by ASMedia only firmware update can fix this issue. Please ask the device/board vendor to roll out new firmware update.

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

I'll try to reach out to Razer again to see if they will push out a firmware update. Is there any documentation on the underlying firmware bug that I can point them towards?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

No, I don't think there's any public information on this issue.

Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
Changed in linux (Ubuntu Xenial):
status: In Progress → Confirmed
Revision history for this message
In , mail2lawi (mail2lawi-linux-kernel-bugs) wrote :

I was getting similar freezes with HP ENVY x360 Convertible 15 running OpenSUSE Leap 15.2.
This laptop model doesn't come with an RJ45 (LAN) port so I use a Type C USB ethernet adapter. And it was exhibiting the same problems, after some time it would just fail to work and initially I thought it was the NetworkManager.
For me even running the command 'ip add' would lock up, yet most other commands and even the desktop manager would still be working fine. But I couldn't get network back or even switch to WiFi. Basically every time this happened I had to restart the laptop.

Anyway, after going through the accounts here and other sites I found one which suggested that the issue could be with power management suspending the USB device.

So I added the particular USB to TLP black_list to prevent it from being suspended and so far I've gon 24 hrs without the lockup.

Link to forum: https://forum.manjaro.org/t/usb-ethernet-dongle-stopped-working/125717

Revision history for this message
3esmit (3esmit) wrote :
Download full text (4.2 KiB)

I am having the same exact error with `USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller` and a `Logitech StreamCam`.

I tried updating my kernel from `4.15.0-112` to `5.4.0-42` but that didn't helped.

I brought a "dodocool DC26 SuperSpeed USB 3.1 PCI-Express Card with Dual Type-C Ports", which was supposed to work with Linux, but it only works for slow speed connections, if I try to use a high speed connection it does not work.

The problem is not on the device I'm connecting. If I use an "Adaptor Usb-C 3.1 Type-c Female to Usb-a 3.0 Male" in the onboard USB 3.0, it works fine. I just have issues with the ASM1142.

If I keep trying to connect, I run into a error that totally disable the ASM1142 .

These are the dmesg logs when I connect the high speed device:
```
[ 2229.576801] usb 6-1: new SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[ 2229.604832] usb 6-1: New USB device found, idVendor=046d, idProduct=0893, bcdDevice= 3.17
[ 2229.604837] usb 6-1: New USB device strings: Mfr=0, Product=2, SerialNumber=3
[ 2229.604841] usb 6-1: Product: Logitech StreamCam
[ 2229.604844] usb 6-1: SerialNumber: 5085D605
[ 2229.609976] uvcvideo: Found UVC 1.00 device Logitech StreamCam (046d:0893)
[ 2229.626309] uvcvideo 6-1:1.0: Entity type for entity Processing 3 was not initialized!
[ 2229.626313] uvcvideo 6-1:1.0: Entity type for entity Extension 14 was not initialized!
[ 2229.626316] uvcvideo 6-1:1.0: Entity type for entity Extension 6 was not initialized!
[ 2229.626318] uvcvideo 6-1:1.0: Entity type for entity Extension 8 was not initialized!
[ 2229.626321] uvcvideo 6-1:1.0: Entity type for entity Extension 9 was not initialized!
[ 2229.626323] uvcvideo 6-1:1.0: Entity type for entity Extension 10 was not initialized!
[ 2229.626325] uvcvideo 6-1:1.0: Entity type for entity Extension 11 was not initialized!
[ 2229.626327] uvcvideo 6-1:1.0: Entity type for entity Camera 1 was not initialized!
[ 2229.626514] input: Logitech StreamCam as /devices/pci0000:00/0000:00:01.0/0000:01:00.0/usb6/6-1/6-1:1.0/input/input29
[ 2229.635309] usb 6-1: current rate 16000 is different from the runtime rate 24000
[ 2229.639323] usb 6-1: current rate 16000 is different from the runtime rate 32000
[ 2229.643585] usb 6-1: current rate 16000 is different from the runtime rate 48000
[ 2229.661927] hid-generic 0003:046D:0893.0007: hiddev0,hidraw0: USB HID v1.11 Device [Logitech StreamCam] on usb-0000:01:00.0-1/input5
[ 2229.790807] usb 6-1: current rate 16000 is different from the runtime rate 48000
[ 2229.797295] usb 6-1: current rate 16000 is different from the runtime rate 48000
[ 2229.806037] usb 6-1: current rate 16000 is different from the runtime rate 48000
```

When I try to access the device (opening Cheese), it turns the "camera led on" but no image is displayed, and this are the dmesg logs:
```
[ 2282.173723] usb 6-1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[ 2282.913814] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 2282.913834] xhci_hcd 0000:01:00.0: Looking for event-dma 000000036400e020 trb-start 000000036400e000 trb-end 000000036400e000 seg-start 000000036...

Read more...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Still present on:
$ uname -r
5.8.5-arch1-1

device connected to USB3 port:
[15005.134111] usb 1-2: new high-speed USB device number 13 using xhci_hcd
[15005.311803] usb 1-2: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[15005.311807] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[15005.311810] usb 1-2: Product: 802.11 n WLAN
[15005.311812] usb 1-2: Manufacturer: Ralink
[15005.311814] usb 1-2: SerialNumber: 1.0
[15005.602591] usb 1-2: reset high-speed USB device number 13 using xhci_hcd
[15005.834856] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[15006.513400] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 5370 detected
[15006.519415] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[15006.520103] usbcore: registered new interface driver rt2800usb
[15006.532103] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
...
[15062.425086] Bluetooth: Core ver 2.22
[15062.425100] NET: Registered protocol family 31
[15062.425101] Bluetooth: HCI device and connection manager initialized
[15062.425103] Bluetooth: HCI socket layer initialized
[15062.425105] Bluetooth: L2CAP socket layer initialized
[15062.425107] Bluetooth: SCO socket layer initialized
[15068.677302] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Any ideas, why HCI device and connection manager initialized?
The device doesn't have BT:
$ lsusb
Bus 001 Device 013: ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter

device connected to USB2 port:
[15317.960151] usb 5-1.1.2: new high-speed USB device number 6 using xhci_hcd
[15318.186487] usb 5-1.1.2: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[15318.186492] usb 5-1.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[15318.186495] usb 5-1.1.2: Product: 802.11 n WLAN
[15318.186497] usb 5-1.1.2: Manufacturer: Ralink
[15318.186498] usb 5-1.1.2: SerialNumber: 1.0
[15318.376954] usb 5-1.1.2: reset high-speed USB device number 6 using xhci_hcd
[15318.593739] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[15318.603488] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 5370 detected
[15318.603596] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[15318.626103] rt2800usb 5-1.1.2:1.0 wlp39s0f3u1u1u2: renamed from wlan0
...
[15336.958194] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[15336.958238] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36

everything is fine!

Revision history for this message
In , mkj (mkj-linux-kernel-bugs) wrote :

Experiencing the same kinda issues with my local Gentoo system on linux 5.4.60 running on my AMD Threadripper ASUS X399-A system.

[59417.351322] Bluetooth: HCI device and connection manager initialized
[59417.351326] Bluetooth: HCI socket layer initialized
[59417.351327] Bluetooth: L2CAP socket layer initialized
[59417.351330] Bluetooth: SCO socket layer initialized
[59417.356567] usbcore: registered new interface driver btusb
[59417.401190] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[59417.401191] Bluetooth: BNEP filters: protocol multicast
[59417.401195] Bluetooth: BNEP socket layer initialized
[59746.085438] debugfs: File 'le_min_key_size' in directory 'hci0' already present!
[59746.085445] debugfs: File 'le_max_key_size' in directory 'hci0' already present!
[59746.085450] debugfs: File 'force_bredr_smp' in directory 'hci0' already present!
[59814.624020] input: WH-1000XM2 (AVRCP) as /devices/virtual/input/input23
[59902.644488] snd_hda_intel 0000:0b:00.3: Too big adjustment 128
[59962.175286] snd_hda_intel 0000:0b:00.3: Too big adjustment 128
[60729.936646] Bluetooth: RFCOMM TTY layer initialized
[60729.936651] Bluetooth: RFCOMM socket layer initialized
[60729.936655] Bluetooth: RFCOMM ver 1.11
[61319.535471] usb 1-3: USB disconnect, device number 20
[61319.536250] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[61333.523244] usb 1-3: new full-speed USB device number 21 using xhci_hcd
[61333.947814] usb 1-3: New USB device found, idVendor=0a12, idProduct=0001, bcdDevice=88.91
[61333.947818] usb 1-3: New USB device strings: Mfr=0, Product=2, SerialNumber=0
[61333.947819] usb 1-3: Product: CSR8510 A10
[61454.719213] usb 1-3: USB disconnect, device number 21
[61454.719807] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Revision history for this message
In , mkj (mkj-linux-kernel-bugs) wrote :

For my system, it seems to trigger after resuming from suspend.

Revision history for this message
Nick Klein (sledgebullet) wrote :

Has anyone gotten their hardware vendor to cough up a new firmware yet? Razer has shown no movement.

Revision history for this message
Bryan Walsh (yetanotherbryan) wrote :

I've heard nothing back from razer. I first submitted the issue to them over two years ago now. Not holding my breath.

Revision history for this message
In , koparebu (koparebu-linux-kernel-bugs) wrote :

Hi,

I'd like to add some information here about this issue, which in one of my computers happens when using a Software Defined Radio (SDR) device on some of the motherboard USB ports:

* The device works OK when using the back panel USB 3.2 Gen 1 ports
* It doesn't work OK when using the back panel USB 3.2 Gen 2 ports, or when using the front headers (either USB 3.2 Gen 1 or USB 2.0)

When I finish using the device, the following message gets logged. I have to disconnect it and plug it in again in order to use it:

xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

This motherboard is an ASUS Prime B550M.

Kernel 5.4.0-45-generic (from Ubuntu 20.04).

I've tried booting with "intel_iommu=off" or "iommu=off" just for testing, but the result is the same.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Just received an AMD notebook to run some tests on 5.8.12-arch1-1:

ASUS TUF Gaming FX505D
05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1

WiFi adapter:
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter

[ 308.749192] usb 1-2: new high-speed USB device number 46 using xhci_hcd
[ 308.909139] usb 1-2: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[ 308.909145] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 308.909148] usb 1-2: Product: 802.11 n WLAN
[ 308.909150] usb 1-2: Manufacturer: Ralink
[ 308.909153] usb 1-2: SerialNumber: 1.0
[ 309.032719] usb 1-2: reset high-speed USB device number 46 using xhci_hcd
[ 309.188373] ieee80211 phy40: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[ 309.276422] ieee80211 phy40: rt2x00_set_rf: Info - RF chipset 0005 detected
[ 309.277177] ieee80211 phy40: Selected rate control algorithm 'minstrel_ht'
[ 309.297319] rt2800usb 1-2:1.0 wlp5s0f3u2: renamed from wlan0
[ 309.338896] ieee80211 phy40: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 309.338980] ieee80211 phy40: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[ 310.619294] usb 1-2: USB disconnect, device number 46
[ 311.185849] ieee80211 phy40: rt2x00queue_flush_queue: Warning - Queue 0 failed to flush

Then xhci died.

Revision history for this message
In , github (github-linux-kernel-bugs) wrote :
Download full text (3.2 KiB)

Hi,

I have the same problem with an Argus KVM switch. The mouse stopped atfer the first click to work.

uname -a
Linux sysiphus 5.8.0-2-amd64 #1 SMP Debian 5.8.10-1 (2020-09-19) x86_64 GNU/Linux

Debian Sid

❯ lsusb
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 004: ID 046d:c245 Logitech, Inc. G400 Optical Mouse
Bus 003 Device 003: ID 046d:c326 Logitech, Inc. Washable Keyboard K310
Bus 003 Device 002: ID 1a86:8072 QinHeng Electronics
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

journald
Okt 07 19:29:43 sysiphus kernel: usb 3-2: USB disconnect, device number 2
Okt 07 19:29:43 sysiphus kernel: usb 3-2.1: USB disconnect, device number 3
Okt 07 19:29:43 sysiphus acpid[587]: input device has been disconnected, fd 5
Okt 07 19:29:43 sysiphus kernel: xhci_hcd 0000:08:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Okt 07 19:29:43 sysiphus acpid[587]: input device has been disconnected, fd 6
Okt 07 19:29:43 sysiphus acpid[587]: input device has been disconnected, fd 12
Okt 07 19:29:43 sysiphus kernel: usb 3-2.2: USB disconnect, device number 4
Okt 07 19:29:43 sysiphus kernel: xhci_hcd 0000:08:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

dmesg with debug
[ 2539.095820] xhci_hcd 0000:00:14.0: // Ding dong!
[ 2539.095824] xhci_hcd 0000:00:14.0: // Ding dong!
[ 2539.095840] xhci_hcd 0000:00:14.0: Slot 17 output ctx = 0x3eac96000 (dma)
[ 2539.095841] xhci_hcd 0000:00:14.0: Slot 17 input ctx = 0x3ef030000 (dma)
[ 2539.095843] xhci_hcd 0000:00:14.0: Set slot id 17 dcbaa entry 000000003007eed2 to 0x3eac96000
[ 2539.173763] usb 1-10.2: new full-speed USB device number 18 using xhci_hcd
[ 2539.173772] xhci_hcd 0000:00:14.0: Set root hub portnum to 10
[ 2539.173775] xhci_hcd 0000:00:14.0: Set fake root hub portnum to 10
[ 2539.173778] xhci_hcd 0000:00:14.0: udev->tt = 000000002e0e39e0
[ 2539.173781] xhci_hcd 0000:00:14.0: udev->ttport = 0xa
[ 2539.173789] xhci_hcd 0000:00:14.0: // Ding dong!
[ 2539.174915] xhci_hcd 0000:00:14.0: Successful setup address command
[ 2539.174922] xhci_hcd 0000:00:14.0: Op regs DCBAA ptr = 0x000003ead2c000
[ 2539.174928] xhci_hcd 0000:00:14.0: Slot ID 17 dcbaa entry @000000003007eed2 = 0x000003eac96000
[ 2539.174933] xhci_hcd 0000:00:14.0: Output Context DMA address = 0x3eac96000
[ 2539.174937] xhci_hcd 0000:00:14.0: Internal device address = 17
[ 2539.198845] xhci_hcd 0000:00:14.0: Max Packet Size for ep 0 changed.
[ 2539.198851] xhci_hcd 0000:00:14.0: Max packet size in usb_device = 8
[ 2539.198854] xhci_hcd 0000:00:14.0: Max packet size in xHCI HW = 64
[ 2539.198857] xhci_hcd 0000:00:14.0: Issuing evaluate context command.
[ 2539.198867] xhci_hcd 0000:00:14.0: // Ding dong!
[ 2539.198886] xhci_hcd 0000:00:14.0: Successful evaluate context command
[ 2539.200465] usb 1-10.2: no configurations
[ 2539.200472] usb 1-10.2: can't read configurations, error -22
[ 2539.200634] xhci_hcd 0000:00:14.0: // Ding dong!
[ 2539.200641] usb 1-10-port2: unable to enumerate USB device

https://www.sendspace.com/file/d923jl...

Read more...

Revision history for this message
Nick Klein (sledgebullet) wrote :

Confirmed this continues on Ubuntu 20.10.

Revision history for this message
In , ehoffman (ehoffman-linux-kernel-bugs) wrote :

I have same issue with HackRF SDR, and there's a bug on their side

https://github.com/mossmann/hackrf/issues/783

I connect device:

[ 428.013129] usb 1-10: USB disconnect, device number 4
[ 2163.462098] usb 1-10: new high-speed USB device number 6 using xhci_hcd
[ 2163.699532] usb 1-10: New USB device found, idVendor=1d50, idProduct=6089, bcdDevice= 1.02
[ 2163.699535] usb 1-10: New USB device strings: Mfr=1, Product=2, SerialNumber=4
[ 2163.699536] usb 1-10: Product: HackRF One
[ 2163.699538] usb 1-10: Manufacturer: Great Scott Gadgets
[ 2163.699539] usb 1-10: SerialNumber: 0000000000000000b65c67dc32a3535f

I run test once, and after device close:

[ 2187.589321] xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Then, if I try to run other tests, the device no longer respond until I reset the device (it has a reset button) or disconnect/reconnect it.

Here, I have the same setup as some other people reporting the issue, a B350 chipset (Ryzen 2700X), on ASUS ROG STRIX B350-F GAMING motherboard.

Kernel version: uname -a
Linux lx-ryzen 5.4.0-53-generic #59-Ubuntu SMP Wed Oct 21 09:38:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

This does the issue with USB3 and USB2 ports.

This same device work fine on another computer (a non-Ryzen laptop), same kernel.

Revision history for this message
In , ehoffman (ehoffman-linux-kernel-bugs) wrote :

Same result with B450 chipset, same kernel.
ASUS PRIME B450M-A motherboard.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

Could this issue be caused by USB Controllers / Chipsets made by ASMedia?
B350 / X370 / B450 / X470 Chipsets are all manufactured by ASMedia.

And @Michael mentioned that one of his Intel systems (I'm assuming the one with a Asus P9X79 motherboard?) has issues with USB 3, but not with USB 2. So I looked up that particular mobo and guess what:

ASMedia® USB 3.0 controller :
4 x USB 3.1 Gen 1 port(s) (4 at back panel, blue)
Intel® X79 chipset :
14 x USB 2.0 port(s) (6 at back panel, black+white, 8 at mid-board)

Only the USB 3.1 Gen 1 Ports are using the ASMedia controller.

Revision history for this message
In , github (github-linux-kernel-bugs) wrote :

My board has two controllers and both shows the same behaviour.

lspci|grep "USB controller"
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
08:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

Asus PRIME Z270-K, BIOS 1207 06/22/2018

Revision history for this message
In , wgh (wgh-linux-kernel-bugs) wrote :
Download full text (3.7 KiB)

The same problem with HackRF: it stops working after using it once (presumably due to transfers being cancelled on exit). hackrf_info still detects it though, which is probably because only the bulk transfer endpoint becomes broken.

Kernel 5.9.11

ASRock B550 Extreme4, BIOS P1.20 08/13/2020

01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee
0c:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller

The debug messages on stopping the tool:

[ 3944.894483] xhci_hcd 0000:01:00.0: Transfer error for slot 28 ep 2 on endpoint
[ 3944.894494] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3944.894602] xhci_hcd 0000:01:00.0: Ignoring reset ep completion code of 1
[ 3945.396550] xhci_hcd 0000:01:00.0: Cancel URB 000000008085c3f5, dev 5, ep 0x81, starting at offset 0x1fa7ae1c90
[ 3945.396558] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396690] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1c90 (dma).
[ 3945.396710] xhci_hcd 0000:01:00.0: Cancel URB 000000000098c2b5, dev 5, ep 0x81, starting at offset 0x1fa7ae1990
[ 3945.396712] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396836] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1990 (dma).
[ 3945.396839] xhci_hcd 0000:01:00.0: Finding endpoint context
[ 3945.396841] xhci_hcd 0000:01:00.0: Cycle state = 0x1
[ 3945.396843] xhci_hcd 0000:01:00.0: New dequeue segment = 000000008a0bf921 (virtual)
[ 3945.396845] xhci_hcd 0000:01:00.0: New dequeue pointer = 0x1fa7ae1a90 (DMA)
[ 3945.396847] xhci_hcd 0000:01:00.0: Set TR Deq Ptr cmd, new deq seg = 000000008a0bf921 (0x1fa7ae1000 dma), new deq ptr = 00000000e5711e6d (0x1fa7ae1a90 dma), new cycle = 1
[ 3945.396851] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396869] xhci_hcd 0000:01:00.0: Cancel URB 000000008b1032dd, dev 5, ep 0x81, starting at offset 0x1fa7ae1a90
[ 3945.396871] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.396904] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 3945.396909] xhci_hcd 0000:01:00.0: Slot state = 3, EP state = 2
[ 3945.397028] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1a90 (dma).
[ 3945.397040] xhci_hcd 0000:01:00.0: Cancel URB 00000000b1b43562, dev 5, ep 0x81, starting at offset 0x1fa7ae1b90
[ 3945.397042] xhci_hcd 0000:01:00.0: // Ding dong!
[ 3945.397172] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1b90 (dma).

Messages on attempts to use the device again:

[ 4076.243019] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4076.243029] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4076.243044] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4076.243051] xhci_hcd 0000:01:00.0: WARN halted endpoint, queueing URB anyway.
[ 4077.749450] xhci_hcd 0000:01:00.0: Cancel URB 0000000063c2cde4, dev 5, ep 0x81, starting at offset 0x1fa7ae1d90
[ 4077.749456] xhci_hcd 0000:01:00.0: // Ding dong!
[ 4077.749592] xhci_hcd 0000:01:00.0: Removing canceled TD starting at 0x1fa7ae1d90 (dma).
[ 4077.749620] xhci_hcd 0000:01:00.0: Cancel URB 00000000564ffbd2, dev 5, ep 0x81, starting at offset 0x1fa7ae1e90
[ 4077.749622] xhci_hcd 0000:01...

Read more...

Revision history for this message
In , ehoffman (ehoffman-linux-kernel-bugs) wrote :

I think I've narrow it down to a minimum.

Using libusb, if you happen to call libusb_control_transfer(), which is synchronous I/O, while there's asynchronous I/O already in progress (usually caused by application error), then the cleanup to the in-progress asynchronous I/O will cause the error.

Ex:
...
libusb_submit_transfer(...); // Start async I/O
...
libusb_control_transfer(...); // Sync I/O
...
libusb_cancel_transfer(...); // .... WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
...

Or, simply

...
libusb_submit_transfer(...); // Start async I/O
...
libusb_control_transfer(...); // Sync I/O
...
libusb_release_interface(...); // Will cleanup async I/O in progress .... WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
...

The thing is that in the example above, even though there's an application error, the behavior is different on different chipset driver.

In the case of HackRF application (which I mentioned above), the application called libusb_cancel_transfer() on exit followed by a libusb_control_transfer. That should work, since the async (streaming) I/O were truminated before sync I/O, but a final callback did a call (by mistake, did not check that the driver was in process of shutting down) to libusb_submit_transfer(). This created a race condition in which if that made it possible to trigger the above bug (made easily reproducible with properly placed 'sleep()' call).

Revision history for this message
Rafael Waldo (lordrafa) wrote :

Hello, I have a Razer Core Chroma with this USB contoller, and I can confirm that this issue still present at 5.9.16.

Revision history for this message
In , sandro.stross (sandro.stross-linux-kernel-bugs) wrote :

Same problems here:

ASUS X470-F Gaming
Ryzen 2700x

most problems i have with rt2800usb driver.

Any progress?

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

rewritten URB cancel, endpoint stop and set trb deq can be found in my tree
in rewrite_halt_stop_handling branch

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git rewrite_halt_stop_handling

https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=rewrite_halt_stop_handling

Does that help?

Revision history for this message
In , t.clastres (t.clastres-linux-kernel-bugs) wrote :

Created attachment 294739
Patch from Nyman's rewrite_halt_stop_handling branch

Revision history for this message
In , t.clastres (t.clastres-linux-kernel-bugs) wrote :

(In reply to Mathias Nyman from comment #139)
> rewritten URB cancel, endpoint stop and set trb deq can be found in my tree
> in rewrite_halt_stop_handling branch
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
> rewrite_halt_stop_handling
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/
> ?h=rewrite_halt_stop_handling
>
> Does that help?

Just created the corresponding patch to easily apply your changes to linux 5.10.y.

I don't get "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state." anymore, but the problem is still here.

After connecting my android phone, I start to get `hub 2-8:1.0: hub_ext_port_status failed (err = -110)` and `device descriptor read/8, error -110` spammed for a while.
The immediate issue is the usb port in question not working but what's worrying is the issue seems to propagate to other usb ports like the ones used by my mouse or keyboard. I guess it's because they are part of the same hub?
Maybe this problem is unrelated but in any case let me know.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Thanks for your effort. Unfortunately it doesn't fix the issue.
Tested two WiFi devices on
$ uname -r
5.10.7-arch1-1

First device:
$ lsusb
ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter

[ 109.165827] usb 1-2: new high-speed USB device number 4 using xhci_hcd
[ 109.410190] usb 1-2: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[ 109.410195] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 109.410197] usb 1-2: Product: 802.11 n WLAN
[ 109.410199] usb 1-2: Manufacturer: Ralink
[ 109.410201] usb 1-2: SerialNumber: 1.0
[ 109.624366] usb 1-2: reset high-speed USB device number 4 using xhci_hcd
[ 109.858679] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[ 110.536313] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 5370 detected
[ 110.542761] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[ 110.555906] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[ 117.628420] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 117.628459] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36

Now running WiFi driver test (we use monitor mode to produce heavy workload):
$ sudo hcxdumptool -i wlp3s0f0u2 --check_driver

[ 121.752121] device wlp3s0f0u2 entered promiscuous mode
[ 121.771509] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 121.822851] device wlp3s0f0u2 left promiscuous mode

Second device:
$ lsusb:
ID 148f:5572 Ralink Technology, Corp. RT5572 Wireless Adapter

[ 419.565208] usb 1-2: new high-speed USB device number 5 using xhci_hcd
[ 419.741196] usb 1-2: New USB device found, idVendor=148f, idProduct=5572, bcdDevice= 1.01
[ 419.741201] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 419.741204] usb 1-2: Product: 802.11 n WLAN
[ 419.741206] usb 1-2: Manufacturer: Ralink
[ 419.741208] usb 1-2: SerialNumber: 1.0
[ 419.950046] usb 1-2: reset high-speed USB device number 5 using xhci_hcd
[ 420.181669] ieee80211 phy2: rt2x00_set_rt: Info - RT chipset 5592, rev 0222 detected
[ 420.859692] ieee80211 phy2: rt2x00_set_rf: Info - RF chipset 000f detected
[ 420.868375] ieee80211 phy2: Selected rate control algorithm 'minstrel_ht'
[ 420.887633] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[ 434.285018] ieee80211 phy2: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 434.285066] ieee80211 phy2: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36

Now running WiFi driver test (we use monitor mode to produce heavy workload):
$ sudo hcxdumptool -i wlp3s0f0u2 --check_driver

[ 463.468004] device wlp3s0f0u2 entered promiscuous mode
[ 537.382571] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 537.384485] ieee80211 phy2: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x0500 with error -19
[ 537.411446] device wlp3s0f0u2 left promiscuous mode

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

The message "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state." is gone for me, but the device still doesn't work after unplugging it and plugging it in again.

After unplugging I get this message in dmesg:
ieee80211 phy1: rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x1004 with error -19

Revision history for this message
In , sandro.stross (sandro.stross-linux-kernel-bugs) wrote :

I am on Kali Linux 2020.4 and tried to use the patch @Mathias Nyman released.

but it failed.

did someone know a tutorial on how to do this on Kali?

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (6.6 KiB)

An ASUS TUF Gaming notebook (AMD Ryzen 5 3550H), showing a different behavior on the same device:
$ lsusb
ID 148f:5572 Ralink Technology, Corp. RT5572 Wireless Adapter

$ uname -r
5.10.7-arch1-1

$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7
01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10fa (rev a1)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
03:00.0 Non-Volatile memory controller: Micron Technology Inc Device 5410 (rev 01)
04:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821CE 802.11ac PCIe Wireless Network Adapter
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c2)
05:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
05:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
05:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller

We end up in an endless " rt2x00queue_flush_queue: Warning - Queue 0 failed to flush" / "USB disconnect, device number x" loop, which spam dmesg:
[ 61.139415] usb 3-2: new high-speed USB device number 3 using xhci_hcd
[ 61.297702] usb 3-2: New USB device found, idVendor=148f, idProduct=5572, bcdDevice= 1.01
[ 61.297708] ...

Read more...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ S4ndm4n KALI is not the best choice to do module tests. It is designed to perform penetration tests and many modules are either patched or third party modules.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Since issue affects mostly rt2800usb devices, maybe we can add quirk to xhci to restore pre 4.20 behaviour for endpoints that are used by rt2800usb.

Please check if patch like this make the problem gone:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index dfa61de7c83f..b75a16e5cc9d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2568,6 +2568,8 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td,
                remaining = 0;
                break;
        case COMP_USB_TRANSACTION_ERROR:
+ if (1) /* this will be quirk for disable Soft Retry */
+ break;
                if ((ep_ring->err_count++ > MAX_SOFT_RETRY) ||
                    le32_to_cpu(slot_ctx->tt_info) & TT_SLOT)
                        break;

If it does, I could then prepare patch that will change this part only for rt2800usb.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

This patch fixes the issue for me

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Doesn't work on
$ uname -r
5.10.8-arch1-1
and
ID 148f:5572 Ralink Technology, Corp. RT5572 Wireless Adapter

Warning "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state." is gone, but interface freeze.

To reproduce the issue:
$ hcxdumptool -I
wlan interfaces:
dc4ef4086e71 wlp3s0f0u2 (rt2800usb)

$ sudo hcxdumptool -i wlp3s0f0u2 --check_injection
initialization...
interface freeze and must be disconnected

expected result (we use a chipset, known as working regardless of the xhci issue connected to tthe same USB3 port):
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter

$ hcxdumptool -I
wlan interfaces:
c83a35cb08e3 wlp3s0f0u2 (rt2800usb)

$ sudo hcxdumptool -i wlp3s0f0u2 --check_injection
initialization...
starting antenna test and packet injection test (that can take up to two minutes)...
available channels: 1,2,3,4,5,6,7,8,9,10,11,12,13,14
packet injection is working on 2.4GHz!
injection ratio: 72% (BEACON: 87 PROBERESPONSE: 63)
your injection ratio is good
antenna ratio: 83% (NETWORK: 24 PROBERESPONSE: 20)
your antenna ratio is excellent, let's ride!
4 driver errors encountered during the test

terminating...

BTW:
Don't worry about the 4 driver errors. The first received packets (via raw socket) after entering monitor mode don't contain a radiotap header. This could be driver related.

BTW:
Now we connect this device
ID 148f:5572 Ralink Technology, Corp. RT5572 Wireless Adapter
to an USB2 port:

$ sudo hcxdumptool -i wlp39s0f3u1u1u2 --check_injection
initialization...
starting antenna test and packet injection test (that can take up to two minutes)...
available channels: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,149,151,153,155,157,159,161,165,184,188,192,196
packet injection is working on 2.4GHz!
injection ratio: 31% (BEACON: 16 PROBERESPONSE: 5)
your injection ratio is average, but there is still room for improvement
antenna ratio: 66% (NETWORK: 3 PROBERESPONSE: 2)
your antenna ratio is good

terminating...

No warning, no disconnect, everything is fine.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Michael, you experiencing different problem than Bernhard. Perhaps you can bisect this or just check if it ever worked on some older kernel (in this broken case of RT5572 + USB3) .

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Stanislaw, are you sure?

Same device, tested on Intel system (xhci unpatched):
Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
06:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
07:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller

ASM1042 should be USB3, too - but I'm not sure.

$ uname -r
5.10.8-arch1-1
and
ID 148f:5572 Ralink Technology, Corp. RT5572 Wireless Adapter

$ sudo hcxdumptool -I
wlan interfaces:
dc4ef4086e71 wlp0s26u1u2 (rt2800usb)

$ sudo hcxdumptool -i wlp0s26u1u2 --check_injection
initialization...
starting antenna test and packet injection test (that can take up to two minutes)...
available channels: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,149,151,153,155,157,159,161,165,184,188,192,196
packet injection is working on 2.4GHz!
injection ratio: 35% (BEACON: 176 PROBERESPONSE: 62)
your injection ratio is average, but there is still room for improvement
antenna ratio: 35% (NETWORK: 20 PROBERESPONSE: 7)
your antenna ratio is average, but there is still room for improvement
2 driver errors encountered during the test

terminating...

BTW:
5GHz injection not shown as working, because I haven't set up a 5GHz ACCESS POINT to respond to hcxdumptool

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

I forgot to mention for the RYZEN system:
03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller (rev 02)

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Stanislaw, the difference between them
using ehci-pci on the Intel system
vs
using xhci_hcd on the RYZEN
Thanks for pointing me into this direction.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

> Stanislaw, are you sure?

Well, I asked to check if hardware combination that is now broken for you ever worked. In Bernhard case it worked on 4.19 and stop to work on 4.20 and he was able to identify broken commit.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 294785
rt2800usb_no_soft_retry.patch

This is patch that restore old xhci behaviour only for rt2800usb. I use usb->transfer_flags to add "quirk" flag. Mathias, do you think it's ok to avoid Soft Retry this way, maybe you have some better idea as solution?

Bernhard, please test if it still fixes the issue for you .

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

/home/zerobeat/Downloads/linux-5.10.8/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c:217:25: error: 'URB_SOFT_RETRY_NOT_OK' undeclared (first use in this function); did you mean 'URB_SHORT_NOT_OK'?
  217 | urb->transfer_flags |= URB_SOFT_RETRY_NOT_OK;
      | ^~~~~~~~~~~~~~~~~~~~~
      | URB_SHORT_NOT_OK

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

As Bernhard, I can confirm that this patch
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c147
is working for me.
This device
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c10
now is working after applying that patch.
Also I can confirm the the issue on the RT5572 is related to USB3 and it is a new one.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :
Download full text (3.6 KiB)

When applying the patch from https://bugzilla.kernel.org/show_bug.cgi?id=202541#c155 the device works, but now this message shows up very often in dmesg:

[ 194.130691] usb 1-3: BOGUS urb flags, 1000200 --> 200
[ 194.130704] WARNING: CPU: 0 PID: 113 at drivers/usb/core/urb.c:517 usb_submit_urb+0x1c9/0x5e0
[ 194.130705] Modules linked in: rt2800usb rt2x00usb rt2800lib rt2x00lib snd_usb_audio mac80211 btusb btrtl btbcm snd_usbmidi_lib btintel snd_rawmidi snd_seq_device bluetooth xpad libarc4 mc joydev ff_memless mousedev ecdh_generic ecc cfg80211 ccm algif_aead cbc des_generic libdes ecb edac_mce_amd kvm_amd algif_skcipher rfkill kvm cmac md4 algif_hash af_alg ppdev wmi_bmof irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd snd_hda_codec_realtek cryptd glue_helper snd_hda_codec_generic amdgpu rapl ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec vfat fat snd_hda_core snd_hwdep soundwire_bus pcspkr r8169 snd_soc_core k10temp sp5100_tco snd_compress ccp nf_log_ipv6 realtek mdio_devres gpu_sched ac97_bus i2c_piix4 ip6t_REJECT libphy nf_reject_ipv6 i2c_algo_bit rng_core snd_pcm_dmaengine ttm snd_pcm drm_kms_helper snd_timer cec snd syscopyarea xt_hl sysfillrect sysimgblt
[ 194.130748] fb_sys_fops soundcore ip6t_rt wmi parport_pc parport pinctrl_amd gpio_amdpt video gpio_generic mac_hid nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype xt_tcpudp xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter pkcs8_key_parser crypto_user drm fuse agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_steam usbhid dm_mod crc32c_intel xhci_pci xhci_pci_renesas
[ 194.130769] CPU: 0 PID: 113 Comm: kworker/u32:8 Tainted: G W 5.10.8-arch1-1 #3
[ 194.130770] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 PC MATE (MS-7A34), BIOS A.J0 01/23/2019
[ 194.130774] Workqueue: phy1 rt2x00usb_work_rxdone [rt2x00usb]
[ 194.130776] RIP: 0010:usb_submit_urb+0x1c9/0x5e0
[ 194.130777] Code: bc 24 a0 00 00 00 48 89 54 24 08 e8 41 c1 f3 ff 48 8b 54 24 08 45 89 f0 44 89 f9 48 89 c6 48 c7 c7 60 49 ff 96 e8 d1 9c 2c 00 <0f> 0b 83 e3 01 0f 85 f1 00 00 00 8b 74 24 04 48 83 c4 18 48 89 ef
[ 194.130778] RSP: 0018:ffffb97d00777d70 EFLAGS: 00010282
[ 194.130779] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff9a5d0ec18bb8
[ 194.130780] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9a5d0ec18bb0
[ 194.130780] RBP: ffff9a5a337980c0 R08: 0000000000000000 R09: ffffb97d00777ba8
[ 194.130781] R10: ffffb97d00777ba0 R11: ffffffff976ca568 R12: ffff9a5a191a1800
[ 194.130782] R13: 0000000000000002 R14: 0000000000000200 R15: 0000000001000200
[ 194.130783] FS: 0000000000000000(0000) GS:ffff9a5d0ec00000(0000) knlGS:0000000000000000
[ 194.130783] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 194.130784] CR2: 00007fb1c6c7e7c8 CR3: 000000011c504000 CR4: 00000000003506f0
[ 194.130784] Call Trace:
[ 194.130788] rt2x00usb_kick_rx...

Read more...

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Stanislaw, may I ask a question?
I purchased the RT5572 adapter several days before. It never worked due to the xhci issue. Now (after your help), it is working and we (with lots of your help) encountered a new issue. Affected combination exclusively
RT5572 - USB3 - xhci - rt2800usb
Should I report this issue?
If yes, is it a xhci issue or a rt2800usb issue?

@ Bernhard, I didn't notice this messages in combination with monitor mode - but I noticed them when running managed in combination with NetworkManager.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 294799
rt2800usb_no_soft_retry_v2.patch

This one should make "BOGUS urb flags" messages gone. Please test.

Patch is for 4.11-rc , perhaps for 4.10 it requires some changes.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Michael from comment #159)
> @ Stanislaw, may I ask a question?
> I purchased the RT5572 adapter several days before. It never worked due to
> the xhci issue. Now (after your help), it is working and we (with lots of
> your help) encountered a new issue. Affected combination exclusively
> RT5572 - USB3 - xhci - rt2800usb
> Should I report this issue?
> If yes, is it a xhci issue or a rt2800usb issue?

Taking this happen only on some particular hardware, it can be driver, firmware or even hardware issue (both on rt2800usb or usb host). If you can find if this worked on some older kernel version and bisect it, you could report the issue, otherwise (without bisection) I do not see any chance to fix this problem.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@ Stanislaw, thanks for your reply.
At this moment, there are too many screws that are turned.
First I'll wait until the "WARN Set TR Deq Ptr cmd failed" received a final fix. Than I'll dive into the driver code to find out, what is going wrong there.

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

The last patch fixes the issue for me and the BOGUS messages are now gone too. Thanks

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :
Download full text (4.2 KiB)

patch v2 causes monitor mode to crash (on ioctl() system calls:

[ 602.100650] usb 1-2: BOGUS urb flags, 208 --> 200
[ 602.100691] WARNING: CPU: 10 PID: 15060 at drivers/usb/core/urb.c:517 usb_submit_urb+0x1c9/0x5e0
[ 602.100692] Modules linked in: mt7601u rt2800usb(OE) rt2x00usb(OE) rt2800lib(OE) rt2x00lib(OE) mac80211 libarc4 nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) cfg80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg rfkill soundwire_intel soundwire_generic_allocation soundwire_cadence 8021q snd_hda_codec garp mrp edac_mce_amd stp llc snd_hda_core snd_hwdep soundwire_bus r8169 snd_soc_core kvm realtek snd_compress nls_iso8859_1 ac97_bus snd_pcm_dmaengine vfat mdio_devres irqbypass mousedev crct10dif_pclmul fat ppdev snd_pcm crc32_pclmul ghash_clmulni_intel wmi_bmof mxm_wmi drm_kms_helper aesni_intel snd_timer cec ccp snd syscopyarea crypto_simd sysfillrect sp5100_tco cryptd usbhid sysimgblt glue_helper libphy soundcore fb_sys_fops pcspkr i2c_piix4 rng_core k10temp rapl parport_pc parport wmi pinctrl_amd gpio_amdpt gpio_generic mac_hid acpi_cpufreq drm sg fuse crypto_user agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic
[ 602.100876] crc16 mbcache jbd2 crc32c_intel sr_mod xhci_pci cdrom xhci_pci_renesas
[ 602.100879] CPU: 10 PID: 15060 Comm: hcxdumptool Tainted: P W OE 5.10.9-arch1-1 #1
[ 602.100880] Hardware name: Micro-Star International Co., Ltd. MS-7A33/X370 KRAIT GAMING (MS-7A33), BIOS 1.F0 11/06/2018
[ 602.100881] RIP: 0010:usb_submit_urb+0x1c9/0x5e0
[ 602.100882] Code: bc 24 a0 00 00 00 48 89 54 24 08 e8 01 c1 f3 ff 48 8b 54 24 08 45 89 f0 44 89 f9 48 89 c6 48 c7 c7 f8 47 bf b9 e8 51 99 2c 00 <0f> 0b 83 e3 01 0f 85 f1 00 00 00 8b 74 24 04 48 83 c4 18 48 89 ef
[ 602.100882] RSP: 0018:ffffb69648f5fb10 EFLAGS: 00010282
[ 602.100883] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff88a90ee98bb8
[ 602.100884] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88a90ee98bb0
[ 602.100884] RBP: ffff88a60f3fb500 R08: 0000000000000000 R09: ffffb69648f5f948
[ 602.100926] R10: ffffb69648f5f940 R11: ffffffffba2c0500 R12: ffff88a6152db800
[ 602.100926] R13: 0000000000000002 R14: 0000000000000200 R15: 0000000000000208
[ 602.100927] FS: 00007fef23ab0280(0000) GS:ffff88a90ee80000(0000) knlGS:0000000000000000
[ 602.101008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 602.101009] CR2: 00007fef2403eff8 CR3: 000000014b076000 CR4: 00000000003506e0
[ 602.101050] Call Trace:
[ 602.101132] rt2x00usb_kick_rx_entry+0xa0/0x100 [rt2x00usb]
[ 602.101175] rt2x00queue_init_queues+0xb3/0x100 [rt2x00lib]
[ 602.101257] rt2x00lib_enable_radio+0x25/0xa0 [rt2x00lib]
[ 602.101300] rt2x00lib_start+0x7c/0xc0 [rt2x00lib]
[ 602.101391] drv_start+0x3d/0x100 [mac80211]
[ 602.101444] ieee80211_do_open+0x1c4/0x9c0 [mac80211]
[ 602.101536] ? ieee80211_check_concurrent_iface+0x14f/0x1c0 [mac80211]
[ 602.101577] __dev_open+0xfb/0x1b0
[ 602.101658] __dev_change_flags+0x1a6/0x210
[ 602.101699] ? enqueue_task_fair+0x8a/0x5d0
[ 602.101780] dev_change_flags+0x21/0x60
[ 602.101821] devinet_ioctl+0x641/0x810
[ 602.101823] ? preempt_schedule_...

Read more...

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Michael from comment #164)
> patch v2 causes monitor mode to crash (on ioctl() system calls:
>
> [ 602.100650] usb 1-2: BOGUS urb flags, 208 --> 200
> [ 602.100691] WARNING: CPU: 10 PID: 15060 at drivers/usb/core/urb.c:517
> usb_submit_urb+0x1c9/0x5e0
[snip]
> [ 602.100879] CPU: 10 PID: 15060 Comm: hcxdumptool Tainted: P W OE
> 5.10.9-arch1-1 #1

Those are same "BOGUS urb flags" messages like reported before by Bernhard. I think you did not correctly apply v2 patch on top of 5.10. Please double check if this hunk is present on your backported patch:

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 357b149b20d3..140bac59dc32 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -495,7 +495,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)

        /* Check against a simple/standard policy */
        allowed = (URB_NO_TRANSFER_DMA_MAP | URB_NO_INTERRUPT | URB_DIR_MASK |
- URB_FREE_BUFFER);
+ URB_SOFT_RETRY_NOT_OK | URB_FREE_BUFFER);
        switch (xfertype) {
        case USB_ENDPOINT_XFER_BULK:
        case USB_ENDPOINT_XFER_INT:

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

 The patch was applied to urb.c (5.10.9):
 /* Check against a simple/standard policy */
 allowed = (URB_NO_TRANSFER_DMA_MAP | URB_NO_INTERRUPT | URB_DIR_MASK |
     URB_SOFT_RETRY_NOT_OK | URB_FREE_BUFFER);
 switch (xfertype) {
 case USB_ENDPOINT_XFER_BULK:
 case USB_ENDPOINT_XFER_INT:

At this moment, I don't know what exactly went wrong. I'll try to identify the issue.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Maybe usb layer was compiled in the kernel and you only reload modules.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Thanks. Now the modules are loaded correctly and the BOGUS messages disappeared.

Unfortunately monitor mode is not working with v2:
ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter

before:
$ sudo hcxdumptool -i wlp3s0f0u2 --check_injection
initialization...
starting antenna test and packet injection test (that can take up to two minutes)...
available channels: 1,2,3,4,5,6,7,8,9,10,11,12,13,14
packet injection is working on 2.4GHz!
injection ratio: 54% (BEACON: 123 PROBERESPONSE: 67)
your injection ratio is good
antenna ratio: 45% (NETWORK: 20 PROBERESPONSE: 9)
your antenna ratio is average, but there is still room for improvement

terminating...

after v2:
$ sudo hcxdumptool -i wlp3s0f0u2 --check_injection
initialization...
starting antenna test and packet injection test (that can take up to two minutes)...
available channels: 1,2,3,4,5,6,7,8,9,10,11,12,13,14
warning: no PROBERESPONSE received - packet injection is probably not working!
8 driver errors encountered during the test

terminating...

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Michael, at this point I really doubt about reliability of your testing.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Stanislaw, and you're not the only one. I doubt it, too.
Maybe I patched my kernel to death and it is time for me to compile a fresh one.
But anyway, thanks for your effort an for your patience.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

Stanislaw, short notice for you. Now, I'm running the fresh kernel (the RYZEN is really fast compiling it). Patch v2 is applied.
Everything is working fine and all Bogus messages are gone.
Thanks again.

Revision history for this message
In , wgh (wgh-linux-kernel-bugs) wrote :

(In reply to Mathias Nyman from comment #139)
> rewritten URB cancel, endpoint stop and set trb deq can be found in my tree
> in rewrite_halt_stop_handling branch
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
> rewrite_halt_stop_handling
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/
> ?h=rewrite_halt_stop_handling
>
> Does that help?

I applied the patch to 5.10.11-gentoo, and it did help with my HackRF One (see comment #136 for details and hardware)! No ill effects so far.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

After discussion on my posted patch here:

https://<email address hidden>/t/#u

it was concluded that this should be rather be xhci quirk instead of rt2800usb driver flag.

If change from comment 147 help for you with the problem, please provide PCI-id of your xHCI controller. This can be done by command:

lspci -k -nn | grep -B2 xhci

If you have more than one xHCI controller please assure you provide PCI-id's of one that actually has the problem ('lspci -t' command can be useful as well)

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Stanislaw Gruszka from comment #173)
> If you have more than one xHCI controller please assure you provide PCI-id's
> of one that actually has the problem ('lspci -t' command can be useful as
> well)

I meant 'lsusb -t'

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
Subsystem: ASMedia Technology Inc. Device [1b21:1142]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 295055
0001-usb-xhci-do-not-perform-Soft-Retry-for-some-xHCI-hos.patch

This is next proposed fix. It suppose to disable Soft Retry for affected xHCI controllers. Currently only for xHCI device reported by Michael:
PCI_VENDOR_ID_AMD = 0x1022 , PCI_DEVICE_ID_AMD_PROMONTORYA_4 = 0x43b9

If you want to test and have different xHCI host you need to add your PCI-id's to
drivers/usb/host/xhci-pci.c part of the patch.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Stanislaw, I followed the discussion you mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c173

Other devices than rt2800usb devices are affected, too.
Tested this one before applying your patch:
ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
and running into the same xhci issue on USB controller mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c175

[10214.423508] usb 1-2: new high-speed USB device number 3 using xhci_hcd
[10214.602833] usb 1-2: New USB device found, idVendor=7392, idProduct=7710, bcdDevice= 0.00
[10214.602838] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[10214.602841] usb 1-2: Product: Edimax Wi-Fi
[10214.602843] usb 1-2: Manufacturer: MediaTek
[10214.602845] usb 1-2: SerialNumber: 1.0
[10214.931553] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
[10215.102895] mt7601u 1-2:1.0: ASIC revision: 76010001 MAC revision: 76010500
[10215.132670] mt7601u 1-2:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[10216.101346] mt7601u 1-2:1.0: EEPROM ver:0d fae:00
[10216.111983] mt7601u 1-2:1.0: EEPROM country region 01 (channels 1-13)
[10217.189574] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[10217.190361] usbcore: registered new interface driver mt7601u
[10217.199429] mt7601u 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[10296.419053] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[10296.419228] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :

The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE 20):

Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:49:49 [kernel] [35029.419748] usb 1-6: USB disconnect, device number 3
Feb 03 09:49:52 [kernel] [35031.994403] usb 1-6: new full-speed USB device number 6 using xhci_hcd
Feb 03 09:50:45 [kernel] [35085.400634] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:50:45 [kernel] [35085.404278] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX
Feb 03 09:50:45 [kernel] [35085.404398] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 1
Feb 03 09:50:45 [kernel] [35085.404401] xhci_hcd 0000:01:00.0: Looking for event-dma 00000008146ff050 trb-start 00000008146ff060 trb-end 00000008146ff060 seg-start 00000008146ff000 seg-end 00000008146ffff0

$ lspci -k -nn | grep -B2 xhci
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
 Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142]
 Kernel driver in use: xhci_hcd
--
09:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1)
 Subsystem: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:139d]
 Kernel driver in use: xhci_hcd
--
0a:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f]
 Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:7914]
 Kernel driver in use: xhci_hcd

$ uname -a
Linux Gentoo 5.4.92-gentoo #1 SMP PREEMPT Thu Jan 28 20:45:52 MSK 2021 x86_64 AMD Ryzen 5 2600 Six-Core Processor AuthenticAMD GNU/Linux

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to Michael from comment #177)
> Other devices than rt2800usb devices are affected, too.
> Tested this one before applying your patch:
> ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
> and running into the same xhci issue on USB controller mentioned here:
> https://bugzilla.kernel.org/show_bug.cgi?id=202541#c175

Ok, so it makes sense to disable Soft Retry per xHCI.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to alpir from comment #178)
> The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE
> 20):
>
> Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR
> Deq Ptr cmd failed due to incorrect slot or ep state.

alpir, does the change from comment 147 help for you ?

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

alpir, you have different device-id than Michael, but you both have the same subsytem device: ASMedia 1b21:1142. So perhaps patch should be based on subdevice id's. Let's wait for other users reports regarding xHCI controller, we will see then.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :
Download full text (9.5 KiB)

I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the same.

Why did I even start looking for the reason for the strange behavior of OSD ports: two my JetFlash Transcend 8GB flash drives connected to the USB3 port is sometimes not detected by the system as being mountable (fat32). When I run a disk check (8 Gb) with the command badblocks -nvs / dev / sdd, then after a while the check ends with the following error: Pass completed, 5662144 bad blocks found. (5662144/0/0 errors). And both flash drives.

But if you connect them to USB2, then there are no errors at all.

At the same time, when looking at the logs, I found errors: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Now, after patch, i get next in logs:

Feb 03 17:47:14 [kernel] [ 52.603587] usb 2-3: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:47:14 [kernel] [ 52.636130] usb-storage 2-3:1.0: USB Mass Storage device detected
Feb 03 17:47:14 [kernel] [ 52.636242] scsi host11: usb-storage 2-3:1.0
Feb 03 17:47:14 [kernel] [ 52.651996] usbcore: registered new interface driver uas
Feb 03 17:47:16 [kernel] [ 54.013780] scsi 11:0:0:0: Direct-Access JetFlash Transcend 8GB 1100 PQ: 0 ANSI: 6
Feb 03 17:47:16 [kernel] [ 54.014688] sd 11:0:0:0: [sdd] 15425536 512-byte logical blocks: (7.90 GB/7.36 GiB)
Feb 03 17:47:16 [kernel] [ 54.015150] sd 11:0:0:0: [sdd] Write Protect is off
Feb 03 17:47:16 [kernel] [ 54.015156] sd 11:0:0:0: [sdd] Mode Sense: 43 00 00 00
Feb 03 17:47:16 [kernel] [ 54.015625] sd 11:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 03 17:47:16 [kernel] [ 54.028165] sdd: sdd1
Feb 03 17:47:16 [kernel] [ 54.045687] sd 11:0:0:0: [sdd] Attached SCSI removable disk
Feb 03 17:48:04 [kernel] [ 102.221862] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:51:52 [kernel] [ 330.009696] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:55:55 [kernel] [ 573.644576] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:01 [kernel] [ 579.149875] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:01 [kernel] [ 579.254204] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:06 [kernel] [ 584.781836] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:07 [kernel] [ 585.073435] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:12 [kernel] [ 590.413816] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:12 [kernel] [ 590.518146] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:18 [kernel] [ 596.046034] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:18 [kernel] [ 596.336445] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:23 [kernel] [ 601.677932] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:23 [kernel] [ 601.782091] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:29 [kernel] [ 607.309722] usb 2-3: device descr...

Read more...

Revision history for this message
In , bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote :

My controller has the PCI ID 43bb, so I've added "PCI_DEVICE_ID_AMD_PROMONTORYA_2" to the patch from #176, and that fixed the issue for me.

Revision history for this message
In , ZeroBeat (zerobeat-linux-kernel-bugs) wrote :

@Stanislaw, I'm running an older mobo and a RYZEN 1700.
I don't need CPU power - GPU power is more important for me (crypto analysis).

Revision history for this message
In , biopsin (biopsin-linux-kernel-bugs) wrote :

[Continuing my first report in comment:https://bugzilla.kernel.org/show_bug.cgi?id=202541#c107]

$ lspci -k -nn | grep -B2 xhci
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
        Subsystem: ASMedia Technology Inc. Device [1b21:1142]
        Kernel driver in use: xhci_hcd

I have adapted the patch by Mr. Gruszka [https://bugzilla.kernel.org/show_bug.cgi?id=202541#c176] for my current system and needs

$ uname -a
Linux voidx 5.4.95_1 #1 SMP PREEMPT 1612063540 x86_64 GNU/Linux

If someone has some spare time to glance at it or comment on my error ;)
(diff availible for 30 days) @
https://p.teknik.io/lIBbA

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to alpir from comment #182)
> I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed
> due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the
> same.
[snip]
> But if you connect them to USB2, then there are no errors at all.

alpir, I think you experiencing different issue that can not be solved by simply disabling Soft Retry. Some more fixes are possibly needed for handing your xHCI/usb hardware. Maybe you can try patch from comment 139? If this is regression, maybe you can bisect to find offending commit? Anyway your problems, most likely will require expertise of Mathias Nyman - xhci driver maintainer.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

(In reply to biopsin from comment #185)
> [Continuing my first report in
> comment:https://bugzilla.kernel.org/show_bug.cgi?id=202541#c107]

Similarly like for as for alpir case this most likely will require some different fixes, but you can try if disabling Soft Retry works. You can just disable like showed in comment 147

 > $ lspci -k -nn | grep -B2 xhci
> 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series
> Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
> Subsystem: ASMedia Technology Inc. Device [1b21:1142]
> Kernel driver in use: xhci_hcd
>
[snip]
> If someone has some spare time to glance at it or comment on my error ;)
> (diff availible for 30 days) @
> https://p.teknik.io/lIBbA

ASMedia is subsystem_{vendor,device) so most likely quirk flag is not set properly for you. You can print values by patch like this to see:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 906a0e08821e..0ec9c3637b7a 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -102,6 +102,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)

        id = pci_match_id(pdev->driver->id_table, pdev);

+ printk("vendor: 0x%04x device 0x%04x subvendor 0x%04x subdevice 0x%04x\n",
+ pdev->vendor, pdev->device, pdev->subsystem_vendor, pdev->subsystem_device);
+
        if (id && id->driver_data) {
                driver_data = (struct xhci_driver_data *)id->driver_data;
                xhci->quirks |= driver_data->quirks;

If indeed those are subsystem ID's I think there is bug in existing xhci-pci.c quirks code:

        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI)
                xhci->quirks |= XHCI_BROKEN_STREAMS;
        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI)
                xhci->quirks |= XHCI_TRUST_TX_LENGTH;
        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
            (pdev->device == PCI_DEVICE_ID_ASMEDIA_1142_XHCI ||
             pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI))
                xhci->quirks |= XHCI_NO_64BIT_SUPPORT

and those check should be replaced by pdev->subsystem_vendor and pdev->subsystem_device.

Revision history for this message
In , stf_xl (stfxl-linux-kernel-bugs) wrote :

Created attachment 295065
asmedia_subsytem_quirks.patch

This patch apply existing xhci ASMedia quirks also for ASMedia subdevices .

Looking into changelog history those quirks helped with some usb disk issues, so perhaps patch could help with disk issues reported here i.e. alpir and biopsin cases. Please test.

Revision history for this message
In , jg.staffel (jg.staffel-linux-kernel-bugs) wrote :

None of the patches (comments 139, 147, 188) did not solve my problem.

Revision history for this message
Rafael Waldo (lordrafa) wrote :

I did a little bit of research on the Razer Core Chroma usb card.

This card uses 3 ASM1142 pcie to usb bridges (let's call them IC_0, IC_1, IC_2) and the PCIe 4x connector is wired in a non-standard way.

IC_0 is wired up to the PCIe in a normal fashion using only the pcie lane 0 (TX and RX) and the "usual" clock lane.

IC_1 data lanes are wired to pci lane 1 (TX and RX) but the clock lane has been wired to lane 3 RX.

IC_2 data lanes are wired to pci lane 2 (TX and RX) but the clock lane has been wired to lane 3 TX.

Furthermore IC_2 has one of it's two USB interfaces wired to an USB to Ethernet bridge (ASIX AX88179), while the other one is connected to another micro-controller on the Chroma's main board that I presume controls the LED colours, in windows this IC is detected as some sort of mouse so I would say that Razer have recycled some code to implement this...

All clock lanes merge together in another IC (IC_4) that is on the Chroma's main board.

I would say that they are doing some sort of Time-division to share the same PCIe port for three different devices and this is provoking all the mess. I have read people having trouble in every single OS (MacOS, Windows and Linux) with this device. I would expeted such junkie desing on low budged development, but never in a company as Razer..... And their response to they issue is been horrendous... something to bare in mind in future acquisitions....

Anyway it would be interesting to know if other people having trouble has more of one ASM1142 or not....

Revision history for this message
In , biopsin (biopsin-linux-kernel-bugs) wrote :

@Gruszka
Your patch [https://bugzilla.kernel.org/show_bug.cgi?id=202541#c188] makes very mutch sense, thank you.
I'm currently testing it with my setup and kernel 5.4.95_x86_64.
Tested against one PATA and one SATA drives, so far I see no ill effects, but I also can't confirm or deny it does anything with this short timespan, and much have change since my initial post last year. I will at least continuing applying it now and then out this year and report any newsworthy. Thank you for your time and help!

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :
Download full text (6.4 KiB)

Created attachment 295151
Dmesg of a Toshiba USB 3.0 HDD connected to USB 3.0 front port and back port.

I am having this error on Linux 5.10.10-051010 while trying to connect a USB 3.0 hard disk, Toshiba Touro 4TB (HitachiGST). If I connect the disk to a USB 2.0 port it works flawlessly.

The kernel shows a different kind of error depending on whether I connect the HDD to the front or back USB 3.0 ports of the motherboard MSI X470 Gaming Plus MAX.

lspci -vnnt:
> -[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) Root Complex [1022:1450]
> +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) I/O Memory Management Unit [1022:1451]
> +-01.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-01.1-[01]----00.0 Samsung Electronics Co Ltd NVMe SSD
> Controller SM981/PM981/PM983 [144d:a808]
> +-01.3-[03-26]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device
> [1022:43d0]
> | +-00.1 Advanced Micro Devices, Inc. [AMD] 400
> Series Chipset SATA Controller [1022:43c8]
> | \-00.2-[20-26]--+-00.0-[21]--
> | +-01.0-[22]----00.0 Realtek
> Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit
> Ethernet Controller [10ec:8168]
> | +-02.0-[23]--
> | +-03.0-[24]--
> | +-04.0-[25]--
> | \-08.0-[26]----00.0 ASMedia
> Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]
> +-02.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.1-[27]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df]
> | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
> +-04.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.1-[28]--+-00.0 Advanced Micro Devices, Inc. [AMD]
> Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
> | +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h
> (Models 00h-0fh) Platform Security Processor [1022:1456]
> | \-00.3 Advanced Micro Devices, Inc. [AMD] Zeppelin
> USB 3.0 Host controller [1022:145f]
> +-08.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-08.1-[29]--+-00.0 Advance...

Read more...

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :

Created attachment 295183
Dmesg of a OnePlus 7 Pro connecting in USB 3.1 gen1 mode. No errors.

(In reply to raul from comment #191)
Connecting a Oneplus 7 Pro smartphone does show any error. This phone has a USB 3.1 gen1 port and connects in that mode without errors. I can navigate the filesystem as one would expect.

Revision history for this message
Kevin Hester (kevinh) wrote :

I can confirm @lamalas "Lenovo Thunderbolt Dock Gen 2" shows these same problems with a high speed USB (Samsung SSD T7) disk on Ubuntu 20.10. No need to involve the gige port to see the problem.

Revision history for this message
Kevin Hester (kevinh) wrote :

Btw - I just checked 20.10 even using kernel 5.11.8 (latest via "mainline --install 5.11.8" and that current kernel shows the same problem and error message "ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1"

Test hardware: Lenovo Thunderbolt Dock Gen 2" shows these same problems with a high speed USB (Samsung SSD T7) disk on Ubuntu 20.10. No need to involve the gige port to see the problem.

This is probably still a bug in the kernel but I'm not sure how you want to report it upstream so it is linked here...

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
In , tisaak (tisaak-linux-kernel-bugs) wrote :

Same issue with a Seagate Portable 4 TB USB 3.0 drive that I connect with usb-storage quirks as its UAS implementation is problematic. Random hangs that flood dmesg with errors.

lsusb -tv
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 3: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        ID 0bc2:231a Seagate RSS LLC Expansion Portable

Errors in dmesg start like this...

xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
usb 3-3: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
sd 5:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
sd 5:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 a4 01 ed 78 00 00 00 10 00 00

After that:

task:usb-storage state:D stack: 0 pid: 286 ppid: 2 flags:0x00004000
Call Trace:
  __schedule+0x282/0x870
  ? usleep_range+0x80/0x80
  schedule+0x46/0xb0
  schedule_timeout+0xff/0x140
  ? __prepare_to_swait+0x4b/0x70
  __wait_for_common+0xae/0x160
  usb_sg_wait+0xe0/0x1a0 [usbcore]
  usb_stor_bulk_transfer_sglist.part.0+0x64/0xb0 [usb_storage]
  usb_stor_Bulk_transport+0x188/0x410 [usb_storage]
  usb_stor_invoke_transport+0x3a/0x520 [usb_storage]
  ? __prepare_to_swait+0x4b/0x70
  ? __wait_for_common+0xed/0x160
  usb_stor_control_thread+0x185/0x280 [usb_storage]
  ? storage_probe+0x2a0/0x2a0 [usb_storage]
  kthread+0x11b/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30

Revision history for this message
In , mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote :

(In reply to Zak from comment #193)
>
>
> Errors in dmesg start like this...
>
> xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
> xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.

There are recent major changes in this area in the xhci driver.
The above message no longer exists, new message in this case is
"Set TR Deq already pending, don't submit for x"

Can you try this on a 5.12-rc kernel?

Thanks
Mathias

Revision history for this message
In , mlkcampion (mlkcampion-linux-kernel-bugs) wrote :

Created attachment 296259
xhci no soft retry for Intel xhci 8086:06ed and 8086:31a8

Hi

I am having this issue on 2 systems when I plug in
a Hoco Hub HB16. The Hoco Hub HB16 is a 6 in 1 adapter that
includes
Type-C to USB3.0 x3
Type-C to HDMI
Type-C to RJ45 Ethernet (RealTek RTL8153, linux loads driver rtl8153b-2)
Type-C to Type-C(PD2.0)
USB Billboard device

Also when the device is plugged into a Windows10 machine
for the first time it presents a disk that contains the RTL8153
drivers, the user is provided with an option to install these. This
"disk" is not visible later.

The 2 systems where this device failed both reported
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
Both systems have Ubuntu Mate 20.10

$ uname -a
5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

1. Dell XPS 9500 (Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz)
$ sudo lspci -k -nn | grep -B2 xhci
    00:14.0 USB controller [0c03]: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller [8086:06ed]
 Subsystem: Dell Comet Lake USB 3.1 xHCI Host Controller [1028:097d]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci
--
    7:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
 Subsystem: Dell JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [1028:097d]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci

2. Seed Studio Odyssey J4105 (Intel(R) Celeron(R) J4105 CPU @ 1.50GHz)
$ sudo lspci -k -nn | grep -B3 xhci
    00:15.0 USB controller [0c03]: Intel Corporation Device [8086:31a8] (rev 03)
 DeviceName: Onboard - Other
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci

I applied the changes in Stanislaw's patch at comment 176, I added the
PCI IDs to match both my systems.

I can confirm that with the patch applied both systems no longer reported the
issue ""WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."

Just to note that on the Dell XPS I use the Dell DA20 Adapter which is a Type-C
to USB and HDMI adapter. This appears to have an ASIX Elec. Corp. AX88179
USB 3.0 to Gigabit Ethernet which I don't have any issues with.

Revision history for this message
In , luke-jr+linuxbugs (luke-jr+linuxbugs-linux-kernel-bugs) wrote :

Encountered this with a PCI-e card using ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

Moved to my native "Intel Corporation Device a3af" USB bus, this error disappeared (though other problems remain in my case)

Linux 5.10.33

Of potential noteworthiness: When I got my Talos II, I tried to move this ASMedia USB PCI-e card to it, and found it was immediately shutdown by the IOMMU whenever I would try to use it at all. It seems the firmware is garbage.

IIRC, someone was getting close to an open source firmware replacement without those issues... would be interesting to see if it helps with this bug as well.

Revision history for this message
In , dront78 (dront78-linux-kernel-bugs) wrote :
Download full text (16.3 KiB)

same problem
5.12.12-arch1-1 #1 SMP PREEMPT Fri, 18 Jun 2021 21:59:22 +0000 x86_64 GNU/Linux

GPD Pocket

00:00.0 Host bridge [0600]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series SoC Transaction Register [8086:2280] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: iosf_mbi_pci
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller [8086:22b0] (rev 34)
 DeviceName: Onboard IGD
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: i915
 Kernel modules: i915
00:0b.0 Signal processing controller [1180]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series Power Management Controller [8086:22dc] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: proc_thermal
 Kernel modules: processor_thermal_device
00:14.0 USB controller [0c03]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series USB xHCI Controller [8086:22b5] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel driver in use: xhci_hcd
 Kernel modules: xhci_pci
00:1a.0 Encryption controller [1080]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series Trusted Execution Engine [8086:2298] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel modules: mei_txe
00:1c.0 PCI bridge [0604]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series PCI Express Port #1 [8086:22c8] (rev 34)
 Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Series PCU [8086:229c] (rev 34)
 Subsystem: Intel Corporation Device [8086:7270]
 Kernel modules: lpc_ich
01:00.0 Network controller [0280]: Broadcom Inc. and subsidiaries BCM4356 802.11ac Wireless Network Adapter [14e4:43ec] (rev 02)
 Subsystem: Gemtek Technology Co., Ltd Device [17f9:0036]
 Kernel driver in use: brcmfmac
 Kernel modules: brcmfmac

# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x5B8DE000.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
 Vendor: American Megatrends Inc.
 Version: 5.11
 Release Date: 06/28/2017
 Address: 0xF0000
 Runtime Size: 64 kB
 ROM Size: 4 MB
 Characteristics:
  PCI is supported
  BIOS is upgradeable
  BIOS shadowing is allowed
  Boot from CD is supported
  Selectable boot is supported
  BIOS ROM is socketed
  EDD is supported
  5.25"/1.2 MB floppy services are supported (int 13h)
  3.5"/720 kB floppy services are supported (int 13h)
  3.5"/2.88 MB floppy services are supported (int 13h)
  Print screen service is supported (int 5h)
  Serial services are supported (int 14h)
  Printer services are supported (int 17h)
  ACPI is supported
  USB legacy is supported
  BIOS boot specification is supported
  Targeted content distribution is supported
  UEFI is supported
 BIOS Revision: 5.11

Handle 0x0001, DMI type 1, 27 bytes
System Information
 Manufacturer: Default string
 Product Name: Default string
 Version: Default string
 Serial Number: Default string
 UUID: 03000200-0400-0500-0006-000700080009
 Wake-up ...

Changed in linux (Debian):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Changed in linux (Ubuntu):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Changed in linux (Ubuntu Trusty):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Changed in linux (Ubuntu Focal):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Changed in linux (Ubuntu Xenial):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Revision history for this message
In , antdev66 (antdev66-linux-kernel-bugs) wrote :

I have same problem with kernels 5.13.12 and 5.14.0-rc7:

dmesg:
xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

journalctl:
ago 24 18:38:40 SERVER kernel: sd 4:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s

Revision history for this message
In , stulluk (stulluk-linux-kernel-bugs) wrote :

I also experience exactly same issue on multiple USB devices ( USB-WIFI or a USB-Webcam ) only on my brand new AMD Mainboard ( ASRock model: B550M-HDV)

I tried both ubuntu focal and hirsute with latest kernels on my OldPC (ASUSTeK model: M5A78L-M LX3) and on my IntelNUC (NUC8BEB) and this issue does not happen (Tried with same USB-WIFI and USB-Webcam devices).

Issue is easily reproducible by inserting USB-WIFI and then executing "ip a" on a shell.

Revision history for this message
In , dion (dion-linux-kernel-bugs) wrote :
Download full text (3.6 KiB)

I also have exactly same problem, but with a bit different HW.

Now it's USB DAC branded as "Qudelix-5K". As far as I understand it's USB1 device.

[ 174.358189] usb 5-2.3.2.2.1.1: new full-speed USB device number 17 using xhci_hcd
[ 174.475229] usb 5-2.3.2.2.1.1: New USB device found, idVendor=0a12, idProduct=4025, bcdDevice=19.70
[ 174.475232] usb 5-2.3.2.2.1.1: New USB device strings: Mfr=1, Product=8, SerialNumber=3
[ 174.475233] usb 5-2.3.2.2.1.1: Product: Qudelix-5K USB DAC/MIC 48KHz
[ 174.475234] usb 5-2.3.2.2.1.1: Manufacturer: QTIL
[ 174.475235] usb 5-2.3.2.2.1.1: SerialNumber: ABCDEF0123456789

It produces corrupted sound (actually some noise) just after a few seconds of playback if connected to Dell WD19TB thunderbolt dock station. Issue happens with USB-A ports on dock plus one Type-C port (front). Second Type-C port (named as "Type-C with Thunderbolt 3 port" works.

When such noise happens I'm getting followed in dmesg:

xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe940f0 trb-start 00000000ffe94100 trb-end 00000000ffe94100 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0
xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe949b0 trb-start 00000000ffe949c0 trb-end 00000000ffe949c0 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0

I've tried to add/remove extra USB hubs (originally Qudelix was plugged to internal USB3 hub of monitor). But even if plugged directly to dock, it produces corrupted sound.

Another important thing: this dock has built-in Ethernet with r8153 chipset like mentioned above.

After reading comments here I've tried to disable soft retry using followed patch:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 1c9a7957c45c..07cbcf50160c 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -189,10 +189,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)

        if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
                xhci->quirks |= XHCI_LPM_SUPPORT;
                xhci->quirks |= XHCI_INTEL_HOST;
                xhci->quirks |= XHCI_AVOID_BEI;
+ xhci->quirks |= XHCI_NO_SOFT_RETRY;
        }
        if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
                        pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI) {
                xhci->quirks |= XHCI_EP_LIMIT_QUIRK;
                xhci->limit_active_eps = 64;

And it completely fixed issue for me. DAC produces clear sound even if connected through chain of two hubs!

PS.
lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation Comet Lake PCH-LP USB 3.1 xHCI Host Controller [8086:02ed]
        Subsystem: Hewlett-Packard Company Comet Lake PCH-LP USB 3.1 xHCI Host Controller [103c:8724]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
--
37:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
        Subsystem: Hewlett-P...

Read more...

Revision history for this message
In , raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote :

Turns out the problem was the cable, it was too long. A shorter USB 3.0 cable (1.8m) allowed a stable connection. On the same Linux 5.13 (the previous dmesg was on Linux 5.10) the longer 3 meters cable kept failing while with the 1.8 meters cable the HDD works without issue.

(In reply to raul from comment #191)

Revision history for this message
In , S.Braendlin (s.braendlin-linux-kernel-bugs) wrote :

Hi,
I have also issues with USB3 on my Debian 10 with kernel 5.10.0-0.bpo.5-amd64 which is not appearing when using USB2 port:

Aug 6 13:20:14 media-server kernel: [ 964.069355] scsi host17: uas_eh_device_reset_handler start
Aug 6 13:20:14 media-server kernel: [ 964.197532] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Aug 6 13:20:14 media-server kernel: [ 964.219053] scsi host17: uas_eh_device_reset_handler success
Aug 6 13:20:18 media-server kernel: [ 968.137601] task:sync state:D stack: 0 pid:12237 ppid: 11291 flags:0x00004324
Aug 6 13:20:18 media-server kernel: [ 968.137607] Call Trace:
Aug 6 13:20:18 media-server kernel: [ 968.137621] __schedule+0x2be/0x770
Aug 6 13:20:18 media-server kernel: [ 968.137630] schedule+0x3c/0xa0
Aug 6 13:20:18 media-server kernel: [ 968.137635] io_schedule+0x12/0x40
Aug 6 13:20:18 media-server kernel: [ 968.137644] wait_on_page_bit+0x127/0x230
Aug 6 13:20:18 media-server kernel: [ 968.137651] ? __page_cache_alloc+0x80/0x80
Aug 6 13:20:18 media-server kernel: [ 968.137657] wait_on_page_writeback+0x25/0x70
Aug 6 13:20:18 media-server kernel: [ 968.137663] __filemap_fdatawait_range+0x89/0xf0
Aug 6 13:20:18 media-server kernel: [ 968.137673] ? sync_inodes_one_sb+0x20/0x20
Aug 6 13:20:18 media-server kernel: [ 968.137679] filemap_fdatawait_keep_errors+0x1a/0x40
Aug 6 13:20:18 media-server kernel: [ 968.137684] iterate_bdevs+0xad/0x150
Aug 6 13:20:18 media-server kernel: [ 968.137691] ksys_sync+0x7c/0xb0
Aug 6 13:20:18 media-server kernel: [ 968.137697] __do_sys_sync+0xa/0x10
Aug 6 13:20:18 media-server kernel: [ 968.137704] do_syscall_64+0x33/0x80
Aug 6 13:20:18 media-server kernel: [ 968.137709] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 6 13:20:18 media-server kernel: [ 968.137714] RIP: 0033:0x7fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137717] RSP: 002b:00007ffcddf49048 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
Aug 6 13:20:18 media-server kernel: [ 968.137723] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137725] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000a8002000
Aug 6 13:20:18 media-server kernel: [ 968.137728] RBP: 0000000000000000 R08: 0000555ba9703dcf R09: 00007ffcddf4afe2
Aug 6 13:20:18 media-server kernel: [ 968.137730] R10: 00007fc4ec01a201 R11: 0000000000000246 R12: 0000000000000001
Aug 6 13:20:18 media-server kernel: [ 968.137733] R13: 0000000000000001 R14: 00007ffcddf49158 R15: 0000000000000000

Revision history for this message
In , pupilla (pupilla-linux-kernel-bugs) wrote :
Download full text (45.7 KiB)

Hello everyone,

I encountered the problem with kernel 6.0.0-rc3 on a lenovo t470 laptop and a usb3 axis card. The system was started with the parameter intel_idle.max_cstate=1 and this appears to affect the possibility of the bug appearing. I have now rebooted the system without this parameter.

I have another similar setup (same laptop and same usb3 network card, but with linux 6.0.0-rc2) that has been active for 8 days started without the parameter intel_idle.max_cstate=1 and the problem has not occurred to date.

The distribution is Slackware 15 (64 bit).

This is the full output of dmesg.

Any feedback is welcome.

Marco

[ 0.000000] Linux version 6.0.0-rc3 (root@Cherepakha) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP PREEMPT_DYNAMIC Tue Aug 30 16:07:18 CEST 2022
[ 0.000000] Command line: auto BOOT_IMAGE=Linux ro root=10303 intel_idle.max_cstate=1
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
[ 0.000000] signal: max sigframe size: 1616
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000040000000-0x00000000403fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000040400000-0x000000008b79bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000008b79c000-0x0000000090652fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000090653000-0x0000000090653fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000090654000-0x000000009b52cfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000009b52d000-0x000000009b599fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000009b59a000-0x000000009b5fefff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000009b5ff000-0x000000009f7fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f3ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed84000-0x00000000fed84fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00...

Revision history for this message
In , pupilla (pupilla-linux-kernel-bugs) wrote :

Hello everyone,

unfortunately it happened again (system started without parameters):

[ 9.561808] br0: port 2(eth1) entered forwarding state
[95735.974041] usb 2-1: USB disconnect, device number 2
[95735.974215] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974439] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974471] ax88179_178a 2-1:1.0 eth1: unregister 'ax88179_178a' usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet
[95735.974523] ax88179_178a 2-1:1.0 eth1: Failed to read reg index 0x0002: -19
[95735.974532] ax88179_178a 2-1:1.0 eth1: Failed to write reg index 0x0002: -19
[95735.974595] br0: port 2(eth1) entered disabled state
[95735.974783] device eth1 left promiscuous mode
[95735.974790] br0: port 2(eth1) entered disabled state
[95735.992489] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95735.992503] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0001: -19
[95735.992510] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95736.215301] usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd
[95736.566562] ax88179_178a 2-1:1.0 eth1: register 'ax88179_178a' at usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet, 00:0e:c6:81:79:01

Marco

Revision history for this message
In , ske5074 (ske5074-linux-kernel-bugs) wrote :
Download full text (9.6 KiB)

I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet adapters. Each one is stable in a lower USB slot. Swapping the adapters does not change the behavior and only impacts the USB device in the higher slot. Changes to different ports without change.

Easily reproducible with the following commands. Basically I'm trying to plumb bond0 again, which works initially, I get the xhci_hcd warning, and the link is down again. System details are also below.

root@higgins:~# dmesg -C ; ifup -a ; ip link | grep enx ; \
> dmesg -H ; dmesg -C ; sleep 70 ; \
> ip link | grep enx ; dmesg -H
3: enxd03745be5afc: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000
16: enx54af9786ab11: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000

[Sep 3 11:05] device enx54af9786ab11 entered promiscuous mode
[ +0.001236] bond0: (slave enx54af9786ab11): Enslaving as a backup interface with a down link
[ +0.006363] vmbr0: the hash_elasticity option has been deprecated and is always 16
[ +0.013972] r8152 2-4:1.0 enx54af9786ab11: Promiscuous mode enabled
[ +0.001344] r8152 2-4:1.0 enx54af9786ab11: carrier on

3: enxd03745be5afc: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000
17: enx54af9786ab11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

[Sep 3 11:05] bond0: (slave enx54af9786ab11): link status definitely up, 1000 Mbps full duplex
[Sep 3 11:06] usb 2-4: USB disconnect, device number 12
[ +0.001544] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ +0.001435] bond0: (slave enx54af9786ab11): Releasing backup interface
[ +0.029081] device enx54af9786ab11 left promiscuous mode
[ +0.316190] usb 2-4: new SuperSpeed USB device number 13 using xhci_hcd
[ +0.022053] usb 2-4: New USB device found, idVendor=2357, idProduct=0601, bcdDevice=30.00
[ +0.001297] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[ +0.001337] usb 2-4: Product: USB 10/100/1000 LAN
[ +0.001261] usb 2-4: Manufacturer: TP-Link
[ +0.001208] usb 2-4: SerialNumber: 000001
[ +0.137200] usb 2-4: reset SuperSpeed USB device number 13 using xhci_hcd
[ +0.049197] r8152 2-4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[ +0.030905] r8152 2-4:1.0 eth0: v1.12.12
[ +0.007834] r8152 2-4:1.0 enx54af9786ab11: renamed from eth0
root@higgins:~#

-------
System Details
-------

root@higgins:~# uname -a
Linux higgins 5.15.39-4-pve #1 SMP PVE 5.15.39-4 (Mon, 08 Aug 2022 15:11:15 +0200) x86_64 GNU/Linux

root@higgins:~# lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
        Subsystem: Lenovo 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [17aa:310b]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci

root@higgins:~# lsusb -tv
/: Bus 02.Port 1: D...

Read more...

Revision history for this message
In , ske5074 (ske5074-linux-kernel-bugs) wrote :

(In reply to Sean Kennedy from comment #205)
> I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo
> M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet
> adapters. Each one is stable in a lower USB slot. Swapping the adapters does
> not change the behavior and only impacts the USB device in the higher slot.
> Changes to different ports without change.

Update - Tried a different dongle - a 2.5Gbe and have two hard drives attached to the system. Doesn't matter where the 2.5Gbe dongle is attached, it eventually errors with "WARN Set TR Deq Ptr cmd failed" And the error rate is only around six times a day right now:

8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G LAN

# dmesg -T | grep xhci
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810
[Tue Sep 6 13:37:13 2022] usb usb1: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
[Tue Sep 6 13:37:13 2022] usb usb2: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-3: new SuperSpeed USB device number 3 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-4: new SuperSpeed USB device number 4 using xhci_hcd
[Tue Sep 6 14:39:22 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 14:39:22 2022] usb 2-4: new SuperSpeed USB device number 5 using xhci_hcd
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:02 2022] usb 2-4: new SuperSpeed USB device number 6 using xhci_hcd
[Tue Sep 6 22:19:06 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 22:19:07 2022] usb 2-4: new SuperSpeed USB device number 7 using xhci_hcd

Since this drops the device from the system and offlines the link, I created a simple script to detect zero UP ethernet devices via cron once a minute and runs a ifnet -a. It's clunky but works.

crontab:
# m h dom mon dow command
* * * * * /root/fixnet.sh >/dev/null 2>&1

fixnet.sh:
#!/bin/sh

STATE=`ip link | grep " enx" | grep UP | wc -l`
if [ $STATE -gt 0 ]; then
  # All good. Exit
  exit 0
fi

/usr/sbin/ifup -a
sleep 20

ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -eq 0 ]; then
  # Network looks good. Exit.
  exit 0
fi

sleep 310
ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -ne 0 ]; then
  # The network is still down.
  systemctl reboot
fi

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

I'm using a 2.5gb ethernet usb device and getting this error intermittently (a dozen times per day).

$ uname -a
Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ lsusb
<snip>
Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G

This is what plays out via /var/log/syslog each time:

Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect, device number 15
Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1: unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1: Network is down
Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed Gen 1 USB device number 16 using xhci_hcd
Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB 10/100/1G/2.5G LAN
Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer: Realtek
Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber: 001000001
Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0: MAC-Address: xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting rx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting tx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1: register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus systemd-udevd[325501]: Using default interface naming scheme 'v245'.
Dec 21 10:26:47 hephaestus systemd-udevd[325501]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_100_1G_2.5G_LAN.
(then things start back up and the ethernet link goes live again after about 10 seconds)

Revision history for this message
In , james (james-linux-kernel-bugs) wrote :

FYI: I have built a kernel with the previously (on this thread) discussed patch (on a 5.4 kernel) and I still have the error multiple times per day.

(In reply to James H from comment #207)
> I'm using a 2.5gb ethernet usb device and getting this error intermittently
> (a dozen times per day).
>
> $ uname -a
> Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC
> 2022 x86_64 x86_64 x86_64 GNU/Linux
>
>
> $ lsusb
> <snip>
> Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB
> 10/100/1G/2.5G
>
>
>
> This is what plays out via /var/log/syslog each time:
>
> Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect,
> device number 15
> Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1:
> unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
> Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1:
> Network is down
> Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
> Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed
> Gen 1 USB device number 16 using xhci_hcd
> Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device
> found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
> Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device
> strings: Mfr=1, Product=2, SerialNumber=6
> Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB
> 10/100/1G/2.5G LAN
> Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer:
> Realtek
> Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber:
> 001000001
> Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0:
> MAC-Address: xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting
> rx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting
> tx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1:
> register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus systemd-udevd[325501]: Using default interface
> naming scheme 'v245'.
> Dec 21 10:26:47 hephaestus systemd-udevd[325501]: ethtool: autonegotiation
> is unset or enabled, the speed and duplex are not writable.
> Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_100_1G_2.5G_LAN.
> (then things start back up and the ethernet link goes live again after about
> 10 seconds)

Revision history for this message
In , svmohr (svmohr-linux-kernel-bugs) wrote :
Download full text (4.2 KiB)

I also get random disconnects on kernel 6.3.0-7-generic with a Samsung T7 Shield external SSD drive. Unfortunately it is hard to reproduce this error, it usually takes hours before it occurs the first time.

System:
  Kernel: 6.3.0-7-generic arch: x86_64 bits: 64 compiler: N/A Console: pty pts/10 Distro: Ubuntu
    23.10 (Mantic Minotaur)
Machine:
  Type: Server System: Supermicro product: C9Z390-PGW v: 0123456789 serial: <filter>
  Mobo: Supermicro model: C9Z390-PGW v: 1.01A serial: <filter> UEFI: American Megatrends v: 1.3
    date: 06/03/2020
CPU:
  Info: 8-core model: Intel Core i9-9900K bits: 64 type: MT MCP arch: Coffee Lake rev: D cache:
    L1: 512 KiB L2: 2 MiB L3: 16 MiB
  Speed (MHz): avg: 3687 high: 5002 min/max: 800/5000 cores: 1: 5002 2: 3600 3: 3600 4: 3600
    5: 3600 6: 3600 7: 3600 8: 3600 9: 3600 10: 3600 11: 3600 12: 3600 13: 3600 14: 3600 15: 3600
    16: 3600 bogomips: 115200
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 10000M
        ID 04e8:61fb Samsung Electronics Co., Ltd

BOOT_IMAGE=/boot/vmlinuz-6.3.0-7-generic root=UUID=2c8c7990-bb1d-47dc-a70c-0272867b1807 ro quiet splash intel_iommu=on iommu=pt pcie_aspm=off initcall_blacklist=sysfb_init rd.modules-load=vf
io-pci vfio_pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7,1462:3710 vt.handoff=7

[349280.239403] usb 2-4: USB disconnect, device number 9
[349280.239689] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[349280.239695] usb 2-4: cmd cmplt err -108
[349280.239702] sd 9:0:0:0: [sdh] tag#13 uas_zap_pending 0 uas-tag 1 inflight: CMD
[349280.239705] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239724] sd 9:0:0:0: [sdh] tag#13 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[349280.239726] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239728] I/O error, dev sdh, sector 3542672384 op 0x1:(WRITE) flags 0x8800 phys_seg 27 prio class 2
[349280.239741] device offline error, dev sdh, sector 3542674432 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239747] device offline error, dev sdh, sector 3542672640 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 2
[349280.239750] device offline error, dev sdh, sector 3542677504 op 0x1:(WRITE) flags 0x8800 phys_seg 45 prio class 2
[349280.239753] device offline error, dev sdh, sector 3542680576 op 0x1:(WRITE) flags 0x8800 phys_seg 41 prio class 2
[349280.239788] device offline error, dev sdh, sector 3542663168 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239793] device offline error, dev sdh, sector 3542663680 op 0x1:(WRITE) flags 0x8800 phys_seg 29 prio class 2
[349280.239799] device offline error, dev sdh, sector 3542663936 op 0x1:(WRITE) flags 0x8800 phys_seg 26 prio class 2
[349280.299534] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[349280.523475] sd 9:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVE...

Read more...

Revision history for this message
imperia (imperia777) wrote :

Since a years ago I know that there is a firmware fix for this issue, but it was not public.
I know firmwares are leaked in station-drivers and finally I decided to give it a try following the excellent guide by this guy:
https://forum-en.msi.com/index.php?threads/asmedia-usb-3-1-controller-firmware-update-for-ge62-72-xxx.380024/
Instead of editing the INI I think you can modify the svid and ssid in the update tool directly by unlocking it with the password - asmedia.
First make a backup with the DOS tool. And gather information about your device like svid and ssid before flashing.
Since I updated I haven't got TRB errors.
Good luck.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.