xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

Bug #1749961 reported by Guilherme G. Piccoli on 2018-02-16
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Debian)
Confirmed
Undecided
Guilherme G. Piccoli
linux (Ubuntu)
Medium
Guilherme G. Piccoli
Trusty
Medium
Guilherme G. Piccoli
Xenial
Medium
Guilherme G. Piccoli
Artful
Medium
Guilherme G. Piccoli
Bionic
Medium
Guilherme G. Piccoli

Bug Description

It was observed that while trying to use a 4K USB webcam connected to USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam does not work and kernel log shows the following messages:

[431.928016] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928021] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e020 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928024] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928026] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e030 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928027] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928029] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e050 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928386] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13

A similar issue was already reported on Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750

The fix to this issue seems to be the following patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9da5a109

Tests in our scenario with this patch proved still broken. Our next approach is to modify the patch a bit and re-test.

This LP will be used to document our progress in the investigation.

no longer affects: linux-meta (Ubuntu)
description: updated

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1749961

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Medium
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu Artful):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → In Progress
Guilherme G. Piccoli (gpiccoli) wrote :

Patch was modified (by adding the PCI_ID of device 1142A, which confusingly is 1242!) and still the problem reproduces.

New approaches to be tried soon.

tags: added: kernel-da-key
imperia (imperia777) wrote :

Hello,
Looks like I am having the same problem.

After some hours(random time) my USB 3.1 asmedia controller crashes the driver with following error:
[ 873.661534] xhci_hcd 0000:00:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 3
[ 873.661629] xhci_hcd 0000:00:00.0: Looking for event-dma 00000002722ed630 trb-start 00000002722ed9b0 trb-end 00000002722ed9d0 seg-start 00000002722ed000 seg-end 00000002722edff0
[ 875.673409] xhci_hcd 0000:00:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
I am struggling with this error for more than year. Its very annoying to have to restart the PC every few hours. USB tuner card is connected to the port.
I would like to provide whatever information and support is necessary to fix this damn bug. Logs, ssh access to the affected box and everything else what is needed.

Please ask me here or write to my e mail imperia777_yahoo.com
Thanks.

Guilherme G. Piccoli (gpiccoli) wrote :

Nice imperia, thanks for the report here. First we need to be sure it's exactly the same adapter.
Can you provide the the output of "lspci -nn" ?

Then, if it's the same adapter:

0) Which Ubuntu version are you running? Which kernel version are you using? Can you try in the latest 4.13 for Xenial? (or even better, the hwe-edge 4.15)
Instructions to run the latest 4.15 version: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable?field.series_filter=xenial

1) You said "after some hours" - can you provide some details? You've been using the USB tuner for like 2 hours? 12 hours? The tuner is in constant use and suddenly the issue happens?

2) If possible, enable xhci dynamic debug and provide logs after the issue; in order to do this, run the following command as root:
echo "module xhci_hcd +flpt" > /sys/kernel/debug/dynamic_debug/control

After issue reproduces, collect the /var/log/kern.log file.

Thanks,

Guilherme

imperia (imperia777) wrote :

Hello,

00:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]

Actually I am on debian buster. I am running kernel 4.16-rc6 from experimental repository.

I am running program for watching satellite channels called vdr.
When I am not watching TV, while idle, every few minutes vdr scans for channel list updates from satellites. It is safe to say that tuner is occupied every few minutes for a scan, but not occupied with bandwidth like when watching TV. While in this mode vdr is able to crash the driver in ~6-30 hours.

There is program that you use to initially create your channels list for vdr. When I use it I am able to crash the driver in ~1-2 hours.

But when I just watch one channel and don't change it for hours, driver is least likely to crash.

I think something in consecutive opening (initializing) of the usb port/driver forces this error.
Because the program that scans for channels crash it much faster.
This program work like this:

:go
open port
scan some frequency
write to file new channels
close port
goto go

I made this script that I will use to capture the log.

echo "module xhci_hcd +flpt" > /sys/kernel/debug/dynamic_debug/control
(tail -F -n0 /var/log/kern.log &) | grep -q "TRB DMA"
cp /var/log/kern.log /home/imperia/log1.log

And I will run initial channels list scan to force it faster.

I will be back later with the logs.
Thanks for your help.

imperia (imperia777) wrote :
Download full text (8.6 KiB)

Mar 29 20:20:03 vdr kernel: [119370.230528] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c590 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230533] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230537] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:20:03 vdr kernel: [119370.230542] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue segment = 00000000573583cc (virtual)
Mar 29 20:20:03 vdr kernel: [119370.230547] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue pointer = 0x2ae36c5a0 (DMA)
Mar 29 20:20:03 vdr kernel: [119370.230553] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Set TR Deq Ptr cmd, new deq seg = 00000000573583cc (0x2ae36c000 dma), new deq ptr = 0000000041e92668 (0x2ae36c5a0 dma), new cycle = 0
Mar 29 20:20:03 vdr kernel: [119370.230558] <intr> xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230631] [27868] xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cancel URB 0000000060641c50, dev 2, ep 0x82, starting at offset 0x2ae36c5a0
Mar 29 20:20:03 vdr kernel: [119370.230638] [27868] xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230650] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Successful Set TR Deq Ptr cmd, deq = @2ae36c5a0
Mar 29 20:20:03 vdr kernel: [119370.230700] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c5a0 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230705] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230710] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:20:03 vdr kernel: [119370.230715] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue segment = 00000000573583cc (virtual)
Mar 29 20:20:03 vdr kernel: [119370.230719] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue pointer = 0x2ae36c5b0 (DMA)
Mar 29 20:20:03 vdr kernel: [119370.230725] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Set TR Deq Ptr cmd, new deq seg = 00000000573583cc (0x2ae36c000 dma), new deq ptr = 0000000050070757 (0x2ae36c5b0 dma), new cycle = 0
Mar 29 20:20:03 vdr kernel: [119370.230730] <intr> xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230798] [27868] xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cancel URB 00000000588cca08, dev 2, ep 0x82, starting at offset 0x2ae36c5b0
Mar 29 20:20:03 vdr kernel: [119370.230805] [27868] xhci_ring_cmd_db:282: xhci_hcd 0000:00:00.0: // Ding dong!
Mar 29 20:20:03 vdr kernel: [119370.230816] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Successful Set TR Deq Ptr cmd, deq = @2ae36c5b0
Mar 29 20:20:03 vdr kernel: [119370.230865] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c5b0 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230870] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230874] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:...

Read more...

Guilherme G. Piccoli (gpiccoli) wrote :

Thanks a lot Imperia! It's indeed the same PCI adapter, and it's even better you're running an upstream kernel like this.

I'll analyze your logs in order to match with the ones I have here.
I might need some xhci traces to understand the TRBs operations (like the enqueue and completion of TRBs). I'll comment here in case I need it.

Cheers,

Guilherme

imperia (imperia777) wrote :

echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
(tail -F -n0 /var/log/kern.log &) | grep -q "TRB DMA"
cp /var/log/kern.log /home/imperia/log1.log

Is this correct command to get traces?
I will run it in advance.

Somebody told me to run this before when I was looking for help.

BTW did you download the full logs so I can remove it from web page?

I will can provide ssh access to box affected if needed.

Guilherme G. Piccoli (gpiccoli) wrote :

Wow Imperia, you're being really helpful here, thank you very much!

To enable traces, these are the instructions I've provided to other people affected so far:

0) Reboot the machine in order to put it in a consistent state;
1) echo "module xhci_hcd +flpt" > /sys/kernel/debug/dynamic_debug/control
2) echo nop > /sys/kernel/debug/tracing/current_tracer
3) echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
4) echo 0 > /sys/kernel/debug/tracing/trace
5) echo 1 > /sys/kernel/debug/tracing/tracing_on
6) echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

After reproduce the issue, you should collect /sys/kernel/debug/tracing/trace. Problem is that the file might be huge, much larger than the kernel log you provided for instance.

About the SSH access, I'm interested in getting it next week, if it doesn't annoy you too much. It'll be really helpful, but I might need to reboot the machine.

Oh, I've downloaded the logs from your website, so you can delete it now.
Cheers,

Guilherme

imperia (imperia777) wrote :

Hello,

I think I am ready with the trace log. Hopefully it is full, because machine run out of disk space :)
http://imperia.mine.nu/trace1.log.bz2
Interesting is that it took ~12 hours to crash it this time.

The problem with ssh access is that this is virtual machine under XEN and when you reboot it, the USB controller is gone(not assigned to virtual machine anymore). I have to re-assign the USB controller for passthrough from xen host. (this is xen bug I think, it wasn't like this before).

This is what I do when I have to restart vdr virtual machine:
xl pci-assignable-remove 03:00.0
xl pci-assignable-add 03:00.0
xl create /etc/xen/vdr.cfg

Anyway we can get in touch on irc and I can do restarts for you.

BTW, I shutdown the whole xen server. Then I turn off the power button on PSU and pressed the power button on the case to discharge any electricity left and put it in consistent state before getting the trace logs.

Thanks again Imperia, the traces are fine. They're only 25MB, shouldn't have caused any kind of disk issues, like out of space condition. Also, I'd like to see the correlated kernel log to match the problematic TRBs from the kernel log with trace information. Can you provide me the relevant kern.log file?

I've already downloaded the traces from your server, in case you want to remove the file.

About the SSH, thanks for the offering and let's talk on IRC in case I need it. I'll try the logs first, not sure they're enough for me to understand the issue completely.

Cheers,

Guilherme

Hi Imperia, I built a mainline kernel (version 4.16) with a different quirk that I think might help here. Can you test it? Thanks in advance!

Instructions (run all as root):

1) wget people.canonical.com/~gpiccoli/imperia416.tgz
2) mv imperia416.tgz /
3) tar -zxf imperia416.tgz
4) update-initramfs -c -k 4.16.0-imperia+

Now that's important: if you have access to some serial console in the machine (or if you have physical access), you can reboot into this new kernel. In case _you only have ssh_, I'd suggest to remove the kernel boot entry from grub, and boot through kexec for safety reasons:

a) Remove boot entries from grub.cfg (you can copy away vmlinuz-4.16-imperia+ to some place outside /boot and run "update-grub" for this)
b) apt-get install kexec-tools
c) kexec vmlinuz-4.16-imperia+ --initrd initrd.img-4.16-imperia+ --append="$(cat /proc/cmdline)"
----

After machine (hopefully!) boot to the new kernel, check in dmesg if the quirk is there:
#$ dmesg|grep QUIRK
[0.813486] QUIRK: XHCI_AVOID_BEI

If you can see that output ("QUIRK: XHCI_AVOID_BEI"), then the quirk was applied.
Now, just need to try to reproduce the issue again.

Thanks a lot,

Guilherme

imperia (imperia777) wrote :

Hello,

I am unable to test with the kernel you provided, because my tuner card doesn't have driver in mainline kernel tree. So I have to compile it myself and I need kernel headers for this.

So I compiled kernel 4.16 from debian linux-source-4.16 package and applied the patch you provided:

From dd0375ffba55172194999d40b35344e9dc2682df Mon Sep 17 00:00:00 2001
From: "Guilherme G. Piccoli" <email address hidden>
Date: Wed, 11 Apr 2018 11:04:13 +0000
Subject: [PATCH] xhci: Add quirk to ASMedia 0x1242 adapter to avoid BEI

Signed-off-by: Guilherme G. Piccoli <email address hidden>
---
 drivers/usb/host/xhci-pci.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index d9f831b..0654461 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -213,6 +213,12 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
                xhci->quirks |= XHCI_TRUST_TX_LENGTH;

        if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
+ pdev->device == 0x1242) {
+ xhci->quirks |= XHCI_AVOID_BEI;
+ pr_warn("QUIRK: XHCI_AVOID_BEI");
+ }
+
+ if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
                pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI)
                xhci->quirks |= XHCI_ASMEDIA_MODIFY_FLOWCONTROL;

--
2.7.4

Compiled my tuner card driver now and I am testing.

This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Artful):
status: In Progress → Won't Fix
imperia (imperia777) wrote :

this is dmidecode output of my machine, in case the fix is FW related, it may be useful in order to contact the motherboard vendor

Roy Thompson (royt77) wrote :

I am running into this same issue with an ASMedia 2142 USB board. Was a fix ever identified?

Hi Roy, thanks for the report. What is your motherboard? What kernel are you running? And what tests are triggering this issue for you?
If you have logs, it'll be pretty useful.

Maybe it's a similar but different case..or the logs may help to confirm it's exact the same issue.

ASMedia seems to have a FW fix but that depends on your motherboard vendor to provide it. They don't provide the fix themselves...it needs some cooking from the vendor, to match subsystem IDs and whatnot.

Cheers,

Guilherme

Changed in linux (Debian):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
status: New → Confirmed
Roy Thompson (royt77) wrote :

Hi Guilherme,

Thanks for the response. I have several (3) quad port ASMedia 2142 PCIe/USB 3.1 cards installed in a Dell R740 rack server. I am using the standard Ubuntu 18.04 kernel (Linux dell-PowerEdge-R740 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux).

For one of my applications, it runs a loop that opens and closes a high speed connection to a USB device connected through the ASMedia board. After this goes on for several minutes without any issues, I see this in dmesg:

[Oct 5 10:12] xhci_hcd 0000:be:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ +3.418076] xhci_hcd 0000:be:00.0: WARN Successful completion on short TX
[ +0.000035] xhci_hcd 0000:be:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 12 comp_code 1
[ +0.000003] xhci_hcd 0000:be:00.0: Looking for event-dma 0000001fe9759610 trb-start 0000001fe9759620 trb-end 0000001fe9759620 seg-start 0000001fe9759000 seg-end 0000001fe9759ff0

This is then followed shortly after by several kernel dump messages, and then the whole system starts behaving erratically, requiring a hard reboot to recover.

The condition is easy for me to reproduce and I will happily provide any logs that may be of use to help debug this. Please just let me know what you would like and how to get them (as I am not a kernel expert).

Thanks,
Roy

Hi Roy, thanks for your quick response. First thing, I'd like to ask you to attach the output of "lspci -vvv" and "dmidecode" in this LP so we can validate the adapters and be sure they are exactly the same, and also the motherboard type. Run both commands as root user.

After that, i'll ask you to reproduce the issue and attach the output of "dmesg" command right after reproduction. If you can also elaborate more about the test you're running, I'd really be glad.

I'll then provide you custom commands to use the kernel trace system to infer more about the issue. One final thing: are you willing to test with mainline kernel in order to check if there's some upstream fix for your instance of the issue?
If so, you can get it here: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable
This PPA provides a build from kernel 4.18.

Thanks in advance,

Guilherme

Bryan Walsh (yetanotherbryan) wrote :
Download full text (4.0 KiB)

Hello,

I think I am seeing the same or related issue with the ASM1142 controller on my Razer Core Chroma EGPU enclosure. I'm running Ubuntu 19.04, kernel version 5.0.0-13-generic. Ethernet on the enclosure stops working while downloading large files. Dmesg produces the following error messages:

[ 569.641475] xhci_hcd 0000:0f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 13
[ 569.641487] xhci_hcd 0000:0f:00.0: Looking for event-dma 000000048d9c5770 trb-start 000000048d9c5750 trb-end 000000048d9c5750 seg-start 000000048d9c5000 seg-end 000000048d9c5ff0

lspci output:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 08)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Intel(R) 100 Series Chipset Family LPC Controller/eSPI Controller - 9D4E (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-V (rev 21)
02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C step) [Alpine Ridge 4C 2016] (rev 02)
08:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
09:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
09:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
0a:00....

Read more...

Hi Bryan, thanks for the report. It could be the same issue, can you provide the full dmesg, and also the outputs of the following commands: "lspci -nnvvv", "lspci -t" and "ls -l /sys/class/net"?

The issue was fixed for the first reporter via a FW update in the ASMedia adapter; unfortunately this FW update comes from the vendor, so the way of getting it varies according to the HW presenting the problem.

Cheers,

Guilherme

Bryan Walsh (yetanotherbryan) wrote :

Please see attached log for the outputs that you requested.

Bryan Walsh (yetanotherbryan) wrote :

Please see attached log for the outputs that you requested.

Great Bryan, the model of your USB controller is the same reported in this LP; also, given the outputs you provided, the network interface "enx90203a19dcb6" in under one of those USB controllers - you mentioned you see the TRB DMA errors and the interface stops responding. Is the problematic interface that one, "enx90203a19dcb6" ?

Who is the vendor of your device? I'd suggest you to seek help from them, mentioning this LP and that ASMedia may have a potential firmware fix for the issue.

Thanks,

Guilherme

Gabe Esposito (gabespo) wrote :

I'm also experiencing the same issue with the ASM1142 controller on the Core X Chroma and can reproduce consistently. I'm running kernel 5.0.9.

Guilherme, thanks for your work diagnosing this. This device is sold by Razer. I will try and reach out but they do not claim Linux support on any of their devices so I worry this may go unfixed. Barring a firmware fix, is there any hope of this being fixed with a quirk, as the other controller was? I realize this LP is not the ideal place for such a fix to take place, but I am happy to participate in finding a solution.

Bryan Walsh (yetanotherbryan) wrote :

In attempt to update the firmware I installed the razer software on my newly created windows partition, to see if it could be updated through there. No luck.

I emailed Razer support to ask about obtaining updated firmware. I'll let everyone know what I hear back.

And yes, "enx90203a19dcb6" is the problematic interface.

Thanks Gabe! I agree with you, would be really nice to have a quirk for that. It would be more easy to analyze that possibility with a datasheet for this adapter, which unfortunately I don't have.
I'm on vacation until next week, I'll try to discuss that in linux-usb when I'm back, and pursue a kernel quirk instead of firmware-only fix.

@Bryan, thanks for checking with the vendor, let us know the outcome.
Cheers,

Guilherme

Alex Lourenco (nyb-2017) wrote :

I am experiencing the exact same issue first reported in this LP (ASMedia ASM1142 USB 3.1 Controller with a Logitech Brio 4k, ERROR Transfer event TRB DMA ptr not part of current TD ...). In my case the controller is provided by a StarTech.com 4 Port USB 3.1 PCIe Card 3x USB-A and 1x USB-C [PEXUS313AC2V].

While searching online I found a couple of LP's and forum posts with similar issues. The common factor seems to be high speed usb devices (e.g 4k webcam, usb ethernet adapters) connected to ASMedia controllers.

I have compiled 5.0.0 with a variety of existing quirks but nothing has done the trick so far. There are a couple of ASMedia firmwares posted on station-drivers. Unfortunately none of them seem to fix the issue either.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers