ERROR Transfer event TRB DMA ptr not part of current TD

Bug #1874442 reported by DjMix
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-signed-hwe (Ubuntu)
New
Undecided
Unassigned

Bug Description

I'm developing a software that talks with a device based on Cypress FX3 USB3 controller. In my laptop all works well, I developed it for months by now. In my desktop computer it crashes the xhci module:

Apr 23 12:39:11 phoenix kernel: [ 679.386166] xhci_hcd 0000:04:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 6 comp_code 13
Apr 23 12:39:11 phoenix kernel: [ 679.386169] xhci_hcd 0000:04:00.0: Looking for event-dma 000000003cd0f020 trb-start 000000003cd0f000 trb-end 000000003cd0f000 seg-start 000000003cd0f000 seg-end 000000003cd0fff0
Apr 23 12:39:11 phoenix kernel: [ 679.386287] xhci_hcd 0000:04:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 6 comp_code 13
Apr 23 12:39:11 phoenix kernel: [ 679.386289] xhci_hcd 0000:04:00.0: Looking for event-dma 000000003cd0f050 trb-start 000000003cd0f000 trb-end 000000003cd0f000 seg-start 000000003cd0f000 seg-end 000000003cd0fff0
Apr 23 12:39:11 phoenix kernel: [ 679.386411] xhci_hcd 0000:04:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 6 comp_code 13
Apr 23 12:39:11 phoenix kernel: [ 679.386413] xhci_hcd 0000:04:00.0: Looking for event-dma 000000003cd0f070 trb-start 000000003cd0f000 trb-end 000000003cd0f000 seg-start 000000003cd0f000 seg-end 000000003cd0fff0

and so on (one per libusb transfer, it seems).

Sometimes it even gives a core dump:

Apr 23 13:33:23 phoenix kernel: [ 122.974398] swiotlb_tbl_map_single: 6 callbacks suppressed
Apr 23 13:33:23 phoenix kernel: [ 122.974401] xhci_hcd 0000:04:00.0: swiotlb buffer is full (sz: 1572864 bytes), total 32768 (slots), used 516 (slots)
Apr 23 13:33:23 phoenix kernel: [ 122.974403] xhci_hcd 0000:04:00.0: overflow 0x0000000154200000+1572864 of DMA mask ffffffff bus mask 0
Apr 23 13:33:23 phoenix kernel: [ 122.974412] WARNING: CPU: 7 PID: 3280 at /build/linux-hwe-3vURZB/linux-hwe-5.3.0/kernel/dma/direct.c:35 report_addr+0x33/0x90
Apr 23 13:33:23 phoenix kernel: [ 122.974413] Modules linked in: rfcomm hid_logitech_hidpp bnep intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp nvidia_uvm(POE) coretemp nls_iso8859_1 kvm_intel nvidia_drm(POE) kvm
joydev input_leds mei_hdcp nvidia_modeset(POE) irqbypass nvidia(POE) snd_hda_codec_hdmi snd_hda_codec_realtek hid_logitech_dj snd_hda_codec_generic ledtrig_audio crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_usb_aud
io snd_intel_nhlt snd_hda_codec snd_usbmidi_lib uvcvideo snd_hda_core aesni_intel btusb btrtl aes_x86_64 btbcm crypto_simd cryptd videobuf2_vmalloc snd_hwdep btintel glue_helper snd_seq_midi eeepc_wmi snd_seq_midi_event bluetooth intel_c
state videobuf2_memops snd_rawmidi asus_wmi intel_rapl_perf videobuf2_v4l2 videobuf2_common snd_seq snd_pcm sparse_keymap ecdh_generic videodev ecc hid_generic ipmi_devintf mc serio_raw wmi_bmof lpc_ich snd_seq_device ipmi_msghandler snd
_timer snd mei_me soundcore mei mac_hid sch_fq_codel parport_pc ppdev lp binfmt_misc
Apr 23 13:33:23 phoenix kernel: [ 122.974436] parport ip_tables x_tables autofs4 usbhid hid btrfs xor zstd_compress raid6_pq libcrc32c pata_acpi i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops psmouse ahc
i drm libahci atl1c pata_via wmi video
Apr 23 13:33:23 phoenix kernel: [ 122.974445] CPU: 7 PID: 3280 Comm: dumper Tainted: P OE 5.3.0-46-generic #38~18.04.1-Ubuntu
Apr 23 13:33:23 phoenix kernel: [ 122.974446] Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
Apr 23 13:33:23 phoenix kernel: [ 122.974447] RIP: 0010:report_addr+0x33/0x90
Apr 23 13:33:23 phoenix kernel: [ 122.974449] Code: 48 83 ec 08 48 8b 87 28 02 00 00 48 89 75 f8 48 85 c0 74 26 4c 8b 00 b8 fe ff ff ff 49 39 c0 76 0d 80 3d 5f fd 71 01 00 74 2e <0f> 0b c9 c3 48 83 bf 38 02 00 00 00 75 e9 eb f0 80 3d 47
 fd 71 01
Apr 23 13:33:23 phoenix kernel: [ 122.974449] RSP: 0018:ffffae0601a97b48 EFLAGS: 00010286
Apr 23 13:33:23 phoenix kernel: [ 122.974451] RAX: 0000000000000000 RBX: ffff9045a540c0b0 RCX: 0000000000000006
Apr 23 13:33:23 phoenix kernel: [ 122.974451] RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000000
Apr 23 13:33:23 phoenix kernel: [ 122.974452] RBP: ffffae0601a97b50 R08: 0000000000000431 R09: ffffffffb5794698
Apr 23 13:33:23 phoenix kernel: [ 122.974452] R10: 0000000000000000 R11: ffffae0601a97880 R12: 0000000000180000
Apr 23 13:33:23 phoenix kernel: [ 122.974453] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffff9045146cc000
Apr 23 13:33:23 phoenix kernel: [ 122.974454] FS: 00007f05b6ab6080(0000) GS:ffff9045a75c0000(0000) knlGS:0000000000000000
Apr 23 13:33:23 phoenix kernel: [ 122.974454] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 23 13:33:23 phoenix kernel: [ 122.974455] CR2: 00007f0595707000 CR3: 0000000161a46006 CR4: 00000000000606e0
Apr 23 13:33:23 phoenix kernel: [ 122.974456] Call Trace:
Apr 23 13:33:23 phoenix kernel: [ 122.974459] dma_direct_map_page+0xe2/0xf0
Apr 23 13:33:23 phoenix kernel: [ 122.974462] usb_hcd_map_urb_for_dma+0x576/0x5e0
Apr 23 13:33:23 phoenix kernel: [ 122.974464] xhci_map_urb_for_dma+0x4e/0x60
Apr 23 13:33:23 phoenix kernel: [ 122.974465] usb_hcd_submit_urb+0x97/0xba0
Apr 23 13:33:23 phoenix kernel: [ 122.974467] ? alloc_pages_current+0x6a/0xe0
Apr 23 13:33:23 phoenix kernel: [ 122.974470] ? kmalloc_order+0x18/0x40
Apr 23 13:33:23 phoenix kernel: [ 122.974471] usb_submit_urb+0x338/0x5d0
Apr 23 13:33:23 phoenix kernel: [ 122.974472] ? __kmalloc+0x208/0x260
Apr 23 13:33:23 phoenix kernel: [ 122.974474] proc_do_submiturb+0x714/0xdc0
Apr 23 13:33:23 phoenix kernel: [ 122.974475] usbdev_do_ioctl+0xacc/0x1300
Apr 23 13:33:23 phoenix kernel: [ 122.974478] ? lru_cache_add_active_or_unevictable+0x39/0xb0
Apr 23 13:33:23 phoenix kernel: [ 122.974480] ? __handle_mm_fault+0x9ad/0x1230
Apr 23 13:33:23 phoenix kernel: [ 122.974481] usbdev_ioctl+0xe/0x20
Apr 23 13:33:23 phoenix kernel: [ 122.974482] do_vfs_ioctl+0xa9/0x640
Apr 23 13:33:23 phoenix kernel: [ 122.974483] ? handle_mm_fault+0xcb/0x210
Apr 23 13:33:23 phoenix kernel: [ 122.974484] ksys_ioctl+0x75/0x80
Apr 23 13:33:23 phoenix kernel: [ 122.974485] __x64_sys_ioctl+0x1a/0x20
Apr 23 13:33:23 phoenix kernel: [ 122.974488] do_syscall_64+0x5a/0x130
Apr 23 13:33:23 phoenix kernel: [ 122.974490] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 23 13:33:23 phoenix kernel: [ 122.974491] RIP: 0033:0x7f05b3c625d7
Apr 23 13:33:23 phoenix kernel: [ 122.974492] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
Apr 23 13:33:23 phoenix kernel: [ 122.974493] RSP: 002b:00007fff3801cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 23 13:33:23 phoenix kernel: [ 122.974494] RAX: ffffffffffffffda RBX: 000055c895487420 RCX: 00007f05b3c625d7
Apr 23 13:33:23 phoenix kernel: [ 122.974494] RDX: 000055c8957c95f0 RSI: 000000008038550a RDI: 0000000000000012
Apr 23 13:33:23 phoenix kernel: [ 122.974495] RBP: 0000000000000001 R08: 000055c8957c95f0 R09: 0000000000000000
Apr 23 13:33:23 phoenix kernel: [ 122.974495] R10: 000055c8957c8f10 R11: 0000000000000246 R12: 000000008038550a
Apr 23 13:33:23 phoenix kernel: [ 122.974496] R13: 000055c8953ed710 R14: 0000000000000000 R15: 0000000000000000
Apr 23 13:33:23 phoenix kernel: [ 122.974497] ---[ end trace 1b4e76a018be9583 ]---
Apr 23 13:33:23 phoenix kernel: [ 122.974499] usb 4-2: usbfs: usb_submit_urb returned -11

Lot of times it crashes so badly that the controller becomes unusable:

Apr 23 11:28:36 phoenix kernel: [10415.197865] xhci_hcd 0000:04:00.0: xHCI host not responding to stop endpoint command.
Apr 23 11:28:36 phoenix kernel: [10415.197880] xhci_hcd 0000:04:00.0: xHCI host controller not responding, assume dead
Apr 23 11:28:36 phoenix kernel: [10415.197903] xhci_hcd 0000:04:00.0: HC died; cleaning up
Apr 23 11:28:36 phoenix kernel: [10415.197949] usb 4-1: USB disconnect, device number 2
Apr 23 11:28:36 phoenix kernel: [10415.197957] usb 4-1: Not enough bandwidth for altsetting 0

To have it back, a reboot is necessary because it seems that xhci is not a module but it's compiled in.

System informations:

# uname -a
Linux phoenix 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# dpkg -l | grep linux-image-5.3.0-46-generic
ii linux-image-5.3.0-46-generic 5.3.0-46.38~18.04.1 amd64 Signed kernel image generic

# lsb_release -rd
Description: KDE neon User Edition 5.18
Release: 18.04

# dmesg | grep xhci
[ 1.209460] xhci_hcd 0000:04:00.0: xHCI Host Controller
[ 1.209463] xhci_hcd 0000:04:00.0: new USB bus registered, assigned bus number 3
[ 1.216386] xhci_hcd 0000:04:00.0: hcc params 0x0200f180 hci version 0x96 quirks 0x0000000000080000
[ 1.216613] usb usb3: Manufacturer: Linux 5.3.0-46-generic xhci-hcd
[ 1.216810] xhci_hcd 0000:04:00.0: xHCI Host Controller
[ 1.216813] xhci_hcd 0000:04:00.0: new USB bus registered, assigned bus number 4
[ 1.216815] xhci_hcd 0000:04:00.0: Host supports USB 3.0 SuperSpeed
[ 1.221652] usb usb4: Manufacturer: Linux 5.3.0-46-generic xhci-hcd

Any help will be appreciated.

Revision history for this message
DjMix (ilario-gottardello) wrote :

I also tried with these kernels:

5.0.0-44.48~18.04.1
4.18.0-25.26~18.04.1
4.15.0-96.97

The same behaviour happens.

Revision history for this message
DjMix (ilario-gottardello) wrote :

Just verified again in my laptop, all works as expected. It has the same kernel, the same distribution, but a different controller:

[ 1.299706] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 1.299712] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[ 1.300780] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000081109810
[ 1.300784] xhci_hcd 0000:00:14.0: cache line size of 64 is not supported
[ 1.300967] usb usb1: Manufacturer: Linux 5.3.0-46-generic xhci-hcd
[ 1.302403] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 1.302405] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[ 1.302407] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
[ 1.302440] usb usb2: Manufacturer: Linux 5.3.0-46-generic xhci-hcd

Revision history for this message
DjMix (ilario-gottardello) wrote :

Just to add info, I am able to reset the controller with these two commands:

# echo 1 > /sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/remove
# echo 1 > /sys/devices/pci0000:00/0000:00:1c.5/rescan

Revision history for this message
Chris Adams (fkmjo) wrote :

I am having the same problem with the same hardware. On my laptop also it works which is an Intel controller. On every AMD EPYC machine (starship USB controller) I have tried I get libusb timeouts and nothing in the kernel logs. Using a PCIe riser card ASMedia ASM3142 I get the same errors as you. By turning debug on I get some more output

[20047.573189] xhci_hcd 0000:68:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[20047.573198] xhci_hcd 0000:68:00.0: Looking for event-dma 00000000fffc30b0 trb-start 00000000fffc4fb0 trb-end 0000000000000000 seg-start 00000000fffc4000 seg-end 00000000fffc4ff0
[20047.573204] xhci_hcd 0000:68:00.0: Looking for event-dma 00000000fffc30b0 trb-start 00000000fffc3000 trb-end 00000000fffc3070 seg-start 00000000fffc3000 seg-end 00000000fffc3ff0
[20052.571651] xhci_hcd 0000:68:00.0: Cancel URB 00000000d0831cdb, dev 1, ep 0x81, starting at offset 0xfffc3040
[20052.571671] xhci_hcd 0000:68:00.0: // Ding dong!
[20052.571790] xhci_hcd 0000:68:00.0: Stopped on No-op or Link TRB for slot 8 ep 2
[20052.571837] xhci_hcd 0000:68:00.0: Removing canceled TD starting at 0xfffc3040 (dma) in stream 0 URB 00000000d0831cdb
[20052.571850] xhci_hcd 0000:68:00.0: xhci_giveback_invalidated_tds: Giveback cancelled URB 00000000d0831cdb TD
[20053.572166] xhci_hcd 0000:68:00.0: Cancel URB 000000004ef649ab, dev 1, ep 0x81, starting at offset 0xfffc3080
[20053.572188] xhci_hcd 0000:68:00.0: // Ding dong!
[20053.572308] xhci_hcd 0000:68:00.0: Stopped on No-op or Link TRB for slot 8 ep 2
[20053.572355] xhci_hcd 0000:68:00.0: Removing canceled TD starting at 0xfffc3080 (dma) in stream 0 URB 000000004ef649ab
[20053.572364] xhci_hcd 0000:68:00.0: xhci_giveback_invalidated_tds: Giveback cancelled URB 000000004ef649ab TD

There are a number of related issues, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667750 and https://<email address hidden>/t/ amongst others.

It seems in some instances it is solved with a patch that was made to the 4 series kernel but it has resurfaced recently.

Did you ever resolve the issue?

Btw I am on Ubuntu 20.04, 5.15.0-76-generic kernel

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.