Kernel crash during USB device enumeration or mounting

Bug #1871143 reported by Dan Halbert
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Device is an Adafruit microcontroller board running https://github.com/adafruit/circuitpython, using the TinyUSB USB stack: https://github.com/hathach/tinyusb.

Device presents a composite USB device that includes MSC, CDC, HID, and MIDI
Device was commanded to forcibly reformat its MSC FAT12 filesystem and then reset, which causes a USB reset and re-enumeration.

This causes some USB kernel code to crash, much of the time, though not quite all the time.

I work for Adafruit and the maintainer of TinyUSB also works for Adafruit, so we can provide further help to track this down.

Attached is the dmesg file for the crash. I tried uploading the dump file from kdump, but launchpad is timing out. The dump file is available at:
https://drive.google.com/open?id=1ka3ySSccEMRSmnYgI9X16IndPDmIKTD6

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-21-generic 5.4.0-21.25
ProcVersionSignature: Ubuntu 5.4.0-21.25-generic 5.4.27
Uname: Linux 5.4.0-21-generic x86_64
ApportVersion: 2.20.11-0ubuntu24
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: halbert 1857 F.... pulseaudio
 /dev/snd/controlC2: halbert 1857 F.... pulseaudio
CurrentDesktop: X-Cinnamon
Date: Mon Apr 6 11:09:43 2020
HibernationDevice: RESUME=UUID=55694d39-bafe-45f8-ba05-ea5d54a93a85
InstallationDate: Installed on 2017-09-19 (929 days ago)
InstallationMedia: Ubuntu 16.04.2 LTS "Xenial Xerus" - Release amd64 (20170215.2)
IwConfig:
 eno1 no wireless extensions.

 lo no wireless extensions.
MachineType: Dell Inc. OptiPlex 7010
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-21-generic root=UUID=b3868153-c620-4efe-9dfc-8059dc6288a8 ro quiet splash crashkernel=512M-:192M vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-21-generic N/A
 linux-backports-modules-5.4.0-21-generic N/A
 linux-firmware 1.187
RfKill:
 0: hci0: Bluetooth
  Soft blocked: yes
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-03-08 (28 days ago)
dmi.bios.date: 06/28/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A29
dmi.board.name: 0WR7PY
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 15
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA29:bd06/28/2018:svnDellInc.:pnOptiPlex7010:pvr01:rvnDellInc.:rn0WR7PY:rvrA01:cvnDellInc.:ct15:cvr:
dmi.product.name: OptiPlex 7010
dmi.product.sku: OptiPlex 7010
dmi.product.version: 01
dmi.sys.vendor: Dell Inc.

Revision history for this message
Dan Halbert (dhalbert) wrote :
Revision history for this message
Dan Halbert (dhalbert) wrote :
Download full text (6.7 KiB)

Key part of dmesg is:

[76704.227672] cdc_acm 1-1.4:1.0: failed to set dtr/rts
[76704.936301] usb 1-1.4: USB disconnect, device number 11
[76704.951638] blk_update_request: I/O error, dev sdc, sector 1110 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951644] Buffer I/O error on dev sdc1, logical block 1109, lost async page write
[76704.951647] Buffer I/O error on dev sdc1, logical block 1110, lost async page write
[76704.951656] blk_update_request: I/O error, dev sdc, sector 1128 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951658] Buffer I/O error on dev sdc1, logical block 1127, lost async page write
[76704.951661] Buffer I/O error on dev sdc1, logical block 1128, lost async page write
[76704.951679] blk_update_request: I/O error, dev sdc, sector 1150 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951682] Buffer I/O error on dev sdc1, logical block 1149, lost async page write
[76704.951684] Buffer I/O error on dev sdc1, logical block 1150, lost async page write
[76704.951689] blk_update_request: I/O error, dev sdc, sector 1174 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951691] Buffer I/O error on dev sdc1, logical block 1173, lost async page write
[76704.951693] Buffer I/O error on dev sdc1, logical block 1174, lost async page write
[76704.951717] blk_update_request: I/O error, dev sdc, sector 1194 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951718] Buffer I/O error on dev sdc1, logical block 1193, lost async page write
[76704.951719] Buffer I/O error on dev sdc1, logical block 1194, lost async page write
[76704.951722] blk_update_request: I/O error, dev sdc, sector 1212 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951734] blk_update_request: I/O error, dev sdc, sector 1232 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951739] blk_update_request: I/O error, dev sdc, sector 1252 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951748] blk_update_request: I/O error, dev sdc, sector 1272 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76704.951751] blk_update_request: I/O error, dev sdc, sector 1290 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[76707.579590] usb 1-1.4: new full-speed USB device number 12 using ehci-pci
[76707.691410] usb 1-1.4: New USB device found, idVendor=239a, idProduct=8014, bcdDevice= 1.00
[76707.691413] usb 1-1.4: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[76707.691415] usb 1-1.4: Product: Metro M0 Express
[76707.691417] usb 1-1.4: Manufacturer: Adafruit Industries LLC
[76707.691418] usb 1-1.4: SerialNumber: DFAE9D970323D4050213E2932141E0FF
[76707.692717] general protection fault: 0000 [#1] SMP PTI
[76707.692723] CPU: 3 PID: 75883 Comm: kworker/3:1 Kdump: loaded Not tainted 5.4.0-21-generic #25-Ubuntu
[76707.692725] Hardware name: Dell Inc. OptiPlex 7010/0WR7PY, BIOS A29 06/28/2018
[76707.692732] Workqueue: usb_hub_wq hub_event
[76707.692738] RIP: 0010:__kmalloc+0xa5/0x270
[76707.692741] Code: 65 49 8b 50 08 65 4c 03 05 40 a0 76 4d 4d 8b 38 4d 85 ff 0f 84 93 01 00 00 41 8b 59 20 49 8b 39 48 8d 4a 01 4c 89 f8 4c 01 fb <48> 33 1b 49 33 99 70 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
[76707.692743] RSP: 0018:ffffa85b4...

Read more...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Dan Halbert (dhalbert) wrote :
Download full text (6.2 KiB)

Another crash, with a different backtrace. with a different board, but related USB implementation. Since the report above, the underlying USB implementation on the boards has changed somewhat, but the circumstances of the crash are similar:

[40859.232293] usb 1-1.3: USB disconnect, device number 34
[40859.250102] FAT-fs (sdc): unable to read boot sector to mark fs as dirty
[40862.270362] usb 1-1.3: new full-speed USB device number 35 using ehci-pci
[40862.387543] usb 1-1.3: New USB device found, idVendor=239a, idProduct=8014, bcdDevice= 1.00
[40862.387546] usb 1-1.3: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[40862.387548] usb 1-1.3: Product: Metro M0 Express
[40862.387550] usb 1-1.3: Manufacturer: Adafruit Industries LLC
[40862.387552] usb 1-1.3: SerialNumber: DFAE9D970323D4050213E2932141E0FF
[40862.388969] cdc_acm 1-1.3:1.0: ttyACM0: USB ACM device
[40862.392200] usb-storage 1-1.3:1.2: USB Mass Storage device detected
[40862.392599] scsi host6: usb-storage 1-1.3:1.2
[40862.397276] input: Adafruit Industries LLC Metro M0 Express Keyboard as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.3/0003:239A:8014.0016/input/input97
[40862.454787] input: Adafruit Industries LLC Metro M0 Express Mouse as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.3/0003:239A:8014.0016/input/input98
[40862.455133] input: Adafruit Industries LLC Metro M0 Express Consumer Control as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.3/0003:239A:8014.0016/input/input99
[40862.455322] input: Adafruit Industries LLC Metro M0 Express as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.3/0003:239A:8014.0016/input/input100
[40862.455689] hid-generic 0003:239A:8014.0016: input,hidraw3: USB HID v1.11 Keyboard [Adafruit Industries LLC Metro M0 Express] on usb-0000:00:1a.0-1.3/input3
[40863.416330] scsi host6: scsi scan: INQUIRY result too short (5), using 36
[40863.416338] scsi 6:0:0:0: Direct-Access Adafruit Metro M0 Express 1.0 PQ: 0 ANSI: 2
[40863.416732] sd 6:0:0:0: Attached scsi generic sg3 type 0
[40863.418307] sd 6:0:0:0: [sdc] 4089 512-byte logical blocks: (2.09 MB/2.00 MiB)
[40863.419611] sd 6:0:0:0: [sdc] Write Protect is off
[40863.419615] sd 6:0:0:0: [sdc] Mode Sense: 03 00 00 00
[40863.420650] sd 6:0:0:0: [sdc] No Caching mode page found
[40863.420656] sd 6:0:0:0: [sdc] Assuming drive cache: write through
[40863.464434] sdc: sdc1
[40863.472071] sd 6:0:0:0: [sdc] Attached SCSI removable disk
[40864.285807] FAT-fs (sdc1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[40866.329111] cdc_acm 1-1.3:1.0: failed to set dtr/rts
[40866.349335] general protection fault: 0000 [#1] SMP PTI
[40866.349339] CPU: 3 PID: 134345 Comm: systemd-udevd Kdump: loaded Not tainted 5.4.0-28-generic #32-Ubuntu
[40866.349340] Hardware name: Dell Inc. OptiPlex 7010/0WR7PY, BIOS A29 06/28/2018
[40866.349346] RIP: 0010:kmem_cache_alloc_trace+0x88/0x230
[40866.349348] Code: 65 49 8b 50 08 65 4c 03 05 fd 95 36 60 4d 8b 38 4d 85 ff 0f 84 80 01 00 00 41 8b 59 20 49 8b 39 48 8d 4a 01 4c 89 f8 4c 01 fb <48> 33 1b 49 33 99 70 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
[40866.349349] RSP: 0018:ffffa5f589357bf0 EFLAGS: ...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Disconnect:
[76704.227672] cdc_acm 1-1.4:1.0: failed to set dtr/rts
[76704.936301] usb 1-1.4: USB disconnect, device number 11

Reprobe:
[76707.579590] usb 1-1.4: new full-speed USB device number 12 using ehci-pci

Then:
[76707.692717] general protection fault: 0000 [#1] SMP PTI

Could be use after free, or stack corruption. Probably some resources aren't freed in USB stack.

Does TinyUSB touch anything inside kernel?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Dan Halbert (dhalbert) wrote :

Thanks for the reply.

TinyUSB runs on the microcontroller board; it is the USB stack implementation used by the software on the board. So it doesn't touch the kernel, by definition. It is behaving in some way that is unexpected to the Linux driver, and causing bad behavior on the part of the driver.

It is possible this is a timing or kernel driver state machine. These crashes occur when I reset the board. So the USB device is forcibly disconnected, and then immediately tries to re-enumerate. This is much faster than the usual unplug/replug time you might see, if say, you unplugged a replugged a USB keyboard or USB flash drive.

Would a USB trace be helpful to you? I can use a Beagle and get a trace file. Is this the right place to continue discussing this bug, or should it be in some Linux kernel bug system?

Thanks.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Since the system is upgraded from 16.04, have you tried any older kernel? This could be a regression.

Also, please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7

Revision history for this message
Dan Halbert (dhalbert) wrote :

I was able to reproduce the problem after a few tries with the 5.7.0-050700-generic kernel. I then started trying older kernels. It appears the problem started somewhere between 4.10.0 an 4.18.0. But sometimes it takes several tries to reproduce, so I'll continue some ad-hoc bisecting to see if I can narrow it down.

For posterity, here is how I'm reproducing. It may take several tries, but it seems to happen about 1/3 of the time. I don't expect you to need to follow all of this for now:

Using CircuitPython 5.3.0 on Adafruit Metro M0 board. Connect to REPL and do:
>>> import storage
>>> storage.erase_filesystem()

can cause an immediate kernel crash, or it might be delayed by several seconds.

Revision history for this message
Dan Halbert (dhalbert) wrote :

I'm wrong, I got this to happen on 4.10, which is quite old, so I'm not sure if it's worth going back much further in time.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please raise the issue to linux-usb mailing list. Possible use-after-free bug here.

Revision history for this message
Dan Halbert (dhalbert) wrote :

Thank you. Attaching lspci-v output in preparation for reporting to the linux-usb mailing list. lsusb -v output is already attached to the first port.

Revision history for this message
Dan Halbert (dhalbert) wrote :

I brought this up on the linux-usb email list and was told it was more likely to be an issue in the filesystem code. In addition, that code (and the USB code) don't expect an MSC USB device to behave badly, and there's no guarantee that if it does, Linux will not crash. In this case, the filesystem changes abruptly out from under the filesystem code.

linux-usb thread:
https://marc.info/?l=linux-usb&m=159387610928589&w=2

I suppose you can close this now. I wish Linux were more robust about this, but it appears to be a "won't fix" from their point of view.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Colin Godsey (crgodsey) wrote :

This still seems to be happening on 5.8. Different board yet again, but same situation. Crashes every 1/X times when resetting the board. I don't have logs of the actual crash (ZFS root maybe?), but it locks up right after the "failed to set dtr/rts" like the others reported here, with similar logs up to that point.

Revision history for this message
Colin Godsey (crgodsey) wrote :

For the other folks that see this, it does seem to help to use the reset button slower, for what its worth. Possibly offers a better disconnect/reconnect timing.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.