Kernel panic in usb-storage w/Quantal kernel in Precise

Bug #1096802 reported by Jani Uusitalo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned

Bug Description

Last week I replaced the internals of my desktop computer with a new ASUS P8H77-M PRO, Intel G2120 and 16 GB RAM. With one round of Memtest passed, I booted into my old Precise install and got bit hard by what appears to be Bug #993187: frequent hard lockups (multiple within a few hours of use). I installed linux-image-generic-lts-quantal (currently 3.5.0.22.29) and that seemed to resolve the lockups: I got more than two days of uptime (ending with an intended shutdown), half of which I was actively using the computer.

This morning, soon (half an hour?) after login, the kernel paniced with a reference to usb-storage (I'll attach a picture). This was different from the lockups with the stock Precise (3.2) kernel: with them I never had any panics shown (only the frozen desktop) and the system had to be powered off to reboot, whereas with the panic here I could reboot using the chassis reset button.

There's some USB activity in syslog just prior to the panic (Jan 7 at around 10:50), but I wasn't using any USB devices at the time. I have used them with this kernel previously though, without issues, and currently am too (to transfer the panic picture from my phone). There's a memory card reader/USB port panel (Akasa AK-ICR-17) permanently plugged into internal USB 2 and 3.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-generic-lts-quantal 3.5.0.22.29
ProcVersionSignature: Ubuntu 3.5.0-22.33~precise1-generic 3.5.7.2
Uname: Linux 3.5.0-22-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.25.
ApportVersion: 2.0.1-0ubuntu17.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: jani 2648 F.... pulseaudio
 /dev/snd/pcmC0D0p: jani 2648 F...m pulseaudio
 /dev/snd/timer: jani 2648 f.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xf7d10000 irq 47'
   Mixer name : 'Intel PantherPoint HDMI'
   Components : 'HDA:10ec0892,1043841b,00100302 HDA:80862806,80860101,00100000'
   Controls : 57
   Simple ctrls : 22
CheckboxSubmission: 09ae689090491ca53449589269e4bfd8
CheckboxSystem: edda5d4f616ca792bf437989cb597002
Date: Mon Jan 7 11:11:10 2013
HibernationDevice: RESUME=UUID=a4e1a2bf-e095-47a9-aff8-d30f458a69b7
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
IwConfig:
 eth1 no wireless extensions.

 lo no wireless extensions.
MachineType: System manufacturer System Product Name
MarkForUpload: True
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-22-generic root=/dev/mapper/hostname-root ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-22-generic N/A
 linux-backports-modules-3.5.0-22-generic N/A
 linux-firmware 1.79.2
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to precise on 2011-11-21 (413 days ago)
dmi.bios.date: 12/10/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1005
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: P8H77-M PRO
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1005:bd12/10/2012:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnP8H77-MPRO:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Jani Uusitalo (uusijani) wrote :
Revision history for this message
Jani Uusitalo (uusijani) wrote :

I don't see the syslog I mentioned attached above, so I'm attaching it here. Also, the USB-related stuff just preceding the panic starts slightly earlier than what I claimed above, with this reset attempt:

Jan 7 10:45:01 saegusa kernel: [ 413.991127] usb 4-4: reset SuperSpeed USB device number 2 using xhci_hcd

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.8 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc2-raring/

Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Incomplete
Revision history for this message
Jani Uusitalo (uusijani) wrote :
Download full text (3.2 KiB)

There's a definite pattern here, and it's definitely tied to xhci_hcd, usb-storage and the Akasa/Genesys card reader. Here's what I've done since reporting this:

1) Set "Legacy USB 3.0" in the BIOS from "Enabled" to "Disabled", and "Intel xHCI Mode" from "Smart Auto" to "Enabled". I tested briefly with latter set to "Disabled" and Legacy 3.0 "Enabled", but then the card reader wasn't detected at all and I'd prefer a working XHCI anyway.

2) Switched to mainline kernel 3.8.0-030800rc2-generic #201301022235. With the earlier kernels (3.5 and 3.2 from Precise repo) things have seemed similar to my findings below with mainline, but data with 3.2 and 3.5 are too few to say conclusively there's no difference at all. I've concentrated my testing to mainline just to keep things simpler.

Here's the pattern with mainline:

1) Cold boot. Early in the boot, the card reader in USB #4 is asked to reset. This results in a flood of "xHCI xhci_drop_endpoint called with disabled ep ffff880403c8d500":

Jan 8 10:32:12 saegusa kernel: [ 721.015249] usb 4-4: reset SuperSpeed USB device number 2 using xhci_hcd
Jan 8 10:32:12 saegusa kernel: [ 721.032596] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880403c8d500
Jan 8 10:32:12 saegusa kernel: [ 721.032599] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880403c8d540

When these messages are there, the session will crash (panic/freeze) at some point; using the card reader isn't necessary (panics without it eventually too), but it's easy enough to trigger just by sticking an SD card into the reader. The reader reports buffer errors on the card and then boom.

Jan 8 10:32:13 saegusa kernel: [ 721.713295] sd 8:0:0:2: [sdf] Unhandled error code
Jan 8 10:32:13 saegusa kernel: [ 721.713299] sd 8:0:0:2: [sdf]
Jan 8 10:32:13 saegusa kernel: [ 721.713300] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jan 8 10:32:13 saegusa kernel: [ 721.713303] sd 8:0:0:2: [sdf] CDB:
Jan 8 10:32:13 saegusa kernel: [ 721.713304] Read(10): 28 00 00 00 20 00 00 00 08 00
Jan 8 10:32:13 saegusa kernel: [ 721.713314] end_request: I/O error, dev sdf, sector 8192
Jan 8 10:32:13 saegusa kernel: [ 721.713318] Buffer I/O error on device sdf1, logical block 0

Another USB #4 -related message often preceding a panic in syslog is this:
Jan 8 17:38:21 saegusa kernel: [ 148.282256] usb 4-4: Disable of device-initiated U1 failed.
Jan 8 17:38:21 saegusa kernel: [ 148.285747] usb 4-4: Disable of device-initiated U2 failed.

The panics, when visible, are always in Pid: usb-storage, and mostly of the "ring_doorbell_for_active_rings" type (above), but I did have at least one "warn_slowpath_common" too (will attach a picture if requested).

2) Reboot after the panic, USB #4 doesn't get reset and no "disabled ep" or "Disable of device-initiated ..." messages appear in syslog. The card reader works perfectly (i.e. SD card can be inserted and read/written without problems).

3) The panic isn't necessary to get the card reader working: it's enough to reboot after one cold boot and the reset signal being sent in that session. Just don't stay in that cold boot session, because it'll p...

Read more...

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Jani Uusitalo (uusijani) wrote :

The lsusb listing attached by apport above seems to not list the card reader at all. This did happen on some sessions, IIRC there were no panics or "disabled ep" messages then either but naturally, the reader also wouldn't read any cards, it was as if disconnected.

I'm attaching output of `sudo lsusb -v` here, with the card reader (004:002) detected and showing.

Revision history for this message
Jani Uusitalo (uusijani) wrote :

I'm happy to report that this seems to have been a firmware issue: with a temporary install of MS Windows, which the card reader manufacturer's firmware upgrading software required [1], I managed to upgrade the card reader's bought-with firmware version 551 to manufacturer's current latest version 563 (released just last month). After this there were no more "disabled ep" messages in any boot, the reader works just fine and there have been no kernel panics of any kind.

This was with the mainline 3.8 kernel so I'm not marking this bug invalid just yet. I've now switched back to the Quantal kernel I initially reported this with and will report here next week on how it goes.

*[1] http://www.akasa.com.tw/update.php?tpl=product/cpu.product.tpl&no=181&type=Card%20Reader/Hub&type_sub=Card%20Reader&model=AK-ICR-17

Revision history for this message
Jani Uusitalo (uusijani) wrote :

No problems with 3.5 either, marking as invalid.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.