USB3 ports cause kernel crash and oops since 3.5.0-24-generic (3.5.0-23 ok), USB2 OK.

Bug #1132129 reported by Stéphane Gourichon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

Summary: upgrading from 3.5.0-23 to 3.5.0-24 appears to reduce kernel stability suddenly from weeks/months to day or so.

Context: Hardware Asus n55sf laptop. Plugging Galaxy Nexus phone via USB daily or so. Observed for months that sometimes USB breaks (e.g. optical mouse literally off, un/plug does nothing, reboot fixes), but no oops.
Using suspend to ram regularly (~ daily). Typical uptime : weeks to 1-2 months.
Using Ubuntu upgrades regularly.
On 2013-02-01, upgraded to 3.5.0-23, no problem.
On 2013-02-21, upgraded to 3.5.0-24.

1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu

12.10 AMD64

2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center

LC_ALL=C apt-cache policy linux-image

linux-image:
  Installed: (none)
  Candidate: 3.5.0.25.31
  Version table:
     3.5.0.25.31 0
        500 http://fr.archive.ubuntu.com/ubuntu/ quantal-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ quantal-security/main amd64 Packages
     3.5.0.17.19 0
        500 http://fr.archive.ubuntu.com/ubuntu/ quantal/main amd64 Packages

3) What you expected to happen

No oops, no crash.

4) What happened instead

On 2013-02-22 (that is kernel update + 1 day), observed kernel panic: switch to text mode with panic report (but X mouse cursor still visible), led blinking. Text says : fatal exception interrupt.
Action : taken photograph of screen (can join if useful), power cycle machine.

On 2013-02-23 (that is kernel update + 2 days), observed kernel oops: switch to text mode with oops report, but X mouse cursor still visible and music still playing.
Action : taken photograph on screen (can join if useful, but you have dmesg attached so probably not useful). Observed that vt-switching key combinations could restore display. Not rebooted system (using it to write this very report) but several features disabled (laptop-integrated card reader, USB mouse). Lsusb still reports in-laptop bluetooth device as present but bluetooth-sendto cannot detect Galaxy Nexus nearby when doing proper steps.

Considered actions:
* continue normal use to confirm / narrow oops circumstances
* downgrade to 3.5.0-23 to confirm symptom disappear

Thank you for your attention.

$ lsusb

Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 003 Device 002: ID 0b05:17a9 ASUSTek Computer, Inc.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 007: ID 0cf3:3005 Atheros Communications, Inc. AR3011 Bluetooth
Bus 001 Device 004: ID 1bcf:2883 Sunplus Innovation Technology Inc.

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image (not installed)
ProcVersionSignature: Ubuntu 3.5.0-24.37-generic 3.5.7.4
Uname: Linux 3.5.0-24-generic x86_64
ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: stephane 2793 F.... xfce4-volumed
                      stephane 2797 F.... pulseaudio
                      stephane 2876 F.... xfce4-mixer-plu
 /dev/snd/pcmC0D0p: stephane 2797 F...m pulseaudio
Date: Sat Feb 23 15:03:04 2013
HibernationDevice: RESUME=UUID=d05e8def-662d-4d3a-b3b3-4d9d2997966d
InstallationDate: Installed on 2012-12-27 (58 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.5)
MachineType: ASUSTeK Computer Inc. N55SF
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=fr_FR.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-24-generic root=UUID=0ac65d3d-9ad8-4b02-8650-5df01d16640a ro quiet splash
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/stephane not ours.
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-24-generic N/A
 linux-backports-modules-3.5.0-24-generic N/A
 linux-firmware 1.95
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/29/2011
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: N55SF.207
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: N55SF
dmi.board.vendor: ASUSTeK Computer Inc.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK Computer Inc.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrN55SF.207:bd08/29/2011:svnASUSTeKComputerInc.:pnN55SF:pvr1.0:rvnASUSTeKComputerInc.:rnN55SF:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr1.0:
dmi.product.name: N55SF
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK Computer Inc.

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: Kernel crash and oops since 3.5.0-24-generic (3.5.0-23 ok).

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.8 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-raring/

tags: added: regression-update
Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Incomplete
Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Hi Joseph,

Thank you for your relevant suggestion. Yes, I can test latest upstream kernel (actually I did that for another kernel bug on same machine after you asked https://bugs.launchpad.net/ubuntu/+source/linux/+bug/987056/comments/6 ).

To effectively confirm that changing kernel changes the problem it's wise to be able to reproduce the problem.

So I'll first continue usual usage patterns and figure out if some more specific circumstances (e.g. patterns of plugging usb devices) trigger the crash or oops. The fact is: both happened when I was away from the machine for a few minutes, that may be part of a pattern.

Then I'll report here and start trying latest upstream kernel.

Regards.

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Hi,

This week, a new kernel was available for quantal : 3.5.0-25-generic, and used.
After 1 day 2:59 uptime and usual activity, another oops occurred.

To help figure out a possible culprit, here is a summary of recent activity.
* using for days a heavily loaded firefox (total of 27 windows and 91 tabs, session restored after reboot to change kernel)
* opening a 16000x16000 JPEG, namely http://www.eso.org/public/archives/images/large/eso1309a.jpg
* plugging Galaxy Nexus smartphone on USB
* observing USB mouse suddenly not working, but mouse laser light still on
* unplugging-replugging USB mouse. Mouse laser light now off.
* observing system freeze for about 5 seconds (probably music, too)
* observing switch to text mode and oops text (like described in original bug report)
* switching VTs using Ctrl-Alt-Fn to regain X session access
* observing firefox taking 100% of one cpu core and memory 3.2GB (varying, 2.9GB after a while, still 100% cpu)
* not observing signifiant system slowdown or led showing disk thrashing
* machine has 6GB physical RAM + 6GB swap, using 4.7GB RAM and virtually no swap (58MB)
* observing rhythmbox has stopped playing when oops occurred. Pausing/playing changes the "play/pause" status in rhythmbox window, but time indicator does not move and no sound produced.
* opening another firefox window to write this very report
* while I type, rhythmbox resumed playing (after about a minute without touching it)

On previous bugs occurrences, firefox also was heavily loaded a Galaxy Nexus plugged recently on USB.
Firefox was also loaded all the week, though probably a bit less, and phone is plugged/unplugged more than daily.

Attached dmesg from command line because /var/log/dmesg does not contain oops text.

Considering to try latest kernel soon.

Hope this helps. Thank you for your attention.

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Another oops yesterday evening with kernel 3.5.0-25-generic.
Took photograph of screen. Can attach if useful.
Could not restore session access or login on any text-mode tty. Had to force shut off.

It looks like /var/log/dmesg* don't record oops, probably because it's only dmesg just after boot.

$ ls -al /var/log/dmesg*

 2042 ls -al /var/log/dmesg*

-rw-r----- 1 root adm 67739 mars 2 14:05 /var/log/dmesg
-rw-r----- 1 root adm 72992 mars 2 08:52 /var/log/dmesg.0
-rw-r----- 1 root adm 16475 févr. 28 10:52 /var/log/dmesg.1.gz
-rw-r----- 1 root adm 16973 févr. 23 22:58 /var/log/dmesg.2.gz
-rw-r----- 1 root adm 16890 févr. 22 18:26 /var/log/dmesg.3.gz
-rw-r----- 1 root adm 16671 févr. 21 16:48 /var/log/dmesg.4.gz

* zgrep -i oops /var/log/dmesg* || echo no match
no match

Considering to try latest kernel soon, but starting holiday time may delay.

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Hi,

Just to add that the machine regularly does this oops. I seldom switch it off (normally suspend) but often have to force reboot when I cannot recover console or even from network.

Here is a filtered result of reboot log:

zgrep "Linux version" syslog.*.gz syslog.1 syslog -h | grep -o '^[a-zA-Z]* *[0-9: ]*' | while read date ; do date --date="$date" --rfc-3339=date ; done | sort | uniq -c

      2 2013-03-02
      1 2013-03-04
      1 2013-03-05
      1 2013-03-10
      1 2013-03-11
      1 2013-03-13
      1 2013-03-21
      1 2013-03-22
      1 2013-03-27
      1 2013-03-28
      4 2013-03-29
      1 2013-03-30
      1 2013-04-01
      1 2013-04-02
      1 2013-04-08
      2 2013-04-09

Cannot try latest kernel at the moment, will get back.

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Also observed crashed on suspend recently, including with linux-image-3.5.0-26-generic 3.5.0-26.42

Now automatically upgraded to 3.5.0-27. Wait and see.

Cannot try latest upstream kernel at the moment as instructed, will get back.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Changed in linux (Ubuntu):
status: Expired → Incomplete
Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Bug still occurs from time to time. Applying regular updates (now running 3.5.0-32-generic #53-Ubuntu SMP Wed May 29 20:23:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux).

Will probably have time to test latest upstream kernel after July 8th.

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Here are some hints about frequency of occurrence.

( cd /var/log/ ; zgrep "Oops" $( ls -1b syslog syslog.1 syslog.?.gz syslog.??.gz syslog.???.gz | tac ; ) ; )
syslog.91.gz:Feb 23 14:53:20 n55sf-l kernel: [35002.379231] Oops: 0002 [#1] SMP
syslog.85.gz:Mar 1 13:47:35 n55sf-l kernel: [68865.856716] Oops: 0002 [#1] SMP
syslog.68.gz:Mar 28 12:54:24 n55sf-l kernel: [23266.494142] Oops: 0002 [#1] SMP
syslog.67.gz:Mar 29 20:21:43 n55sf-l kernel: [11758.414392] Oops: 0002 [#1] SMP
syslog.59.gz:Apr 8 21:33:06 n55sf-l kernel: [ 5516.858854] Oops: 0002 [#1] SMP
syslog.58.gz:Apr 9 12:01:16 n55sf-l kernel: [14994.873453] Oops: 0002 [#1] SMP
syslog.58.gz:Apr 9 21:38:43 n55sf-l kernel: [34407.716338] Oops: 0002 [#1] SMP
syslog.57.gz:Apr 10 16:51:33 n55sf-l kernel: [30054.201816] Oops: 0002 [#1] SMP
syslog.31.gz:May 9 08:26:27 n55sf-l kernel: [184444.450319] Oops: 0002 [#1] SMP
syslog.26.gz:May 14 15:11:42 n55sf-l kernel: [268666.886048] Oops: 0002 [#1] SMP
syslog.18.gz:May 22 09:51:16 n55sf-l kernel: [328070.610715] Oops: 0002 [#1] SMP
syslog.18.gz:May 22 18:16:48 n55sf-l kernel: [14544.693352] Oops: 0002 [#1] SMP
syslog.10.gz:May 30 22:50:54 n55sf-l kernel: [68868.692654] Oops: 0002 [#1] SMP

May 30 22:50:54 n55sf-l kernel: [68868.692631] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
May 30 22:50:54 n55sf-l kernel: [68868.692642] IP: [<ffffffff814d86d3>] xhci_free_dev+0x63/0x160
May 30 22:50:54 n55sf-l kernel: [68868.692650] PGD 0
May 30 22:50:54 n55sf-l kernel: [68868.692654] Oops: 0002 [#1] SMP
May 30 22:50:54 n55sf-l kernel: [68868.692659] CPU 2
May 30 22:50:54 n55sf-l kernel: [68868.692661] Modules linked in: bbswitch(O) pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) rfcomm parport_pc ppdev bnep binfmt_misc nls_iso8859_1 dm_crypt arc4 ath9k mac80211 uvcvideo snd_hda_codec_realtek snd_hda_intel videobuf2_core ath9k_common videodev ath9k_hw snd_hda_codec videobuf2_vmalloc snd_hwdep snd_pcm videobuf2_memops ath ath3k snd_seq_midi joydev btusb snd_rawmidi bluetooth snd_seq_midi_event snd_seq kvm_intel snd_timer mei snd_seq_device kvm mac_hid asus_nb_wmi lpc_ich asus_wmi snd cfg80211 mxm_wmi soundcore sparse_keymap microcode psmouse wmi snd_page_alloc serio_raw coretemp lp parport usb_storage hid_generic usbhid hid ghash_clmulni_intel cryptd i915 drm_kms_helper ahci libahci atl1c drm i2c_algo_bit video
May 30 22:50:54 n55sf-l kernel: [68868.692733]
May 30 22:50:54 n55sf-l kernel: [68868.692735] Pid: 46, comm: khubd Tainted: G W O 3.5.0-31-generic #52-Ubuntu ASUSTeK Computer Inc. N55SF/N55SF

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Bugs still observed though not reported every time, including several times in July with kernel panic.

One thing noticed : the laptop has 4 USB ports.
On the left two USB3 ports (blue colored with USB SuperSpeed logo).
On the right two USB2 ports (black with regular USB2 logo).

The problem happens when plugging devices on one of the USB3 ports.
No problem happen when plugging the devices on one of the USB2 ports.

Currently running on
3.5.0-36-generic #57-Ubuntu SMP Wed Jun 19 15:10:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Changed in linux (Ubuntu):
status: Expired → Incomplete
summary: - Kernel crash and oops since 3.5.0-24-generic (3.5.0-23 ok).
+ USB3 ports causes kernel crash and oops since 3.5.0-24-generic (3.5.0-23
+ ok), USB2 OK.
summary: - USB3 ports causes kernel crash and oops since 3.5.0-24-generic (3.5.0-23
+ USB3 ports cause kernel crash and oops since 3.5.0-24-generic (3.5.0-23
ok), USB2 OK.
Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Also, I don't use or have any USB3 devices.
I do use USB2 devices (external hard drive enclosures, Samsung Galaxy Nexus) and USB 1.1 devices (mouse) -- checked with lsusb -v and bcdUSB field.

So to summarize :
* the tested crash scenario is "plugging a USB2 device in a USB*3* port"
* the tested OK scenario is "plugging a USB2 (or 1.1) device in a USB*2* port"

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Still happens on 3.5.0-41-generic.

I have recently observed a variant (and the symptoms are active now on the laptop I'm writing with).
When bug happens, plugging a device on one of the two USB3 ports powers the device but Linux does not detect anything (nothing in dmesg).
Plugging the same device on one of the two USB2 ports (which never suffered the problem) works normally as it always did.

Heres the output of lsusb when all 4 USB ports are free (nothing plugged) :

Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 044: ID 0cf3:3005 Atheros Communications, Inc. AR3011 Bluetooth
Bus 001 Device 004: ID 1bcf:2883 Sunplus Innovation Technology Inc.

Regards

Changed in linux (Ubuntu):
status: Expired → New
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.