rt2x00 oopses in 2.6.26-4, regression against 2.6.24-3

Bug #249242 reported by Christoph Orsinger on 2008-07-16
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Colin Ian King

Bug Description

I have an Edimax EW-7318UG wlan usb stick which works fine with linux-2.6.26-3 and older.
Under 2.6.26-4 the kernel oopses after plugging in the stick

uname -r: 2.6.26-4-generic

Oops log:
[ 1144.232094] usb 3-5: new high speed USB device using ehci_hcd and address 3
[ 1144.503721] usb 3-5: configuration #1 chosen from 1 choice
[ 1145.184628] phy0 -> rt2500usb_init_eeprom: Error - Invalid RT chipset detected.
[ 1145.184646] phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
[ 1145.184690] BUG: unable to handle kernel NULL pointer dereference at 00000010
[ 1145.184693] IP: [<c013cd9a>] flush_workqueue+0xa/0x50
[ 1145.184703] *pde = 00000000
[ 1145.184709] Oops: 0000 [#1] SMP
[ 1145.184713] Modules linked in: rt2500usb(+) rt2x00usb rt2x00lib rfkill led_class input_polldev mac80211 cfg80211 ipv6 binfmt_misc rfcomm l2cap bluetooth ppdev cpufreq_conservative cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_userspace sbs container sbshc video output battery af_packet iptable_filter ip_tables x_tables ac parport_pc lp parport nvidia(P) snd_via82xx gameport snd_ac97_codec ac97_bus snd_mpu401_uart snd_seq_dummy snd_pcsp snd_pcm_oss snd_mixer_oss snd_seq_oss snd_pcm snd_page_alloc snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device via_ircc button snd irda crc_ccitt soundcore i2c_viapro i2c_core shpchp pci_hotplug via_agp agpgart evdev ext3 jbd mbcache usb_storage usbhid hid sg libusual sr_mod sd_mod cdrom pata_acpi ata_generic pata_via uhci_hcd libata scsi_mod dock ehci_hcd ohci_hcd tulip usbcore thermal processor fan fbcon tileblit font bitblit softcursor uvesafb cn fuse
[ 1145.184771]
[ 1145.184775] Pid: 5582, comm: modprobe Tainted: P (2.6.26-4-generic #1)
[ 1145.184779] EIP: 0060:[<c013cd9a>] EFLAGS: 00010246 CPU: 0
[ 1145.184783] EIP is at flush_workqueue+0xa/0x50
[ 1145.184786] EAX: 00000000 EBX: d5ebd0a0 ECX: 00000096 EDX: 00000000
[ 1145.184789] ESI: c047b4f8 EDI: 00000000 EBP: de6fca00 ESP: dec21e44
[ 1145.184791] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 1145.184795] Process modprobe (pid: 5582, ti=dec20000 task=df883400 task.ti=dec20000)
[ 1145.184798] Stack: d5ebd0a0 d5ebd0a0 ffffffed e0cbaf48 00000000 e0cbb056 e0cbd7d0 d5ebc0f0
[ 1145.184804] e0cbd620 e0cbd744 00000000 d5ebd0a0 d5ebc1a0 e0c91307 dee34360 e0cc96ec
[ 1145.184810] c0361a38 d2a01c00 d2a01c00 00000000 de6fca00 e0cc96ec e0cc94a0 e08a6151
[ 1145.184816] Call Trace:
[ 1145.184821] [<e0cbaf48>] rt2x00lib_remove_dev+0x38/0x60 [rt2x00lib]
[ 1145.184838] [<e0cbb056>] rt2x00lib_probe_dev+0xe6/0x1b0 [rt2x00lib]
[ 1145.184850] [<e0c91307>] rt2x00usb_probe+0xe7/0x170 [rt2x00usb]
[ 1145.184860] [<c0361a38>] mutex_lock+0x8/0x20
[ 1145.184871] [<e08a6151>] usb_probe_interface+0xa1/0x110 [usbcore]
[ 1145.184917] [<c029ac60>] really_probe+0x60/0x180
[ 1145.184926] [<e08a5441>] usb_match_id+0x41/0x60 [usbcore]
[ 1145.184943] [<e08a5680>] usb_device_match+0x40/0x80 [usbcore]
[ 1145.184961] [<c029ae51>] __driver_attach+0x71/0x80
[ 1145.184967] [<c029a5c4>] bus_for_each_dev+0x44/0x70
[ 1145.184977] [<c029ab16>] driver_attach+0x16/0x20
[ 1145.184981] [<c029ade0>] __driver_attach+0x0/0x80
[ 1145.184985] [<c0299f57>] bus_add_driver+0x1a7/0x220
[ 1145.184996] [<c029afec>] driver_register+0x5c/0x130
[ 1145.185007] [<e08a63f1>] usb_register_driver+0x81/0x100 [usbcore]
[ 1145.185027] [<c0152ab8>] sys_init_module+0x88/0x1b0
[ 1145.185037] [<c0103f73>] sysenter_past_esp+0x78/0xb1
[ 1145.185055] =======================
[ 1145.185057] Code: 90 8d 50 10 e9 78 fe ff ff 90 8d b4 26 00 00 00 00 31 d2 e9 69 fe ff ff 89 f6 8d bc 27 00 00 00 00 57 89 c7 56 be f8 b4 47 c0 53 <8b> 58 10 b8 f0 b4 47 c0 85 db 0f 45 f0 e8 54 46 22 00 89 f0 e8
[ 1145.185083] EIP: [<c013cd9a>] flush_workqueue+0xa/0x50 SS:ESP 0068:dec21e44
[ 1145.185090] ---[ end trace 591fddf59e09f337 ]---

with linux-2.6.26-3-generic dmesg shows this:
[ 114.997119] usb 3-5: new high speed USB device using ehci_hcd and address 2
[ 115.268033] usb 3-5: configuration #1 chosen from 1 choice
[ 115.617564] phy0 -> rt2500usb_init_eeprom: Error - Invalid RT chipset detected.
[ 115.617579] phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
[ 115.617661] usbcore: registered new interface driver rt2500usb
[ 115.922985] phy1: Selected rate control algorithm 'pid'
[ 116.030008] Registered led device: rt73usb-phy1:radio
[ 116.030046] Registered led device: rt73usb-phy1:assoc
[ 116.030069] Registered led device: rt73usb-phy1:quality
[ 116.030975] usbcore: registered new interface driver rt73usb
[ 116.084534] firmware: requesting rt73.bin
[ 116.256835] ADDRCONF(NETDEV_UP): wlan0: link is not ready

lsusb:
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 002: ID 046d:c012 Logitech, Inc. Mouseman Dual Optical
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 002: ID 148f:2573 Ralink Technology, Corp. RT2501USB Wireless Adapter
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Any additional informations needed?

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: New → Triaged
Changed in linux:
assignee: ubuntu-kernel-team → colin-king
status: Triaged → In Progress
Colin Ian King (colin-king) wrote :

Hi,

I've examined the OOPS and it occurs when the driver attempts to create a workqueue and this fails - causing the code to free the workqueue, which causes the OOPS.

I was wondering if this problem occurs every time, or was just a transient problem caused by a lack of resources for just the one instance you tried the Edimax EW-7318UG wlan usb stick. So, can you repeat the insertion of usb stick and let me know if the same OOPS occurs and is always repeatable.

Thanks. Colin.

description: updated
Christoph Orsinger (c-orsinger) wrote :

Hi Colin,

Yes, the bug is reproducible. (That's why i wrote this bug report in the first place :) )
The new 2.6.26-5-generic kernel oopses as well. The new oops message doesn't really differ from the original except for some different memory addresses. (Well, at least i think those are memory address.)

I managed to work around the error problem by blacklisting the rt2500usb driver.
But why is this driver loaded in the fist place? I'm a little confused here. I rmmoded rt2500usb experimentally under 2.6.26-3 and didn't notice any difference. It seems this driver isn't used at all. (Is there any method to verify this?)

Short:
2.6.26-3-generic rt2500usb and rt73usb are loaded --> works
2.6.26-4-generic and 2.6.26-5-generic rt2500usb is loaded --> oopses
2.6.26-4-generic and 2.6.26-5-generic with blacklisted rt2500usb --> works

Regards,
Christoph

Christoph Orsinger (c-orsinger) wrote :
Christoph Orsinger (c-orsinger) wrote :

Update:

It seems there is no bug at all.

I just downloaded the latest daily live CD (Kernel 2.6.26-5) and retested my WLAN adapter on the PC of my house-mate. Well, i couldn't reproduce the error on his PC.
Out of curiosity, i booted my pc with the live CD. Again, no OOPS occurred.

So my hard disk installation must have gotten corrupted somehow.
It didn't took me long to find a possible cause.
free:
             total
Mem: 514544 this is an odd number (502M), should be 512M

some traces in dmseg:
[ 0.615711] system 00:00: iomem range 0xd0000-0xd3fff has been reserved
[ 0.615716] system 00:00: iomem range 0xf0000-0xf7fff could not be reserved
[ 0.615719] system 00:00: iomem range 0xf8000-0xfbfff could not be reserved
[ 0.615722] system 00:00: iomem range 0xfc000-0xfffff could not be reserved
[ 0.615726] system 00:00: iomem range 0x1fff0000-0x1fffffff could not be reserved
[ 0.615729] system 00:00: iomem range 0xfec00000-0xfec00fff has been reserved
[ 0.615733] system 00:00: iomem range 0xffee0000-0xffef2fff has been reserved
[ 0.615736] system 00:00: iomem range 0xffef4000-0xffef8fff has been reserved
[ 0.615740] system 00:00: iomem range 0xffefa000-0xffefafff has been reserved
[ 0.615743] system 00:00: iomem range 0xffefc000-0xffefffff has been reserved
[ 0.615746] system 00:00: iomem range 0xffff0000-0xffffffff could not be reserved
[ 0.615750] system 00:00: iomem range 0x0-0x9ffff could not be reserved
[ 0.615753] system 00:00: iomem range 0x100000-0x1ffeffff could not be reserved
[ 0.615756] system 00:00: iomem range 0xfee00000-0xfee00fff has been reserved
[ 0.615760] system 00:00: iomem range 0xfff80000-0xfffeffff has been reserved

I ran memtest afterwards. (From the live CD, to be sure)
It crashes repeatedly after ~58000 errors in test #2 "Moving Inversions, ones & zeros" regardless which of my 2 memory modules I've inserted.

I suspect a hardware error in the memory controller.

        Christoph

Colin Ian King (colin-king) wrote :

The bug occurs when the driver tries to allocated some memory and fails to do so, causing the oops. Maybe your system hit this error, which generally 99.999% of the time does not happen.

I will put a fix in to catch this corner case anyway to make the driver more robust.

Colin

Changed in linux:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.26-5.14

---------------
linux (2.6.26-5.14) intrepid; urgency=low

  [ Ben Collins ]

  * SAUCE: applesmc: Add MacBookAir
  * build: Do not build ddeb unless we are on the buildd
  * build: control: Consistency in arch fields.
  * SAUCE: Update toshiba_acpi.c to version 0.19a
    - LP: #77026
  * build: Added perm blacklist support and per-module support to abi-check
    - Blacklist p80211 module from abi checks
  * ubuntu/lirc: Get rid of drivers symlink and use real include stuff

  [ Colin Ian King ]

  * SAUCE: acerhk module - add support for Amilo A1650g keyboard
    - LP: #84159
  * SAUCE: rt2x00: Fix OOPS on failed creation of rt2x00lib workqueue
    - LP: #249242

  [ Mario Limonciello ]

  * Add LIRC back in

  [ Tim Gardner ]

  * Makefile race condition can lead to ndiswrapper build failure
    - LP: #241547
  * update linux-wlan-ng (prism2_usb) to upstream version 1861
    - LP: #245026

  [ Upstream Kernel Changes ]

  * Fix typos from signal_32/64.h merge

 -- Ben Collins <email address hidden> Fri, 01 Aug 2008 00:05:01 -0400

Changed in linux:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers