radio_maestro module generates kernel oops and crash

Bug #357724 reported by Ian McMichael
90
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Jaunty
Fix Released
High
Colin Ian King

Bug Description

I'm currently testing the Jaunty beta on a Compaq Armada M300 laptop. As it does not have a floppy or optical device I'm booting over PXE and using the netboot installer. Due to its lack of horsepower I have only verified this bug on the Xubuntu and Ubuntu Netbook Remix flavours of 9.04 so far.

Unless I add "blacklist radio_maestro" to /etc/modprobe.d/blacklist.conf at the end of the installation procedure the system will hang during start-up after a reboot. It is very difficult to collect information about this crash. Occasionally I can get to a root prompt via recovery mode and a few CTRL-C and CTRL-ALT-DEL at strategic points during start-up.

What I have seen in dmesg is that after the radio_maestro module is loaded the kernel has around 15-20 oops messages related to SMP. Odd as this machine only has a single (very slow) processor core. The filesystem mounts read-only with the module loaded, which means I cannot save anything anywhere. Eventually after a while I get lots of segmentation faults and the system hangs. I can reboot it with the magic SysRq keystrokes (raising elephants, etc...) but then the information about the error is lost.

With my blacklist workaround in place the system functions perfectly including the ESS Technology ES1978 Maestro 2E built-in audio. I was previously running Xubuntu 8.10 on this system without similar issues.

Please let me know if I can collect any other information to assist in fixing this issue?

Ian.

ProblemType: Bug
Architecture: i386
CurrentDmesg:
 [ 40.384314] ADDRCONF(NETDEV_UP): eth0: link is not ready
 [ 40.388302] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
 [ 40.397082] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [ 50.852045] eth0: no IPv6 routers present
 [ 99.301569] ondemand governor failed, too long transition latency of HW, fallback to performance governor
DistroRelease: Ubuntu 9.04
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Compaq Armada M300
Package: linux-image-2.6.28-11-generic 2.6.28-11.41
ProcCmdLine: root=UUID=4af0312c-560c-47fe-9d75-9930d3cd54c0 ro quiet splash quiet
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.28-11.41-generic
SourcePackage: linux

Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Ian,

I don't suppose you'd be able to take a digital photo of the very first oops that occurs (and particularly be able to capture the beginning of the oops)? Could you attach that photo to this bug report? You also mentioned you are able to blacklist the module which allows the boot to succeed. Just curious what happens if you then load the module afterwards? Do you get an oops as well? Thanks.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

Why didn't I think of anything as obvious as a photograph? I was looking for a high-tech way of collecting the information! I've attached one as requested.

Loading the radio_maestro module after booting with it blacklisted appears to work fine. I have no idea what it does for me but the system is stable and operates normally after loading it. Presumably it is an order thing during startup? Maybe it just needs to be loaded after snd_es1968?

Here's the only output I get when loading the module:

root@m300:~# modprobe radio_maestro
WARNING: All config files need .conf: /etc/modprobe.d/oss-compat, it will be ignored in a future release.
root@m300:~# lsmod | grep maestro
radio_maestro 14724 0
videodev 41600 1 radio_maestro

Let me know if there is any other information you need or tests you'd like me to run?

Ian.

Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
tags: added: kernel-oops
Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

I have confirmed that it is the order that the modules get loaded in that causes the kernel to crash. I've blacklisted both snd_es1968 and radio_maestro to bring the system up with no audio support. If I then modprobe radio_maestro first, the system kernel immediately hangs and locks the system solid. When I load snd_es1968 first I can then load the radio_maestro module and the system is stable with working audio.

Revision history for this message
Simon Schneider (schneida-simon) wrote :

I can confirm this bug on a Armada E500. After blacklisting radio_maestro everything booting via pxe works fine for me. I didn't have any troubles while using ubuntu 8.04.

Architecture: i386
DistroRelease: Ubuntu 9.04
MachineType: Compaq Armada E500
Kernel: 2.6.28-11-generic

Revision history for this message
kaliffo (7repip) wrote :

The same happen to me on Compaq Armada E500 with kernel 2.6.28-11-generic: kernel panic during boot process.
After blacklisting radio_maestro I get normal boot.
I didn't have any troubles while using previous version of kernel (2.6.27.... ubuntu 8.10).

Revision history for this message
eboy (eboy) wrote :

I can also confirm this on a Compaq Armada M700 PIII-1000 after upgrading from XUbuntu 8.10 to XUbuntu 9.04: during boot a number of oops messages are displayed; later on segmentation faults occur, after which the system hangs. Following the solution of this bug report, I blacklisted radio_maestro, which resolved all of these issues. Also modprobing radio_maestro later on (after snd_es1968 was loaded) worked.

Another solution that worked for me was to use the 2.6.27-11-generic kernel image that was still present on my system after the upgrade from 8.10 to 9.04. However, I much more prefer the blacklist solution as temporary workaround.

Architecture: i386
DistroRelease: XUbuntu 9.04
MachineType: Compaq Armada M700 PII-1000
Kernel: 2.6.28-11-generic

Revision history for this message
Paolo (paolo-notari) wrote :

Hello,

I'm experiencing the same problem on Armada M700 PIII 700; ESS Maestro 2E (ev 10)
I thank you very much for indications in order to fix;

live cd doesn't boot 97%;
alternate install worked 100% but booting from pc didn't work untill I blacklisted radio_maestro;

after boot, i can modprobe radio_maestro withouth problem (snd_1968 has already been charged automatically);

After all that, I can't hear any sound - sound worked perfectly in any previous release from 5.10 to now.

I also tried to boot forcing acpi, but no difference;

Paolo

Revision history for this message
DoomWarrior (doomwarriorx) wrote :

I also had this described problem on a Toshiba Satellite 4100XCDT.

After blacklisting the radio_maestro system is again bootable (like on 8.10)

Revision history for this message
Bitrot (yoyo42) wrote :

I have a Compaq Armada M700 PIII-800 which booted fine with 9.04 using kernel 2.6.27-11, failed to boot with 2.6.28-11 and now it boots successfully again with 2.6.30-rc4.

I do have the radio_maestro module enabled and loaded, NOT blacklisted. This laptop doesn't have built in wireless, I don't know if it was an optional extra when new, or if the radio is just an optional bit on the ESS sound card which Compaq never used and the driver doesn't expect it to be missing.

Either way, I seem to be booting and running fine for a day or so now with kernel 2.6.30-rc4.

If you want dmesg or logs let me know.

Revision history for this message
Golden Tiger (goldentiger24) wrote :

Compaq Armada M700, blacklisting radio_maestro works. I still have full sound functionality. My laptop does not have built-in wireless. 700 Mhz model

Revision history for this message
Paolo (paolo-notari) wrote :

I can get sound working blacklisting radio_maestro but no way to

1) get microphone working
2) get aumix working without taking all cpu
3) have volume settings stabilized at each reboot;

I got all this working properly in alsa, eliminating pulseaudio

http://idyllictux.wordpress.com/2009/04/21/ubuntu-904-jaunty-keeping-the-beast-pulseaudio-at-bay/

Revision history for this message
Colin Ian King (colin-king) wrote :

@Ian McMichael

I'd like you to reproduce the kernel oops and produce more of the stack dump for me so that I can debug this a little further.

1) Black list both snd_es1968 and radio_maestro
2) Reboot
3) Change to console #1 (control-alt-f1) and login
4) Set to a smaller font (so that we can capture more of the Oops message):
    setfont /usr/share/consolefonts/Uni1-VGA8.psf.gz
    clear
5) sudo modprobe radio_maestro
6) Take a photo and attach the jpeg.

Thanks!

Changed in linux (Ubuntu):
assignee: nobody → Colin King (colin-king)
status: Triaged → Incomplete
Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

@Colin King

I've re-created the panic as requested and it looks like the whole stack dump fits on the screen this way. The JPEG is attached. I'm going to be away for a few days but feel free to ask for anything else I can be of assistance with and I'll get it to you when I return.

Changed in linux (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

@Ian,

I've put some debug kernels at: http://people.ubuntu.com/~cking/jaunty-357724/

Can you install the appropriate one and then boot the machine and re-test as per the instructions https://bugs.launchpad.net/ubuntu/+source/linux/+bug/357724/comments/13

Thanks

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Aaronious (regisupp) wrote :

I am very new to linux at least toying around with it I have an Armada m700 and I am having the same boot problems. How do I blacklist?

Revision history for this message
Erik (lnchpd-elaan) wrote :

@ Aaronious,

cd /etc/modprobe.d
sudo cp blacklist.conf blacklist.conf.backup
sudoedit blacklist.conf
- add a line "blacklist radio_maestro"
- add a line "blacklist snd_es1968"
save and exit.

Revision history for this message
Erik (lnchpd-elaan) wrote :

Colin,

I have a Armada e500 and had the same problem. I have just booted the generic debug-kernel. After the modprobe command I get and oops, but the prompt is returned, so I can sent you the output of dmesg. I you need more please tell me what. I have to reboot to the 2.6.28-11-generic kernel to get networking because I have not yet recompiled the driver for my WiFI USB stick for this debug kernel. I you need more info, do you also need a new dmesg because they need to be from the same boot?

TIA, Erik.

Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

@Colin King

Sorry for the delay in getting this to you. As requested, I booted from the kernel in linux-image-2.6.28-13-generic_2.6.28-13.44_i386.deb have attached a new stack dump. Again I'm going to be away for a few days but after that you'll have my solid attention for the majority of June...

Revision history for this message
linhurst (linhurst) wrote :

I have an old Compaq Armada E700. Jaunty (Ubuntu & Xubuntu) & Linux Mint Gloria live CDs will not boot on the E700. I have used the same live CDs to install elsewhere so I know they are all good.

E700 worked well with Intrepid (Ubuntu, Xubuntu), Hardy (Ubuntu) and Feisty (Ubuntu). I (rather recklessly) used used the Update feature in Intrepid Ubuntu to upgrade to Jaunty and ended up with a system that did not boot and well out of my depth. Fortunately I had lost no data so took the opportunity to try out Intrepid Xubuntu which I will stick with for now as it works OK for me.

My Compaq Armada E700 is beyond all but basic use so would be happy to experiment. I would be happy to spend some time following instructions to explore this further if this would be helpful.

Revision history for this message
GertjanVD (gertjanvd) wrote :

I also have an Armada E500 and had the same kernel oops in jaunty and did not have that problem with hardy.

I think I've found a possible problem with the radio-maestro module. With this patch I did not have a kernel oops and could boot without adding any modules to /etc/modprobe.d/blacklist.conf.

The problem seems to be that video_device_release is called twice (once in video_unregister_device, and once at the errfr1 label). One of the last commits in 2.6.28 for this module was "add all missing video_device_release callbacks", in case "radio_power_on(radio_unit)" fails, this function is called twice. In kernel version 2.6.30 this module is converted to a v4l2_device and the problem is probably fixed there.

However I didn't compile a debug version, I only used info that was available here, so I am not entirely sure this solves the problem. Can someone confirm this is correct and really fixes the problem?

Revision history for this message
Colin Ian King (colin-king) wrote :

@GertjanVD,

Thanks for the patch and this observation. Are you 100% sure that video_device_release() is being called twice? I'm unclear that it's being called in the video_unregister_device() call. I will look into this, but if you can clarify this, it would help. Thanks! Colin

Revision history for this message
GertjanVD (gertjanvd) wrote :

I am not 100% sure of this, but here is what I found.

I checked the video_unregister_device which calls device_unregister.

In drivers/base/core.c:
 * device_unregister - unregister device from system.
 * We do this in two parts, like we do device_register(). First,
 * we remove it from all the subsystems with device_del(), then
 * we decrement the reference count via put_device(). If that
 * is the final reference count, the device will be cleaned up
 * via device_release() above. Otherwise, the structure will
 * stick around until the final reference to the device is dropped.

and

 * device_release - free device structure.
 * This is called once the reference count for the object
 * reaches 0. We forward the call to the device's release
 * method, which should handle actually freeing the structure.

In device_release:
        if (dev->release)
                dev->release(dev);

I think this causes the release to be called twice if the device fails to register. Is this correct?

Revision history for this message
Colin Ian King (colin-king) wrote :

@GertjanVD:

I concur with your findings. I've reworked your patch a little and uploaded some test kernels with a fix in at:

http://people.canonical.com/~cking/sru-357724

Can you install the appropriate kernel and give this a test. If this fixes the problem I will try and get the patch incorporated as a SRU fix.

Thanks, Colin.

Revision history for this message
Colin Ian King (colin-king) wrote :

Post script: Please let me know if these kernels fix the problem. Thanks!

Revision history for this message
GertjanVD (gertjanvd) wrote :

I installed linux-image-2.6.28-14-generic_2.6.28-14.47_i386.deb, no kernel oops. So the reworked patch seems to work fine.

Revision history for this message
Colin Ian King (colin-king) wrote :

SRU Justification:

Impact: A failed radio-maestro probe always generates a kernel oops.

Probe failures in maestro_probe cause a double call to
video_device_release():

 a) indirectly when calling video_unregister_device()
via dev->release(dev) in device_release()
 b) explicity in the error handling at label errfr1

The second call causes the oops. This bug was introduced
with commit aa5e90af7d78d1711f8f4275ce3638817c0023dc when the
release method was added to struct video_device maestro_radio
but the error handling was not updated to take this into
consideration.

The fix: Correct the error handling path to avoid the
double release.

Test case: Without this fix, the kernel generates an oops when a probe
fails. With the fix, there is no oops. Tested by user GertjanVD.

Revision history for this message
Colin Ian King (colin-king) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Fix Committed
Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

@Colin

Sorry for the delay in testing. I've now tried your linux-image-2.6.28-14-generic_2.6.28-14.47_i386 on the original M300 I reported the error from. Without blacklisting any modules and with the system fully updated (apart from the kernel package, clearly!) the sound now works fine and there's no sign of a panic.

Thanks for the update. How does this filter through to the official repositories from here?

Revision history for this message
Colin Ian King (colin-king) wrote :

@Ian. This will work it's way through the SRU process and will be released in a Jaunty kernel update sometime in the near future.

Revision history for this message
Erik (lnchpd-elaan) wrote :

Sorry for chiming in late. just tested on my Armada E500 with the test kernel (-generic) from the link in post #25 and had no crashes with the modules deleted from /etc/modprobe.d/blacklist.conf. Sound works, so the fix also works on my E500. I also tested the 2.6.28-14.47 kernel that has recently been released for USN-807-1 but that does not yet contain this SRU. I thought I would just throw that out here for anybody that maybe confused by both the kernels from USN-807-1 and Colin's test kernels being 2.6.28-14.47. Thanks to everybody who made this fix. Hope to see it in the official kernel soon.

Revision history for this message
Stefan Bader (smb) wrote :

There was no problem with 2.6.30, setting to released for Karmic.

Changed in linux (Ubuntu Jaunty):
assignee: nobody → Colin King (colin-king)
importance: Undecided → High
status: New → Fix Committed
Changed in linux (Ubuntu):
assignee: Colin King (colin-king) → nobody
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Ian McMichael (ian-sigma-uk) wrote :

I've installed linux-image-2.6.28-15-generic (v2.6.28-15.51) on my M300 from the jauty-proposed repository. The system boots without a panic and sound operates correctly. Thanks for fixing this bug.

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Erik (lnchpd-elaan) wrote :

Hi, I just verified the -proposed kernel version 2.6.28-15.51 solves the problem on my Armada E500 too. Thanks!

Revision history for this message
GertjanVD (gertjanvd) wrote :

I've also verified the new -proposed kernel on jaunty on an Armada E500. The kernel oops is fixed.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.28-15.52

---------------
linux (2.6.28-15.52) jaunty-proposed; urgency=low

  [ Stefan Bader ]

  * Revert "SAUCE: ACPI: Populate DIDL before registering ACPI video device
    on Intel"
    - LP: #423296
  * SAUCE: Allow less restrictive acpi video detection
    - LP: #333386

  [ Upstream Kernel Changes ]

  * include drivers/pci/hotplug/* in -virtual package
    - LP: #364916
  * ext4: don't call jbd2_journal_force_commit_nested without journal
    - LP: #418197
  * ext4: fix ext4_free_inode() vs. ext4_claim_inode() race
    - LP: #418197
  * ext4: fix bogus BUG_ONs in in mballoc code
    - LP: #418197
  * ext4: fix typo which causes a memory leak on error path
    - LP: #418197
  * ext4: Fix softlockup caused by illegal i_file_acl value in on-disk
    inode
    - LP: #418197
  * ext4: Fix sub-block zeroing for writes into preallocated extents
    - LP: #418197
  * jbd2: Call journal commit callback without holding j_list_lock
    - LP: #418197
  * ext4: Print the find_group_flex() warning only once
    - LP: #367065
  * ext4: really print the find_group_flex fallback warning only once
    - LP: #367065

linux (2.6.28-15.51) jaunty-proposed; urgency=low

  [ Colin Ian King ]

  * SAUCE: wireless: hostap, fix oops due to early probing interrupt
    - LP: #254837

  [ Leann Ogasawara ]

  * Add the atl1c driver to support Atheros AR8132
    - LP: #415358
  * Updating configs to enable the atl1c driver
    - LP: #415358

  [ Stefan Bader ]

  * Revert "SAUCE: input: Blacklist digitizers from joydev.c"
    - LP: #300143
  * SAUCE: Fix the exported name for e1000e-next
    - LP: #402890
  * SAUCE: Fix incorrect stable backport to bas_gigaset
    - LP: #417732
  * SAUCE: Remove the atl2 driver from the ubuntu subdirectory
    - LP: #419438

linux (2.6.28-15.50) jaunty-proposed; urgency=low

  [ Colin Ian King ]

  * SAUCE: radio-maestro: fix panics on probe failure
    - LP: #357724
  * SAUCE: HDA Intel, sigmatel: Enable speakers on HP Mini 1000
    - LP: #318942

  [ Jerone Young ]

  * SAUCE: Fix Soltech TA12 volume hotkeys not sending key release in
    Jaunty
    - LP: #397499

  [ John Johansen ]

  * SAUCE: remove AppArmor debug check for calls from interrupt context
    - LP: #350789

  [ Manoj Iyer ]

  * SAUCE: Fix kernel panic when SELinux is enabled.
    - LP: #395219

  [ Matthew Garrett ]

  * SAUCE: ACPI: Populate DIDL before registering ACPI video device on
    Intel

  [ Michael Frey (Senior Manager, MID ]

  * SAUCE: Fix for internal microphone for Dell Mini10V
    - LP: #394793

  [ Tim Gardner ]

  * SAUCE: Added e1000e from sourceforge.
    - LP: #402890

  [ Upstream Kernel Changes ]

  * Input: synaptics - report multi-taps only if supported by the device
    - LP: #399787
  * ftdi_sio: fix kref leak
    - LP: #396930, #376128
  * IPv6: add "disable" module parameter support to ipv6.ko
    - LP: #351656

 -- Stefan Bader <email address hidden> Thu, 27 Aug 2009 15:09:06 +0200

Changed in linux (Ubuntu Jaunty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.