Oops while dumping core, caused by crash handler

Bug #60183 reported by Colin Watson on 2006-09-13
28
Affects Status Importance Assigned to Milestone
linux-source-2.6.17 (Ubuntu)
Medium
Ben Collins

Bug Description

I've been doing a fair amount of work on usplash lately (on a fully up-to-date edgy powerpc system, so 2.6.17-7.20), and I keep running into kernel oopses triggered by usplash. Originally I got this while simply doing 'usplash -c -x 640 -y 480', but I suppose that will go away once my fix to usplash is uploaded that makes it refuse to start when there's no theme available that fits within the specified resolution. Now it seems to happen practically every time I run usplash with my extra debugging printfs in it, which obviously makes it quite difficult to get anything done.

Since the oops is apparently in mutex_unlock, I built a kernel with CONFIG_DEBUG_MUTEXES=y, but that didn't appear to help much.

Here's a trace from the stock kernel:

[ 4349.637093] Oops: Kernel access of bad area, sig: 11 [#1]
[ 4349.637101]
[ 4349.637105] Modules linked in: binfmt_misc rfcomm l2cap ppdev lp parport radeon drm cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative dm_mod md_mod ipv6 therm_adt746x sr_mod snd_powermac af_packet sbp2 scsi_mod apm_emu prism54 pcmcia tsdev bcm43xx ieee80211softmac ieee80211 ieee80211_crypt snd_aoa_i2sbus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc sungem sungem_phy hci_usb yenta_socket rsrc_nonstatic pcmcia_core pmac_zilog serial_core uninorth_agp agpgart snd soundcore snd_aoa_soundbus bluetooth evdev ext3 jbd ehci_hcd ohci1394 ieee1394 ohci_hcd usbcore ide_disk ide_cd cdrom capability commoncap
[ 4349.637203] NIP: C022B994 LR: C008E3F0 CTR: C0076040
[ 4349.637215] REGS: d0239c80 TRAP: 0300 Not tainted (2.6.17-7-powerpc)
[ 4349.637223] MSR: 0000B032 <EE,FP,ME,IR,DR> CR: 24000422 XER: 20000000
[ 4349.637241] DAR: 00000074, DSISR: 40000000
[ 4349.637249] TASK = d5d58600[6727] 'usplash' THREAD: d0238000
[ 4349.637257] GPR00: C008E3E0 D0239D30 D5D58600 00000074 C034C6E0 00000001 7FFFFFFF C0344048
[ 4349.637276] GPR08: D0238000 D0238028 0000B032 00000027 24000422 1001ACCC C0240000 D0239F50
[ 4349.637296] GPR16: 00000000 00000000 00000001 00000000 C188D7D0 C02AD2E8 0000000B 00000000
[ 4349.637313] GPR24: 00000001 C00B0AD0 C0260000 D02A1854 D0239D8C D02A1854 C21D08C0 00000000
[ 4349.637333] NIP [C022B994] mutex_unlock+0x0/0x1c
[ 4349.637361] LR [C008E3F0] vfs_unlink+0x100/0x14c
[ 4349.637380] Call Trace:
[ 4349.637387] [D0239D30] [C008E3E0] vfs_unlink+0xf0/0x14c (unreliable)
[ 4349.637403] [D0239D50] [C008A3A8] do_coredump+0x3a8/0x870
[ 4349.637418] [D0239E40] [C00382A0] get_signal_to_deliver+0x2b4/0x388
[ 4349.637445] [D0239E70] [C0006D3C] do_signal+0x4c/0x6d4
[ 4349.637464] [D0239F40] [C00122A8] do_user_signal+0x74/0xc4
[ 4349.637486] --- Exception: 300 at 0xff950c4
[ 4349.637503] LR = 0xff95068
[ 4349.637508] Instruction dump:
[ 4349.637514] 90010010 39291454 91610020 91210018 90410014 9161001c 4bfffd4d 80010044
[ 4349.637533] bb410028 38210040 7c0803a6 4e800020 <7d201828> 31290001 7d20192d 40a2fff4

And here's a trace from the mutex-debugging kernel:

[ 189.646108] Oops: Kernel access of bad area, sig: 11 [#1]
[ 189.646116]
[ 189.646120] Modules linked in: binfmt_misc rfcomm l2cap ppdev lp parport radeon drm cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative ipv6 dm_mod md_mod therm_adt746x sr_mod snd_powermac sbp2 scsi_mod apm_emu prism54 pcmcia af_packet tsdev snd_aoa_i2sbus snd_pcm_oss snd_mixer_oss yenta_socket rsrc_nonstatic pcmcia_core pmac_zilog serial_core snd_pcm snd_timer snd_page_alloc hci_usb bluetooth uninorth_agp agpgart evdev snd soundcore snd_aoa_soundbus bcm43xx ieee80211softmac ieee80211 ieee80211_crypt sungem sungem_phy ext3 jbd ehci_hcd ohci1394 ieee1394 ohci_hcd usbcore ide_disk ide_cd cdrom capability commoncap
[ 189.646217] NIP: C022CCF4 LR: C008F390 CTR: C0076FC0
[ 189.646229] REGS: d54d5c70 TRAP: 0300 Not tainted (2.6.17-7-powerpc)
[ 189.646236] MSR: 0000B032 <EE,FP,ME,IR,DR> CR: 22000482 XER: 20000000
[ 189.646254] DAR: 00000080, DSISR: 40000000
[ 189.646262] TASK = d6fe3420[4934] 'usplash' THREAD: d54d4000
[ 189.646270] GPR00: C008F390 D54D5D20 D6FE3420 00000074 C008F390 00000001 7FFFFFFF C0346048
[ 189.646289] GPR08: D54D4000 D54D4028 C0310000 D54D4000 22000482 1001ACCC C0240000 D54D5F50
[ 189.646308] GPR16: 00000000 00000000 00000001 00000000 DF66B530 C02AF300 0000000B 00000000
[ 189.646325] GPR24: 00000001 C00B1A8C C0260000 D12404B4 D54D5D8C D12404B4 00000074 00000000
[ 189.646345] NIP [C022CCF4] __mutex_unlock_slowpath+0x1c/0x154
[ 189.646376] LR [C008F390] vfs_unlink+0x100/0x14c
[ 189.646397] Call Trace:
[ 189.646404] [D54D5D20] [D12404B4] 0xd12404b4 (unreliable)
[ 189.646422] [D54D5D30] [C008F390] vfs_unlink+0x100/0x14c
[ 189.646435] [D54D5D50] [C008B344] do_coredump+0x3a8/0x870
[ 189.646449] [D54D5E40] [C0038290] get_signal_to_deliver+0x2b4/0x388
[ 189.646476] [D54D5E70] [C0006D3C] do_signal+0x4c/0x6d4
[ 189.646495] [D54D5F40] [C0012288] do_user_signal+0x74/0xc4
[ 189.646516] --- Exception: 300 at 0xff950c4
[ 189.646528] LR = 0xff95068
[ 189.646534] Instruction dump:
[ 189.646540] 7f64db78 bb61000c 38210020 7c0803a6 4bfffcb0 9421fff0 7c0802a6 3d40c031
[ 189.646559] bfc10008 542b0024 7c7e1b78 90010014 <8003000c> 7f805800 419e0010 800a3674
[ 189.646579] BUG: usplash/4934, lock held at task exit time!
[ 189.658792] [d5685210] {inode_init_once}
[ 189.658802] .. held by: usplash: 4934 [d6fe3420, 116]
[ 189.658813] ... acquired at: vfs_unlink+0x98/0x14c

Ben Collins (ben-collins) wrote :

I can tell you how to stop it:

echo "" > /proc/sys/kernel/crashdump-helper

This is a crash in my code, added for apport. Sucks that a crash helper is causing a crash. I'll look into it.

Matt Zimmerman (mdz) wrote :

Happened to me with gimp as well

Changed in linux-source-2.6.17:
importance: Untriaged → Medium
status: Unconfirmed → Confirmed
Matt Zimmerman (mdz) on 2006-09-18
Changed in linux-source-2.6.17:
assignee: nobody → ben-collins
Changed in linux-source-2.6.17:
status: Confirmed → Fix Committed
Matt Zimmerman (mdz) wrote :

I wasn't able to test the packages you provided yet, but is the fix in edgy now?

Martin Pitt (pitti) wrote :

FWIW, I cannot get any oops with either the -8 or the -10 kernel.

On Tue, 2006-09-26 at 17:59 +0000, Matt Zimmerman wrote:
> I wasn't able to test the packages you provided yet, but is the fix in
> edgy now?

Yes, and Martin says he can't cause it in latest kernel. Please confirm.

Matt Zimmerman (mdz) wrote :

On Thu, Sep 28, 2006 at 11:41:54AM -0000, Ben Collins wrote:
> On Tue, 2006-09-26 at 17:59 +0000, Matt Zimmerman wrote:
> > I wasn't able to test the packages you provided yet, but is the fix in
> > edgy now?
>
> Yes, and Martin says he can't cause it in latest kernel. Please confirm.

He wasn't able to cause it in the old one either. I've just reproduced it
on -8 in my environment and will now update to -10 and try again.

--
 - mdz

Matt Zimmerman (mdz) wrote :

I can't reproduce the problem using the same test case on 2.6.17-10.

--
 - mdz

Changed in linux-source-2.6.17:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers