Another kernel bug at mm/rmap.c, process wedged

Bug #73982 reported by Christian Hudon
4
Affects Status Importance Assigned to Milestone
linux-source-2.6.15 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

kernel BUG at mm/mmap.c:439!
invalid operand: 0000 [#1]
SMP
Modules linked in: nfs nfsd lockd sunrpc esp4 ppdev lp autofs4 ip6table_mangle ip6table_filter ip6_tables ipt_LOG ipt_state ipt_MARK iptable_mangle iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables pppoe pppox ipv6 af_packet ppp_generic slhc deflate zlib_deflate twofish serpent aes blowfish des sha256 sha1 crypto_null af_key dm_mod 8139too mii sk98lin snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq tsdev snd_via82xx gameport psmouse snd_ac97_codec snd_ac97_bus serio_raw snd_pcm_oss snd_mixer_oss parport_pc i2c_viapro parport pcspkr floppy i2c_core snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device via_agp agpgart snd soundcore tulip skge shpchp pci_hotplug evdev usbhid xfs exportfs raid1 md_mod ide_generic ehci_hcd uhci_hcd usbcore ide_cd cdrom ide_disk via82cxxx generic sata_via sata_promise libata scsi_mod thermal processor fan capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcurso

CPU: 0
EIP: 0060:[__insert_vm_struct+66/128] Not tainted VLI
EFLAGS: 00010287 (2.6.15-27-server)
EIP is at __insert_vm_struct+0x42/0x80
eax: b7dae000 ebx: f32dc8e0 ecx: d4ad28d0 edx: f4f87120
esi: ea7cc3a4 edi: 00000000 ebp: f53ce120 esp: c3979e44
ds: 007b es: 007b ss: 0068
Process mysqlhotcopy (pid: 2177, threadinfo=c3978000 task=cee25ab0)
Stack: f32dc8e0 b7dad000 c3979e60 c3979e5c c3979e58 c015d449 f3e3af0c f53ce14c
       f4f87b30 ea7cc3a4 c01639ba f32dc8e0 ea7cc3a4 ffffffea f32dc8e0 f3e3aef4
       f3e3af0c d82badc0 00000000 00000000 00000000 f53ce120 b7dad000 00000000
Call Trace:
 [vma_prio_tree_insert+41/96] vma_prio_tree_insert+0x29/0x60
 [vma_adjust+426/1024] vma_adjust+0x1aa/0x400
 [split_vma+259/272] split_vma+0x103/0x110
 [do_munmap+177/336] do_munmap+0xb1/0x150
 [do_mmap_pgoff+867/2096] do_mmap_pgoff+0x363/0x830
 [old_mmap+224/304] old_mmap+0xe0/0x130
 [syscall_call+7/11] syscall_call+0x7/0xb
Code: 24 2c 89 44 24 0c 8d 44 24 1c 89 44 24 08 8b 46 04 89 1c 24 89 44 24 04 e8 8c fd ff ff 85 c0 89 c2 74 10 8b 46 08 39 42 04 73 08 <0f> 0b b7 01 aa df 31 c0 8b 44 24 14 89 74 24 04 89 1c 24 89 44

Revision history for this message
Andrew Ash (ash211) wrote :

Sorry for the huge delay, Christian. Is this still an issue for you?

Changed in linux-source-2.6.15:
status: New → Incomplete
Revision history for this message
Christian Hudon (chrish) wrote :

Well, it was for a long while. The dapper server kernel oopsed or crashed a couple of times a month on that machine... which just wasn't acceptable, so I moved to a hand-compiled 2.6.16.x kernel, and that machine has been much more stable since then. And with the new LTS release coming out soon, we'll be moving to that in a couple of months. So no, this isn't really an issue for us anymore, although not quite for the rights reasons, IMHO.

I don't mean to complain (especially given that we're not paying anything for what is a very nice OS), but given that the bug report was a detailed oops trace instead something unspecific like "the machine crashes from time to time under load", I would have expected someone who can decode and understand kernel oopses (Canonical does have at least one of those on staff, don't you?) to have a look at the oops, at least to see if it pointed directly to an easy kernel bug. Or in the trickier case of "data structure had been corrupted by something else previously", at least to know what got corrupted.

Did this one just fall through the cracks, or are kernel oops reports just not a priority? It there something I should have done to raise the importance of the oopses I reported? I must admit this makes me a bit nervous about moving to the new LTS release. What if I hit another bug like this one? (We can move this discussion out of the bug report if you want.) Thanks.

Revision history for this message
Andrew Ash (ash211) wrote :

I can't speak directly for Canonical, since I'm just a volunteer with the Ubuntu project, but I'm sure they do have one. With the huge popularity of Ubuntu, there are now way more users and bug reports than there are developers to adequately address each one, so some will inevitably fall through the cracks, like this one did. One of the things that volunteers have done is create a team called BugSquad ( https://wiki.ubuntu.com/BugSquad ) to try to address issues before they sit around for two years.

Another thing to consider is that most of the work done by Ubuntu developers is getting all the pieces of the GNU/Linux stack to work together nicely, from the kernel and drivers to X and Gnome/KDE to the applications on top of those, everything is pieced together by the Ubuntu team. Since this is such a monumental effort in itself, improving these components is more something that the upstream teams, the people who create these products that are packaged by Ubuntu, are more in charge of. So if you have an issue, report it here to Launchpad first. If no one does anything, you can try to bring attention to it on IRC channels. I think #ubuntu-bugs would be the best place to start (that's where the bug squad hangs out). Sometimes though, it just comes down to developers have too many things on their minds to follow up on every single bug, which is unfortunate, but sadly the case. Would I would do then is open a bug in the upstream component. So at the kernel bugtracker, and then post a link to the Launchpad report. The upstream bugtracker is where the specialists are who are more likely to be able to actually help you. Feel free to contact me personally via my Launchpad profile ( https://bugs.launchpad.net/~ash211 ) if you're having trouble with something.

So the best thing to do to make sure that the latest version will support your hardware is to run the development version every now and then (if possible) and post any bugs you find. If you don't start checking until the new version is already released, then obviously it's too late to get it fixed in that one!

I'll close this bug then, since apparently it got fixed by kernel devs somewhere between 2.6.15 and 2.6.16 Thanks for the bug report!

Changed in linux-source-2.6.15:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.