Xen dom0 oops when using LVM: "NULL pointer dereference"

Bug #316355 reported by Gordon Syme
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I am using Xen with my images stored on LVM volumes. Generally , when provisioning a new domU I snapshot the image filesystem and use that for the domU filesystem.

I occasionally get these oopses when removing the snapshot volume after shutting down the domU. Once it has occurred I need to reboot before any LVM related operations will return.

The 2.6.24-22-xen kernel does not seem to be very stable. I am seeing these LVM related oopses fairly frequently (mostly to do with running MS Windows 2003 Server with HVM). I also have the occasional dom0 kernel panic when performing operations with heavy disk IO (such as dd:ing one filesystem to another).

Relevant portion of dmesg, full log also attached:

[267304.050713] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000127
[267304.050770] printing eip: c01a8cc5
[267304.050795] 2bb2d000 -> *pde = 00000001:096f3001
[267304.050822] 264f3000 -> *pme = 00000000:00000000
[267304.050849] Oops: 0000 [#1] SMP
[267304.050879] Modules linked in: tun af_packet bridge iptable_filter ip_tables x_tables parport_pc lp parport loop ipv6 serio_raw psmouse button pcspkr 8250_pnp 8250 serial_core dcdbas shpchp pci_hotplug evdev iTCO_wdt iTCO_vendor_support dm_multipath ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic usbhid hid ata_piix pata_acpi libata scsi_mod ehci_hcd tg3 uhci_hcd usbcore dm_mirror dm_snapshot dm_mod thermal processor fan fuse
[267304.051167]
[267304.051188] Pid: 213, comm: pdflush Not tainted (2.6.24-22-xen #1)
[267304.051218] EIP: 0061:[<c01a8cc5>] EFLAGS: 00010297 CPU: 0
[267304.051254] EIP is at __block_write_full_page+0xe5/0x350
[267304.051287] EAX: c2028140 EBX: 000be4d1 ECX: 00000000 EDX: 000be4d0
[267304.051317] ESI: 00000000 EDI: 00000127 EBP: ed002270 ESP: c1021de0
[267304.051346] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[267304.051374] Process pdflush (pid: 213, ti=c1020000 task=ed5a4070 task.ti=c1020000)
[267304.051406] Stack: 00000005 00000110 c1021e84 c1f8d980 c01ad250 c1f8d980 00000000 001bffff
[267304.051474] 00000000 c0477178 00001000 00000001 ed002270 c1f8d980 00000000 c01a9027
[267304.051542] c1021f74 f578e000 c1021f74 c01ad250 c1f8d980 ed002314 0000000b c1021f74
[267304.051611] Call Trace:
[267304.051661] [<c01ad250>] blkdev_get_block+0x0/0x70
[267304.051692] [<c01a9027>] block_write_full_page+0xf7/0x100
[267304.051724] [<c01ad250>] blkdev_get_block+0x0/0x70
[267304.051754] [<c01636d8>] __writepage+0x8/0x30
[267304.051784] [<c0163d6f>] write_cache_pages+0x21f/0x310
[267304.051815] [<c01636d0>] __writepage+0x0/0x30
[267304.051847] [<c0163e80>] generic_writepages+0x20/0x30
[267304.051877] [<c0163ebb>] do_writepages+0x2b/0x50
[267304.051907] [<c01a2ad9>] __writeback_single_inode+0x89/0x320
[267304.051938] [<ee09e2cd>] dm_table_any_congested+0xd/0x60 [dm_mod]
[267304.051977] [<ee09e2e1>] dm_table_any_congested+0x21/0x60 [dm_mod]
[267304.052013] [<c01a310f>] sync_sb_inodes+0x19f/0x270
[267304.052044] [<c01a33f9>] writeback_inodes+0x89/0xc0
[267304.052074] [<c0164925>] wb_kupdate+0x85/0xf0
[267304.052104] [<c0164dd0>] pdflush+0x0/0x250
[267304.052132] [<c0164f16>] pdflush+0x146/0x250
[267304.052161] [<c01648a0>] wb_kupdate+0x0/0xf0
[267304.052190] [<c013b7b2>] kthread+0x42/0x70
[267304.052220] [<c013b770>] kthread+0x0/0x70
[267304.052248] [<c0105bb7>] kernel_thread_helper+0x7/0x10
[267304.052279] =======================
[267304.054071] Code: 24 24 eb 21 77 06 3b 5c 24 1c 76 1f f0 0f ba 37 01 f0 0f ba 2f 00 8b 7f 04 39 7c 24 24 74 68 83 c3 01 83 d6 00 3b 74 24 20 73 d9 <8b> 07 a8 20 75 e5 8b 07 a8 02 74 df 8b 54 24 28 39 57 14 0f 85
[267304.054333] EIP: [<c01a8cc5>] __block_write_full_page+0xe5/0x350 SS:ESP 0069:c1021de0
[267304.054846] ---[ end trace c4666ba89cf2be08 ]---

$ uname -a
Linux potenza 2.6.24-22-xen #1 SMP Mon Nov 24 21:30:37 UTC 2008 i686 GNU/Linux

$ cat /proc/version_signature
Ubuntu 2.6.24-4.6-generic

Revision history for this message
Gordon Syme (gordon-syme) wrote :
Revision history for this message
Gordon Syme (gordon-syme) wrote :
Revision history for this message
Gordon Syme (gordon-syme) wrote :

This bug is starting to become a bit of a pain, the system that it occurs on is our build/test system, we use Xen for testing on different platforms and reliability is becoming a problem.

Revision history for this message
Gordon Syme (gordon-syme) wrote :

I still see this on 2.6.24-23-xen after dist-upgrading this morning.

To test it I ran dd of a 7G LVM volume to another 7G LVM volume. This is a fairly reliable method of causing the NULL pointer dereference.

Once this bug has occurred I cannot perform any further LVM operations, any attempt to do so puts the process into an uninterruptible sleep.

A reboot seems to be the only way to get the system usable again.

Revision history for this message
Stéphane Ludwig (sludwig) wrote :

Same problem here. My server is running fine with the default kernel (2.6.24-23-server) but not with the xen kernel (2.6.24-23-xen).

Revision history for this message
Stéphane Ludwig (sludwig) wrote :
Revision history for this message
Gordon Syme (gordon-syme) wrote :

This is still happening

$ uname -a
Linux potenza 2.6.24-23-xen #1 SMP Mon Jan 26 03:12:59 UTC 2009 i686 GNU/Linux

All LVM related processes are in state D (uninterruptible sleep)

I have attached the dump from alt-sysrq-t and ps auxfww

Revision history for this message
Gordon Syme (gordon-syme) wrote :
Revision history for this message
Andy Whitcroft (apw) wrote :

This is not a bug in the linux-meta package, moving to the linux package.

affects: linux-meta (Ubuntu) → linux (Ubuntu)
tags: added: xen
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Gordon,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 316355

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.