Bug #235783 “[hardy][xen] Oops in free_hot_cold_cache, probably ...” : Bugs : linux package : Ubuntu

Revision history for this message

Bernhard Schmidt (berni) wrote on 2008-05-29:

#1

uname-a.log Edit (75 bytes, text/plain)

Revision history for this message

Bernhard Schmidt (berni) wrote on 2008-05-29:

#2

version.log Edit (26 bytes, text/plain)

Revision history for this message

Bernhard Schmidt (berni) wrote on 2008-05-29:

#3

dmesg.log Edit (21.7 KiB, text/plain)

Revision history for this message

Bernhard Schmidt (berni) wrote on 2008-05-29:

#4

lspci-vvnn.log Edit (9.0 KiB, text/plain)

Revision history for this message

Bernhard Schmidt (berni) wrote on 2008-05-30:

#5

Download full text (8.2 KiB)

Another series of Oopses I just got when running bonnie++ on a local partition (not even drbd8)

[ 91.876570] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 91.876587] printing eip: de12d234
[ 91.876594] 1c1c3000 -> *pde = 00000000:09ae4001
[ 91.876598] 15ee4000 -> *pme = 00000000:00000000
[ 91.876604] Oops: 0000 [#1] SMP
[ 91.876611] Modules linked in: drbd cn bridge sbs container battery sbshc video output ac dock iptable_filter ip_tables x_tables parpe
[ 91.876699]
[ 91.876704] Pid: 161, comm: kswapd0 Not tainted (2.6.24-17-xen #1)
[ 91.876708] EIP: 0061:[<de12d234>] EFLAGS: 00010202 CPU: 0
[ 91.876721] EIP is at __journal_remove_checkpoint+0x14/0xb0 [jbd]
[ 91.876725] EAX: 000001c0 EBX: 00000008 ECX: dc803640 EDX: 00924925
[ 91.876730] ESI: dc803640 EDI: dc803640 EBP: c192da94 ESP: db71dda8
[ 91.876734] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[ 91.876738] Process kswapd0 (pid: 161, ti=db71c000 task=db67ceb0 task.ti=db71c000)
[ 91.876743] Stack: c19749e0 c192da94 dc803640 de12ba89 dafaa8b8 c192c340 dafaa800 de2102d0
[ 91.876789] 000000d0 db71df7c db71df7c c015f70c c192c340 db71df0c c0167c15 00000000
[ 91.876804] 00000000 dc76c908 db71de7c db71de54 00000000 00f52873 00000000 00000015
[ 91.876819] Call Trace:
[ 91.876826] [<de12ba89>] journal_try_to_free_buffers+0xe9/0x140 [jbd]
[ 91.876842] [<de2102d0>] ext3_releasepage+0x0/0xa0 [ext3]
[ 91.876860] [<c015f70c>] try_to_release_page+0x2c/0x40
[ 91.876874] [<c0167c15>] shrink_page_list+0x4c5/0x600
[ 91.876888] [<c0166daf>] isolate_lru_pages+0x5f/0x1c0
[ 91.876899] [<c0167e6f>] shrink_inactive_list+0x11f/0x3b0
[ 91.876914] [<c016819c>] shrink_zone+0x9c/0x100
[ 91.876923] [<c016883c>] kswapd+0x44c/0x490
[ 91.876938] [<c013bb90>] autoremove_wake_function+0x0/0x40
[ 91.876949] [<c011e260>] complete+0x40/0x60
[ 91.876958] [<c01683f0>] kswapd+0x0/0x490
[ 91.876966] [<c013b8d2>] kthread+0x42/0x70
[ 91.876972] [<c013b890>] kthread+0x0/0x70
[ 91.876980] [<c0105bb7>] kernel_thread_helper+0x7/0x10
[ 91.876991] =======================
[ 91.876994] Code: 0b eb fe 8d 74 26 00 0f 0b eb fe 0f 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c1 53 83 ec 04 8b 58 28 85 db 74 29 8b 4
[ 91.877080] EIP: [<de12d234>] __journal_remove_checkpoint+0x14/0xb0 [jbd] SS:ESP 0069:db71dda8
[ 91.877668] ---[ end trace 276bea9ce4a4d4b9 ]---
[ 94.370990] BUG: unable to handle kernel paging request at virtual address b3578bd4
[ 94.371007] printing eip: c020ff0d
[ 94.371015] 015c5000 -> *pde = 00000000:1d5c8001
[ 94.371021] 015c8000 -> *pme = 00000000:00000000
[ 94.371028] Oops: 0000 [#2] SMP
[ 94.371036] Modules linked in: drbd cn bridge sbs container battery sbshc video output ac dock iptable_filter ip_tables x_tables parpe
[ 94.371147]
[ 94.371152] Pid: 4686, comm: getty Tainted: G D (2.6.24-17-xen #1)
[ 94.371159] EIP: 0061:[<c020ff0d>] EFLAGS: 00010446 CPU: 1
[ 94.371171] EIP is at memmove+0x1d/0x40
[ 94.371189] EAX: db578bd5 EBX: db578bd5 ECX: d8000000 EDX: db578bd5
[ 94.371195] ESI: b3578bd4 EDI: b3578bd4 EBP: dbed6000 ESP: dbed7f4c
[ 94.371201] DS: 007b ES: ...

Another series of Oopses I just got when running bonnie++ on a local partition (not even drbd8)

[   91.876570] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
[   91.876587] printing eip: de12d234
[   91.876594] 1c1c3000 -> *pde = 00000000:09ae4001
[   91.876598] 15ee4000 -> *pme = 00000000:00000000
[   91.876604] Oops: 0000 [#1] SMP
[   91.876611] Modules linked in: drbd cn bridge sbs container battery sbshc video output ac dock iptable_filter ip_tables x_tables parpe
[   91.876699]
[   91.876704] Pid: 161, comm: kswapd0 Not tainted (2.6.24-17-xen #1)
[   91.876708] EIP: 0061:[<de12d234>] EFLAGS: 00010202 CPU: 0
[   91.876721] EIP is at __journal_remove_checkpoint+0x14/0xb0 [jbd]
[   91.876725] EAX: 000001c0 EBX: 00000008 ECX: dc803640 EDX: 00924925
[   91.876730] ESI: dc803640 EDI: dc803640 EBP: c192da94 ESP: db71dda8
[   91.876734]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[   91.876738] Process kswapd0 (pid: 161, ti=db71c000 task=db67ceb0 task.ti=db71c000)
[   91.876743] Stack: c19749e0 c192da94 dc803640 de12ba89 dafaa8b8 c192c340 dafaa800 de2102d0
[   91.876789]        000000d0 db71df7c db71df7c c015f70c c192c340 db71df0c c0167c15 00000000
[   91.876804]        00000000 dc76c908 db71de7c db71de54 00000000 00f52873 00000000 00000015
[   91.876819] Call Trace:
[   91.876826]  [<de12ba89>] journal_try_to_free_buffers+0xe9/0x140 [jbd]
[   91.876842]  [<de2102d0>] ext3_releasepage+0x0/0xa0 [ext3]
[   91.876860]  [<c015f70c>] try_to_release_page+0x2c/0x40
[   91.876874]  [<c0167c15>] shrink_page_list+0x4c5/0x600
[   91.876888]  [<c0166daf>] isolate_lru_pages+0x5f/0x1c0
[   91.876899]  [<c0167e6f>] shrink_inactive_list+0x11f/0x3b0
[   91.876914]  [<c016819c>] shrink_zone+0x9c/0x100
[   91.876923]  [<c016883c>] kswapd+0x44c/0x490
[   91.876938]  [<c013bb90>] autoremove_wake_function+0x0/0x40
[   91.876949]  [<c011e260>] complete+0x40/0x60
[   91.876958]  [<c01683f0>] kswapd+0x0/0x490
[   91.876966]  [<c013b8d2>] kthread+0x42/0x70
[   91.876972]  [<c013b890>] kthread+0x0/0x70
[   91.876980]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
[   91.876991]  =======================
[   91.876994] Code: 0b eb fe 8d 74 26 00 0f 0b eb fe 0f 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c1 53 83 ec 04 8b 58 28 85 db 74 29 8b 4
[   91.877080] EIP: [<de12d234>] __journal_remove_checkpoint+0x14/0xb0 [jbd] SS:ESP 0069:db71dda8
[   91.877668] ---[ end trace 276bea9ce4a4d4b9 ]---
[   94.370990] BUG: unable to handle kernel paging request at virtual address b3578bd4
[   94.371007] printing eip: c020ff0d
[   94.371015] 015c5000 -> *pde = 00000000:1d5c8001
[   94.371021] 015c8000 -> *pme = 00000000:00000000
[   94.371028] Oops: 0000 [#2] SMP
[   94.371036] Modules linked in: drbd cn bridge sbs container battery sbshc video output ac dock iptable_filter ip_tables x_tables parpe
[   94.371147]
[   94.371152] Pid: 4686, comm: getty Tainted: G      D (2.6.24-17-xen #1)
[   94.371159] EIP: 0061:[<c020ff0d>] EFLAGS: 00010446 CPU: 1
[   94.371171] EIP is at memmove+0x1d/0x40
[   94.371189] EAX: db578bd5 EBX: db578bd5 ECX: d8000000 EDX: db578bd5
[   94.371195] ESI: b3578bd4 EDI: b3578bd4 EBP: dbed6000 ESP: dbed7f4c
[   94.371201]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[   94.371208] Process getty (pid: 4686, ti=dbed6000 task=daf19210 task.ti=dbed6000)
[   94.371213] Stack: db578bc1 da899040 00000000 c01811c6 00000001 daf19210 00000100 c0129d95
[   94.371233]        c155b46c 00000000 c03267d6 74746567 74690079 00000000 00000000 0000124e
[   94.371253]        00000000 dbed7fa8 dae72200 00000100 b7fcf288 dbed6000 c012a50a 00000001
[   94.371272] Call Trace:
[   94.371281]  [<c01811c6>] kmem_cache_free+0xa6/0xb0
[   94.371294]  [<c0129d95>] do_exit+0x165/0x8b0
[   94.371307]  [<c03267d6>] do_nanosleep+0x46/0x70
[   94.371323]  [<c012a50a>] do_group_exit+0x2a/0xa0
[   94.371335]  [<c0105832>] syscall_call+0x7/0xb
[   94.371348]  [<c0320000>] vcc_create+0x90/0x110
[   94.371360]  =======================
[   94.371364] Code: 7c 24 08 83 c4 0c c3 8d b4 26 00 00 00 00 83 ec 0c 39 d0 89 1c 24 89 c3 89 74 24 04 89 7c 24 08 72 1d 8d 74 0a ff 8
[   94.371468] EIP: [<c020ff0d>] memmove+0x1d/0x40 SS:ESP 0069:dbed7f4c
[   94.371480] ---[ end trace 276bea9ce4a4d4b9 ]---
[   94.371484] Fixing recursive fault but reboot is needed!
[   95.090693] ------------[ cut here ]------------
[   95.090706] kernel BUG at /build/buildd/linux-2.6.24/debian/build/custom-source-xen/mm/slab.c:602!
[   95.090714] invalid opcode: 0000 [#3] SMP
[   95.090721] Modules linked in: drbd cn bridge sbs container battery sbshc video output ac dock iptable_filter ip_tables x_tables parpe
[   95.090832]
[   95.090836] Pid: 10, comm: events/1 Tainted: G      D (2.6.24-17-xen #1)
[   95.090843] EIP: 0061:[<c01813ca>] EFLAGS: 00010002 CPU: 1
[   95.090852] EIP is at free_block+0x12a/0x130
[   95.090857] EAX: 00080008 EBX: db578bd5 ECX: c0da8b3a EDX: c15f6500
[   95.090863] ESI: db578bc1 EDI: 05666667 EBP: db578bd5 ESP: db5b9f24
[   95.090869]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[   95.090879] Process events/1 (pid: 10, ti=db5b8000 task=db5b7730 task.ti=db5b8000)
[   95.090901] Stack: 00000011 00000000 05666667 dc803640 00000000 db578bd5 db578bc1 05666667
[   95.090920]        dc810d40 c018145d 00000000 00000000 00000000 dc803640 dc810d40 dc803640
[   95.090939]        c155b6c0 c0182950 c01829d9 00000000 00000000 c155b6c4 db57de40 c155b6c0
[   95.090958] Call Trace:
[   95.090969]  [<c018145d>] drain_array+0x8d/0x110
[   95.090982]  [<c0182950>] cache_reap+0x0/0x110
[   95.090991]  [<c01829d9>] cache_reap+0x89/0x110
[   95.091002]  [<c0137c83>] run_workqueue+0x93/0x160
[   95.091015]  [<c0138780>] worker_thread+0x0/0xe0
[   95.091025]  [<c0138804>] worker_thread+0x84/0xe0
[   95.091036]  [<c013bb90>] autoremove_wake_function+0x0/0x40
[   95.091048]  [<c0138780>] worker_thread+0x0/0xe0
[   95.091058]  [<c013b8d2>] kthread+0x42/0x70
[   95.091066]  [<c013b890>] kthread+0x0/0x70
[   95.091076]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
[   95.091089]  =======================
[   95.091093] Code: 2b 42 38 89 f2 89 47 18 8b 44 24 0c e8 80 fe ff ff e9 28 ff ff ff 83 c4 14 5b 5e 5f 5d c3 8b 52 0c 8b 02 84 c0 0f 8
[   95.091194] EIP: [<c01813ca>] free_block+0x12a/0x130 SS:ESP 0069:db5b9f24
[   95.091207] ---[ end trace 276bea9ce4a4d4b9 ]---
[  103.181745] BUG: soft lockup - CPU#0 stuck for 11s! [bonnie++:4677]
[  103.181757]
[  103.181762] Pid: 4677, comm: bonnie++ Tainted: G      D (2.6.24-17-xen #1)
[  103.181767] EIP: 0061:[<c03275b7>] EFLAGS: 00200286 CPU: 0
[  103.181777] EIP is at _spin_lock+0x7/0x10
[  103.181781] EAX: dafaa8b8 EBX: d9f64908 ECX: dc76c908 EDX: 00000000
[  103.181785] ESI: dafaa800 EDI: dc6427e8 EBP: c92e8fa8 ESP: da6ebcf0
[  103.181790]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[  103.181798] CR0: 8005003b CR2: 00000008 CR3: 1c1c3000 CR4: 00000660
[  103.181805] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  103.181810] DR6: ffff0ff0 DR7: 00000400
[  103.181821]  [<de12b592>] journal_dirty_data+0x62/0x1b0 [jbd]
[  103.181841]  [<de20e6f8>] ext3_journal_dirty_data+0x18/0x50 [ext3]
[  103.181858]  [<de20f300>] ext3_get_block+0x0/0x100 [ext3]
[  103.181874]  [<de20d942>] walk_page_buffers+0x32/0x70 [ext3]
[  103.181890]  [<de210544>] ext3_ordered_write_end+0x74/0x170 [ext3]
[  103.181906]  [<de20e6e0>] ext3_journal_dirty_data+0x0/0x50 [ext3]
[  103.181944]  [<c011958c>] kmap_atomic+0x1c/0x30
[  103.181956]  [<c015e816>] generic_file_buffered_write+0x176/0x640
[  103.181972]  [<c0108333>] sched_clock+0x23/0x70
[  103.181984]  [<c015ef84>] __generic_file_aio_write_nolock+0x2a4/0x540
[  103.181995]  [<c0122746>] scheduler_tick+0xf6/0x140
[  103.182005]  [<c0109030>] timer_interrupt+0x3a0/0x770
[  103.182014]  [<c015f285>] generic_file_aio_write+0x65/0xe0
[  103.182027]  [<de20c690>] ext3_file_write+0x30/0xc0 [ext3]
[  103.182043]  [<c01852e5>] do_sync_write+0xd5/0x120
[  103.182055]  [<c013bb90>] autoremove_wake_function+0x0/0x40
[  103.182067]  [<c012bfe2>] __do_softirq+0x92/0x130
[  103.182083]  [<c0185210>] do_sync_write+0x0/0x120
[  103.182095]  [<c0185be9>] vfs_write+0xb9/0x170
[  103.182108]  [<c0186321>] sys_write+0x41/0x70
[  103.182117]  [<c0105832>] syscall_call+0x7/0xb
[  103.182128]  =======================
[  114.997708] BUG: soft lockup - CPU#0 stuck for 11s! [bonnie++:4677]

Revision history for this message

Bernhard Schmidt (berni) wrote on 2008-05-30:

#6

Okay, I think I have figured this out. Pretty scary if I'm right

The box runs headless and has a serial console. I basically copied the configuration from a gutsy box for this which gives me the following settings in grub/menu.lst

---
## Xen hypervisor options to use with the default Xen boot option
# xenhopt=console=com1,vga com1=57600,8n1

## Xen Linux kernel options to use with the default Xen boot option
# xenkopt=console=ttyS0,57600n1
---

This shows all xen, kernel and bootup messages right until the "Running local boot scripts", but obviously no prompt. For that I copied the file /etc/event.d/ttyS0 from a gutsy box as well

---
start on stopped rc2
start on stopped rc3
start on stopped rc4
start on stopped rc5

stop on runlevel 0
stop on runlevel 1
stop on runlevel 6

respawn
exec /sbin/getty -L ttyS0 57600 vt102
---

I still did not get a prompt but did not have a second look, because the box was crashing very frequently and I tried to figure out this problem first.

Today I learned that I need to use xvc0 instead of ttyS0. I changed my config, got a login prompt and the box was suddenly way more stable. It usually died within minutes when I did something I/O intensive (e.g. bonnie++, no matter whether on a local partition or a drbd8 volume, or using debootstrap)

I did the test, after reenabling ttyS0 the box crashed as violent and often as before. It does not appear to happen when ttyS0 is started after the bootup process. Every access to ttyS0 is blocked with an Input/Output error, so getty is restarted in a loop by upstart.

My guess is that accessing/writing to ttyS0 from within dom0 compromises kernel memory and leads to a crash at some time.

Removed ttyS0 now and will be running bonnie++ on a drbd volume over night, I hope I can confirm this tomorrow.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-08-29:

#7

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message

kernel-janitor (kernel-janitor) wrote on 2009-07-15:

#8

Hi berni,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-`uname -r` 235783

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags:	added: needs-kernel-logs
tags:	added: needs-upstream-testing
tags:	added: kj-triage
Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2010-03-13:

#9

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags:	added: kj-expired
Changed in linux (Ubuntu):
status:	Incomplete → Invalid

Ubuntu
linux package

[hardy][xen] Oops in free_hot_cold_cache, probably drbd8-related

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntulinux package

[hardy][xen] Oops in free_hot_cold_cache, probably drbd8-related

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package