I'm also seeing this bug (almost exactly the original trace) on two physical servers since upgrading to 12.04 LTS. The same machines ran 10.04 LTS without any errors for over a year, and since I'm seeing the same BUG on both servers, I believe it to be related to the 3.2.0 kernel and not the hardware. Notably, the "bad_page.part.61+0x9f/0xf0" line exactly matches the original trace in this bug report.
Generally, the system stays up when this happens, but the baseline load average on the system increases because the apache2 process triggering the bug gets stuck. Stopping apache, kill -9ing all the apache processes which did not exit when stopping apache, and starting apache again brings the load back down to normal.
About every two weeks, the servers become completely unresponsive and must be reset.
Hope this helps find the issue. This bug has prevented us from upgrading any further systems until it is resolved, and we may even have to downgrade these computers to 10.04 until a solution becomes available.
The systems are completely up-to-date with 12.04.1 LTS.
I'm also seeing this bug (almost exactly the original trace) on two physical servers since upgrading to 12.04 LTS. The same machines ran 10.04 LTS without any errors for over a year, and since I'm seeing the same BUG on both servers, I believe it to be related to the 3.2.0 kernel and not the hardware. Notably, the "bad_page. part.61+ 0x9f/0xf0" line exactly matches the original trace in this bug report.
Generally, the system stays up when this happens, but the baseline load average on the system increases because the apache2 process triggering the bug gets stuck. Stopping apache, kill -9ing all the apache processes which did not exit when stopping apache, and starting apache again brings the load back down to normal.
About every two weeks, the servers become completely unresponsive and must be reset.
Hope this helps find the issue. This bug has prevented us from upgrading any further systems until it is resolved, and we may even have to downgrade these computers to 10.04 until a solution becomes available.
The systems are completely up-to-date with 12.04.1 LTS.
Example from today:
[1309944.336646] BUG: Bad page state in process apache2 pfn:1334cc cd3300 count:0 mapcount:0 mapping: (null) index:0x1a9c 08(uptodate| private_ 2|0x2000000) bff>] bad_page. part.61+ 0x9f/0xf0 c68>] bad_page+0x18/0x30 6ee>] free_pages_ prepare+ 0x10e/0x120 859>] free_hot_ cold_page+ 0x49/0x1a0 728>] ? __switch_ to+0x138/ 0x360 bd4>] __pagevec_ free+0x54/ 0xd0 8dc>] ? __schedule+ 0x3cc/0x6f0 c1c>] release_ pages+0x24c/ 0x280 79a>] ? mem_cgroup_ add_lru_ list+0x1a/ 0x20 da0>] ? pagevec_ move_tail+ 0x40/0x40 d2a>] pagevec_ lru_move_ fn+0xda/ 0xf0 d57>] ____pagevec_ lru_add+ 0x17/0x20 fd8>] __lru_cache_ add+0x68/ 0x90 6f7>] ? __unmap_ and_move+ 0x107/0x270 48d>] lru_cache_ add_lru+ 0x2d/0x50 709>] putback_ lru_page+ 0x69/0xe0 8f4>] unmap_and_ move+0x94/ 0x150 bae>] migrate_ pages+0x9e/ 0x140 590>] ? isolate_ freepages+ 0x210/0x210 d91>] compact_ zone.part. 14+0x121/ 0x270 fc7>] compact_ zone+0x37/ 0x50 153>] compact_ zone_order+ 0x83/0xb0 24d>] try_to_ compact_ pages+0xcd/ 0x100 796>] __alloc_ pages_direct_ compact+ 0xb2/0x170 8a5>] __alloc_ pages_nodemask+ 0x535/0x8f0 ce6>] alloc_pages_ current+ 0xb6/0x120 c8d>] allocate_ slab+0x13d/ 0x1a0 d20>] new_slab+0x30/0x180 199>] __slab_ alloc+0x165/ 0x269 c26>] ? ext4_get_ block+0x16/ 0x20 e80>] ? nfs_readdata_ alloc+0x20/ 0xa0 [nfs] e80>] ? nfs_readdata_ alloc+0x20/ 0xa0 [nfs] 666>] kmem_cache_ alloc+0x136/ 0x140 9d1>] ? nfs_create_ request+ 0x41/0x160 [nfs] bb0>] ? nfs_return_ empty_page+ 0x70/0x70 [nfs] e80>] nfs_readdata_ alloc+0x20/ 0xa0 [nfs] f34>] nfs_pagein_ one+0x34/ 0x200 [nfs] 9d1>] ? nfs_create_ request+ 0x41/0x160 [nfs] bb0>] ? nfs_return_ empty_page+ 0x70/0x70 [nfs] 5f8>] nfs_generic_ pagein+ 0x18/0x30 [nfs] 639>] nfs_generic_ pg_readpages+ 0x29/0xa0 [nfs] 742>] __nfs_pageio_ add_request+ 0x22/0xb0 [nfs] d03>] nfs_pageio_ add_request+ 0x23/0x40 [nfs] c33>] readpage_ async_filler+ 0x83/0x130 [nfs] bb0>] ? nfs_return_ empty_page+ 0x70/0x70 [nfs] e1a>] read_cache_ pages+0xba/ 0x120 b31>] nfs_readpages+ 0x131/0x1a0 [nfs] a78>] read_pages+ 0x48/0x100 c93>] __do_page_ cache_readahead +0x163/ 0x180 001>] ra_submit+0x21/0x30 125>] ondemand_ readahead+ 0x115/0x230 dfd>] ? release_ sock+0x6d/ 0x80 2c8>] page_cache_ async_readahead +0x88/0xb0 8fe>] ? radix_tree_ lookup_ slot+0xe/ 0x10 b6e>] ? find_get_ page+0x1e/ 0x90 4a9>] do_generic_ file_read. constprop. 33+0x269/ 0x440 41f>] generic_ file_aio_ read+0xef/ 0x280 ec2>] ? alloc_sock_ iocb+0x12/ 0x60 443>] ? sock_aio_ write+0x63/ 0x90 e49>] nfs_file_ read+0x89/ 0x100 [nfs] 92a>] do_sync_ read+0xda/ 0x120 5f3>] ? security_ file_permission +0x93/0xb0 db1>] ? rw_verify_ area+0x61/ 0xf0 290>] vfs_read+0xb0/0x180 3aa>] sys_read+0x4a/0x90 442>] system_ call_fastpath+ 0x16/0x1b
[1309944.349965] page:ffffea0004
[1309944.375260] page flags: 0x2000000020010
[1309944.388104] Modules linked in: ipt_REJECT xt_tcpudp xt_multiport iptable_filter ip_tables x_tables cachefiles nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ext2 vesafb psmouse serio_raw joydev i5100_edac ioatdma dca edac_core mac_hid lp parport pata_it8213 usbhid floppy hid e1000e 3w_9xxx
[1309944.439976] Pid: 11497, comm: apache2 Tainted: G B D 3.2.0-32-generic #51-Ubuntu
[1309944.465951] Call Trace:
[1309944.478272] [<ffffffff8111e
[1309944.490451] [<ffffffff8111e
[1309944.502755] [<ffffffff8111f
[1309944.514555] [<ffffffff8111f
[1309944.526060] [<ffffffff81012
[1309944.537454] [<ffffffff8111f
[1309944.548556] [<ffffffff81658
[1309944.559282] [<ffffffff81123
[1309944.569964] [<ffffffff8116f
[1309944.580545] [<ffffffff81123
[1309944.590917] [<ffffffff81123
[1309944.601225] [<ffffffff81123
[1309944.611199] [<ffffffff81123
[1309944.620860] [<ffffffff81167
[1309944.630305] [<ffffffff81124
[1309944.639530] [<ffffffff8112a
[1309944.648441] [<ffffffff81167
[1309944.657237] [<ffffffff81167
[1309944.665861] [<ffffffff8115b
[1309944.674300] [<ffffffff8115b
[1309944.682777] [<ffffffff8115b
[1309944.691109] [<ffffffff8115c
[1309944.699220] [<ffffffff8115c
[1309944.707068] [<ffffffff81645
[1309944.714904] [<ffffffff81120
[1309944.722401] [<ffffffff81157
[1309944.729869] [<ffffffff81160
[1309944.737041] [<ffffffff81160
[1309944.743957] [<ffffffff81647
[1309944.750992] [<ffffffff81218
[1309944.757996] [<ffffffffa019c
[1309944.765108] [<ffffffffa019c
[1309944.771973] [<ffffffff81164
[1309944.779081] [<ffffffffa019a
[1309944.786319] [<ffffffffa019c
[1309944.793434] [<ffffffffa019c
[1309944.800585] [<ffffffffa019c
[1309944.807563] [<ffffffffa019a
[1309944.814653] [<ffffffffa019c
[1309944.821932] [<ffffffffa019d
[1309944.829008] [<ffffffffa019d
[1309944.836234] [<ffffffffa019a
[1309944.843457] [<ffffffffa019a
[1309944.850815] [<ffffffffa019c
[1309944.858064] [<ffffffffa019c
[1309944.865326] [<ffffffff81122
[1309944.872611] [<ffffffffa019d
[1309944.880134] [<ffffffff81122
[1309944.887456] [<ffffffff81122
[1309944.894820] [<ffffffff81123
[1309944.902272] [<ffffffff81123
[1309944.909891] [<ffffffff8152e
[1309944.917433] [<ffffffff81123
[1309944.924981] [<ffffffff81310
[1309944.932569] [<ffffffff81117
[1309944.940263] [<ffffffff81118
[1309944.956115] [<ffffffff81119
[1309944.964185] [<ffffffff81528
[1309944.972206] [<ffffffff8152a
[1309944.980355] [<ffffffffa018e
[1309944.988937] [<ffffffff81177
[1309944.997074] [<ffffffff8129d
[1309945.005354] [<ffffffff81177
[1309945.013604] [<ffffffff81178
[1309945.021840] [<ffffffff81178
[1309945.029904] [<ffffffff81663