Comment 29 for bug 1007082

Revision history for this message
Justin Dossey (jbd) wrote :

I'm also seeing this bug (almost exactly the original trace) on two physical servers since upgrading to 12.04 LTS. The same machines ran 10.04 LTS without any errors for over a year, and since I'm seeing the same BUG on both servers, I believe it to be related to the 3.2.0 kernel and not the hardware. Notably, the "bad_page.part.61+0x9f/0xf0" line exactly matches the original trace in this bug report.

Generally, the system stays up when this happens, but the baseline load average on the system increases because the apache2 process triggering the bug gets stuck. Stopping apache, kill -9ing all the apache processes which did not exit when stopping apache, and starting apache again brings the load back down to normal.

About every two weeks, the servers become completely unresponsive and must be reset.

Hope this helps find the issue. This bug has prevented us from upgrading any further systems until it is resolved, and we may even have to downgrade these computers to 10.04 until a solution becomes available.

The systems are completely up-to-date with 12.04.1 LTS.

Example from today:

[1309944.336646] BUG: Bad page state in process apache2 pfn:1334cc
[1309944.349965] page:ffffea0004cd3300 count:0 mapcount:0 mapping: (null) index:0x1a9c
[1309944.375260] page flags: 0x200000002001008(uptodate|private_2|0x2000000)
[1309944.388104] Modules linked in: ipt_REJECT xt_tcpudp xt_multiport iptable_filter ip_tables x_tables cachefiles nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ext2 vesafb psmouse serio_raw joydev i5100_edac ioatdma dca edac_core mac_hid lp parport pata_it8213 usbhid floppy hid e1000e 3w_9xxx
[1309944.439976] Pid: 11497, comm: apache2 Tainted: G B D 3.2.0-32-generic #51-Ubuntu
[1309944.465951] Call Trace:
[1309944.478272] [<ffffffff8111ebff>] bad_page.part.61+0x9f/0xf0
[1309944.490451] [<ffffffff8111ec68>] bad_page+0x18/0x30
[1309944.502755] [<ffffffff8111f6ee>] free_pages_prepare+0x10e/0x120
[1309944.514555] [<ffffffff8111f859>] free_hot_cold_page+0x49/0x1a0
[1309944.526060] [<ffffffff81012728>] ? __switch_to+0x138/0x360
[1309944.537454] [<ffffffff8111fbd4>] __pagevec_free+0x54/0xd0
[1309944.548556] [<ffffffff816588dc>] ? __schedule+0x3cc/0x6f0
[1309944.559282] [<ffffffff81123c1c>] release_pages+0x24c/0x280
[1309944.569964] [<ffffffff8116f79a>] ? mem_cgroup_add_lru_list+0x1a/0x20
[1309944.580545] [<ffffffff81123da0>] ? pagevec_move_tail+0x40/0x40
[1309944.590917] [<ffffffff81123d2a>] pagevec_lru_move_fn+0xda/0xf0
[1309944.601225] [<ffffffff81123d57>] ____pagevec_lru_add+0x17/0x20
[1309944.611199] [<ffffffff81123fd8>] __lru_cache_add+0x68/0x90
[1309944.620860] [<ffffffff811676f7>] ? __unmap_and_move+0x107/0x270
[1309944.630305] [<ffffffff8112448d>] lru_cache_add_lru+0x2d/0x50
[1309944.639530] [<ffffffff8112a709>] putback_lru_page+0x69/0xe0
[1309944.648441] [<ffffffff811678f4>] unmap_and_move+0x94/0x150
[1309944.657237] [<ffffffff81167bae>] migrate_pages+0x9e/0x140
[1309944.665861] [<ffffffff8115b590>] ? isolate_freepages+0x210/0x210
[1309944.674300] [<ffffffff8115bd91>] compact_zone.part.14+0x121/0x270
[1309944.682777] [<ffffffff8115bfc7>] compact_zone+0x37/0x50
[1309944.691109] [<ffffffff8115c153>] compact_zone_order+0x83/0xb0
[1309944.699220] [<ffffffff8115c24d>] try_to_compact_pages+0xcd/0x100
[1309944.707068] [<ffffffff81645796>] __alloc_pages_direct_compact+0xb2/0x170
[1309944.714904] [<ffffffff811208a5>] __alloc_pages_nodemask+0x535/0x8f0
[1309944.722401] [<ffffffff81157ce6>] alloc_pages_current+0xb6/0x120
[1309944.729869] [<ffffffff81160c8d>] allocate_slab+0x13d/0x1a0
[1309944.737041] [<ffffffff81160d20>] new_slab+0x30/0x180
[1309944.743957] [<ffffffff81647199>] __slab_alloc+0x165/0x269
[1309944.750992] [<ffffffff81218c26>] ? ext4_get_block+0x16/0x20
[1309944.757996] [<ffffffffa019ce80>] ? nfs_readdata_alloc+0x20/0xa0 [nfs]
[1309944.765108] [<ffffffffa019ce80>] ? nfs_readdata_alloc+0x20/0xa0 [nfs]
[1309944.771973] [<ffffffff81164666>] kmem_cache_alloc+0x136/0x140
[1309944.779081] [<ffffffffa019a9d1>] ? nfs_create_request+0x41/0x160 [nfs]
[1309944.786319] [<ffffffffa019cbb0>] ? nfs_return_empty_page+0x70/0x70 [nfs]
[1309944.793434] [<ffffffffa019ce80>] nfs_readdata_alloc+0x20/0xa0 [nfs]
[1309944.800585] [<ffffffffa019cf34>] nfs_pagein_one+0x34/0x200 [nfs]
[1309944.807563] [<ffffffffa019a9d1>] ? nfs_create_request+0x41/0x160 [nfs]
[1309944.814653] [<ffffffffa019cbb0>] ? nfs_return_empty_page+0x70/0x70 [nfs]
[1309944.821932] [<ffffffffa019d5f8>] nfs_generic_pagein+0x18/0x30 [nfs]
[1309944.829008] [<ffffffffa019d639>] nfs_generic_pg_readpages+0x29/0xa0 [nfs]
[1309944.836234] [<ffffffffa019a742>] __nfs_pageio_add_request+0x22/0xb0 [nfs]
[1309944.843457] [<ffffffffa019ad03>] nfs_pageio_add_request+0x23/0x40 [nfs]
[1309944.850815] [<ffffffffa019cc33>] readpage_async_filler+0x83/0x130 [nfs]
[1309944.858064] [<ffffffffa019cbb0>] ? nfs_return_empty_page+0x70/0x70 [nfs]
[1309944.865326] [<ffffffff81122e1a>] read_cache_pages+0xba/0x120
[1309944.872611] [<ffffffffa019db31>] nfs_readpages+0x131/0x1a0 [nfs]
[1309944.880134] [<ffffffff81122a78>] read_pages+0x48/0x100
[1309944.887456] [<ffffffff81122c93>] __do_page_cache_readahead+0x163/0x180
[1309944.894820] [<ffffffff81123001>] ra_submit+0x21/0x30
[1309944.902272] [<ffffffff81123125>] ondemand_readahead+0x115/0x230
[1309944.909891] [<ffffffff8152edfd>] ? release_sock+0x6d/0x80
[1309944.917433] [<ffffffff811232c8>] page_cache_async_readahead+0x88/0xb0
[1309944.924981] [<ffffffff813108fe>] ? radix_tree_lookup_slot+0xe/0x10
[1309944.932569] [<ffffffff81117b6e>] ? find_get_page+0x1e/0x90
[1309944.940263] [<ffffffff811184a9>] do_generic_file_read.constprop.33+0x269/0x440
[1309944.956115] [<ffffffff8111941f>] generic_file_aio_read+0xef/0x280
[1309944.964185] [<ffffffff81528ec2>] ? alloc_sock_iocb+0x12/0x60
[1309944.972206] [<ffffffff8152a443>] ? sock_aio_write+0x63/0x90
[1309944.980355] [<ffffffffa018ee49>] nfs_file_read+0x89/0x100 [nfs]
[1309944.988937] [<ffffffff8117792a>] do_sync_read+0xda/0x120
[1309944.997074] [<ffffffff8129d5f3>] ? security_file_permission+0x93/0xb0
[1309945.005354] [<ffffffff81177db1>] ? rw_verify_area+0x61/0xf0
[1309945.013604] [<ffffffff81178290>] vfs_read+0xb0/0x180
[1309945.021840] [<ffffffff811783aa>] sys_read+0x4a/0x90
[1309945.029904] [<ffffffff81663442>] system_call_fastpath+0x16/0x1b