"BUG: Bad page state in process" when running on EC2

Bug #1052275 reported by Matt Wilson
26
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux-ec2 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After running for some time, several m1.large 64-bit instances started repeatedly hitting this BUG_ON()

[525758.322281] BUG: Bad page state in process pdnsd pfn:1d1a6f
[525758.322290] page:ffff88000b26f848 flags:800000000000087c count:2 mapcount:0 mapping:ffff8800d2da0860 index:99
[525758.322294] Pid: 731, comm: pdnsd Not tainted 2.6.32-346-ec2 #51-Ubuntu
[525758.322296] Call Trace:
[525758.322305] [<ffffffff810b39c0>] bad_page+0xd0/0x130
[525758.322307] [<ffffffff810b48aa>] prep_new_page+0x1aa/0x1c0
[525758.322310] [<ffffffff810b3d75>] ? zone_watermark_ok+0x25/0xe0
[525758.322312] [<ffffffff810b4a2b>] get_page_from_freelist+0x16b/0x550
[525758.322315] [<ffffffff810b5586>] __alloc_pages_nodemask+0xd6/0x180
[525758.322319] [<ffffffff810cc37d>] do_anonymous_page+0x21d/0x540
[525758.322321] [<ffffffff810cee87>] handle_mm_fault+0x427/0x4f0
[525758.322333] [<ffffffff814b5fe7>] do_page_fault+0x147/0x390
[525758.322335] [<ffffffff814b3d28>] page_fault+0x28/0x30

One instance ultimately hit a GPF:
[525758.336588] general protection fault: 0000 [#1] SMP
[525758.336598] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[525758.336601] CPU 1
[525758.336603] Modules linked in: ipv6 raid0 md_mod
[525758.336610] Pid: 731, comm: pdnsd Tainted: G B 2.6.32-346-ec2 #51-Ubuntu
[525758.336613] RIP: e030:[<ffffffff810b4ab6>] [<ffffffff810b4ab6>] get_page_from_freelist+0x1f6/0x550
[525758.336623] RSP: e02b:ffff8801dce4bce8 EFLAGS: 00010096
[525758.336625] RAX: ffffffff816b1570 RBX: ffffffff816b1480 RCX: 0000000000000040
[525758.336628] RDX: dead000000100100 RSI: 0000000000000000 RDI: 0000000000000005
[525758.336630] RBP: ffff8801dce4bdb8 R08: 0000000000010ffa R09: 0000000000000000
[525758.336633] R10: 0000000000000005 R11: 0000000000000000 R12: ffff88000b26f848
[525758.336636] R13: 0000000000000001 R14: dead000000200200 R15: 0000000000000002
[525758.336642] FS: 00007f4b928ee700(0000) GS:ffff880002e7e000(0000) knlGS:0000000000000000
[525758.336645] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[525758.336647] CR2: 00007f4b900e9ff8 CR3: 00000001debdf000 CR4: 0000000000002660
[525758.336650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[525758.336653] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
[525758.336656] Process pdnsd (pid: 731, threadinfo ffff8801dce4a000, task ffff8801dce40300)
[525758.336659] Stack:
[525758.336660] ffff8801dcdaf0c0 00000002dcdaf0c0 0000000000000000 000000000000a3c0
[525758.336665] <0> ffff880100000041 ffff8801dcdaf0c0 ffffffffdce4be28 0000000100000000
[525758.336670] <0> 0000000300000040 0000000000000000 ffffffff816b6088 ffffffff816b34c0
[525758.336677] Call Trace:
[525758.336682] [<ffffffff810b5586>] __alloc_pages_nodemask+0xd6/0x180
[525758.336687] [<ffffffff810cc37d>] do_anonymous_page+0x21d/0x540
[525758.336690] [<ffffffff810cee87>] handle_mm_fault+0x427/0x4f0
[525758.336695] [<ffffffff814b5fe7>] do_page_fault+0x147/0x390
[525758.336698] [<ffffffff814b3d28>] page_fault+0x28/0x30
[525758.336701] Code: 84 b0 00 00 00 4b 8d 44 ef 05 48 c1 e0 04 4c 8b 64 18 08 49 83 ec 28 49 8b 44 24 30 49 8b 54 24 28 49 be 00 02 20 00 00 00 ad de <48> 89 42 08 48 89 10 48 b
8 00 01 10 00 00 00 ad de 49 89 44 24
[525758.336747] RIP [<ffffffff810b4ab6>] get_page_from_freelist+0x1f6/0x550
[525758.336752] RSP <ffff8801dce4bce8>
[525758.336757] ---[ end trace 371c569b99678b87 ]---

Revision history for this message
Matt Wilson (msw-amazon) wrote :
Revision history for this message
Matt Wilson (msw-amazon) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-ec2 (Ubuntu):
status: New → Confirmed
Revision history for this message
Ilan (ilan) wrote :

The reports Matt provided are from a few of our instances. We are seeing this issue with relative frequency, although we do not have a reproducible case for triggering this condition. These particular crashes were with 2.6.32-346-ec2 #51.

Revision history for this message
Ilan (ilan) wrote :

We are seeing this occur quite frequently. Is there additional data we can provide to help with resolution?

Revision history for this message
chinmay asarawala (chinmay-asarawala) wrote :
Download full text (4.8 KiB)

We are seeing following errors on two of our nodes which brings the node to freeze.

Node 1:

May 8 11:19:05 kernel: [89019.043852] Pid: 15145, comm: python Tainted: GB 2.6.32-344-ec2 #46-Ubuntu
May 8 11:19:05 kernel: [89019.043852] Call Trace:
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810b2d90>] bad_page+0xd0/0x130
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810b5964>] free_hot_cold_page+0x2a4/0x300
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810b5a11>] __pagevec_free+0x51/0xb0
May 8 11:19:05 kernel: [89019.043852] [<ffffffff8101d472>] ? ___pte_free_tlb+0x22/0x90
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810b853c>] release_pages+0x22c/0x280
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810de6e4>] free_pages_and_swap_cache+0xb4/0xe0
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810d40bf>] unmap_region+0x14f/0x170
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810d46c5>] do_munmap+0x295/0x3b0
May 8 11:19:05 kernel: [89019.043852] [<ffffffff810d4831>] sys_munmap+0x51/0x80
May 8 11:19:05 kernel: [89019.043852] [<ffffffff81009bb8>] system_call_fastpath+0x16/0x1b
May 8 11:19:05 kernel: [89019.043852] [<ffffffff81009b50>] ? system_call+0x0/0x52

Node 2:
May 8 07:58:42 kernel: [1986597.770228] CPU 7
May 8 07:58:42 kernel: [1986597.770231] Modules linked in: ipv6
May 8 07:58:42 kernel: [1986597.770237] Pid: 25928, comm: python Not tainted 2.6.32-344-ec2 #46-Ubuntu
May 8 07:58:42 kernel: [1986597.770240] RIP: e030:[<ffffffff810b521f>] [<ffffffff810b521f>] free_pcppages_bulk+0x17f/0x360
May 8 07:58:42 kernel: [1986597.770251] RSP: e02b:ffff8800bd85fa38 EFLAGS: 00010002
May 8 07:58:42 kernel: [1986597.770254] RAX: dead000000200200 RBX: 0000000000000001 RCX: ffffffff816af2c0
May 8 07:58:42 kernel: [1986597.770257] RDX: ffffffff816af2f0 RSI: 0000000000000030 RDI: ffffffff816b0fc0
May 8 07:58:42 kernel: [1986597.770259] RBP: ffff8800bd85faa8 R08: ffff880006fbe970 R09: 4000000000020068
May 8 07:58:42 kernel: [1986597.770262] R10: 0000000000000000 R11: dead000000200200 R12: ffff880004cff8f0
May 8 07:58:42 kernel: [1986597.770264] R13: ffff880004cff918 R14: ffffffff816aef00 R15: dead000000100100
May 8 07:58:42 kernel: [1986597.770272] FS: 00007f9c5ffef700(0000) GS:ffff880002ccd000(0000) knlGS:0000000000000000
May 8 07:58:42 kernel: [1986597.770275] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
May 8 07:58:42 kernel: [1986597.770278] CR2: 00007f604762b000 CR3: 00000000580b7000 CR4: 0000000000002660
May 8 07:58:42 kernel: [1986597.770281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 8 07:58:42 kernel: [1986597.770284] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 8 07:58:42 kernel: [1986597.770287] Process python (pid: 25928, threadinfo ffff8800bd85e000, task ffff8800bdbec0c0)
May 8 07:58:42 kernel: [1986597.770291] ffffffff816b0f40 0000000200000026 ffffffff816af2f0 0000000000000030
May 8 07:58:42 kernel: [1986597.770296] <0> ffffffff816af2f8 00000007ffffffff ffffffff816af2c0 000000010000e033
May 8 07:58:42 kernel: [1986597.770301] <0> 0000000000000246 00000000000003c0 0000000000002600 0000000000000001
May 8 07:58:42 kernel: [198659...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.