We have been experiencing something similar with a custom image built with python-vmbuilder, using kernel aki-2407f24d (2.6.32-308.15)
During the hang, the following repeats in kern.log a number of times:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057125] INFO: task java:4479 blocked for more than 120 seconds.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057144] java D 0000000000000002 0 4479 1230 0x00000000
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057149] ffff8801dda81e00 0000000000000282 0000000000000000 ffff8801dda81d80
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057152] ffffffff81338023 ffff8801dda81dc8 ffff8801dcfccab8 ffff8801dda81fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057155] ffff8801dcfcc700 ffff8801dcfcc700 ffff8801dcfcc700 ffff8801dda81fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057158] Call Trace:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057169] [<ffffffff81338023>] ? cpumask_next_and+0x23/0x40
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057175] [<ffffffff813a695b>] ? xen_spin_kick+0x4b/0x130
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057181] [<ffffffff810383f8>] ? check_preempt_wakeup+0x2a8/0x3b0
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057186] [<ffffffff814b0587>] ? _spin_unlock_irqrestore+0x77/0x90
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057189] [<ffffffff814aff8d>] rwsem_down_failed_common+0xbd/0x240
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057191] [<ffffffff814b0166>] rwsem_down_read_failed+0x26/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057195] [<ffffffff81341ec4>] call_rwsem_down_read_failed+0x14/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057197] [<ffffffff814af302>] ? down_read+0x12/0x20
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057200] [<ffffffff814b2dc4>] do_page_fault+0x2f4/0x390
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057203] [<ffffffff814b0a48>] page_fault+0x28/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057205] INFO: task java:4480 blocked for more than 120 seconds.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057217] java D 0000000000000001 0 4480 1230 0x00000000
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057220] ffff8801dd603e00 0000000000000282 0000000000000035 ffff8801dd603d80
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057223] 0000000000000000 ffff8801dd603dc8 ffff8801dd70aa38 ffff8801dd603fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057226] ffff8801dd70a680 ffff8801dd70a680 ffff8801dd70a680 ffff8801dd603fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057228] Call Trace:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057231] [<ffffffff814aff8d>] rwsem_down_failed_common+0xbd/0x240
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057234] [<ffffffff81037178>] ? set_next_entity+0x88/0x90
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057236] [<ffffffff814b0166>] rwsem_down_read_failed+0x26/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057241] [<ffffffff81341ec4>] call_rwsem_down_read_failed+0x14/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057244] [<ffffffff814af302>] ? down_read+0x12/0x20
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057246] [<ffffffff814b2dc4>] do_page_fault+0x2f4/0x390
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057248] [<ffffffff814b0a48>] page_fault+0x28/0x30
After this absolutely nothing happens for a number of minutes. The things start working normally.
The total stop can be > 30 minutes. Sometimes it lasts even longer, and we finally give up waiting and reboot the instance.
We have been experiencing something similar with a custom image built with python-vmbuilder, using kernel aki-2407f24d (2.6.32-308.15)
During the hang, the following repeats in kern.log a number of times:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057125] INFO: task java:4479 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. 023>] ? cpumask_ next_and+ 0x23/0x40 95b>] ? xen_spin_ kick+0x4b/ 0x130 3f8>] ? check_preempt_ wakeup+ 0x2a8/0x3b0 587>] ? _spin_unlock_ irqrestore+ 0x77/0x90 f8d>] rwsem_down_ failed_ common+ 0xbd/0x240 166>] rwsem_down_ read_failed+ 0x26/0x30 ec4>] call_rwsem_ down_read_ failed+ 0x14/0x30 302>] ? down_read+0x12/0x20 dc4>] do_page_ fault+0x2f4/ 0x390 a48>] page_fault+ 0x28/0x30 kernel/ hung_task_ timeout_ secs" disables this message. f8d>] rwsem_down_ failed_ common+ 0xbd/0x240 178>] ? set_next_ entity+ 0x88/0x90 166>] rwsem_down_ read_failed+ 0x26/0x30 ec4>] call_rwsem_ down_read_ failed+ 0x14/0x30 302>] ? down_read+0x12/0x20 dc4>] do_page_ fault+0x2f4/ 0x390 a48>] page_fault+ 0x28/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057139] "echo 0 > /proc/sys/
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057144] java D 0000000000000002 0 4479 1230 0x00000000
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057149] ffff8801dda81e00 0000000000000282 0000000000000000 ffff8801dda81d80
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057152] ffffffff81338023 ffff8801dda81dc8 ffff8801dcfccab8 ffff8801dda81fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057155] ffff8801dcfcc700 ffff8801dcfcc700 ffff8801dcfcc700 ffff8801dda81fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057158] Call Trace:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057169] [<ffffffff81338
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057175] [<ffffffff813a6
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057181] [<ffffffff81038
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057186] [<ffffffff814b0
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057189] [<ffffffff814af
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057191] [<ffffffff814b0
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057195] [<ffffffff81341
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057197] [<ffffffff814af
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057200] [<ffffffff814b2
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057203] [<ffffffff814b0
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057205] INFO: task java:4480 blocked for more than 120 seconds.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057209] "echo 0 > /proc/sys/
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057217] java D 0000000000000001 0 4480 1230 0x00000000
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057220] ffff8801dd603e00 0000000000000282 0000000000000035 ffff8801dd603d80
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057223] 0000000000000000 ffff8801dd603dc8 ffff8801dd70aa38 ffff8801dd603fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057226] ffff8801dd70a680 ffff8801dd70a680 ffff8801dd70a680 ffff8801dd603fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057228] Call Trace:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057231] [<ffffffff814af
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057234] [<ffffffff81037
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057236] [<ffffffff814b0
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057241] [<ffffffff81341
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057244] [<ffffffff814af
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057246] [<ffffffff814b2
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057248] [<ffffffff814b0
After this absolutely nothing happens for a number of minutes. The things start working normally.
The total stop can be > 30 minutes. Sometimes it lasts even longer, and we finally give up waiting and reboot the instance.