Comment 4 for bug 741224

Revision history for this message
Peter Værlien (VT) (peter-verdandetechnology) wrote :

We have been experiencing something similar with a custom image built with python-vmbuilder, using kernel aki-2407f24d (2.6.32-308.15)

During the hang, the following repeats in kern.log a number of times:

Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057125] INFO: task java:4479 blocked for more than 120 seconds.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057144] java D 0000000000000002 0 4479 1230 0x00000000
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057149] ffff8801dda81e00 0000000000000282 0000000000000000 ffff8801dda81d80
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057152] ffffffff81338023 ffff8801dda81dc8 ffff8801dcfccab8 ffff8801dda81fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057155] ffff8801dcfcc700 ffff8801dcfcc700 ffff8801dcfcc700 ffff8801dda81fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057158] Call Trace:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057169] [<ffffffff81338023>] ? cpumask_next_and+0x23/0x40
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057175] [<ffffffff813a695b>] ? xen_spin_kick+0x4b/0x130
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057181] [<ffffffff810383f8>] ? check_preempt_wakeup+0x2a8/0x3b0
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057186] [<ffffffff814b0587>] ? _spin_unlock_irqrestore+0x77/0x90
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057189] [<ffffffff814aff8d>] rwsem_down_failed_common+0xbd/0x240
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057191] [<ffffffff814b0166>] rwsem_down_read_failed+0x26/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057195] [<ffffffff81341ec4>] call_rwsem_down_read_failed+0x14/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057197] [<ffffffff814af302>] ? down_read+0x12/0x20
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057200] [<ffffffff814b2dc4>] do_page_fault+0x2f4/0x390
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057203] [<ffffffff814b0a48>] page_fault+0x28/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057205] INFO: task java:4480 blocked for more than 120 seconds.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057217] java D 0000000000000001 0 4480 1230 0x00000000
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057220] ffff8801dd603e00 0000000000000282 0000000000000035 ffff8801dd603d80
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057223] 0000000000000000 ffff8801dd603dc8 ffff8801dd70aa38 ffff8801dd603fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057226] ffff8801dd70a680 ffff8801dd70a680 ffff8801dd70a680 ffff8801dd603fd8
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057228] Call Trace:
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057231] [<ffffffff814aff8d>] rwsem_down_failed_common+0xbd/0x240
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057234] [<ffffffff81037178>] ? set_next_entity+0x88/0x90
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057236] [<ffffffff814b0166>] rwsem_down_read_failed+0x26/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057241] [<ffffffff81341ec4>] call_rwsem_down_read_failed+0x14/0x30
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057244] [<ffffffff814af302>] ? down_read+0x12/0x20
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057246] [<ffffffff814b2dc4>] do_page_fault+0x2f4/0x390
Aug 24 18:13:38 ip-10-83-47-71 kernel: [28107.057248] [<ffffffff814b0a48>] page_fault+0x28/0x30

After this absolutely nothing happens for a number of minutes. The things start working normally.
The total stop can be > 30 minutes. Sometimes it lasts even longer, and we finally give up waiting and reboot the instance.