Kernel hang under memory stress
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned |
Bug Description
When running memhog test program (which allocates lots of memory and then randomly touches pages) 9.10 alpha 6 gets stuck in read_swap_
<3>[50313.402614] BUG: soft lockup - CPU#0 stuck for 61s! [memhog:16322]
<4>[50313.402614] Modules linked in: vmblock vsock vmci vmmemctl vmhgfs pvscsi acpiphp binfmt_misc snd_ens1371 gameport snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device ppdev psmouse serio_raw snd soundcore
snd_page_alloc i2c_piix4 parport_pc lp intel_agp parport shpchp mptspi mptscsih mptbase e1000 scsi_transport_spi floppy
<6>[50313.402614] CPU 0:
<4>[50313.402614] Modules linked in: vmblock vsock vmci vmmemctl vmhgfs pvscsi acpiphp binfmt_misc snd_ens1371 gameport snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device ppdev psmouse serio_raw snd soundcore
snd_page_alloc i2c_piix4 parport_pc lp intel_agp parport shpchp mptspi mptscsih mptbase e1000 scsi_transport_spi floppy
<6>[50313.402614] Pid: 16322, comm: memhog Not tainted 2.6.31-10-generic #34-Ubuntu GT5414E
<6>[50313.402614] RIP: 0010:[<
<6>[50313.402614] RSP: 0018:ffff88002f
<6>[50313.402614] RAX: 00000000ffffffef RBX: ffff88002f54bd68 RCX: 0000000000000000
<6>[50313.402614] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81980210
<6>[50313.402614] RBP: ffffffff81012b6e R08: 0000000000000000 R09: 0000000000000efd
<6>[50313.402614] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
<6>[50313.402614] R13: ffffc90010664f42 R14: 0000000000000000 R15: 0000000000000efd
<6>[50313.402614] FS: 00007ff09d9316f
<6>[50313.402614] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
<6>[50313.402614] CR2: 00000000e8fa9000 CR3: 000000001a82e000 CR4: 00000000000006b0
<6>[50313.402614] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<6>[50313.402614] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[50313.402614] Call Trace:
<4>[50313.402614] [<ffffffff810ff
<4>[50313.402614] [<ffffffff810ff
<4>[50313.402614] [<ffffffff810ff
<4>[50313.402614] [<ffffffff810f3
<4>[50313.402614] [<ffffffff8104a
<4>[50313.402614] [<ffffffff81012
<4>[50313.402614] [<ffffffff81272
<4>[50313.402614] [<ffffffff810f3
<4>[50313.402614] [<ffffffff81032
<4>[50313.402614] [<ffffffff81525
<4>[50313.402614] [<ffffffff81523
In my current hang this function calls find_get_page(..., 0x17A1), which in turn returns NULL because radix_tree_
Problem does not occur on 9.04.
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | New → Triaged |
I've reproed it with all daemons & programs killed - see new attachment. Apparently one of memhog threads went to sleep on malloc while adding entry to the radix tree:
<6>[ 1789.980812] memhog R running task 0 3039 3036 0x00020000 30c>] __cond_ resched+ 0x1c/0x50 5ab>] _cond_resched+ 0x2b/0x40 3e9>] kmem_cache_ alloc+0xd9/ 0x150 db6>] ? radix_tree_ preload+ 0x36/0xa0 db6>] radix_tree_ preload+ 0x36/0xa0 5d1>] add_to_ swap_cache+ 0x21/0xd0 706>] read_swap_ cache_async+ 0x86/0x120 e96>] ? valid_swaphandl es+0x166/ 0x190 81f>] swapin_ readahead+ 0x7f/0xb0 73e>] do_swap_ page+0x2ce/ 0x420 155>] ? finish_ task_switch+ 0x65/0x120 9ce>] ? common_ interrupt+ 0xe/0x13 b5b>] handle_ mm_fault+ 0x2cb/0x3c0 439>] ? default_ spin_lock_ flags+0x9/ 0x10 fba>] do_page_ fault+0x16a/ 0x370 975>] page_fault+ 0x25/0x30
<4>[ 1789.980812] ffff88003309dc38 0000000000000086 ffff880000010800 0000000000015580
<4>[ 1789.980812] ffff88003992c7c0 0000000000015580 0000000000015580 0000000000015580
<4>[ 1789.980812] 0000000000015580 ffff88003992c7c8 0000000000015580 0000000000015580
<4>[ 1789.980812] Call Trace:
<4>[ 1789.980812] [<ffffffff81056
<4>[ 1789.980812] [<ffffffff81521
<4>[ 1789.980812] [<ffffffff81110
<4>[ 1789.980812] [<ffffffff81271
<4>[ 1789.980812] [<ffffffff81271
<4>[ 1789.980812] [<ffffffff810ff
<4>[ 1789.980812] [<ffffffff810ff
<4>[ 1789.980812] [<ffffffff810ff
<4>[ 1789.980812] [<ffffffff810ff
<4>[ 1789.980812] [<ffffffff810f3
<4>[ 1789.980812] [<ffffffff8104a
<4>[ 1789.980812] [<ffffffff81012
<4>[ 1789.980812] [<ffffffff810f3
<4>[ 1789.980812] [<ffffffff81032
<4>[ 1789.980812] [<ffffffff81525
<4>[ 1789.980812] [<ffffffff81523
while other thread then wants to read this very same entry. I have no idea why kswapd does not get woken up and swap something to unblock this thread.