Comment 2 for bug 1084469

Revision history for this message
warmcat (andy-warmcat) wrote :

I see from the log that it times out waiting on telnet, but the kernel bug is pointing elsewhere.

Can you tell me what it's doing at that point with update-apt-xapi, in terms of where the filesystems are? Are the filesystems in heavy use on USB or SD card or both at that point?

The first backtrace seems to be secondary fallout from trying to handle the real exception after it's reached a state of brain damage.

cpu #0 is here saving wget content

[ 756.669464] [<c02570fd>] (do_raw_spin_lock+0xb5/0xd4) from [<c009b8e7>] (rmqueue_bulk.constprop.43+0x23/0x8a)
[ 756.679962] [<c009b8e7>] (rmqueue_bulk.constprop.43+0x23/0x8a) from [<c009bef3>] (get_page_from_freelist+0xd3/0x1fc)
[ 756.691101] [<c009bef3>] (get_page_from_freelist+0xd3/0x1fc) from [<c009c0cd>] (__alloc_pages_nodemask+0xb1/0x3f8)
[ 756.702056] [<c009c0cd>] (__alloc_pages_nodemask+0xb1/0x3f8) from [<c009821b>] (grab_cache_page_write_begin+0x4b/0x88)
[ 756.713378] [<c009821b>] (grab_cache_page_write_begin+0x4b/0x88) from [<c011ef25>] (ext4_da_write_begin+0xed/0x170)
[ 756.724426] [<c011ef25>] (ext4_da_write_begin+0xed/0x170) from [<c0097e37>] (generic_perform_write+0x83/0x154)

but I am not sure if that's where it originally blew chunks or if the spinlock held by what actually died just deadlocked this at that point. The other traces seem to be telling us it had a hard time going on after the data abort.

In any event if we can translate the region of activities where it dies to a reproducer, like "wget big file to USB memory stick and watch it blow up" that will be helpful.

Jassi does it make any more sense to you?