Comment 3 for bug 1389787

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote : Re: 3.11 memory consumption leads to HANG (not sure if 3.13 suffers from this).

continuing on the back trace:

#8 [ffff880092791dd0] page_fault at ffffffff81747e98
[exception RIP: kmem_cache_alloc+102]
RIP: ffffffff8119c316 RSP: ffff880092791e88 RFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffffffffffffea RCX: 00000000260000be
RDX: 00000000260000bd RSI: 00000000000000d0 RDI: 00000000000173c0
RBP: ffff880092791ed8 R8: ffff880c0bb173c0 R9: 0000000000001165
R10: 00007f2e4ac88dcc R11: 0000000000000246 R12: ffff880c0b403800
R13: ffff880ce78ccc00 R14: ffffffff8108f636 R15: 00000000000000d0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff880092791ee0] prepare_creds at ffffffff8108f636
#10 [ffff880092791f00] sys_faccessat at ffffffff811b22b4
#11 [ffff880092791f70] sys_access at ffffffff811b24b8

In "prepare_creds" execution we called "kmem_cache_alloc" which triggered a page fault that was handled by the same task... (so we had a page fault inside kernel context). With the rest of the back trace:

#3 [ffff880092791bd0] no_context at ffffffff8172dd02
#4 [ffff880092791c20] __bad_area_nosemaphore at ffffffff8172dee4
#5 [ffff880092791c80] bad_area_nosemaphore at ffffffff8172df16
#6 [ffff880092791c90] __do_page_fault at ffffffff8174bc12
#7 [ffff880092791da0] do_page_fault at ffffffff8174bde7

But kernel couldn't handle this page fault .. things that might have happened:

0) error code can be PF_RSVD, PF_USER or PF_PROT (to be checked)
if not
0.1) address for the page fault is not inside vmalloc area (return -1) (since its a page fault for a virtual address)
1) this wasn't a spurious fault (for sure) (caused by cpu walking the page table by itself)

kernel oops with no_context...

Im still working on this.. will come back here with comments soon.