Comment 6 for bug 1389787

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote : Re: 3.11 memory consumption leads to HANG (not sure if 3.13 suffers from this).

For the kmem_cache_alloc problem I found a really promising fix saying that the commit:

""""
commit ba5bb147330a8737b6b5a812cc774c79c070704b
Author: Al Viro <email address hidden>
Date: Thu Mar 21 02:21:19 2013 -0400

pipe: take allocation and freeing of pipe_inode_info out of ->i_mutex
""""

present from tags v3.10 to today....

caused a use-after-free problem on VERY rare occasions (maybe until your workload was discovered). This problem really looks like the problem we are having here and is fixed by the following commit:

""""
commit b0d8d2292160bb63de1972361ebed100c64b5b37
Author: Linus Torvalds <email address hidden>
Date: Mon Dec 2 09:44:51 2013 -0800

vfs: fix subtle use-after-free of pipe_inode_info

The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:

spin_lock(&inode->i_lock);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
__pipe_unlock(pipe);
if (kill)
free_pipe_info(pipe);

where the final freeing is done last.

HOWEVER. The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".

This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.

Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.

Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early. And since there were two users of this
pattern, create a helper function for it.

Introduced commit ba5bb147330a ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").

Reported-by: Simon Kirby <email address hidden>
Reported-by: Ian Applegate <email address hidden>
Acked-by: Al Viro <email address hidden>
Cc: <email address hidden> # v3.10+
Signed-off-by: Linus Torvalds <email address hidden>
""""

The fix is contained in the following tags:

inaddy@inerddy:/bugs/kernel/upstream$ git tag --contains b0d8d2292160bb63de1972361ebed100c64b5b37
v3.13
v3.13-rc3
v3.13-rc4
v3.13-rc5
v3.13-rc6
v3.13-rc7
v3.13-rc8
v3.14
v3.14-rc1
v3.14-rc2

And v3.13 might not suffer from this issue.