A kernel BUG is sometimes observed when using fscache:
Jun 25 11:32:08 kernel: [4740718.880898] FS-Cache:
Jun 25 11:32:08 kernel: [4740718.880920] FS-Cache: Assertion failed
Jun 25 11:32:08 kernel: [4740718.880934] FS-Cache: 0 > 0 is false
Jun 25 11:32:08 kernel: [4740718.881001] ------------[ cut here ]------------
Jun 25 11:32:08 kernel: [4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449!
Jun 25 11:32:08 kernel: [4740718.881040] invalid opcode: 0000 [#1] SMP
...
Jun 25 11:32:08 kernel: [4740718.892659] Call Trace:
Jun 25 11:32:08 kernel: [4740718.893506] [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles]
Jun 25 11:32:08 kernel: [4740718.894374] [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache]
Jun 25 11:32:08 kernel: [4740718.895180] [<ffffffff81096da0>] process_one_work+0x150/0x3f0
Jun 25 11:32:08 kernel: [4740718.895966] [<ffffffff8109751a>] worker_thread+0x11a/0x470
Jun 25 11:32:08 kernel: [4740718.896753] [<ffffffff81808e59>] ? __schedule+0x359/0x980
Jun 25 11:32:08 kernel: [4740718.897783] [<ffffffff81097400>] ? rescuer_thread+0x310/0x310
Jun 25 11:32:08 kernel: [4740718.898581] [<ffffffff8109cdd6>] kthread+0xd6/0xf0
Jun 25 11:32:08 kernel: [4740718.899469] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
Jun 25 11:32:08 kernel: [4740718.900477] [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
Jun 25 11:32:08 kernel: [4740718.901514] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
[Problem]
In include/fscache-cache.h, fscache_retrieval_complete reads, in part:
atomic_sub(n_pages, &op->n_pages);
if (atomic_read(&op->n_pages) <= 0) fscache_op_complete(&op->op, true);
The code is using atomic_sub followed by an atomic_read. This causes two threads doing a decrement of pages to race with each other seeing the op->refcount <= 0 at same time,
and end up calling fscache_op_complete in both the threads leading to the OOPS.
[Fix]
The fix is trivial to use atomic_sub_return instead of two calls.
[Testcase]
The user has tested the patch successfully on their fscache/cachefiles setup.
[Regression Potential]
Limited to fscache. Small, comprehensible change.
SRU Justification
-----------------
[Impact]
A kernel BUG is sometimes observed when using fscache:
Jun 25 11:32:08 kernel: [4740718.880898] FS-Cache: linux-4. 4.0/fs/ fscache/ operation. c:449! cf9>] cachefiles_ read_copier+ 0x3a9/0x410 [cachefiles] 272>] fscache_ op_work_ func+0x22/ 0x50 [fscache] da0>] process_ one_work+ 0x150/0x3f0 51a>] worker_ thread+ 0x11a/0x470 e59>] ? __schedule+ 0x359/0x980 400>] ? rescuer_ thread+ 0x310/0x310 dd6>] kthread+0xd6/0xf0 d00>] ? kthread_ park+0x60/ 0x60 0cf>] ret_from_ fork+0x3f/ 0x70 d00>] ? kthread_ park+0x60/ 0x60
Jun 25 11:32:08 kernel: [4740718.880920] FS-Cache: Assertion failed
Jun 25 11:32:08 kernel: [4740718.880934] FS-Cache: 0 > 0 is false
Jun 25 11:32:08 kernel: [4740718.881001] ------------[ cut here ]------------
Jun 25 11:32:08 kernel: [4740718.881017] kernel BUG at /usr/src/
Jun 25 11:32:08 kernel: [4740718.881040] invalid opcode: 0000 [#1] SMP
...
Jun 25 11:32:08 kernel: [4740718.892659] Call Trace:
Jun 25 11:32:08 kernel: [4740718.893506] [<ffffffffc1464
Jun 25 11:32:08 kernel: [4740718.894374] [<ffffffffc037e
Jun 25 11:32:08 kernel: [4740718.895180] [<ffffffff81096
Jun 25 11:32:08 kernel: [4740718.895966] [<ffffffff81097
Jun 25 11:32:08 kernel: [4740718.896753] [<ffffffff81808
Jun 25 11:32:08 kernel: [4740718.897783] [<ffffffff81097
Jun 25 11:32:08 kernel: [4740718.898581] [<ffffffff8109c
Jun 25 11:32:08 kernel: [4740718.899469] [<ffffffff8109c
Jun 25 11:32:08 kernel: [4740718.900477] [<ffffffff8180d
Jun 25 11:32:08 kernel: [4740718.901514] [<ffffffff8109c
[Problem]
In include/ fscache- cache.h, fscache_ retrieval_ complete reads, in part:
if (atomic_
The code is using atomic_sub followed by an atomic_read. This causes two threads doing a decrement of pages to race with each other seeing the op->refcount <= 0 at same time,
and end up calling fscache_op_complete in both the threads leading to the OOPS.
[Fix]
The fix is trivial to use atomic_sub_return instead of two calls.
[Testcase]
The user has tested the patch successfully on their fscache/cachefiles setup.
[Regression Potential]
Limited to fscache. Small, comprehensible change.