------- Comment From <email address hidden> 2016-07-06 20:56 EDT-------
From what I can see the following is the root cause of the issue
cgroup_threadgroup_rwsem almost serializes accesses on the system
1. stress-ng-brk has cgroup_threadgroup_rwsem held in read mode via copy_process() and does a schedule_timeout() from __alloc_pages_nodemask() which never seems to return from schedule_timeout()
The block cgroup controller has no specific changes to fix any deadlocks that I am aware of, so it needs more testing and root cause analysis. I expected the can_attach callback to potentially cause this, but it does not seem to be the case.
------- Comment From <email address hidden> 2016-07-06 20:56 EDT-------
From what I can see the following is the root cause of the issue
cgroup_ threadgroup_ rwsem almost serializes accesses on the system
1. stress-ng-brk has cgroup_ threadgroup_ rwsem held in read mode via copy_process() and does a schedule_timeout() from __alloc_ pages_nodemask( ) which never seems to return from schedule_timeout()
00003fff96ceeab0 0 2799 2701 0x00040002 to+0x204/ 0x360 0x40c/0xe70 timeout+ 0x384/0x4f0 pages_nodemask+ 0xd0c/0xf40 current+ 0xc0/0x240 alloc+0xcc/ 0x1e0 0x54/0x1e0 range+0x754/ 0x8f0 isra.6+ 0x1834/ 0x1ab0
[ 4401.831972] Call Trace:
[ 4401.831973] [c000000ee71433c0] [c000000ee7143400] 0xc000000ee7143400 (unreliable)
[ 4401.831975] [c000000ee7143590] [c000000000017c64] __switch_
[ 4401.831977] [c000000ee71435e0] [c000000000bb917c] __schedule+
[ 4401.831979] [c000000ee71436a0] [c000000000bb9c34] schedule+0x54/0xd0
[ 4401.831981] [c000000ee71436d0] [c000000000bc0524] schedule_
[ 4401.831983] [c000000ee7143800] [c00000000027de1c] __alloc_
[ 4401.831985] [c000000ee7143a10] [c0000000002e8d40] alloc_pages_
[ 4401.831988] [c000000ee7143a70] [c000000000056b6c] page_table_
[ 4401.831989] [c000000ee7143ac0] [c0000000002b5824] __pte_alloc+
[ 4401.831991] [c000000ee7143b10] [c0000000002b8584] copy_page_
[ 4401.831993] [c000000ee7143c40] [c0000000000bcee4] copy_process.
[ 4401.831995] [c000000ee7143d60] [c0000000000bd33c] _do_fork+0xac/0x980
[ 4401.831997] [c000000ee7143e30] [c00000000000946c] ppc_clone+0x8/0xc
[ 4401.861569] cfs_rq[ 23]:/user. slice
[ 4401.861570] .exec_clock : 1725230.642232
[ 4401.861571] .MIN_vruntime : 0.000001
[ 4401.861572] .min_vruntime : 1154678.434341
[ 4401.861573] .max_vruntime : 0.000001
[ 4401.861573] .spread : 0.000000
[ 4401.861574] .spread0 : -97866589.605918
[ 4401.861575] .nr_spread_over : 11
[ 4401.861575] .nr_running : 0
[ 4401.862187] stress-ng-brk 2799 1154678.007061 854611 120 688670.967816 1148995.803148 2318289.407734 0 0 /user.slice
2. Since cgroup_ threadgroup_ rwsem is grabbed, we are unable to make any processes exit
[ 4177.396262] Showing all locks held in the system: 9){.+.+ .+}, at: [<c000000000328 810>] __sb_start_ write+0x100/ 0x130 {+.+.+. }, at: [<c0000000003e7 6ec>] kernfs_ fop_write+ 0x7c/0x1f0 mutex){ +.+.+.} , at: [<c0000000001b0 b8c>] cgroup_ kn_lock_ live+0x14c/ 0x280 threadgroup_ rwsem){ ++++++} , at: [<c00000000013a a10>] percpu_ down_write+ 0x50/0x180
[ 4177.396263] 4 locks held by systemd/1:
[ 4177.396268] #0: (sb_writers#
[ 4177.396272] #1: (&of->mutex)
[ 4177.396275] #2: (cgroup_
[ 4177.396278] #3: (&cgroup_
I think at #3, we are waiting for all readers to exit cgroup_ threadgroup_ rwsem, this further blocks exiting threads
[ 4177.396548] #0: (&cgroup_ threadgroup_ rwsem){ ++++++} , at: [<c0000000000d9 b00>] exit_signals+ 0x50/0x1a0 threadgroup_ rwsem){ ++++++} , at: [<c0000000000d9 b00>] exit_signals+ 0x50/0x1a0 threadgroup_ rwsem){ ++++++} , at: [<c0000000000d9 b00>] exit_signals+ 0x50/0x1a0
[ 4177.396548] 1 lock held by kworker/dying/1348:
[ 4177.396551] #0: (&cgroup_
[ 4177.396552] 1 lock held by kworker/dying/1919:
[ 4177.396555] #0: (&cgroup_
[ 4177.396555] 1 lock held by kworker/19:2/1930:
A similar deadlock was seen and solved in 4.5 (see https:/ /lkml.org/ lkml/2016/ 4/17/56)
More debugging in progress
------- Comment From <email address hidden> 2016-07-07 09:52 EDT-------
After debugging, the following seems to work fine for me
Apply the fixes mentioned at https:/ /lkml.org/ lkml/2016/ 4/17/56 and disable block-cgroup controller.
The block cgroup controller has no specific changes to fix any deadlocks that I am aware of, so it needs more testing and root cause analysis. I expected the can_attach callback to potentially cause this, but it does not seem to be the case.