Ubuntu
linux package

Comment 64 for bug 1573062

Revision history for this message

bugproxy (bugproxy) wrote on 2016-07-07: Comment bridged from LTC Bugzilla

#64

------- Comment From <email address hidden> 2016-07-06 20:56 EDT-------
From what I can see the following is the root cause of the issue

cgroup_threadgroup_rwsem almost serializes accesses on the system

1. stress-ng-brk has cgroup_threadgroup_rwsem held in read mode via copy_process() and does a schedule_timeout() from __alloc_pages_nodemask() which never seems to return from schedule_timeout()

00003fff96ceeab0 0 2799 2701 0x00040002
[ 4401.831972] Call Trace:
[ 4401.831973] [c000000ee71433c0] [c000000ee7143400] 0xc000000ee7143400 (unreliable)
[ 4401.831975] [c000000ee7143590] [c000000000017c64] __switch_to+0x204/0x360
[ 4401.831977] [c000000ee71435e0] [c000000000bb917c] __schedule+0x40c/0xe70
[ 4401.831979] [c000000ee71436a0] [c000000000bb9c34] schedule+0x54/0xd0
[ 4401.831981] [c000000ee71436d0] [c000000000bc0524] schedule_timeout+0x384/0x4f0
[ 4401.831983] [c000000ee7143800] [c00000000027de1c] __alloc_pages_nodemask+0xd0c/0xf40
[ 4401.831985] [c000000ee7143a10] [c0000000002e8d40] alloc_pages_current+0xc0/0x240
[ 4401.831988] [c000000ee7143a70] [c000000000056b6c] page_table_alloc+0xcc/0x1e0
[ 4401.831989] [c000000ee7143ac0] [c0000000002b5824] __pte_alloc+0x54/0x1e0
[ 4401.831991] [c000000ee7143b10] [c0000000002b8584] copy_page_range+0x754/0x8f0
[ 4401.831993] [c000000ee7143c40] [c0000000000bcee4] copy_process.isra.6+0x1834/0x1ab0
[ 4401.831995] [c000000ee7143d60] [c0000000000bd33c] _do_fork+0xac/0x980
[ 4401.831997] [c000000ee7143e30] [c00000000000946c] ppc_clone+0x8/0xc

[ 4401.862187] stress-ng-brk 2799 1154678.007061 854611 120 688670.967816 1148995.803148 2318289.407734 0 0 /user.slice

2. Since cgroup_threadgroup_rwsem is grabbed, we are unable to make any processes exit

[ 4177.396262] Showing all locks held in the system:
[ 4177.396263] 4 locks held by systemd/1:
[ 4177.396268] #0: (sb_writers#9){.+.+.+}, at: [<c000000000328810>] __sb_start_write+0x100/0x130
[ 4177.396272] #1: (&of->mutex){+.+.+.}, at: [<c0000000003e76ec>] kernfs_fop_write+0x7c/0x1f0
[ 4177.396275] #2: (cgroup_mutex){+.+.+.}, at: [<c0000000001b0b8c>] cgroup_kn_lock_live+0x14c/0x280
[ 4177.396278] #3: (&cgroup_threadgroup_rwsem){++++++}, at: [<c00000000013aa10>] percpu_down_write+0x50/0x180

I think at #3, we are waiting for all readers to exit cgroup_threadgroup_rwsem, this further blocks exiting threads

[ 4177.396548] #0: (&cgroup_threadgroup_rwsem){++++++}, at: [<c0000000000d9b00>] exit_signals+0x50/0x1a0
[ 4177.396548] 1 lock held by kworker/dying/1348:
[ 4177.396551] #0: (&cgroup_threadgroup_rwsem){++++++}, at: [<c0000000000d9b00>] exit_signals+0x50/0x1a0
[ 4177.396552] 1 lock held by kworker/dying/1919:
[ 4177.396555] #0: (&cgroup_threadgroup_rwsem){++++++}, at: [<c0000000000d9b00>] exit_signals+0x50/0x1a0
[ 4177.396555] 1 lock held by kworker/19:2/1930:

A similar deadlock was seen and solved in 4.5 (see https://lkml.org/lkml/2016/4/17/56)

More debugging in progress

------- Comment From <email address hidden> 2016-07-07 09:52 EDT-------
After debugging, the following seems to work fine for me

Apply the fixes mentioned at https://lkml.org/lkml/2016/4/17/56 and disable block-cgroup controller.

The block cgroup controller has no specific changes to fix any deadlocks that I am aware of, so it needs more testing and root cause analysis. I expected the can_attach callback to potentially cause this, but it does not seem to be the case.

------- Comment From balbirs@au1.ibm.com 2016-07-06 20:56 EDT-------
From what I can see the following is the root cause of the issue

cgroup_threadgroup_rwsem almost serializes accesses on the system

1. stress-ng-brk has cgroup_threadgroup_rwsem held in read mode via copy_process() and does a schedule_timeout() from __alloc_pages_nodemask() which never seems to return from schedule_timeout()

00003fff96ceeab0     0  2799   2701 0x00040002
[ 4401.831972] Call Trace:
[ 4401.831973] [c000000ee71433c0] [c000000ee7143400] 0xc000000ee7143400 (unreliable)
[ 4401.831975] [c000000ee7143590] [c000000000017c64] __switch_to+0x204/0x360
[ 4401.831977] [c000000ee71435e0] [c000000000bb917c] __schedule+0x40c/0xe70
[ 4401.831979] [c000000ee71436a0] [c000000000bb9c34] schedule+0x54/0xd0
[ 4401.831981] [c000000ee71436d0] [c000000000bc0524] schedule_timeout+0x384/0x4f0
[ 4401.831983] [c000000ee7143800] [c00000000027de1c] __alloc_pages_nodemask+0xd0c/0xf40
[ 4401.831985] [c000000ee7143a10] [c0000000002e8d40] alloc_pages_current+0xc0/0x240
[ 4401.831988] [c000000ee7143a70] [c000000000056b6c] page_table_alloc+0xcc/0x1e0
[ 4401.831989] [c000000ee7143ac0] [c0000000002b5824] __pte_alloc+0x54/0x1e0
[ 4401.831991] [c000000ee7143b10] [c0000000002b8584] copy_page_range+0x754/0x8f0
[ 4401.831993] [c000000ee7143c40] [c0000000000bcee4] copy_process.isra.6+0x1834/0x1ab0
[ 4401.831995] [c000000ee7143d60] [c0000000000bd33c] _do_fork+0xac/0x980
[ 4401.831997] [c000000ee7143e30] [c00000000000946c] ppc_clone+0x8/0xc

[ 4401.861569] cfs_rq[23]:/user.slice
[ 4401.861570]   .exec_clock                    : 1725230.642232
[ 4401.861571]   .MIN_vruntime                  : 0.000001
[ 4401.861572]   .min_vruntime                  : 1154678.434341
[ 4401.861573]   .max_vruntime                  : 0.000001
[ 4401.861573]   .spread                        : 0.000000
[ 4401.861574]   .spread0                       : -97866589.605918
[ 4401.861575]   .nr_spread_over                : 11
[ 4401.861575]   .nr_running                    : 0

[ 4401.862187]    stress-ng-brk  2799   1154678.007061    854611   120    688670.967816   1148995.803148   2318289.407734 0 0 /user.slice

2. Since cgroup_threadgroup_rwsem is grabbed, we are unable to make any processes exit

[ 4177.396262] Showing all locks held in the system:
[ 4177.396263] 4 locks held by systemd/1:
[ 4177.396268]  #0:  (sb_writers#9){.+.+.+}, at: [<c000000000328810>] __sb_start_write+0x100/0x130
[ 4177.396272]  #1:  (&of->mutex){+.+.+.}, at: [<c0000000003e76ec>] kernfs_fop_write+0x7c/0x1f0
[ 4177.396275]  #2:  (cgroup_mutex){+.+.+.}, at: [<c0000000001b0b8c>] cgroup_kn_lock_live+0x14c/0x280
[ 4177.396278]  #3:  (&cgroup_threadgroup_rwsem){++++++}, at: [<c00000000013aa10>] percpu_down_write+0x50/0x180

I think at #3, we are waiting for all readers to exit cgroup_threadgroup_rwsem, this further blocks exiting threads

[ 4177.396548]  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<c0000000000d9b00>] exit_signals+0x50/0x1a0
[ 4177.396548] 1 lock held by kworker/dying/1348:
[ 4177.396551]  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<c0000000000d9b00>] exit_signals+0x50/0x1a0
[ 4177.396552] 1 lock held by kworker/dying/1919:
[ 4177.396555]  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<c0000000000d9b00>] exit_signals+0x50/0x1a0
[ 4177.396555] 1 lock held by kworker/19:2/1930:

A similar deadlock was seen and solved in 4.5 (see https://lkml.org/lkml/2016/4/17/56)

More debugging in progress

------- Comment From balbirs@au1.ibm.com 2016-07-07 09:52 EDT-------
After debugging, the following seems to work fine for me

Apply the fixes mentioned at https://lkml.org/lkml/2016/4/17/56 and disable block-cgroup controller.

Ubuntulinux package

Comment 64 for bug 1573062

Ubuntu
linux package