Ubuntu
linux package

Bug #1627108
Comment #42

Comment 42 for bug 1627108

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-10-13:

#42

Update and request from upstream:

"I have looked at the dump and there is something very odd for
system.slice task group where the display manager is running.
system.slice->tg_load_avg is around 381697 but tg_load_avg is
normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib
whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our
case. We can have some differences because the dump of
/proc/shed_debug is not atomic and some changes can happen but nothing
like this difference.

The main effect of this quite high value is that the weight/prio of
the sched_entity that represents system.slice in root cfs_rq is very
low (lower than task with the smallest nice prio) so the system.slice
task group will not get the CPU quite often compared to the user.slice
task group: less than 1% for the system.slice where lightDM and xorg
are running compared 99% for the user.slice where the stress tasks are
running. This is confirmed by the se->avg.util_avg value of the task
groups which reflect how much time each task group is effectively
running on a CPU:
system.slice[CPU3].se->avg.util_avg = 8 whereas
user.slice[CPU3].se->avg.util_avg = 991

This difference of weight/priority explains why the system becomes
unresponsive. For now, I can't explain is why
system.slice->tg_load_avg = 381697 whereas is should be around 1013
and how the patch can generate this situation.

Is it possible to have a dump of /proc/sched_debug before starting
stress command ? to check if the problem is there from the beginning
but not seen because not overloaded. Or if it the problem comes when
user starts to load the system"

Ubuntulinux package

Comment 42 for bug 1627108

Ubuntu
linux package