Ubuntu
linux package

Bug #1836971
Activity log

Activity log for bug #1836971

Date	Who	What changed	Old value	New value	Message
2019-07-17 23:04:09	Matthew Ruffell	bug			added bug
2019-07-17 23:04:18	Matthew Ruffell	nominated for series		Ubuntu Bionic
2019-07-17 23:04:18	Matthew Ruffell	bug task added		linux (Ubuntu Bionic)
2019-07-17 23:04:30	Matthew Ruffell	tags		sts
2019-07-17 23:06:01	Matthew Ruffell	description	BugLink: https://bugs.launchpad.net/bugs/ [Impact] On machines with extremely high CPU usage, parent task groups which have a large number of children can make the for loop in sched_cfs_period_timer() run until the watchdog fires when the cfs_period_us setting is too short. In this particular case, it is unlikely that the call to hrtimer_forward_now() will return 0, meaning the for loop is never left, and tasks are never rescheduled. The large number of children makes do_sched_cfs_period_timer() take longer than the period, which impacts calls to hrtimer_forward_now(). The kernel will produce this call trace: CPU: 51 PID: 0 Comm: swapper/51 Tainted: P OELK 4.15.0-50-generic #54-Ubuntu Call Trace: <IRQ> ? sched_clock+0x9/0x10 walk_tg_tree_from+0x61/0xd0 ? task_rq_unlock+0x30/0x30 unthrottle_cfs_rq+0xcb/0x1a0 distribute_cfs_runtime+0xd7/0x100 sched_cfs_period_timer+0xd9/0x1a0 ? sched_cfs_slack_timer+0xe0/0xe0 __hrtimer_run_queues+0xdf/0x230 hrtimer_interrupt+0xa0/0x1d0 smp_apic_timer_interrupt+0x6f/0x130 apic_timer_interrupt+0x84/0x90 </IRQ> This has been hit in production in a particularly highly utilised hadoop cluster which is powering an analytics platform. About 30% of the cluster experiences this issue every week, and the machines need a manual reboot to get back online. [Fix] This was fixed in 5.1 upstream with the below commit: commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Author: Phil Auld <pauld@redhat.com> Date: Tue Mar 19 09:00:05 2019 -0400 subject: sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup This commit adds a check to see if the loop has run too many times, and if it has, scales up the period and quota, so the timer can complete before the next period expires, which enables the task to be rescheduled normally. Note, 2e8e19226398db8265a8e675fcc0118b9e80c9e8 was included in upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10. Please cherry pick 2e8e19226398db8265a8e675fcc0118b9e80c9e8 to all bionic kernels. [Testcase] Kind of hard to reproduce, so this was tested on a production hadoop cluster with extremely high CPU load. I built a test kernel, which is available here: https://launchpad.net/~mruffell/+archive/ubuntu/sf232784-test For unpatched kernels, expect the machine to lockup and print the call trace in the impact section. For patched kernels, if the machine hits the condition, it will print a warning to the kernel log with the new period and quota which has been used: Example from the same hadoop cluster with a machine running the test kernel: % uname -a 4.15.0-50-generic #54+hf232784v20190626b1-Ubuntu % sudo grep cfs /var/log/kern.log.* cfs_period_timer[cpu40]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 22430, cfs_quota_us = 1148437) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 25759, cfs_quota_us = 1318908) cfs_period_timer[cpu68]: period too short, scaling up (new cfs_period_us 29583, cfs_quota_us = 1514684) cfs_period_timer[cpu49]: period too short, scaling up (new cfs_period_us 33974, cfs_quota_us = 1739519) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 39017, cfs_quota_us = 1997729) cfs_period_timer[cpu10]: period too short, scaling up (new cfs_period_us 44809, cfs_quota_us = 2294267) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 51460, cfs_quota_us = 2634823) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 59099, cfs_quota_us = 3025929) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) [Regression Potential] This patch was accepted into upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10, and is thus treated as stable and trusted by the community. Xenial received this patch in 4.4.0-150.176, as per LP #1828420 Disco will receive this patch in the next version, as per LP #1830922 Eoan already has the patch, being based on 5.2. While this does effect a core part of the kernel, the scheduler, the patch has been extensively tested, and it has been proven in production environments, so the overall risk is low.	BugLink: https://bugs.launchpad.net/bugs/1836971 [Impact] On machines with extremely high CPU usage, parent task groups which have a large number of children can make the for loop in sched_cfs_period_timer() run until the watchdog fires when the cfs_period_us setting is too short. In this particular case, it is unlikely that the call to hrtimer_forward_now() will return 0, meaning the for loop is never left, and tasks are never rescheduled. The large number of children makes do_sched_cfs_period_timer() take longer than the period, which impacts calls to hrtimer_forward_now(). The kernel will produce this call trace: CPU: 51 PID: 0 Comm: swapper/51 Tainted: P OELK 4.15.0-50-generic #54-Ubuntu Call Trace: <IRQ> ? sched_clock+0x9/0x10 walk_tg_tree_from+0x61/0xd0 ? task_rq_unlock+0x30/0x30 unthrottle_cfs_rq+0xcb/0x1a0 distribute_cfs_runtime+0xd7/0x100 sched_cfs_period_timer+0xd9/0x1a0 ? sched_cfs_slack_timer+0xe0/0xe0 __hrtimer_run_queues+0xdf/0x230 hrtimer_interrupt+0xa0/0x1d0 smp_apic_timer_interrupt+0x6f/0x130 apic_timer_interrupt+0x84/0x90 </IRQ> This has been hit in production in a particularly highly utilised hadoop cluster which is powering an analytics platform. About 30% of the cluster experiences this issue every week, and the machines need a manual reboot to get back online. [Fix] This was fixed in 5.1 upstream with the below commit: commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Author: Phil Auld <pauld@redhat.com> Date: Tue Mar 19 09:00:05 2019 -0400 subject: sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup This commit adds a check to see if the loop has run too many times, and if it has, scales up the period and quota, so the timer can complete before the next period expires, which enables the task to be rescheduled normally. Note, 2e8e19226398db8265a8e675fcc0118b9e80c9e8 was included in upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10. Please cherry pick 2e8e19226398db8265a8e675fcc0118b9e80c9e8 to all bionic kernels. [Testcase] Kind of hard to reproduce, so this was tested on a production hadoop cluster with extremely high CPU load. I built a test kernel, which is available here: https://launchpad.net/~mruffell/+archive/ubuntu/sf232784-test For unpatched kernels, expect the machine to lockup and print the call trace in the impact section. For patched kernels, if the machine hits the condition, it will print a warning to the kernel log with the new period and quota which has been used: Example from the same hadoop cluster with a machine running the test kernel: % uname -a 4.15.0-50-generic #54+hf232784v20190626b1-Ubuntu % sudo grep cfs /var/log/kern.log.* cfs_period_timer[cpu40]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 22430, cfs_quota_us = 1148437) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 25759, cfs_quota_us = 1318908) cfs_period_timer[cpu68]: period too short, scaling up (new cfs_period_us 29583, cfs_quota_us = 1514684) cfs_period_timer[cpu49]: period too short, scaling up (new cfs_period_us 33974, cfs_quota_us = 1739519) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 39017, cfs_quota_us = 1997729) cfs_period_timer[cpu10]: period too short, scaling up (new cfs_period_us 44809, cfs_quota_us = 2294267) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 51460, cfs_quota_us = 2634823) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 59099, cfs_quota_us = 3025929) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) [Regression Potential] This patch was accepted into upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10, and is thus treated as stable and trusted by the community. Xenial received this patch in 4.4.0-150.176, as per LP #1828420 Disco will receive this patch in the next version, as per LP #1830922 Eoan already has the patch, being based on 5.2. While this does effect a core part of the kernel, the scheduler, the patch has been extensively tested, and it has been proven in production environments, so the overall risk is low.
2019-07-17 23:06:19	Matthew Ruffell	linux (Ubuntu Bionic): importance	Undecided	Medium
2019-07-17 23:06:24	Matthew Ruffell	linux (Ubuntu Bionic): status	New	In Progress
2019-07-17 23:06:27	Matthew Ruffell	linux (Ubuntu Bionic): assignee		Matthew Ruffell (mruffell)
2019-07-17 23:08:16	Matthew Ruffell	description	BugLink: https://bugs.launchpad.net/bugs/1836971 [Impact] On machines with extremely high CPU usage, parent task groups which have a large number of children can make the for loop in sched_cfs_period_timer() run until the watchdog fires when the cfs_period_us setting is too short. In this particular case, it is unlikely that the call to hrtimer_forward_now() will return 0, meaning the for loop is never left, and tasks are never rescheduled. The large number of children makes do_sched_cfs_period_timer() take longer than the period, which impacts calls to hrtimer_forward_now(). The kernel will produce this call trace: CPU: 51 PID: 0 Comm: swapper/51 Tainted: P OELK 4.15.0-50-generic #54-Ubuntu Call Trace: <IRQ> ? sched_clock+0x9/0x10 walk_tg_tree_from+0x61/0xd0 ? task_rq_unlock+0x30/0x30 unthrottle_cfs_rq+0xcb/0x1a0 distribute_cfs_runtime+0xd7/0x100 sched_cfs_period_timer+0xd9/0x1a0 ? sched_cfs_slack_timer+0xe0/0xe0 __hrtimer_run_queues+0xdf/0x230 hrtimer_interrupt+0xa0/0x1d0 smp_apic_timer_interrupt+0x6f/0x130 apic_timer_interrupt+0x84/0x90 </IRQ> This has been hit in production in a particularly highly utilised hadoop cluster which is powering an analytics platform. About 30% of the cluster experiences this issue every week, and the machines need a manual reboot to get back online. [Fix] This was fixed in 5.1 upstream with the below commit: commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Author: Phil Auld <pauld@redhat.com> Date: Tue Mar 19 09:00:05 2019 -0400 subject: sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup This commit adds a check to see if the loop has run too many times, and if it has, scales up the period and quota, so the timer can complete before the next period expires, which enables the task to be rescheduled normally. Note, 2e8e19226398db8265a8e675fcc0118b9e80c9e8 was included in upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10. Please cherry pick 2e8e19226398db8265a8e675fcc0118b9e80c9e8 to all bionic kernels. [Testcase] Kind of hard to reproduce, so this was tested on a production hadoop cluster with extremely high CPU load. I built a test kernel, which is available here: https://launchpad.net/~mruffell/+archive/ubuntu/sf232784-test For unpatched kernels, expect the machine to lockup and print the call trace in the impact section. For patched kernels, if the machine hits the condition, it will print a warning to the kernel log with the new period and quota which has been used: Example from the same hadoop cluster with a machine running the test kernel: % uname -a 4.15.0-50-generic #54+hf232784v20190626b1-Ubuntu % sudo grep cfs /var/log/kern.log.* cfs_period_timer[cpu40]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 22430, cfs_quota_us = 1148437) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 25759, cfs_quota_us = 1318908) cfs_period_timer[cpu68]: period too short, scaling up (new cfs_period_us 29583, cfs_quota_us = 1514684) cfs_period_timer[cpu49]: period too short, scaling up (new cfs_period_us 33974, cfs_quota_us = 1739519) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 39017, cfs_quota_us = 1997729) cfs_period_timer[cpu10]: period too short, scaling up (new cfs_period_us 44809, cfs_quota_us = 2294267) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 51460, cfs_quota_us = 2634823) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 59099, cfs_quota_us = 3025929) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) [Regression Potential] This patch was accepted into upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10, and is thus treated as stable and trusted by the community. Xenial received this patch in 4.4.0-150.176, as per LP #1828420 Disco will receive this patch in the next version, as per LP #1830922 Eoan already has the patch, being based on 5.2. While this does effect a core part of the kernel, the scheduler, the patch has been extensively tested, and it has been proven in production environments, so the overall risk is low.	BugLink: https://bugs.launchpad.net/bugs/1836971 [Impact] On machines with extremely high CPU usage, parent task groups which have a large number of children can make the for loop in sched_cfs_period_timer() run until the watchdog fires when the cfs_period_us setting is too short. In this particular case, it is unlikely that the call to hrtimer_forward_now() will return 0, meaning the for loop is never left, and tasks are never rescheduled. The large number of children makes do_sched_cfs_period_timer() take longer than the period, which impacts calls to hrtimer_forward_now(). The kernel will produce this call trace: CPU: 51 PID: 0 Comm: swapper/51 Tainted: P OELK 4.15.0-50-generic #54-Ubuntu Call Trace: <IRQ> ? sched_clock+0x9/0x10 walk_tg_tree_from+0x61/0xd0 ? task_rq_unlock+0x30/0x30 unthrottle_cfs_rq+0xcb/0x1a0 distribute_cfs_runtime+0xd7/0x100 sched_cfs_period_timer+0xd9/0x1a0 ? sched_cfs_slack_timer+0xe0/0xe0 __hrtimer_run_queues+0xdf/0x230 hrtimer_interrupt+0xa0/0x1d0 smp_apic_timer_interrupt+0x6f/0x130 apic_timer_interrupt+0x84/0x90 </IRQ> This has been hit in production in a particularly highly utilised hadoop cluster which is powering an analytics platform. About 30% of the cluster experiences this issue every week, and the machines need a manual reboot to get back online. [Fix] This was fixed in 5.1 upstream with the below commit: commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Author: Phil Auld <pauld@redhat.com> Date: Tue Mar 19 09:00:05 2019 -0400 subject: sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup This commit adds a check to see if the loop has run too many times, and if it has, scales up the period and quota, so the timer can complete before the next period expires, which enables the task to be rescheduled normally. Note, 2e8e19226398db8265a8e675fcc0118b9e80c9e8 was included in upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10. Please cherry pick 2e8e19226398db8265a8e675fcc0118b9e80c9e8 to all bionic kernels. [Testcase] Kind of hard to reproduce, so this was tested on a production hadoop cluster with extremely high CPU load. I built a test kernel, which is available here: https://launchpad.net/~mruffell/+archive/ubuntu/sf232784-test For unpatched kernels, expect the machine to lockup and print the call trace in the impact section. For patched kernels, if the machine hits the condition, it will print a warning to the kernel log with the new period and quota which has been used: Example from the same hadoop cluster with a machine running the test kernel: % uname -a 4.15.0-50-generic #54+hf232784v20190626b1-Ubuntu % sudo grep cfs /var/log/kern.log.* cfs_period_timer[cpu40]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 22430, cfs_quota_us = 1148437) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 25759, cfs_quota_us = 1318908) cfs_period_timer[cpu68]: period too short, scaling up (new cfs_period_us 29583, cfs_quota_us = 1514684) cfs_period_timer[cpu49]: period too short, scaling up (new cfs_period_us 33974, cfs_quota_us = 1739519) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 39017, cfs_quota_us = 1997729) cfs_period_timer[cpu10]: period too short, scaling up (new cfs_period_us 44809, cfs_quota_us = 2294267) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 51460, cfs_quota_us = 2634823) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 59099, cfs_quota_us = 3025929) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) [Regression Potential] This patch was accepted into upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10, and is thus treated as stable and trusted by the community. Xenial received this patch in 4.4.0-150.176, as per LP #1828420 Disco will receive this patch in the next version, as per LP #1830922 Eoan already has the patch, being based on 5.2. While this does effect a core part of the kernel, the scheduler, the patch has been extensively tested, and it has been proven in production environments, so the overall risk is low.
2019-07-17 23:30:05	Ubuntu Kernel Bot	linux (Ubuntu): status	New	Incomplete
2019-07-17 23:30:07	Ubuntu Kernel Bot	tags	sts	bionic sts
2019-07-18 02:44:25	Matthew Ruffell	description	BugLink: https://bugs.launchpad.net/bugs/1836971 [Impact] On machines with extremely high CPU usage, parent task groups which have a large number of children can make the for loop in sched_cfs_period_timer() run until the watchdog fires when the cfs_period_us setting is too short. In this particular case, it is unlikely that the call to hrtimer_forward_now() will return 0, meaning the for loop is never left, and tasks are never rescheduled. The large number of children makes do_sched_cfs_period_timer() take longer than the period, which impacts calls to hrtimer_forward_now(). The kernel will produce this call trace: CPU: 51 PID: 0 Comm: swapper/51 Tainted: P OELK 4.15.0-50-generic #54-Ubuntu Call Trace: <IRQ> ? sched_clock+0x9/0x10 walk_tg_tree_from+0x61/0xd0 ? task_rq_unlock+0x30/0x30 unthrottle_cfs_rq+0xcb/0x1a0 distribute_cfs_runtime+0xd7/0x100 sched_cfs_period_timer+0xd9/0x1a0 ? sched_cfs_slack_timer+0xe0/0xe0 __hrtimer_run_queues+0xdf/0x230 hrtimer_interrupt+0xa0/0x1d0 smp_apic_timer_interrupt+0x6f/0x130 apic_timer_interrupt+0x84/0x90 </IRQ> This has been hit in production in a particularly highly utilised hadoop cluster which is powering an analytics platform. About 30% of the cluster experiences this issue every week, and the machines need a manual reboot to get back online. [Fix] This was fixed in 5.1 upstream with the below commit: commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Author: Phil Auld <pauld@redhat.com> Date: Tue Mar 19 09:00:05 2019 -0400 subject: sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup This commit adds a check to see if the loop has run too many times, and if it has, scales up the period and quota, so the timer can complete before the next period expires, which enables the task to be rescheduled normally. Note, 2e8e19226398db8265a8e675fcc0118b9e80c9e8 was included in upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10. Please cherry pick 2e8e19226398db8265a8e675fcc0118b9e80c9e8 to all bionic kernels. [Testcase] Kind of hard to reproduce, so this was tested on a production hadoop cluster with extremely high CPU load. I built a test kernel, which is available here: https://launchpad.net/~mruffell/+archive/ubuntu/sf232784-test For unpatched kernels, expect the machine to lockup and print the call trace in the impact section. For patched kernels, if the machine hits the condition, it will print a warning to the kernel log with the new period and quota which has been used: Example from the same hadoop cluster with a machine running the test kernel: % uname -a 4.15.0-50-generic #54+hf232784v20190626b1-Ubuntu % sudo grep cfs /var/log/kern.log.* cfs_period_timer[cpu40]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 22430, cfs_quota_us = 1148437) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 25759, cfs_quota_us = 1318908) cfs_period_timer[cpu68]: period too short, scaling up (new cfs_period_us 29583, cfs_quota_us = 1514684) cfs_period_timer[cpu49]: period too short, scaling up (new cfs_period_us 33974, cfs_quota_us = 1739519) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 39017, cfs_quota_us = 1997729) cfs_period_timer[cpu10]: period too short, scaling up (new cfs_period_us 44809, cfs_quota_us = 2294267) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 51460, cfs_quota_us = 2634823) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 59099, cfs_quota_us = 3025929) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) [Regression Potential] This patch was accepted into upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10, and is thus treated as stable and trusted by the community. Xenial received this patch in 4.4.0-150.176, as per LP #1828420 Disco will receive this patch in the next version, as per LP #1830922 Eoan already has the patch, being based on 5.2. While this does effect a core part of the kernel, the scheduler, the patch has been extensively tested, and it has been proven in production environments, so the overall risk is low.	BugLink: https://bugs.launchpad.net/bugs/1836971 [Impact] On machines with extremely high CPU usage, parent task groups which have a large number of children can make the for loop in sched_cfs_period_timer() run until the watchdog fires when the cfs_period_us setting is too short. In this particular case, it is unlikely that the call to hrtimer_forward_now() will return 0, meaning the for loop is never left, and tasks are never rescheduled. The large number of children makes do_sched_cfs_period_timer() take longer than the period, which impacts calls to hrtimer_forward_now(). The kernel will produce this call trace: CPU: 51 PID: 0 Comm: swapper/51 Tainted: P OELK 4.15.0-50-generic #54-Ubuntu Call Trace: <IRQ> ? sched_clock+0x9/0x10 walk_tg_tree_from+0x61/0xd0 ? task_rq_unlock+0x30/0x30 unthrottle_cfs_rq+0xcb/0x1a0 distribute_cfs_runtime+0xd7/0x100 sched_cfs_period_timer+0xd9/0x1a0 ? sched_cfs_slack_timer+0xe0/0xe0 __hrtimer_run_queues+0xdf/0x230 hrtimer_interrupt+0xa0/0x1d0 smp_apic_timer_interrupt+0x6f/0x130 apic_timer_interrupt+0x84/0x90 </IRQ> This has been hit in production in a particularly highly utilised hadoop cluster which is powering an analytics platform. About 30% of the cluster experiences this issue every week, and the machines need a manual reboot to get back online. [Fix] This was fixed in 5.1 upstream with the below commit: commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Author: Phil Auld <pauld@redhat.com> Date: Tue Mar 19 09:00:05 2019 -0400 subject: sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup This commit adds a check to see if the loop has run too many times, and if it has, scales up the period and quota, so the timer can complete before the next period expires, which enables the task to be rescheduled normally. Note, 2e8e19226398db8265a8e675fcc0118b9e80c9e8 was included in upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10. This patch requires minor backporting for 4.15, so please cherry pick d069fe4844f8d799d771659a745fe91870c93fda from upstream stable 4.14.y, where the backport has been done by the original author, to all bionic kernels. [Testcase] Kind of hard to reproduce, so this was tested on a production hadoop cluster with extremely high CPU load. I built a test kernel, which is available here: https://launchpad.net/~mruffell/+archive/ubuntu/sf232784-test For unpatched kernels, expect the machine to lockup and print the call trace in the impact section. For patched kernels, if the machine hits the condition, it will print a warning to the kernel log with the new period and quota which has been used: Example from the same hadoop cluster with a machine running the test kernel: % uname -a 4.15.0-50-generic #54+hf232784v20190626b1-Ubuntu % sudo grep cfs /var/log/kern.log.* cfs_period_timer[cpu40]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 22430, cfs_quota_us = 1148437) cfs_period_timer[cpu48]: period too short, scaling up (new cfs_period_us 25759, cfs_quota_us = 1318908) cfs_period_timer[cpu68]: period too short, scaling up (new cfs_period_us 29583, cfs_quota_us = 1514684) cfs_period_timer[cpu49]: period too short, scaling up (new cfs_period_us 33974, cfs_quota_us = 1739519) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 39017, cfs_quota_us = 1997729) cfs_period_timer[cpu10]: period too short, scaling up (new cfs_period_us 44809, cfs_quota_us = 2294267) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 51460, cfs_quota_us = 2634823) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 59099, cfs_quota_us = 3025929) cfs_period_timer[cpu3]: period too short, scaling up (new cfs_period_us 67872, cfs_quota_us = 3475091) [Regression Potential] This patch was accepted into upstream stable versions 4.4.179, 4.9.171, 4.14.114, 4.19.37, 5.0.10, and is thus treated as stable and trusted by the community. Xenial received this patch in 4.4.0-150.176, as per LP #1828420 Disco will receive this patch in the next version, as per LP #1830922 Eoan already has the patch, being based on 5.2. While this does effect a core part of the kernel, the scheduler, the patch has been extensively tested, and it has been proven in production environments, so the overall risk is low.
2019-07-18 03:34:14	Mark Sergeant	bug			added subscriber Mark Sergeant
2019-07-18 03:34:38	David Mitchell	bug			added subscriber David Mitchell
2019-07-19 23:02:27	Terry Rudd	bug			added subscriber Terry Rudd
2019-07-23 05:02:26	Khaled El Mously	linux (Ubuntu Bionic): status	In Progress	Fix Committed
2019-07-25 18:32:48	Ubuntu Kernel Bot	tags	bionic sts	bionic sts verification-needed-bionic
2019-07-25 20:51:47	Dmitry S. Fedorov	bug			added subscriber Dmitry S. Fedorov
2019-07-31 05:53:18	Matthew Ruffell	tags	bionic sts verification-needed-bionic	bionic sts verification-done-bionic
2019-08-07 08:34:44	Ubuntu Kernel Bot	tags	bionic sts verification-done-bionic	bionic sts verification-done-bionic verification-needed-xenial
2019-08-07 22:58:59	Matthew Ruffell	tags	bionic sts verification-done-bionic verification-needed-xenial	bionic sts verification-done-bionic verification-done-xenial
2019-08-13 11:27:47	Launchpad Janitor	linux (Ubuntu Bionic): status	Fix Committed	Fix Released
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2000-1134
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2007-3852
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2008-0525
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2009-0416
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2011-4834
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2015-1838
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2015-7442
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2016-7489
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2018-5383
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-10126
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-1125
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-12614
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-12818
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-12819
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-12984
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-13233
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-13272
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-2101
2019-08-13 11:27:47	Launchpad Janitor	cve linked		2019-3846
2019-12-27 09:12:43	Po-Hsu Lin	linux (Ubuntu): status	Incomplete	Fix Released

Ubuntulinux package

Activity log for bug #1836971

Ubuntu
linux package