Unexpected CFS throttling

Bug #1832151 reported by Khaled El Mously on 2019-06-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Khaled El Mously
Bionic
Undecided
Khaled El Mously
Disco
Undecided
Marcelo Cerri
Eoan
Undecided
Marcelo Cerri
linux-azure (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
Disco
Undecided
Marcelo Cerri
Eoan
Undecided
Unassigned

Bug Description

Basically, this issue:

https://bugzilla.kernel.org/show_bug.cgi?id=198197

..is affecting a cloud provider on a 4.15 kernel.

Customer testing with a kernel that is patched with the fix (upstream: de53fd7aedb100f03e5d2231cfce0e4993282425 ) confirms that it resolves the issue.

More details in the SalesForce ticket: https://canonical.my.salesforce.com/5003z00001yUmC1

[Impact]

 * Aggressive throttling by CFS causes severe performance degradation in some cases.

[Test Case]

 * This reproducer clearly shows the problem before the fix and shows no problem after the fix:
  https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1

[Regression Potential]

 * It touches core scheduler code so regression could be bad - but risk is low as the patch is accepted in mainline and tested separately by myself, the cloud provider, and at least 2 others.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1832151

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → In Progress
description: updated
description: updated
description: updated
Khaled El Mously (kmously) wrote :

Targeted the bug to Eoan and Diso.

I'm not actually sure that E or D are affected by this. Targeting them anyway to make sure to look into it before closing the bug.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (24.0 KiB)

This bug was fixed in the package linux - 4.15.0-69.78

---------------
linux (4.15.0-69.78) bionic; urgency=medium

  * KVM NULL pointer deref (LP: #1851205)
    - KVM: nVMX: handle page fault in vmread fix

  * CVE-2018-12207
    - KVM: MMU: drop vcpu param in gpte_access
    - kvm: Convert kvm_lock to a mutex
    - kvm: x86: Do not release the page inside mmu_set_spte()
    - KVM: x86: make FNAME(fetch) and __direct_map more similar
    - KVM: x86: remove now unneeded hugepage gfn adjustment
    - KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON
    - KVM: x86: add tracepoints around __direct_map and FNAME(fetch)
    - kvm: x86, powerpc: do not allow clearing largepages debugfs entry
    - SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is
      active
    - SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure
    - SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation
    - SAUCE: kvm: Add helper function for creating VM worker threads
    - SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages
    - SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers
    - SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT

  * CVE-2019-11135
    - KVM: x86: use Intel speculation bugs and features as derived in generic x86
      code
    - x86/msr: Add the IA32_TSX_CTRL MSR
    - x86/cpu: Add a helper function x86_read_arch_cap_msr()
    - x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
    - x86/speculation/taa: Add mitigation for TSX Async Abort
    - x86/speculation/taa: Add sysfs reporting for TSX Async Abort
    - kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
    - x86/tsx: Add "auto" option to the tsx= cmdline parameter
    - x86/speculation/taa: Add documentation for TSX Async Abort
    - x86/tsx: Add config options to set tsx=on|off|auto
    - SAUCE: x86/speculation/taa: Call tsx_init()
    - SAUCE: x86/cpu: Include cpu header from bugs.c
    - [Config] Disable TSX by default when possible

  * CVE-2019-0154
    - SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs
    - SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA

  * CVE-2019-0155
    - drm/i915/gtt: Add read only pages to gen8_pte_encode
    - drm/i915/gtt: Read-only pages for insert_entries on bdw+
    - drm/i915/gtt: Disable read-only support under GVT
    - drm/i915: Prevent writing into a read-only object via a GGTT mmap
    - drm/i915/cmdparser: Check reg_table_count before derefencing.
    - drm/i915/cmdparser: Do not check past the cmd length.
    - drm/i915: Silence smatch for cmdparser
    - drm/i915: Move engine->needs_cmd_parser to engine->flags
    - SAUCE: drm/i915: Rename gen7 cmdparser tables
    - SAUCE: drm/i915: Disable Secure Batches for gen6+
    - SAUCE: drm/i915: Remove Master tables from cmdparser
    - SAUCE: drm/i915: Add support for mandatory cmdparsing
    - SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
    - SAUCE: drm/i915: Allow parsing of unsized batches
    - SAUCE: drm/i915: Add gen9 BCS cmdparsing
    - SAUCE: drm/i915/cmdparser: Use explicit goto for error paths
    - SAUCE: drm/i915/cmdparser: Add support for backward jumps
    - SAUCE: drm/i915/cmdpar...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Marcelo Cerri (mhcerri) on 2019-11-22
Changed in linux (Ubuntu Eoan):
assignee: Khaled El Mously (kmously) → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Disco):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Khaled El Mously (kmously)
Changed in linux-azure (Ubuntu Bionic):
status: New → Invalid
Changed in linux-azure (Ubuntu Eoan):
status: New → Invalid
Changed in linux (Ubuntu Disco):
status: New → In Progress
Changed in linux-azure (Ubuntu Disco):
status: New → In Progress
assignee: nobody → Marcelo Cerri (mhcerri)
Marcelo Cerri (mhcerri) on 2019-11-27
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Eoan):
status: Incomplete → Fix Committed
Changed in linux-azure (Ubuntu Disco):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-azure - 5.0.0-1027.29

---------------
linux-azure (5.0.0-1027.29) disco; urgency=medium

  * disco/linux-azure: 5.0.0-1027.29 -proposed tracker (LP: #1853901)

  * Unexpected CFS throttling (LP: #1832151) // Disco update: upstream stable
    patchset 2019-11-18 (LP: #1853067)
    - sched/fair: Fix low cpu usage with high throttling by removing expiration of
      cpu-local slices
    - sched/fair: Fix -Wunused-but-set-variable warnings

 -- Marcelo Henrique Cerri <email address hidden> Mon, 25 Nov 2019 17:04:59 -0300

Changed in linux-azure (Ubuntu Disco):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-azure - 5.0.0-1027.29~18.04.1

---------------
linux-azure (5.0.0-1027.29~18.04.1) bionic; urgency=medium

  * bionic/linux-azure: 5.0.0-1027.29~18.04.1 -proposed tracker (LP: #1853900)

  [ Ubuntu: 5.0.0-1027.29 ]

  * disco/linux-azure: 5.0.0-1027.29 -proposed tracker (LP: #1853901)
  * Unexpected CFS throttling (LP: #1832151) // Disco update: upstream stable
    patchset 2019-11-18 (LP: #1853067)
    - sched/fair: Fix low cpu usage with high throttling by removing expiration of
      cpu-local slices
    - sched/fair: Fix -Wunused-but-set-variable warnings

 -- Marcelo Henrique Cerri <email address hidden> Mon, 25 Nov 2019 17:48:07 -0300

Changed in linux-azure (Ubuntu Bionic):
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers