[UBUNTU 20.04] PSI generates overhead on s390x

Bug #1876044 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Medium
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Canonical Kernel Team
Focal
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:
==================

[Impact]

* PSI is enabled by default for all architectures in Ubuntu.

* On s390x this leads to performance degradations on popular workloads like web serving (nginx).

[Fix]

* Leave 'CONFIG_PSI=y', but change 'CONFIG_PSI_DEFAULT_DISABLED=n' to 'CONFIG_PSI_DEFAULT_DISABLED=y'

[Test Case]

* Measure the overhead with 'CONFIG_PSI_DEFAULT_DISABLED=n' and 'CONFIG_PSI_DEFAULT_DISABLED=y' on the same environment with nginx.

[Regression Potential]

* The regression potential can be considered as moderate, since PSI (Pressure stall information tracking),

* since PSI is just used to collect CPU overcommitted, memory and IO metrics.

* And it can be enabled again with the kernel argument.
__________

PSI is always enabled in Ubuntu 20.04.
For a test system with 72 guests on 8 cores running a nginx workload this created an overhead of ~1%.

Can we change this back to
CONFIG_PSI=y
CONFIG_PSI_DEFAULT_DISABLED=y

so that by default the overhead is not there but for debugging or if needed it can still be enabled via kernel parm?

Maybe there has been a reason for this - so feel free to discuss.

---uname output---
Linux t35lp76 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:57:22 UTC 2020 s390x s390x s390x GNU/Linux

Machine Type = All s390x architecture

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 root@t35lp76:/boot# grep PSI config-5.4.0-26-generic
CONFIG_PSI=y
# CONFIG_PSI_DEFAULT_DISABLED is not set

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system is not configured to capture a system dump.

*Additional Instructions for epasch@de,ibm.com:
-Attach sysctl -a output output to the bug.

bugproxy (bugproxy)
tags: added: architecture-s3903164 bugnameltc-185697 severity-medium targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel Team (canonical-kernel-team)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
importance: Undecided → Medium
status: New → Triaged
Frank Heimes (fheimes)
summary: - [UBUNTU 20.04] Overhead introduced by PSI
+ [UBUNTU 20.04] PSI generated overhead on s390x
summary: - [UBUNTU 20.04] PSI generated overhead on s390x
+ [UBUNTU 20.04] PSI generates overhead on s390x
Revision history for this message
Frank Heimes (fheimes) wrote :

Request submitted to kernel team's mailing list:
https://lists.ubuntu.com/archives/kernel-team/2020-May/thread.html#109592
changing status to 'In Progress'

description: updated
Changed in linux (Ubuntu):
status: New → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-37.41

---------------
linux (5.4.0-37.41) focal; urgency=medium

  * CVE-2020-0543
    - SAUCE: x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2
    - SAUCE: x86/cpu: Add a steppings field to struct x86_cpu_id
    - SAUCE: x86/cpu: Add 'table' argument to cpu_matches()
    - SAUCE: x86/speculation: Add Special Register Buffer Data Sampling (SRBDS)
      mitigation
    - SAUCE: x86/speculation: Add SRBDS vulnerability and mitigation documentation
    - SAUCE: x86/speculation: Add Ivy Bridge to affected list

 -- Marcelo Henrique Cerri <email address hidden> Wed, 03 Jun 2020 11:24:23 -0300

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (linux-oracle-5.4/5.4.0-1019.19~18.04.1)

All autopkgtests for the newly accepted linux-oracle-5.4 (5.4.0-1019.19~18.04.1) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

zfs-linux/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#linux-oracle-5.4

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-42.46

---------------
linux (5.4.0-42.46) focal; urgency=medium

  * focal/linux: 5.4.0-42.46 -proposed tracker (LP: #1887069)

  * linux 4.15.0-109-generic network DoS regression vs -108 (LP: #1886668)
    - SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups"

linux (5.4.0-41.45) focal; urgency=medium

  * focal/linux: 5.4.0-41.45 -proposed tracker (LP: #1885855)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CVE-2019-19642
    - kernel/relay.c: handle alloc_percpu returning NULL in relay_open

  * CVE-2019-16089
    - SAUCE: nbd_genl_status: null check for nla_nest_start

  * CVE-2020-11935
    - aufs: do not call i_readcount_inc()

  * ip_defrag.sh in net from ubuntu_kernel_selftests failed with 5.0 / 5.3 / 5.4
    kernel (LP: #1826848)
    - selftests: net: ip_defrag: ignore EPERM

  * Update lockdown patches (LP: #1884159)
    - SAUCE: acpi: disallow loading configfs acpi tables when locked down

  * seccomp_bpf fails on powerpc (LP: #1885757)
    - SAUCE: selftests/seccomp: fix ptrace tests on powerpc

  * Introduce the new NVIDIA 418-server and 440-server series, and update the
    current NVIDIA drivers (LP: #1881137)
    - [packaging] add signed modules for the 418-server and the 440-server
      flavours

 -- Khalid Elmously <email address hidden> Thu, 09 Jul 2020 19:50:26 -0400

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-07-28 02:36 EDT-------
IBM Bugzilla status-> closed, Fix Released with focal

tags: added: targetmilestone-inin2004
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.