set-cpufreq: 'powersave' governor configuration sanity on ubuntu server

Bug #1806012 reported by James Page
40
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Won't Fix
Medium
Ioanna Alifieraki
Xenial
Won't Fix
Medium
Ioanna Alifieraki
Bionic
Won't Fix
Medium
Ioanna Alifieraki
Cosmic
Won't Fix
Medium
Ioanna Alifieraki
Disco
Won't Fix
Medium
Ioanna Alifieraki

Bug Description

Whilst debugging 'slow instance performance' on a Ubuntu Bionic based cloud, I observed that the default cpu governor configuration was set to 'powersave'; toggling this to 'performance' (while in not anyway a particularly green thing todo) resulted in the instance slowness disappearing and the cloud performance being as expected (based on a prior version of the deploy on Ubuntu Xenial).

AFAICT Xenial does the same thing albeit in a slight different way, but we definitely did not see the same performance laggy-ness under a Xenial based cloud.

Raising against systemd (as this package sets the governor to 'powersave') - I feel that the switch to 'performance' although appropriate then obscures what might be a performance/behavioural difference in the underlying kernel when a machine is under load.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: systemd 237-3ubuntu10.9
ProcVersionSignature: Ubuntu 4.15.0-39.42-generic 4.15.18
Uname: Linux 4.15.0-39-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Fri Nov 30 10:05:46 2018
Lsusb:
 Bus 002 Device 002: ID 8087:8002 Intel Corp.
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
 Bus 001 Device 002: ID 8087:800a Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R630
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-39-generic root=UUID=a361a524-47eb-46c3-8a04-e5eaa65188c9 ro hugepages=103117 iommu=pt intel_iommu=on
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/08/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.3.4
dmi.board.name: 02C2CP
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.3.4:bd11/08/2016:svnDellInc.:pnPowerEdgeR630:pvr:rvnDellInc.:rn02C2CP:rvrA03:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R630
dmi.sys.vendor: Dell Inc.

Related branches

Revision history for this message
James Page (james-page) wrote :
Dan Streetman (ddstreet)
tags: added: sts sts-sponsor-ddstreet
Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Disco):
importance: Undecided → Medium
Changed in systemd (Ubuntu Cosmic):
importance: Undecided → Medium
Changed in systemd (Ubuntu Bionic):
importance: Undecided → Medium
Changed in systemd (Ubuntu Xenial):
importance: Undecided → Medium
Changed in systemd (Ubuntu Disco):
status: New → In Progress
Changed in systemd (Ubuntu Cosmic):
status: New → In Progress
Changed in systemd (Ubuntu Bionic):
status: New → In Progress
Changed in systemd (Ubuntu Xenial):
status: New → In Progress
Changed in systemd (Ubuntu Disco):
assignee: nobody → Ioanna Alifieraki (joalif)
Changed in systemd (Ubuntu Cosmic):
assignee: nobody → Ioanna Alifieraki (joalif)
Changed in systemd (Ubuntu Bionic):
assignee: nobody → Ioanna Alifieraki (joalif)
Changed in systemd (Ubuntu Xenial):
assignee: nobody → Ioanna Alifieraki (joalif)
Revision history for this message
James Page (james-page) wrote :

Just to add some more detail to this bug; for the impacted deployment we actually ended up re-configuring the power regulator settings via the BIOS to delegate to the OS for control; after a reboot we've just stuck with the default ondemand behaviour and performance has been consistent/better than before.

Revision history for this message
Haw Loeung (hloeung) wrote :

See also LP: #1732696 and LP: #1579278.

Revision history for this message
Trent Lloyd (lathiat) wrote :
Download full text (4.6 KiB)

(as context to this information, apparently this particularly bad performance experienced with 'powersave' happens when the BIOS power control is set to the default, and goes away when in the BIOS you set power management to 'os control' - so there is some additional information needed to determine why this particular case offers bad performance, when as shown below, powersave/performance governors should not normally present more than a few percent performance difference)

I would have not expected the governor choice (powersave or otherwise) to limit performance so severely as to prevent a VM from booting/working usefully. I would expect the frequency governor settings to see make a difference in benchmarks and power usage, not general interactive performance. The phoronix data referred to later supports that view (the performance difference is minimal generally). The behavior you experienced is really a bug in my view.

On modern Intel CPUs (Sandy Bridge and newer, many 2011/2012+ models but varies depending on the exact CPU) the Intel "Pstate" driver is used which is significantly different to the older "cpufreq" driver. This is important to note as you have the two different drivers in use based on which CPU you have - rather than OS (Xenial/Bionic use the same settings).

Although both drivers have governor modes with the name "powersave" and "performance" they are similar in name only and their behavior is quite different and they do not share any code. To that end you may find different behavior between some kind of test/lab environment which is not unlikely to have much older hardware and current new hardware FCBs. It would also be good to know for this specific badly broken system which scaling_driver was in use and what the precise processor model information from /proc/cpuinfo.

This article from Phoronix was released recently which compares the performance with various different benchmarks as well as power-usage of the various driver and governor mode combinations (it's a good read separately)
https://www.phoronix.com/scan.php?page=article&item=linux50-pstate-cpufreq

It has a few interesting observations. In the majority of benchmarks the performance between the two is very similar, and in fact the p-state powersave governor is slightly faster (!) than the pstate performance governor in many of the tests by a small margin. Another major observation from the phoronix data is that the CPUfreq-powersave governor is VERY significantly slower, by a factor of 4-5 times in most cases.

While the *cpufreq*-powersave (which remember, is different to the intel_pstate-powersave governor, which should be used) governor is very slow, it should also not be used by default on any Xenial or Bionic system from what I can see unless I am missing another script/tool that is changing the governor somewhere (I couldn't see any scripts in the charm or qemu packages that do so). If we read the code of the systemd service on bionic to set the CPU scheduler, we find that the script /lib/systemd/set-cpufreq (which is an Ubuntu/Debian addition, not systemd upstream, xenial uses more or less the same script at /etc/init.d/ondemand) it is quite simple and ...

Read more...

Revision history for this message
Trent Lloyd (lathiat) wrote :

Something I was not previously aware of that informs this a bit more, is that in some BIOS modes (apparently HP uses this extensively, unsure about Dell and others) you get a "Collaborative Power Control" mode, which sets the scaling_driver to pcc-cpufreq (as opposed to cpufreq) and is some weird hybrid of OS+BIOS defined behavior.

In the case of these collaborative modes, the exact behavior is probably wildly different based on what the BIOS is doing and likely would explain why we get weird and inconsistent performance behavior. Unclear to me if such BIOS modes will still use intel_pstate or not.. something I'd have to look into. Or whether it's specific to pre-pstate.

Some more information about collaborative mode in this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447763

Revision history for this message
Trent Lloyd (lathiat) wrote :

Confirmed that pcc-cpufreq *can* be used in preference to intel_pstate even on a CPU that supports intel_pstate if the firmware tables are setup to request such. One such server is an E5-2630 v3 HP DL360 G9 (shuckle).

On the default "dynamic" firmware setting you get driver=pcc-cpufreq + governor=ondemand, with the "OS Control" setting you get driver=intel_pstate + governor=powersave.

As above this would explain why the very poor performance is only seen without "OS Control" set, and then, only on some hardware. Since the firmware is in control of the CPU power states in pcc-cpufreq mode the exact frequencies / the rate they are changed / etc are partly under BIOS control. Secondarily it's using an entirely separate kernel path for when and how to choose these frequencies.

Note that when pcc-cpufreq is in use the startup script (xenial:/etc/init.d/ondemand, bionic:/lib/systemd/set-cpufreq) will use ondemand and not powersave (contrary to what the bug report description states). If a system using 'cpufreq' is somehow getting the powersave governor set, this is a bug, but I haven't seen any case where that would be true as of yet.

Also note that in Xenial, the ondemand script runs "sleep 60" before setting the governor, apparently to let most desktops boot to the login screen. So any method that tries to override this setting may fail on Xenial if it runs before the 60 seconds is up (e.g. /etc/rc.local, an init script, sysctl, etc)

I did find that we have 1 other method of setting the governor, which is a charm ~canonical-bootstack/sysconfig which had an option added to allow setting the governor to performance (though it doesn't default to that). This charm installs the cpufrequtils package which also seems to default to 'ondemand'. However if this charm was configured with governor=powersave on such a cpufreq system, we would expect very poor performance. Secondly when configured with governor=performance on Xenial it runs before the 'ondemand' script finishes its 60 second wait, so the change gets reverted. But it will work when first deployed if no reboot is done. (Bug: https://bugs.launchpad.net/bootstack-ops/+bug/1822774)

To my mind this leaves two remaining questions:
 - Are we ever getting into a state where we have scaling-driver=pcc-cpufreq or acpi-cpufreq, but governor=powersave. Such a case is likely a bug. I haven't found any such case as yet unless someone deployed the sysconfig charm with governor=powersave explicitly set (which I have not ruled out)

 - Is there some specific hardware where scaling-driver=pcc-cpufreq and scaling-governor=ondemand performs poorly. I have yet to run a benchmark on my example hardware to find out.

Revision history for this message
Bryan Quigley (bryanquigley) wrote :

I tried to get the ondemand script dropped in 2015 #1503773, Got Nacked.

Revision history for this message
Ioanna Alifieraki (joalif) wrote :

@james-page
What was the the BIOS setting before setting it to "OS control"?
Also what do you mean by "we've just stuck with the default ondmand behaviour" ?
You mean that you let systemd ondemand service do whatever it does by default or
that you use the ondemand governor ?

Revision history for this message
James Page (james-page) wrote :

TBH it was probably whatever the default mode was in the BIOS - we saw the same issue on Dell and HP servers.

This one >> "You mean that you let systemd ondemand service do whatever it does by default"

Revision history for this message
Haw Loeung (hloeung) wrote :

@lathiat:

> - Is there some specific hardware where scaling-driver=pcc-cpufreq
> and scaling-governor=ondemand performs poorly. I have yet to run a
> benchmark on my example hardware to find out.

Yes, we first started seeing this when deploying new Ubuntu Archive
servers where we had two servers in the same DC taking on the same
amount of traffic/requests. One was showing much higher load and
performing much worse than the other.

We brought up others in another DC and saw the same. The internal
ticket, RT#90571, has some details.

The specs differ with the one without issues being:

| economy - HP ProLiant DL380 G7

The ones that were showing issues are:

| hanger - HP ProLiant DL360p Gen8
| steelix - ProLiant DL380 Gen9
| keeton - ProLiant DL380p Gen8

By default, they're using the pcc-cpufreq but we also tried
acpi-cpufreq which didn't seem to have made any difference.

This led to us filing LP: #1579278 and the change to a piece of
software we use to deploy disabling the 'ondemand' CPU governor:

| https://bazaar.launchpad.net/~canonical-sysadmins/basenode/trunk/revision/98

Revision history for this message
Trent Lloyd (lathiat) wrote :

On the previously mentioned HP server today I was able to get closer to reproducing the situation by testing with bionic (4.15.0-47-generic) instead of xenial (4.4)

On bionic, unlike xenial, even with the BIOS set to "BIOS controlled dynamic" mode, the intel_pstate driver is loaded instead of pcc-cpufreq

Found this kernel commit: https://github.com/torvalds/linux/commit/bfa54a3a00e2f7ff051a50f3957e4fca3d73f6e7

Pull power management fix from Rafael Wysocki:
 "Fix a relatively old initialization issue in intel_pstate causing the
  pcc-cpufreq driver to be used instead of it on some HP Proliant
  systems.

  This turned into a functional regression during the 4.17 cycle,
  because pcc-cpufreq is a scalability disaster and that was amplified
  by the idle loop rework done at that time (Rafael Wysocki).

This suggests there has definitely been some related change in this area that sound very much similar to this which is worth further research.

Revision history for this message
Steve Langasek (vorlon) wrote :

Based on the analysis in the log, there does not appear to be a bug in systemd - which is behaving as intended, and defaulting to the best available cpufreq governor for low power consumption - but there may be a bug in the kernel resulting in the wrong scaling driver being used on the hardware in question. Reassigning to the kernel.

affects: systemd (Ubuntu) → linux (Ubuntu)
Revision history for this message
Dan Streetman (ddstreet) wrote :

> Reassigning to the kernel

please don't. This is a bug in systemd because its set-cpufreq script provides no way for the user to override its hardcoded values of what governor it select. We will be providing a systemd patch to add user configurability.

affects: linux (Ubuntu) → systemd (Ubuntu)
Revision history for this message
Colin Ian King (colin-king) wrote :

It is normally always preferable to use the intel-pstate driver compared to pcc-cpufreq or acpi-cpufreq on modern Intel hardware.

Some HP ProLiant platforms implement the PCC interface [1] which can be disabled by a BIOS setting in which case the PCC driver will not load and the acpi-cpufreq driver can be used instead.

The intel-pstate driver is presumed to be better for Sandybridge CPUs and later. Unlike the the cpufreq drivers, it uses P-states rather than cpu frequency [2]. It also has access to CPU performance metrics so in theory it has finer control than the traditional BIOS table driven frequency scaling.

So for HP Proliants that are pre-Sandybridge, pcc-cpufreq may be the best bet, providing the firmware is doing the right thing. If not, acpi-cpufreq maybe better, as long as the BIOS has the correct control data in the ACPI tables.

[1] Processor Clocking Control, https://acpica.org/sites/acpica/files/Processor-Clocking-Control-v1p0.pdf
[2] https://events.static.linuxfound.org/sites/events/files/slides/LinuxConEurope_2015.pdf

Revision history for this message
Dan Streetman (ddstreet) wrote :

> > Reassigning to the kernel
>
> please don't. This is a bug in systemd

with that said - that we believe this should stay open against systemd so end users can configure ondemand - there certainly may also be a bug in the kernel causing the slowness; if anyone (@lathiat?) has found that's the case this bug should be targeted to linux as well.

Revision history for this message
Ioanna Alifieraki (joalif) wrote :

Just opened a MP against systemd to make the 'ondemand' service configurable through
/etc/default/cpufrequtils file.
This change has two purposes :

1) Make 'onedmand' service configurable.
It is important for the 'ondemand' service to be configurable because depending on the use case
and the the CPU model, 'ondemand' may not select the optimal governor for the user's needs.

2) Fix an existing bug when cpufrequtils installed.
In case cpufrequtils is installed and user has chosen a different
governor (by editing the /etc/default/cpufrequtils file) than the one selected
by ondemand service, the ondemand service will overwrite user's settings and
stick to its selection.

With this change the ondemand service will first check if the /etc/default/cpufrequtils
files exist and in case there is a governor defined, the ondemand service will select
the defined governor.
In case there is no such file, ondemand service will behave as it does currently.
The /etc/default/cpufrequtils file is chosen on purpose to provide
compatibility between ondemand service and cpufrequtils package.

Revision history for this message
Dan Streetman (ddstreet) wrote :

Anyone with any opinions on whether the 'ondemand' service should be end-user configurable should please comment in the merge proposal.

https://code.launchpad.net/~joalif/ubuntu/+source/systemd/+git/systemd/+merge/367469

tags: added: rls-x-notfixing
Dan Streetman (ddstreet)
tags: removed: sts sts-sponsor-ddstreet
Revision history for this message
Dan Streetman (ddstreet) wrote :

as the merge request for this was rejected as not needed, i'm marking this bug as wontfix.

Changed in systemd (Ubuntu Disco):
status: In Progress → Won't Fix
Changed in systemd (Ubuntu Bionic):
status: In Progress → Won't Fix
Changed in systemd (Ubuntu Xenial):
status: In Progress → Won't Fix
Changed in systemd (Ubuntu Cosmic):
status: In Progress → Won't Fix
Changed in systemd (Ubuntu):
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.