Keep powersave CPU frequency scaling governor for CPUs that support intel_pstate
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Wishlist
|
Unassigned | ||
Xenial |
Invalid
|
Wishlist
|
Unassigned | ||
systemd (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Invalid
|
Undecided
|
Martin Pitt | ||
sysvinit (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Xenial |
Invalid
|
Medium
|
Unassigned |
Bug Description
Hi,
With the new Ubuntu archive servers, we saw constantly high load and after some tinkering, we found that it was mostly CPUs being woken up to see if they should enter idle states. Changing the CPU frequency scaling governor to "performance" saw a considerable drop.
Perf report using the following commands:
| perf record -g -a sleep 10
| perf report
| Samples: 287K of event 'cycles:pp', Event count (approx.): 124776998906
| Children Self Command Shared Object Symbol
| + 55.24% 0.20% swapper [kernel.kallsyms] [k] cpu_startup_entry
| + 53.51% 0.00% swapper [kernel.kallsyms] [k] start_secondary
| + 53.02% 0.08% swapper [kernel.kallsyms] [k] call_cpuidle
| + 52.94% 0.02% swapper [kernel.kallsyms] [k] cpuidle_enter
| + 31.81% 0.67% swapper [kernel.kallsyms] [k] cpuidle_enter_state
| + 29.59% 0.12% swapper [kernel.kallsyms] [k] acpi_idle_enter
| + 29.45% 0.05% swapper [kernel.kallsyms] [k] acpi_idle_do_entry
| + 29.43% 29.43% swapper [kernel.kallsyms] [k] acpi_processor_
| + 20.51% 0.04% swapper [kernel.kallsyms] [k] ret_from_intr
| + 20.47% 0.12% swapper [kernel.kallsyms] [k] do_IRQ
| + 19.30% 0.07% swapper [kernel.kallsyms] [k] irq_exit
| + 19.18% 0.07% apache2 [kernel.kallsyms] [k] entry_SYSCALL_
| + 18.80% 0.17% swapper [kernel.kallsyms] [k] __do_softirq
| + 16.45% 0.11% swapper [kernel.kallsyms] [k] net_rx_action
| + 16.25% 0.43% swapper [kernel.kallsyms] [k] be_poll
| + 14.74% 0.21% swapper [kernel.kallsyms] [k] be_process_rx
| + 13.61% 0.07% swapper [kernel.kallsyms] [k] napi_gro_frags
| + 12.58% 0.04% swapper [kernel.kallsyms] [k] netif_receive_
| + 12.48% 0.03% swapper [kernel.kallsyms] [k] __netif_receive_skb
| + 12.42% 0.24% swapper [kernel.kallsyms] [k] __netif_
| + 12.41% 0.00% apache2 [unknown] [k] 0x00007f27983b5028
| + 12.41% 0.00% apache2 [unknown] [k] 0x00007f2798369028
| + 11.49% 0.16% swapper [kernel.kallsyms] [k] ip_rcv
| + 11.29% 0.09% swapper [kernel.kallsyms] [k] ip_rcv_finish
| + 10.77% 0.05% swapper [kernel.kallsyms] [k] ip_local_deliver
| + 10.70% 0.06% swapper [kernel.kallsyms] [k] ip_local_
| + 10.55% 0.22% swapper [kernel.kallsyms] [k] tcp_v4_rcv
| + 10.10% 0.00% apache2 [unknown] [k] 0000000000000000
| + 10.01% 0.04% swapper [kernel.kallsyms] [k] tcp_v4_do_rcv
Expanding in a few of those, you'll see:
| - 55.24% 0.20% swapper [kernel.kallsyms] [k] cpu_startup_entry
| - 55.04% cpu_startup_entry
| - 52.98% call_cpuidle
| + 52.93% cpuidle_enter
| + 0.00% ret_from_intr
| 0.00% cpuidle_enter_state
| 0.00% irq_entries_start
| + 1.14% cpuidle_select
| + 0.47% schedule_
| 0.10% rcu_idle_enter
| 0.09% rcu_idle_exit
| + 0.05% ret_from_intr
| + 0.05% tick_nohz_
| + 0.04% arch_cpu_idle_enter
| 0.02% cpuidle_enter
| 0.02% tick_check_
| + 0.01% cpuidle_reflect
| 0.01% menu_reflect
| 0.01% atomic_
| 0.01% local_touch_nmi
| 0.01% cpuidle_
| 0.01% menu_select
| 0.01% cpuidle_
| + 0.01% tick_nohz_idle_exit
| + 0.01% sched_ttwu_pending
| 0.00% set_cpu_
| 0.00% native_
| 0.00% schedule
| + 0.00% arch_cpu_idle_exit
| 0.00% __tick_
| 0.00% irq_entries_start
| 0.00% sched_clock_
| 0.00% reschedule_
| + 0.00% apic_timer_
| + 0.20% start_secondary
| + 0.00% x86_64_start_kernel
| + 53.51% 0.00% swapper [kernel.kallsyms] [k] start_secondary
| + 53.02% 0.08% swapper [kernel.kallsyms] [k] call_cpuidle
| - 52.94% 0.02% swapper [kernel.kallsyms] [k] cpuidle_enter
| - 52.92% cpuidle_enter
| + 31.81% cpuidle_enter_state
| + 20.01% ret_from_intr
| + 0.51% apic_timer_
| 0.28% native_
| + 0.09% reschedule_
| 0.05% irq_entries_start
| 0.05% do_IRQ
| 0.05% common_interrupt
| 0.02% sched_idle_
| 0.01% acpi_idle_enter
| 0.01% ktime_get
| 0.01% restore_
| 0.01% restore_
| + 0.01% call_function_
| 0.00% native_iret
| + 0.00% call_function_
| 0.00% smp_apic_
| 0.00% smp_reschedule_
| 0.00% smp_call_
| + 0.02% start_secondary
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in linux (Ubuntu Xenial): | |
status: | Incomplete → Confirmed |
summary: |
Consider changing default CPU frequency scaling governor back to - "performance" + "performance" (Ubuntu Server) |
Changed in linux (Ubuntu): | |
importance: | Undecided → Wishlist |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → Wishlist |
tags: | added: kernel-da-key xenial |
Changed in linux (Ubuntu): | |
status: | Confirmed → Triaged |
Changed in linux (Ubuntu Xenial): | |
status: | Confirmed → Triaged |
As Theodore Ts'o has pointed out[1]:
"""
... with modern Intel processors, the ondemand CPU governor is actually counterproductive because waking up to decide whether the CPU is idle keeps it from entering the deepest sleep states, and so (somewhat counterintuitively) the performance governor will actually result in the best battery life.
"""
[1]https:/ /plus.google. com/+TheodoreTs o/posts/ 2vEekAsG2QT