KVM on PowerVM: L2 Guest-Aggressively entering CEDE results in low performance. Possible tuning opportunity.

Bug #2070253 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
New
Medium
Ubuntu on IBM Power Systems Bug Triage
linux (Ubuntu)
New
High
Unassigned

Bug Description

KVM on PowerVM: L2 Guest-Aggressively entering CEDE results in low performance. Possible tuning opportunity.

---uname output---
Linux rhel86edb1 #1 SMP Sun Jan 21 11:45:44 EST 2024 ppc64le ppc64le ppc64le GNU/Linux

---Steps to Reproduce---
Example: run READ only Test using EDB-PGBENCH and DT7 workloads on
 1. L1-Host
 2. L2-Guest CEDE ON
 3. L2-Guest CEDE OFF

significant performance drop is observed in L2-Guest CEDE on vs L2-Guest CEDE off case.

Note: Host and Guest configuration used performance experiments are listed below.

Location of EDB-PGBENCH:
#wget http://ci-http-results.aus.stglabs.ibm.com/perfTest/scripts/Bug_Scripts/pgbench_install.sh
#chmod 777 pgbench_install.sh
#./pgbench_install.sh -->> it will install EDB(pgbench) and run edb on target lpar.

Location of DT7 workload:

#wget http://ci-http-results.aus.stglabs.ibm.com/perfTest/scripts/Bug_Scripts/DT7-Install.sh
#chmod 777 DT7-Install.sh
#./DT7-Install.sh -->> It will install DT7.

Sample Commands : Once installation was successful run below commands on target lpar.

EDB-PGBENCH Commands :

# su - enterprisedb
# vi t1.tc -->> copy below lines to t1.tc file .

##########t1.tc##########
runname=select
SCALE=100
runtime=300
thread="40"
smtlist="8"
mode=select
recreateinstance=yes
recreateduringrun=yes
warmup=no
perf_stat=yes
PGSQL=/usr/local/pgsql/bin
#PGSQL=/usr/edb/as14/bin
#PGPORT=5432
cores=5
##########t1.tc##########

#cp t1.tc tc/
#./auto-run-test.sh

DT7 Commands :

After installation of DT7 run below command :
#cd /root
#./DayTrader7_Run.sh -u 20 -l 900 -i 2

######################################################################
Machine Type: Power 10 LPAR (RHEL9.3)
gcc : 11.4.1
Memory : 300GB
Test type : pgbench-edb, DT7
######################################################################
KVM Host lscpu output :

# lscpu
Architecture: ppc64le
  Byte Order: Little Endian
CPU(s): 96
  On-line CPU(s) list: 0-39
  Off-line CPU(s) list: 40-95
Model name: POWER10 (architected), altivec supported
  Model: 2.0 (pvr 0080 0200)
  Thread(s) per core: 8
  Core(s) per socket: 5
  Socket(s): 1
  Physical sockets: 1
  Physical chips: 4
  Physical cores/chip: 12
Virtualization features:
  Hypervisor vendor: pHyp
  Virtualization type: para
Caches (sum of all):
  L1d: 320 KiB (10 instances)
  L1i: 480 KiB (10 instances)
  L2: 10 MiB (10 instances)
  L3: 40 MiB (10 instances)
NUMA:
  NUMA node(s): 1
  NUMA node2 CPU(s): 0-39
Vulnerabilities:
  Gather data sampling: Not affected
  Itlb multihit: Not affected
  L1tf: Not affected
  Mds: Not affected
  Meltdown: Not affected
  Mmio stale data: Not affected
  Retbleed: Not affected
  Spec rstack overflow: Not affected
  Spec store bypass: Not affected
  Spectre v1: Vulnerable, ori31 speculation barrier enabled
  Spectre v2: Vulnerable
  Srbds: Not affected
  Tsx async abort: Not affected

##############################################

KVM on PowerVM setup:

KVM (Kernel Virtual Machine) is a virtualization module for Linux that provides the ability of virtualization to Linux i.e. it allows the kernel to function as a hypervisor.

We used P10 2S4U system for this experiment.

Workloads: DT7 and PGBENCH in details:

DT7 is an open source benchmark application emulating an online stock trading system.
DT7 consist of 3 components
1) Jmeter
2) WAS (WebSphere Application Server)
3) DB2

DayTrader benchmark/application will be installed/deployed on WAS and this used DB2 as a backbone database. Jmeter generate the request and interact with the WAS. which would be kind of middle ware.

PGBENCH :
pgbench is a simple program for running benchmark tests on PostgreSQL. It runs the same sequence of SQL commands over and over, possibly in multiple concurrent database sessions, and then calculates the average transaction rate (transactions per second).

Config of KVM Host and L2-Guest:

KVM Host Config :
# uname -a
Linux #1 SMP Sun Jan 21 11:45:44 EST 2024 ppc64le ppc64le ppc64le GNU/Linux
# numactl -H
available: 1 nodes (1)
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 292860 MB
node 1 free: 290979 MB
node distances:
node 1
  1: 10
# cat /proc/cmdline
BOOT_IMAGE=(ieee1275//pci@800000020000021/pci1014\\,683@0/namespace@1,msdos2)/vmlinuz-6.7.0-nested.1.1a946fcde971.up.ibm.el9.ppc64le root=/dev/mapper/rhel_rhel86edb-root ro crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G rd.lvm.lv=rhel_rhel86edb/root rd.lvm.lv=rhel_rhel86edb/swap biosdevname=0 mitigations=off doorbell=off
# ppc64_cpu --dscr
DSCR is 23
# cpupower idle-info
CPUidle driver: pseries_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 2
Available idle states: snooze CEDE
snooze:
Flags/Description: snooze
Latency: 0
Usage: 2656
Duration: 297483
CEDE:
Flags/Description: CEDE
Latency: 12
Usage: 159981
Duration: 95235883853

# qemu-system-ppc64 --version
QEMU emulator version 7.1.0
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

#Libvirt version : libvirt-8.7.0

L2 GUEST CONFIG :

CPU's : UN-pinned

# cat /proc/cmdline
BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.7.0-nested.1.1a946fcde971.up.ibm.el9.ppc64le root=/dev/mapper/rhel-root ro crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap mitigations=off doorbell=off
# ppc64_cpu --dscr
DSCR is 23
# cat /proc/cmdline
BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.7.0-nested.1.1a946fcde971.up.ibm.el9.ppc64le root=/dev/mapper/rhel-root ro crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap mitigations=off doorbell=off
# numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 0 size: 106739 MB
node 0 free: 105211 MB
node distances:
node 0
  0: 10

We did DT7 and PGBENCH-Read only test on L2-Guest with CEDE On vs Off. We could see degradation with CEDE on compare with CEDE off.

Here I?m adding DT7 and EDB-PGBENCH results.

L2-GUEST 5Cores with CEDE on:

1) EDB-PGBENCH Data :
+ /usr/local/pgsql/bin/pgbench -n -S -T 120 -c 40 -j 40 pgbench
pgbench (14.5)
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 40
number of threads: 40
duration: 120 s
number of transactions actually processed: 21811958
latency average = 0.220 ms
initial connection time = 16.004 ms
tps = 181761.468180 (without initial connection time)

2) DT7 Data:
DayTrader7 Report

 Run Group ID=0
 Run ID=40
 Run Description=Test Run
 Host=127.0.0.1 Users=40 Run_time=900

 Total Instances 2
 Total Throughputs 2340.6

L2-GUEST 5Cores with CEDE Off:

1) EDB-PGBENCH Data :
+ /usr/local/pgsql/bin/pgbench -n -S -T 120 -c 40 -j 40 pgbench
pgbench (14.5)
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 40
number of threads: 40
duration: 120 s
number of transactions actually processed: 37804765
latency average = 0.127 ms
initial connection time = 5.910 ms
tps = 315015.313022 (without initial connection time)

2) DT7 Results:
==================================================================================
 DayTrader7 Report

 Run Group ID=0
 Run ID=41
 Run Description=Test Run
 Host=127.0.0.1 Users=40 Run_time=900

 Total Instances 2

 Total Throughputs 3569.6
===================================================================================

EDB-PGBENCH Performance Summary:

CEDE ON EDB-PGBENCH Data : 181761.46818 tps
CEDE OFF EDB-PGBENCH Data : 315015.31302 tps

Percentage Drop: (181761.46818-315015.31)*100/315015.3130= 42%
Guest when CEDE was turned ON under-performed by 42% vs CEDE turned OFF.

DT7 Performance Summary:

CEDE ON DT7 Data : 2340.6 tps
CEDE OFF DT7 Data : 3569.6 tps

Percentage Drop : (2340.6-3569.6 )*100/3569.6= 34%
Guest when CEDE was turned ON under-performed by 34% vs CEDE turned OFF.

From above data we observed that performance drops when L2-Guest CEDE is ON when compared to L2-Guest CEDE is OFF. It is well understood that the solution cannot be offered with Shared CEDE disabled. However, it would be ideal to reduce the aggressiveness of CEDE'ing to scale to higher performance which is acceptable.

.........................................................................

The patch for this fix has been merged into upstream kernel via commit

7be6ce7043b4cf293c8826a48fd9f56931cef2cf("KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception")

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-207093 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

Hello and thanks for having reported this issue.

I was able to find patch "KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception" accepted in the 'linux-next' tree (7be6ce7043b4cf293c8826a48fd9f56931cef2cf).

Do you know if it got also upstream marked for stable updates, and with that will be added to mainline 6.8 as well?
I'm asking because, if so, the Canonical kernel team will pick it automatically - but if not, it needs to be manually submitted to the Ubuntu kernel teams mailing list (where we can help with).

Changed in ubuntu-power-systems:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → nobody
importance: Undecided → High
Changed in ubuntu-power-systems:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.