Ubuntu 16.04.02: powerpc-ibm-utils: drmgr does not scale with large number of virtual adapters

Bug #1692837 reported by bugproxy on 2017-05-23
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Medium
Unassigned
powerpc-utils (Ubuntu)
Medium
Ubuntu on IBM Power Systems Bug Triage
Xenial
Medium
Steve Langasek
Yakkety
Medium
Steve Langasek
Zesty
Medium
Steve Langasek

Bug Description

[SRU Justification]
On a NovaLink system, the time drmgr takes to complete increases linearly with the number of virtual adapters. This is unreasonable.

[Test case]
To be completed by IBM, who have access to the hardware.
1. Add 200 virtual adapters to an LPAR
2. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q
3. Confirm that this takes multiple seconds to return.
4. Install powerpc-utils from -proposed.
5. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q again
6. Confirm that this takes less than a second to return.

[Regression potential]
Any bugs introduced in this code could cause drmgr/lsslot to fail to correctly operate at all on the slots. However, the code is reasonably generic and the risk is low of this code failing intermittently: if it passes verification it's reasonable to expect it will work everywhere.

== Comment: #0 - Jeremy A. Arnold
---Problem Description---
The time to run commands such as "drmgr -c slot -s U8247.22L.211E15A-V1-C210 -a -w 1" increases linearly with the number of slots on the system. For cloud environments with a large number of VMs hosted by a single NovaLink partition (or a small number of VIOS partitions), the number of virtual slots in the NovaLink partition can grow large, and the long drmgr time can be a major factor in the time to deploy new VMs.

In one recent test, the call above took about 29 seconds to complete on a system with around 100 VMs. Earlier in the run (when there was less than 10 VMs) it only took about 2 seconds.

I'm not at all an expert on this area, but it would appear that drmgr is iterating through all of the slots in order to find the one that was requested. Some evidence for this is provided by running:

sudo time /usr/sbin/drmgr -c slot -s U8247.22L.211E15A-V1-C210 -Q

This is taking about 14 seconds elapsed time (it may have been slower during the actual run due to concurrent executions of drmgr) on a system with about 232 virtual adapter slots. The time is similar if I make a request for a slot that does not exist (e.g. change C210 to C250), so it would appear that nearly all of the runtime is for looking up the slot and not for actually retrieving information about it.

Adding -d20 to the above command provides additional debug data. This shows that the majority of the time is between these two lines of output:

---
Could not find DRC property group in path: /proc/device-tree/pci@80000002000001f.
DR nodes list
---

For reference, the following command can identify the correct entry in the device tree in about 0.02 seconds. Obviously drmgr has more to do than just this, but this suggests that there is no fundamental reason the time has to scale with the number of slots:

time (find /sys/firmware/devicetree/base/vdevice -name "ibm\,loc-code" | xargs grep "^U8247.22L.211E15A-V1-C210-T1$")

---uname output---
Linux cs-tul10-neo 4.4.21-customv1.29 #6 SMP Wed Apr 12 14:40:02 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = 8247-22L

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 I used PowerVC to deploy 100 VMs to a NovaLink system and viewed /var/log/drmgr to observe how long the drmgr calls took during the test.

I believe it would be sufficient to add 200 virtual adapters to an LPAR and then run "/usr/sbin/drmgr -c slot -s <slot_name> -Q"

I'm happy to collect additional data in my environment if it would be helpful.

Userspace tool common name: /usr/sbin/drmgr

The userspace tool has the following bit modes: 64-bit

Userspace rpm: powerpc-ibm-utils

Userspace tool obtained from project website: na

== Comment: #4 - Amartey S. Pearson
I have a proposed fix for this in a github fork. In short, the algorithm used to populate the dr_nodes needs to be fixed. It currently walks the entire bus for every theoretical DRC (1000's of times). The fix is to walk the bus once.

https://github.com/apearson-ibm/powerpc-utils/commit/6fefb6acb6fb302c97d71faef75a12674a50209a

This addresses both drmgr and lsslot as the change is in common code. An example of the improvement we see:

Here we have a system with 196 populated virtual slots. An lsslot takes 6.5 seconds.

root@neo33-2:/usr/sbin# time /usr/sbin/lsslot -c slot | wc -l
196
real 0m6.495s
user 0m1.108s
sys 0m5.384s

With the fix, the lsslot now takes 0.18 seconds, and scales well as more slots are added.

root@neo33-2:~/powerpc-utils# time /usr/local/sbin/lsslot -c slot | wc -l
196
real 0m0.186s
user 0m0.028s
sys 0m0.156s

== Comment: #7 - Anna A. Sortland
We tried the patch in our test environment and it worked great.

== Comment: #11 - Nathan D. Fontenot <email address hidden> - 2017-05-18 13:30:42 ==
Patch submitted upstream.

https://groups.google.com/forum/#!topic/powerpc-utils-devel/sd1gdvbQp0w

bugproxy (bugproxy) on 2017-05-23
tags: added: architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → powerpc-ibm-utils (Ubuntu)
bugproxy (bugproxy) on 2017-05-23
tags: added: targetmilestone-inin16042
removed: targetmilestone-inin---
summary: - drmgr does not scale with large number of virtual adapters
+ Ubuntu 16.04.02: powerpc-ibm-utils: drmgr does not scale with large
+ number of virtual adapters
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in powerpc-ibm-utils (Ubuntu):
status: New → Confirmed
Manoj Iyer (manjo) on 2017-06-01
tags: added: ubuntu-16.04
Breno Leitão (breno-leitao) wrote :

I created a new version for this fix. This is version 1.3.1-2ubuntu0.3 and you could see it on my PPA:

https://launchpad.net/~breno-leitao/+archive/ubuntu/powerpc-ibm-utils

please install the package using:

 # sudo add-apt-repository ppa:breno-leitao/powerpc-ibm-utils
 # sudo apt-get update
 # apt-get install powerpc-ibm-utils

If it fixes the problem, I will ask someone to sponsor this package.

------- Comment From <email address hidden> 2017-06-11 23:26 EDT-------
I've tested the fix from the PPA (version 1.3.1-2ubuntu0.3) and it appears to resolve the issue. No functional issues were noted in my testing. Performance of the drmgr query operation mentioned in the original bug report was less than 0.2 seconds, compared to 14 seconds with the old version.

Changed in ubuntu-power-systems:
status: New → Confirmed
Amartey Pearson (apearson) wrote :

I added this in the other bug report (1696434), but want to make sure it gets reflected here too:

Looking at the diff, it appears you brought in:

0005-in-kernel-dlpar.patch
Subject: [PATCH] drmgr: Start using in-kernel DLPAR functionality for
 cpu/memory

I believe you'll need to bring in the following as well:
https://groups.google.com/forum/#!topic/powerpc-utils-devel/LkrB6tIvs_Y
[PATCH] drmgr: Disable use of in-kernel cpu hotplug

It appears that CPU hotplug is not enabled in the stock 4.4 kernel, so without that second commit CPU hotplug will fail with this new package.

Steve Langasek (vorlon) on 2017-06-19
affects: powerpc-ibm-utils (Ubuntu) → powerpc-utils (Ubuntu)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package powerpc-utils - 1.3.2-1ubuntu2

---------------
powerpc-utils (1.3.2-1ubuntu2) artful; urgency=medium

  * d/p/in-kernel-dlpar.patch: fix FTBFS.

 -- Steve Langasek <email address hidden> Mon, 19 Jun 2017 14:18:01 -0700

Changed in powerpc-utils (Ubuntu):
status: Confirmed → Fix Released
Steve Langasek (vorlon) on 2017-06-19
description: updated
Steve Langasek (vorlon) on 2017-06-19
description: updated
Steve Langasek (vorlon) on 2017-06-20
Changed in powerpc-utils (Ubuntu Xenial):
milestone: none → ubuntu-16.04.3
assignee: nobody → Steve Langasek (vorlon)
Changed in powerpc-utils (Ubuntu Yakkety):
assignee: nobody → Steve Langasek (vorlon)
Changed in powerpc-utils (Ubuntu Zesty):
assignee: nobody → Steve Langasek (vorlon)
Changed in powerpc-utils (Ubuntu Xenial):
status: New → In Progress
Changed in powerpc-utils (Ubuntu Yakkety):
status: New → In Progress
Changed in powerpc-utils (Ubuntu Zesty):
status: New → Incomplete
status: Incomplete → In Progress

Hello bugproxy, or anyone else affected,

Accepted powerpc-utils into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/powerpc-utils/1.3.2-1ubuntu2~17.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in powerpc-utils (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in powerpc-utils (Ubuntu Yakkety):
status: In Progress → Fix Committed
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted powerpc-utils into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/powerpc-utils/1.3.2-1ubuntu2~16.10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in powerpc-utils (Ubuntu Xenial):
status: In Progress → Fix Committed
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted powerpc-utils into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/powerpc-utils/1.3.1-2ubuntu0.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ubuntu-power-systems:
status: Confirmed → Fix Committed
Breno Leitão (breno-leitao) wrote :

Amartey, could you please test this package at -proposed archive? They are depending on it to release the package to the official archive.

Amartey Pearson (apearson) wrote :

Tested version 1.3.1-2ubuntu0.3 in 16.04.2 successfully.

Steve Langasek (vorlon) on 2017-06-21
tags: added: verification-done-xenial

As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-done-$RELEASE tag needs to be added to the bug where $RELEASE is the name of the series the package that was tested (e.g. verification-done-xenial). Please note that the global 'verification-done' tag can no longer be used for this purpose.

Thank you!

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package powerpc-utils - 1.3.1-2ubuntu0.3

---------------
powerpc-utils (1.3.1-2ubuntu0.3) xenial; urgency=medium

  * d/p/Improve-perf-of-drmgr-lsslot-with-large-num-of-virt.patch:
    Fix scaling with large number of virtual adapters. LP: #1692837
  * d/p/drmgr-Stale-errno-usage-corrections.patch,
    d/p/drmgr-Correct-errno-usage-use-in-validate_paltform.patch,
    d/p/drmgr-Correct-errno-usage-in-init_cpu_info.patch:
    Fix failures during scale-up test on Novalink System. LP: #1696434

 -- Breno Leitao <email address hidden> Fri, 09 Jun 2017 10:39:15 -0400

Changed in powerpc-utils (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for powerpc-utils has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Steve Langasek (vorlon) wrote :

While this SRU has been released for 16.04, we are still awaiting verification of the bugfixes in 16.10 and 17.04 so that users do not experience regressions between 16.04 and later releases.

------- Comment From <email address hidden> 2017-07-11 14:55 EDT-------
Okay. We are checking on the verification for 16.10 and 17.04.

Changed in powerpc-utils (Ubuntu):
importance: Undecided → Medium
Changed in powerpc-utils (Ubuntu Xenial):
importance: Undecided → Medium
Changed in powerpc-utils (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in powerpc-utils (Ubuntu Zesty):
importance: Undecided → Medium
Manoj Iyer (manjo) on 2017-07-19
Changed in ubuntu-power-systems:
importance: Undecided → Medium
Manoj Iyer (manjo) on 2017-07-24
tags: added: triage-g
Amartey Pearson (apearson) wrote :

Tested version 1.3.2-1ubuntu2 in 17.04 successfully.

I did not test in 16.10 as that is now EOL.

Steve Langasek (vorlon) on 2017-07-26
Changed in powerpc-utils (Ubuntu Yakkety):
status: Fix Committed → Won't Fix
tags: added: verification-done
removed: verification-needed

Thank you for taking the time to verify this stable release fix. We have noticed that you have used the verification-done tag for marking the bug as verified and would like to point out that due to a recent change in SRU bug verification policy fixes now have to be marked with per-release tags (i.e. verification-done-$RELEASE). Please remove the verification-done tag and add one for the release you have tested the package in. Thank you!

https://wiki.ubuntu.com/StableReleaseUpdates#Verification

Steve Langasek (vorlon) on 2017-08-11
tags: added: verification-done-zesty
removed: verification-done
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers