Activity log for bug #1692837

Date Who What changed Old value New value Message
2017-05-23 08:39:12 bugproxy bug added bug
2017-05-23 08:39:14 bugproxy tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin---
2017-05-23 08:39:15 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2017-05-23 08:39:18 bugproxy affects ubuntu powerpc-ibm-utils (Ubuntu)
2017-05-23 14:01:15 Manoj Iyer bug task added ubuntu-power-systems
2017-05-23 17:39:19 bugproxy tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin--- architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042
2017-05-23 17:46:40 Vipin K Parashar summary drmgr does not scale with large number of virtual adapters Ubuntu 16.04.02: powerpc-ibm-utils: drmgr does not scale with large number of virtual adapters
2017-05-31 14:18:02 Launchpad Janitor powerpc-ibm-utils (Ubuntu): status New Confirmed
2017-06-01 15:42:03 Manoj Iyer tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 ubuntu-16.04
2017-06-12 05:58:44 Frank Heimes ubuntu-power-systems: status New Confirmed
2017-06-12 10:52:45 Andrew Cloke bug added subscriber Andrew Cloke
2017-06-19 19:53:48 Steve Langasek affects powerpc-ibm-utils (Ubuntu) powerpc-utils (Ubuntu)
2017-06-19 21:26:27 Steve Langasek nominated for series Ubuntu Xenial
2017-06-19 21:26:27 Steve Langasek bug task added powerpc-utils (Ubuntu Xenial)
2017-06-19 21:26:27 Steve Langasek nominated for series Ubuntu Zesty
2017-06-19 21:26:27 Steve Langasek bug task added powerpc-utils (Ubuntu Zesty)
2017-06-19 21:26:27 Steve Langasek nominated for series Ubuntu Yakkety
2017-06-19 21:26:27 Steve Langasek bug task added powerpc-utils (Ubuntu Yakkety)
2017-06-19 22:16:53 Launchpad Janitor powerpc-utils (Ubuntu): status Confirmed Fix Released
2017-06-19 22:30:02 Steve Langasek description == Comment: #0 - Jeremy A. Arnold ---Problem Description--- The time to run commands such as "drmgr -c slot -s U8247.22L.211E15A-V1-C210 -a -w 1" increases linearly with the number of slots on the system. For cloud environments with a large number of VMs hosted by a single NovaLink partition (or a small number of VIOS partitions), the number of virtual slots in the NovaLink partition can grow large, and the long drmgr time can be a major factor in the time to deploy new VMs. In one recent test, the call above took about 29 seconds to complete on a system with around 100 VMs. Earlier in the run (when there was less than 10 VMs) it only took about 2 seconds. I'm not at all an expert on this area, but it would appear that drmgr is iterating through all of the slots in order to find the one that was requested. Some evidence for this is provided by running: sudo time /usr/sbin/drmgr -c slot -s U8247.22L.211E15A-V1-C210 -Q This is taking about 14 seconds elapsed time (it may have been slower during the actual run due to concurrent executions of drmgr) on a system with about 232 virtual adapter slots. The time is similar if I make a request for a slot that does not exist (e.g. change C210 to C250), so it would appear that nearly all of the runtime is for looking up the slot and not for actually retrieving information about it. Adding -d20 to the above command provides additional debug data. This shows that the majority of the time is between these two lines of output: --- Could not find DRC property group in path: /proc/device-tree/pci@80000002000001f. DR nodes list --- For reference, the following command can identify the correct entry in the device tree in about 0.02 seconds. Obviously drmgr has more to do than just this, but this suggests that there is no fundamental reason the time has to scale with the number of slots: time (find /sys/firmware/devicetree/base/vdevice -name "ibm\,loc-code" | xargs grep "^U8247.22L.211E15A-V1-C210-T1$") ---uname output--- Linux cs-tul10-neo 4.4.21-customv1.29 #6 SMP Wed Apr 12 14:40:02 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = 8247-22L ---Debugger--- A debugger is not configured ---Steps to Reproduce--- I used PowerVC to deploy 100 VMs to a NovaLink system and viewed /var/log/drmgr to observe how long the drmgr calls took during the test. I believe it would be sufficient to add 200 virtual adapters to an LPAR and then run "/usr/sbin/drmgr -c slot -s <slot_name> -Q" I'm happy to collect additional data in my environment if it would be helpful. Userspace tool common name: /usr/sbin/drmgr The userspace tool has the following bit modes: 64-bit Userspace rpm: powerpc-ibm-utils Userspace tool obtained from project website: na == Comment: #4 - Amartey S. Pearson I have a proposed fix for this in a github fork. In short, the algorithm used to populate the dr_nodes needs to be fixed. It currently walks the entire bus for every theoretical DRC (1000's of times). The fix is to walk the bus once. https://github.com/apearson-ibm/powerpc-utils/commit/6fefb6acb6fb302c97d71faef75a12674a50209a This addresses both drmgr and lsslot as the change is in common code. An example of the improvement we see: Here we have a system with 196 populated virtual slots. An lsslot takes 6.5 seconds. root@neo33-2:/usr/sbin# time /usr/sbin/lsslot -c slot | wc -l 196 real 0m6.495s user 0m1.108s sys 0m5.384s With the fix, the lsslot now takes 0.18 seconds, and scales well as more slots are added. root@neo33-2:~/powerpc-utils# time /usr/local/sbin/lsslot -c slot | wc -l 196 real 0m0.186s user 0m0.028s sys 0m0.156s == Comment: #7 - Anna A. Sortland We tried the patch in our test environment and it worked great. == Comment: #11 - Nathan D. Fontenot <nfonteno@us.ibm.com> - 2017-05-18 13:30:42 == Patch submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils-devel/sd1gdvbQp0w [SRU Justification] On a NovaLink system, the time drmgr takes to complete increases linearly with the number of virtual adapters. This is unreasonable. [Test case] To be completed by IBM, who have access to the hardware. 1. Add 200 virtual adapters to an LPAR 2. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q 3. Confirm that this takes multiple seconds to return. 4. Install powerpc-utils from -proposed. 5. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q again 6. Confirm that this takes less than a second to return. == Comment: #0 - Jeremy A. Arnold ---Problem Description--- The time to run commands such as "drmgr -c slot -s U8247.22L.211E15A-V1-C210 -a -w 1" increases linearly with the number of slots on the system. For cloud environments with a large number of VMs hosted by a single NovaLink partition (or a small number of VIOS partitions), the number of virtual slots in the NovaLink partition can grow large, and the long drmgr time can be a major factor in the time to deploy new VMs. In one recent test, the call above took about 29 seconds to complete on a system with around 100 VMs. Earlier in the run (when there was less than 10 VMs) it only took about 2 seconds. I'm not at all an expert on this area, but it would appear that drmgr is iterating through all of the slots in order to find the one that was requested. Some evidence for this is provided by running: sudo time /usr/sbin/drmgr -c slot -s U8247.22L.211E15A-V1-C210 -Q This is taking about 14 seconds elapsed time (it may have been slower during the actual run due to concurrent executions of drmgr) on a system with about 232 virtual adapter slots. The time is similar if I make a request for a slot that does not exist (e.g. change C210 to C250), so it would appear that nearly all of the runtime is for looking up the slot and not for actually retrieving information about it. Adding -d20 to the above command provides additional debug data. This shows that the majority of the time is between these two lines of output: --- Could not find DRC property group in path: /proc/device-tree/pci@80000002000001f. DR nodes list --- For reference, the following command can identify the correct entry in the device tree in about 0.02 seconds. Obviously drmgr has more to do than just this, but this suggests that there is no fundamental reason the time has to scale with the number of slots: time (find /sys/firmware/devicetree/base/vdevice -name "ibm\,loc-code" | xargs grep "^U8247.22L.211E15A-V1-C210-T1$") ---uname output--- Linux cs-tul10-neo 4.4.21-customv1.29 #6 SMP Wed Apr 12 14:40:02 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = 8247-22L ---Debugger--- A debugger is not configured ---Steps to Reproduce---  I used PowerVC to deploy 100 VMs to a NovaLink system and viewed /var/log/drmgr to observe how long the drmgr calls took during the test. I believe it would be sufficient to add 200 virtual adapters to an LPAR and then run "/usr/sbin/drmgr -c slot -s <slot_name> -Q" I'm happy to collect additional data in my environment if it would be helpful. Userspace tool common name: /usr/sbin/drmgr The userspace tool has the following bit modes: 64-bit Userspace rpm: powerpc-ibm-utils Userspace tool obtained from project website: na == Comment: #4 - Amartey S. Pearson I have a proposed fix for this in a github fork. In short, the algorithm used to populate the dr_nodes needs to be fixed. It currently walks the entire bus for every theoretical DRC (1000's of times). The fix is to walk the bus once. https://github.com/apearson-ibm/powerpc-utils/commit/6fefb6acb6fb302c97d71faef75a12674a50209a This addresses both drmgr and lsslot as the change is in common code. An example of the improvement we see: Here we have a system with 196 populated virtual slots. An lsslot takes 6.5 seconds. root@neo33-2:/usr/sbin# time /usr/sbin/lsslot -c slot | wc -l 196 real 0m6.495s user 0m1.108s sys 0m5.384s With the fix, the lsslot now takes 0.18 seconds, and scales well as more slots are added. root@neo33-2:~/powerpc-utils# time /usr/local/sbin/lsslot -c slot | wc -l 196 real 0m0.186s user 0m0.028s sys 0m0.156s == Comment: #7 - Anna A. Sortland We tried the patch in our test environment and it worked great. == Comment: #11 - Nathan D. Fontenot <nfonteno@us.ibm.com> - 2017-05-18 13:30:42 == Patch submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils-devel/sd1gdvbQp0w
2017-06-19 22:30:34 Steve Langasek bug added subscriber Steve Langasek
2017-06-19 23:42:26 Steve Langasek description [SRU Justification] On a NovaLink system, the time drmgr takes to complete increases linearly with the number of virtual adapters. This is unreasonable. [Test case] To be completed by IBM, who have access to the hardware. 1. Add 200 virtual adapters to an LPAR 2. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q 3. Confirm that this takes multiple seconds to return. 4. Install powerpc-utils from -proposed. 5. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q again 6. Confirm that this takes less than a second to return. == Comment: #0 - Jeremy A. Arnold ---Problem Description--- The time to run commands such as "drmgr -c slot -s U8247.22L.211E15A-V1-C210 -a -w 1" increases linearly with the number of slots on the system. For cloud environments with a large number of VMs hosted by a single NovaLink partition (or a small number of VIOS partitions), the number of virtual slots in the NovaLink partition can grow large, and the long drmgr time can be a major factor in the time to deploy new VMs. In one recent test, the call above took about 29 seconds to complete on a system with around 100 VMs. Earlier in the run (when there was less than 10 VMs) it only took about 2 seconds. I'm not at all an expert on this area, but it would appear that drmgr is iterating through all of the slots in order to find the one that was requested. Some evidence for this is provided by running: sudo time /usr/sbin/drmgr -c slot -s U8247.22L.211E15A-V1-C210 -Q This is taking about 14 seconds elapsed time (it may have been slower during the actual run due to concurrent executions of drmgr) on a system with about 232 virtual adapter slots. The time is similar if I make a request for a slot that does not exist (e.g. change C210 to C250), so it would appear that nearly all of the runtime is for looking up the slot and not for actually retrieving information about it. Adding -d20 to the above command provides additional debug data. This shows that the majority of the time is between these two lines of output: --- Could not find DRC property group in path: /proc/device-tree/pci@80000002000001f. DR nodes list --- For reference, the following command can identify the correct entry in the device tree in about 0.02 seconds. Obviously drmgr has more to do than just this, but this suggests that there is no fundamental reason the time has to scale with the number of slots: time (find /sys/firmware/devicetree/base/vdevice -name "ibm\,loc-code" | xargs grep "^U8247.22L.211E15A-V1-C210-T1$") ---uname output--- Linux cs-tul10-neo 4.4.21-customv1.29 #6 SMP Wed Apr 12 14:40:02 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = 8247-22L ---Debugger--- A debugger is not configured ---Steps to Reproduce---  I used PowerVC to deploy 100 VMs to a NovaLink system and viewed /var/log/drmgr to observe how long the drmgr calls took during the test. I believe it would be sufficient to add 200 virtual adapters to an LPAR and then run "/usr/sbin/drmgr -c slot -s <slot_name> -Q" I'm happy to collect additional data in my environment if it would be helpful. Userspace tool common name: /usr/sbin/drmgr The userspace tool has the following bit modes: 64-bit Userspace rpm: powerpc-ibm-utils Userspace tool obtained from project website: na == Comment: #4 - Amartey S. Pearson I have a proposed fix for this in a github fork. In short, the algorithm used to populate the dr_nodes needs to be fixed. It currently walks the entire bus for every theoretical DRC (1000's of times). The fix is to walk the bus once. https://github.com/apearson-ibm/powerpc-utils/commit/6fefb6acb6fb302c97d71faef75a12674a50209a This addresses both drmgr and lsslot as the change is in common code. An example of the improvement we see: Here we have a system with 196 populated virtual slots. An lsslot takes 6.5 seconds. root@neo33-2:/usr/sbin# time /usr/sbin/lsslot -c slot | wc -l 196 real 0m6.495s user 0m1.108s sys 0m5.384s With the fix, the lsslot now takes 0.18 seconds, and scales well as more slots are added. root@neo33-2:~/powerpc-utils# time /usr/local/sbin/lsslot -c slot | wc -l 196 real 0m0.186s user 0m0.028s sys 0m0.156s == Comment: #7 - Anna A. Sortland We tried the patch in our test environment and it worked great. == Comment: #11 - Nathan D. Fontenot <nfonteno@us.ibm.com> - 2017-05-18 13:30:42 == Patch submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils-devel/sd1gdvbQp0w [SRU Justification] On a NovaLink system, the time drmgr takes to complete increases linearly with the number of virtual adapters. This is unreasonable. [Test case] To be completed by IBM, who have access to the hardware. 1. Add 200 virtual adapters to an LPAR 2. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q 3. Confirm that this takes multiple seconds to return. 4. Install powerpc-utils from -proposed. 5. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q again 6. Confirm that this takes less than a second to return. [Regression potential] Any bugs introduced in this code could cause drmgr/lsslot to fail to correctly operate at all on the slots. However, the code is reasonably generic and the risk is low of this code failing intermittently: if it passes verification it's reasonable to expect it will work everywhere. == Comment: #0 - Jeremy A. Arnold ---Problem Description--- The time to run commands such as "drmgr -c slot -s U8247.22L.211E15A-V1-C210 -a -w 1" increases linearly with the number of slots on the system. For cloud environments with a large number of VMs hosted by a single NovaLink partition (or a small number of VIOS partitions), the number of virtual slots in the NovaLink partition can grow large, and the long drmgr time can be a major factor in the time to deploy new VMs. In one recent test, the call above took about 29 seconds to complete on a system with around 100 VMs. Earlier in the run (when there was less than 10 VMs) it only took about 2 seconds. I'm not at all an expert on this area, but it would appear that drmgr is iterating through all of the slots in order to find the one that was requested. Some evidence for this is provided by running: sudo time /usr/sbin/drmgr -c slot -s U8247.22L.211E15A-V1-C210 -Q This is taking about 14 seconds elapsed time (it may have been slower during the actual run due to concurrent executions of drmgr) on a system with about 232 virtual adapter slots. The time is similar if I make a request for a slot that does not exist (e.g. change C210 to C250), so it would appear that nearly all of the runtime is for looking up the slot and not for actually retrieving information about it. Adding -d20 to the above command provides additional debug data. This shows that the majority of the time is between these two lines of output: --- Could not find DRC property group in path: /proc/device-tree/pci@80000002000001f. DR nodes list --- For reference, the following command can identify the correct entry in the device tree in about 0.02 seconds. Obviously drmgr has more to do than just this, but this suggests that there is no fundamental reason the time has to scale with the number of slots: time (find /sys/firmware/devicetree/base/vdevice -name "ibm\,loc-code" | xargs grep "^U8247.22L.211E15A-V1-C210-T1$") ---uname output--- Linux cs-tul10-neo 4.4.21-customv1.29 #6 SMP Wed Apr 12 14:40:02 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = 8247-22L ---Debugger--- A debugger is not configured ---Steps to Reproduce---  I used PowerVC to deploy 100 VMs to a NovaLink system and viewed /var/log/drmgr to observe how long the drmgr calls took during the test. I believe it would be sufficient to add 200 virtual adapters to an LPAR and then run "/usr/sbin/drmgr -c slot -s <slot_name> -Q" I'm happy to collect additional data in my environment if it would be helpful. Userspace tool common name: /usr/sbin/drmgr The userspace tool has the following bit modes: 64-bit Userspace rpm: powerpc-ibm-utils Userspace tool obtained from project website: na == Comment: #4 - Amartey S. Pearson I have a proposed fix for this in a github fork. In short, the algorithm used to populate the dr_nodes needs to be fixed. It currently walks the entire bus for every theoretical DRC (1000's of times). The fix is to walk the bus once. https://github.com/apearson-ibm/powerpc-utils/commit/6fefb6acb6fb302c97d71faef75a12674a50209a This addresses both drmgr and lsslot as the change is in common code. An example of the improvement we see: Here we have a system with 196 populated virtual slots. An lsslot takes 6.5 seconds. root@neo33-2:/usr/sbin# time /usr/sbin/lsslot -c slot | wc -l 196 real 0m6.495s user 0m1.108s sys 0m5.384s With the fix, the lsslot now takes 0.18 seconds, and scales well as more slots are added. root@neo33-2:~/powerpc-utils# time /usr/local/sbin/lsslot -c slot | wc -l 196 real 0m0.186s user 0m0.028s sys 0m0.156s == Comment: #7 - Anna A. Sortland We tried the patch in our test environment and it worked great. == Comment: #11 - Nathan D. Fontenot <nfonteno@us.ibm.com> - 2017-05-18 13:30:42 == Patch submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils-devel/sd1gdvbQp0w
2017-06-20 00:06:57 Steve Langasek powerpc-utils (Ubuntu Xenial): milestone ubuntu-16.04.3
2017-06-20 00:07:00 Steve Langasek powerpc-utils (Ubuntu Xenial): assignee Steve Langasek (vorlon)
2017-06-20 00:07:01 Steve Langasek powerpc-utils (Ubuntu Yakkety): assignee Steve Langasek (vorlon)
2017-06-20 00:07:03 Steve Langasek powerpc-utils (Ubuntu Zesty): assignee Steve Langasek (vorlon)
2017-06-20 00:07:05 Steve Langasek powerpc-utils (Ubuntu Xenial): status New In Progress
2017-06-20 00:07:07 Steve Langasek powerpc-utils (Ubuntu Yakkety): status New In Progress
2017-06-20 00:07:09 Steve Langasek powerpc-utils (Ubuntu Zesty): status New Incomplete
2017-06-20 00:07:10 Steve Langasek powerpc-utils (Ubuntu Zesty): status Incomplete In Progress
2017-06-20 00:12:09 Brian Murray powerpc-utils (Ubuntu Zesty): status In Progress Fix Committed
2017-06-20 00:12:11 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2017-06-20 00:12:14 Brian Murray bug added subscriber SRU Verification
2017-06-20 00:12:19 Brian Murray tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 ubuntu-16.04 architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 ubuntu-16.04 verification-needed
2017-06-20 00:14:01 Brian Murray powerpc-utils (Ubuntu Yakkety): status In Progress Fix Committed
2017-06-20 00:18:31 Brian Murray powerpc-utils (Ubuntu Xenial): status In Progress Fix Committed
2017-06-20 06:05:49 Frank Heimes ubuntu-power-systems: status Confirmed Fix Committed
2017-06-21 16:19:32 Steve Langasek tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 ubuntu-16.04 verification-needed architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 ubuntu-16.04 verification-done-xenial verification-needed
2017-07-04 12:12:17 Launchpad Janitor powerpc-utils (Ubuntu Xenial): status Fix Committed Fix Released
2017-07-04 12:12:22 Łukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team
2017-07-15 20:19:52 Mathew Hodson powerpc-utils (Ubuntu): importance Undecided Medium
2017-07-15 20:19:55 Mathew Hodson powerpc-utils (Ubuntu Xenial): importance Undecided Medium
2017-07-15 20:19:57 Mathew Hodson powerpc-utils (Ubuntu Yakkety): importance Undecided Medium
2017-07-15 20:19:59 Mathew Hodson powerpc-utils (Ubuntu Zesty): importance Undecided Medium
2017-07-19 16:48:05 Manoj Iyer ubuntu-power-systems: importance Undecided Medium
2017-07-24 15:48:27 Manoj Iyer tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 ubuntu-16.04 verification-done-xenial verification-needed architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 triage-g ubuntu-16.04 verification-done-xenial verification-needed
2017-07-26 15:25:58 Steve Langasek powerpc-utils (Ubuntu Yakkety): status Fix Committed Won't Fix
2017-08-01 14:07:51 Breno Leitão tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 triage-g ubuntu-16.04 verification-done-xenial verification-needed architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 triage-g ubuntu-16.04 verification-done verification-done-xenial
2017-08-11 19:48:53 Steve Langasek tags architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 triage-g ubuntu-16.04 verification-done verification-done-xenial architecture-ppc64le bugnameltc-153564 severity-medium targetmilestone-inin16042 triage-g ubuntu-16.04 verification-done-xenial verification-done-zesty
2017-08-24 19:50:03 Launchpad Janitor powerpc-utils (Ubuntu Zesty): status Fix Committed Fix Released
2017-09-11 14:54:59 Frank Heimes ubuntu-power-systems: status Fix Committed Fix Released