Activity log for bug #1696434

Date Who What changed Old value New value Message
2017-06-07 14:19:21 bugproxy bug added bug
2017-06-07 14:19:24 bugproxy tags architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043
2017-06-07 14:19:27 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2017-06-07 14:19:31 bugproxy affects ubuntu powerpc-ibm-utils (Ubuntu)
2017-06-08 05:19:00 Launchpad Janitor powerpc-ibm-utils (Ubuntu): status New Confirmed
2017-06-09 15:04:39 Manoj Iyer bug task added ubuntu-power-systems
2017-06-09 15:25:51 Frank Heimes ubuntu-power-systems: status New Confirmed
2017-06-12 10:52:59 Andrew Cloke bug added subscriber Andrew Cloke
2017-06-19 19:54:04 Steve Langasek affects powerpc-ibm-utils (Ubuntu) powerpc-utils (Ubuntu)
2017-06-19 19:59:46 Steve Langasek bug added subscriber Steve Langasek
2017-06-19 21:26:18 Steve Langasek nominated for series Ubuntu Zesty
2017-06-19 21:26:18 Steve Langasek bug task added powerpc-utils (Ubuntu Zesty)
2017-06-19 21:26:18 Steve Langasek nominated for series Ubuntu Yakkety
2017-06-19 21:26:18 Steve Langasek bug task added powerpc-utils (Ubuntu Yakkety)
2017-06-19 21:26:18 Steve Langasek nominated for series Ubuntu Xenial
2017-06-19 21:26:18 Steve Langasek bug task added powerpc-utils (Ubuntu Xenial)
2017-06-19 22:16:53 Launchpad Janitor powerpc-utils (Ubuntu): status Confirmed Fix Released
2017-06-19 23:54:56 Steve Langasek description Problem: During the scale-up test to 1000 VMs I could see 20 deploys failed due to following command failure.. Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 Validating I/O DLPAR capability...yes. kernel I/O op failed, rc = 26 len = 26. I have been looking through the logs on this system to piece together what is happening when the dlpar add failures occur. From what I am seeing we are trying to dlpar add a virtual network device and getting a error when trying to add the device to the system. > ########## May 17 05:18:00 2017 ########## > drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3 > Validating I/O DLPAR capability...yes. > Getting node types 0x00000003 > Could not find DRC property group in path: /proc/device-tree/ibm,serial. > Acquiring drc index 0x30000406 > get-sensor for 30000406: 0, 2 > Setting allocation state to 'alloc usable' > Setting indicator state to 'unisolate' > Configuring connector for drc index 30000406 > Adding device-tree node /proc/device-tree/vdevice/l-lan@30000406 > ofdt update: add_node /vdevice/l-lan@30000406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x00000003 > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device > Releasing drc index 0x30000406 > get-sensor for 30000406: 0, 1 > Setting isolation state to 'isolate' > Setting allocation state to 'alloc unusable' > get-sensor for 30000406: 0, 2 > drc_index 30000406 sensor-state: 2 > Resource is not available to the partition. > Removing device-tree node /proc/device-tree/vdevice/l-lan@30000406 > ########## May 17 05:20:11 2017 ########## From the drmgr log, you can see that we get a ENODEV return code when performing the kernel operation to add the device to the system. > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device This indicates that the rpadlpar_io kernel modules was unable to find the device in the device tree. This doesn not seem right because earlier in the drmgr logs we add the device to the device tree. Additionally, the drmgr code validates that the add succeeds by retrieveing the newly added device node from the device tree as a sanity check. There are no failures reported for this. > Adding device-tree node /proc/device-tree/vdevice/l-lan@30000406 > ofdt update: add_node /vdevice/l-lan@30000406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x00000003 I started scale-up testing and I could see deploys are going fine. Will post a comment here if I see further drmgr failures. Patches have been submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils-devel/GNEi65WBwkQ and https://groups.google.com/forum/#!topic/powerpc-utils-devel/hJfUb5wYPsE [SRU Justification] drmgr fails intermittently when adding devices to the system. [Test case] To be completed by IBM, who have access to the hardware. 1. Run a scale test of launching 1000 VMs on a Novalink system. 2. Observe that some of the deployments fail with the following error: kernel I/O op failed, rc = 26 len = 26. 3. Install powerpc-utils from -proposed 4. Run the scale test again. 5. Observe that all the deployments succeed. [Regression potential] This change cherry-picked from upstream corrects faulty handling of a 0 return code from syscalls. Regression potential appears to be minimal. Problem: During the scale-up test to 1000 VMs I could see 20 deploys failed due to following command failure.. Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 Validating I/O DLPAR capability...yes. kernel I/O op failed, rc = 26 len = 26. I have been looking through the logs on this system to piece together what is happening when the dlpar add failures occur. From what I am seeing we are trying to dlpar add a virtual network device and getting a error when trying to add the device to the system. > ########## May 17 05:18:00 2017 ########## > drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3 > Validating I/O DLPAR capability...yes. > Getting node types 0x00000003 > Could not find DRC property group in path: /proc/device-tree/ibm,serial. > Acquiring drc index 0x30000406 > get-sensor for 30000406: 0, 2 > Setting allocation state to 'alloc usable' > Setting indicator state to 'unisolate' > Configuring connector for drc index 30000406 > Adding device-tree node /proc/device-tree/vdevice/l-lan@30000406 > ofdt update: add_node /vdevice/l-lan@30000406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x00000003 > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device > Releasing drc index 0x30000406 > get-sensor for 30000406: 0, 1 > Setting isolation state to 'isolate' > Setting allocation state to 'alloc unusable' > get-sensor for 30000406: 0, 2 > drc_index 30000406 sensor-state: 2 > Resource is not available to the partition. > Removing device-tree node /proc/device-tree/vdevice/l-lan@30000406 > ########## May 17 05:20:11 2017 ########## From the drmgr log, you can see that we get a ENODEV return code when performing the kernel operation to add the device to the system. > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device This indicates that the rpadlpar_io kernel modules was unable to find the device in the device tree. This doesn not seem right because earlier in the drmgr logs we add the device to the device tree. Additionally, the drmgr code validates that the add succeeds by retrieveing the newly added device node from the device tree as a sanity check. There are no failures reported for this. > Adding device-tree node /proc/device-tree/vdevice/l-lan@30000406 > ofdt update: add_node /vdevice/l-lan@30000406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x00000003 I started scale-up testing and I could see deploys are going fine. Will post a comment here if I see further drmgr failures. Patches have been submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils-devel/GNEi65WBwkQ and https://groups.google.com/forum/#!topic/powerpc-utils-devel/hJfUb5wYPsE
2017-06-20 00:06:30 Steve Langasek powerpc-utils (Ubuntu Xenial): milestone ubuntu-16.04.3
2017-06-20 00:06:33 Steve Langasek powerpc-utils (Ubuntu Xenial): assignee Steve Langasek (vorlon)
2017-06-20 00:06:34 Steve Langasek powerpc-utils (Ubuntu Yakkety): assignee Steve Langasek (vorlon)
2017-06-20 00:06:36 Steve Langasek powerpc-utils (Ubuntu Zesty): assignee Steve Langasek (vorlon)
2017-06-20 00:06:45 Steve Langasek powerpc-utils (Ubuntu Xenial): status New In Progress
2017-06-20 00:06:47 Steve Langasek powerpc-utils (Ubuntu Yakkety): status New In Progress
2017-06-20 00:06:50 Steve Langasek powerpc-utils (Ubuntu Zesty): status New In Progress
2017-06-20 00:12:25 Brian Murray powerpc-utils (Ubuntu Zesty): status In Progress Fix Committed
2017-06-20 00:12:27 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2017-06-20 00:12:31 Brian Murray bug added subscriber SRU Verification
2017-06-20 00:12:38 Brian Murray tags architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 verification-needed
2017-06-20 00:14:14 Brian Murray powerpc-utils (Ubuntu Yakkety): status In Progress Fix Committed
2017-06-20 00:18:42 Brian Murray powerpc-utils (Ubuntu Xenial): status In Progress Fix Committed
2017-06-20 06:06:31 Frank Heimes ubuntu-power-systems: status Confirmed Fix Committed
2017-06-21 16:33:19 Steve Langasek tags architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 verification-needed architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 verification-done-xenial verification-needed
2017-07-04 12:12:17 Launchpad Janitor powerpc-utils (Ubuntu Xenial): status Fix Committed Fix Released
2017-07-04 12:12:24 Ɓukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team
2017-07-15 20:20:22 Mathew Hodson powerpc-utils (Ubuntu): importance Undecided High
2017-07-15 20:20:23 Mathew Hodson powerpc-utils (Ubuntu Xenial): importance Undecided High
2017-07-15 20:20:25 Mathew Hodson powerpc-utils (Ubuntu Yakkety): importance Undecided High
2017-07-15 20:20:27 Mathew Hodson powerpc-utils (Ubuntu Zesty): importance Undecided High
2017-07-19 16:48:27 Manoj Iyer ubuntu-power-systems: importance Undecided High
2017-07-24 15:36:17 Manoj Iyer tags architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 verification-done-xenial verification-needed architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 triage-g verification-done-xenial verification-needed
2017-08-24 19:48:50 Brian Murray tags architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 triage-g verification-done-xenial verification-needed architecture-ppc64le bugnameltc-154853 severity-high targetmilestone-inin16043 triage-g verification-done-xenial verification-done-zesty
2017-08-24 19:50:03 Launchpad Janitor powerpc-utils (Ubuntu Zesty): status Fix Committed Fix Released
2017-08-25 20:00:01 Manoj Iyer ubuntu-power-systems: status Fix Committed Fix Released