Ironic failing to gracefully handle: ipmi error "insufficient resources for session"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ironic |
Fix Released
|
Medium
|
Julia Kreger |
Bug Description
While testing with current ironic master branch, Upon attempting to deploy, utilizing ironic to deploy a number of nodes (ten to thirty five nodes), on a high density hardware chassis (HP Moonshot) that utilizes a dual bridge IPMI bus where the only difference between the nodes is the MAC address and the bridging target number. We observed a number of nodes that failed to deploy due to ipmitool returning "Error in open session response message : insufficient resources for session"
Apparent root cause:
_exec_ipmitool does not have retry logic logic wrapped around the execution of the script and instead the parameters are passed to the command line of ipmitool. In essence, ipmitool and Ironic presently fail hard when the BMC says "give me a minute".
---
How to reproduce:
Have ironic running in a stand-alone environment with no other openstack services.
Attempt to deploy, in-mass, to a number of nodes on that utilize dual ipmi_bridging. This is being orchustrated by python code utilizing the python-ironicclient library setting the node instance_info, and then setting the node to an active state, in series, however not waiting for any of the operations or nodes to reach an active state.
Attempting to tune the ipmi settings, specifically increasing the min_command_
---
Ironic conductor log output for one of the nodes that failed to deploy in this manor:
2015-03-12 19:20:30.699 4307 DEBUG oslo_concurrenc
r chassis bootdev disk options=persistent execute /usr/local/
2015-03-12 19:20:30.766 4307 DEBUG oslo_concurrenc
k options=persistent" returned: 0 in 0.568s execute /usr/local/
2015-03-12 19:20:30.769 4307 DEBUG ironic.common.utils [-] Execution completed, command line is "ipmitool -I lanplus -H 10.0.1.5 -L ADMINISTRATOR -U administrator -B 0 -T 158 -b 7 -t 114 -R 12 -N 5 -f /tmp/tmp3uQ
DcS chassis bootdev disk options=persistent" execute /usr/local/
2015-03-12 19:20:30.772 4307 DEBUG ironic.common.utils [-] Command stdout is: "Set Boot Device to disk
" execute /usr/local/
2015-03-12 19:20:30.775 4307 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/local/
2015-03-12 19:20:30.885 4307 DEBUG oslo_concurrenc
k options=persistent" returned: 1 in 0.185s execute /usr/local/
2015-03-12 19:20:30.888 4307 DEBUG oslo_concurrenc
ptions=persistent' failed. Not Retrying. execute /usr/local/
2015-03-12 19:20:30.891 4307 WARNING ironic.
Command: ipmitool -I lanplus -H 10.0.1.5 -L ADMINISTRATOR -U administrator -B 0 -T 160 -b 7 -t 114 -R 12 -N 5 -f /tmp/tmpgaadMr chassis bootdev disk options=persistent
Exit code: 1
Stdout: u''
Stderr: u'Error in open session response message : insufficient resources for session\n\nError: Unable to establish IPMI v2 / RMCP+ session\nError setting Chassis Boot Parameter 0\nError setting Chassis Boot Parameter 4\n'
2015-03-12 19:20:30.897 4307 ERROR ironic.
ed: chassis bootdev disk options=persistent.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
2015-03-12 19:20:30.897 4307 TRACE ironic.
ironic node-list:
+------
| UUID | Instance UUID | Power State | Provisioning State | Maintenance |
+------
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
| a8cb6624-
+------
ironic node-show nodes 16, 17, 20:
+------
| Property | Value |
+------
| instance_uuid | None |
| target_power_state | None |
| properties | {u'memory_mb': u'8160', u'cpu_arch': u'x86_64', u'local_gb': u'450', |
| | u'cpus': u'3'} |
| maintenance | False |
| driver_info | {u'ipmi_
| | u'ipmi_
| | u'http://
| | u'ipmi_username': u'administrator', u'ipmi_address': u'10.0.1.5', |
| | u'ipmi_
| | u'ipmi_bridging': u'dual', u'deploy_ramdisk': u'http://
| | /coreos_
| extra | {} |
| last_error | Asynchronous exception for node a8cb6624-
| | Node failed to move to active state. exception: IPMI call failed: |
| | chassis bootdev disk options=persistent. |
| created_at | 2015-03-
| target_
| driver | agent_ipmitool |
| updated_at | 2015-03-
| maintenance_reason | None |
| instance_info | {u'root_gb': 10, u'image_source': |
| | u'http://
| | u'7f76ce7cbff05
| | u'http://
| | u'configdrive': u'http://
| | -affc-046ebb96e
| driver_
| chassis_uuid | |
| provision_state | deploy failed |
| reservation | None |
| power_state | power off |
| console_enabled | False |
| uuid | a8cb6624-
+------
+------
| Property | Value |
+------
| instance_uuid | None |
| target_power_state | None |
| properties | {u'memory_mb': u'8160', u'cpu_arch': u'x86_64', u'local_gb': u'450', |
| | u'cpus': u'3'} |
| maintenance | False |
| driver_info | {u'ipmi_
| | u'ipmi_
| | u'http://
| | u'ipmi_username': u'administrator', u'ipmi_address': u'10.0.1.5', |
| | u'ipmi_
| | u'ipmi_bridging': u'dual', u'deploy_ramdisk': u'http://
| | /coreos_
| extra | {} |
| last_error | Failed to deploy. Error: IPMI call failed: raw 0x00 0x08 0x03 0x08. |
| created_at | 2015-03-
| target_
| driver | agent_ipmitool |
| updated_at | 2015-03-
| maintenance_reason | None |
| instance_info | {u'root_gb': 10, u'image_source': |
| | u'http://
| | u'7f76ce7cbff05
| | u'http://
| | u'configdrive': u'http://
| | -affc-046ebb96e
| driver_
| chassis_uuid | |
| provision_state | deploy failed |
| reservation | None |
| power_state | power off |
| console_enabled | False |
| uuid | a8cb6624-
+------
+------
| Property | Value |
+------
| instance_uuid | None |
| target_power_state | None |
| properties | {u'memory_mb': u'8160', u'cpu_arch': u'x86_64', u'local_gb': u'450', |
| | u'cpus': u'3'} |
| maintenance | False |
| driver_info | {u'ipmi_
| | u'ipmi_
| | u'http://
| | u'ipmi_username': u'administrator', u'ipmi_address': u'10.0.1.5', |
| | u'ipmi_
| | u'ipmi_bridging': u'dual', u'deploy_ramdisk': u'http://
| | /coreos_
| extra | {} |
| last_error | Asynchronous exception for node a8cb6624-
| | Node failed to move to active state. exception: IPMI call failed: raw |
| | 0x00 0x08 0x03 0x08. |
| created_at | 2015-03-
| target_
| driver | agent_ipmitool |
| updated_at | 2015-03-
| maintenance_reason | None |
| instance_info | {u'root_gb': 10, u'image_source': |
| | u'http://
| | u'7f76ce7cbff05
| | u'http://
| | u'configdrive': u'http://
| | -affc-046ebb96e
| driver_
| chassis_uuid | |
| provision_state | deploy failed |
| reservation | None |
| power_state | power off |
| console_enabled | False |
| uuid | a8cb6624-
+------
description: | updated |
tags: | added: ipmi |
Changed in ironic: | |
assignee: | nobody → Julia Kreger (juliaashleykreger) |
Changed in ironic: | |
importance: | Undecided → Medium |
milestone: | none → kilo-rc1 |
Changed in ironic: | |
status: | Fix Committed → Fix Released |
Changed in ironic: | |
milestone: | kilo-rc1 → 2015.1.0 |
Fix proposed to branch: master /review. openstack. org/168120
Review: https:/