[Rocky] FS01 periodic job failed at overcloud prepare image giving Error: Unable to establish IPMI v2 / RMCP+ session\n'.: ProcessExecutionError: Unexpected error while running command

Bug #1792870 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Quique Llorente

Bug Description

Rocky periodic job FS001 failed at overcloud prepare image setup giving following errors:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky/4820d62/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz#_2018-09-17_02_06_38

2018-09-17 02:06:38 | Exception registering nodes: {u'status': u'FAILED', u'message': [{u'result': u'Node d51d0e4d-c04d-4672-b560-405980bfe993 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d51d0e4d-c04d-4672-b560-405980bfe993. Error: IPMI call failed: power status.'}, {u'result': u'Node 50ab94b4-9a55-4c1b-9616-26428eec977c did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 50ab94b4-9a55-4c1b-9616-26428eec977c. Error: IPMI call failed: power status.'}, {u'result': u'Node 5a572d9d-590b-4c67-b37e-363ae4234060 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 5a572d9d-590b-4c67-b37e-363ae4234060. Error: IPMI call failed: power status.'}, {u'result': u'Node d28945f3-06c2-4730-bb9b-d1124a1380f0 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d28945f3-06c2-4730-bb9b-d1124a1380f0. Error: IPMI call failed: power status.'}], u'result': u'Failure caused by error in tasks: send_message\n\n send_message [task_ex_id=cb21e549-04a7-4a93-9b84-84806d6be2a3] -> Workflow failed due to message status\n [wf_ex_id=31fbb0a4-8e79-4c28-8f16-4c12eeb981d3, idx=0]: Workflow failed due to message status\n'}

While going deep, we found that
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky/4820d62/logs/undercloud/var/log/extra/docker/containers/ironic_pxe_http/log/ironic/ironic-conductor.log.txt.gz#_2018-09-17_02_06_28_644

 2018-09-17 02:06:28.644 7 ERROR ironic.drivers.modules.ipmitool [req-7276def4-b4d4-4695-a060-9b0ccc998e5c 70b4958c371e40e9b34bd1262ff1111e 7a684903197e4a9f8c8c82b1d39c4c53 - default default] IPMI Error while attempting "ipmitool -I lanplus -H 192.168.100.234 -L ADMINISTRATOR -U admin -R 12 -N 5 -f /tmp/tmpz15Zrd power status" for node d28945f3-06c2-4730-bb9b-d1124a1380f0. Error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 192.168.100.234 -L ADMINISTRATOR -U admin -R 12 -N 5 -f /tmp/tmpz15Zrd power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\n': ProcessExecutionError: Unexpected error while running command.
2018-09-17 02:06:28.644 7 WARNING ironic.drivers.modules.ipmitool [req-7276def4-b4d4-4695-a060-9b0ccc998e5c 70b4958c371e40e9b34bd1262ff1111e 7a684903197e4a9f8c8c82b1d39c4c53 - default default] IPMI power status failed for node d28945f3-06c2-4730-bb9b-d1124a1380f0 with error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 192.168.100.234 -L ADMINISTRATOR -U admin -R 12 -N 5 -f /tmp/tmpz15Zrd power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\n'.: ProcessExecutionError: Unexpected error while running command.
2018-09-17 02:06:28.644 7 ERROR ironic.conductor.manager [req-7276def4-b4d4-4695-a060-9b0ccc998e5c 70b4958c371e40e9b34bd1262ff1111e 7a684903197e4a9f8c8c82b1d39c4c53 - default default] Failed to get power state for node d28945f3-06c2-4730-bb9b-d1124a1380f0. Error: IPMI call failed: power status.: IPMIFailure: IPMI call failed: power status.

There might be some issue with Ironic conductor leading to failed to bring node to power status.

tags: added: alert promotion-blocker
Changed in tripleo:
assignee: nobody → Quique Llorente (quiquell)
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :

Looks like we have overlapping IPs
eth0
[ 88.307440] os-net-config[2426]: [2018/09/17 05:36:28 PM] NFO] running ifup on interface: eth0
[ 92.370410] os-net-config[2426]: [2018/09/17 05:36:32 PM] RROR] Failure(s) occurred when applying configuration
[ 92.372815] os-net-config[2426]: [2018/09/17 05:36:32 PM] RROR] stdout: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host already uses address 192.168.100.189.
[ 92.375545] os-net-config[2426]: , stderr:
[ 92.376418] os-net-config[2426]: Traceback (most recent call last):
[ 92.377631] os-net-config[2426]: File "/bin/os-net-config", line 10, in <module>
[ 92.378907] os-net-config[2426]: sys.exit(main())
[ 92.379850] os-net-config[2426]: File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 285, in main
[ 92.395047] os-net-config[2426]: activate=not opts.no_activate)
[

Revision history for this message
Quique Llorente (quiquell) wrote :

Looks like this is happening after the te-broker was re-constructed last week

Revision history for this message
Quique Llorente (quiquell) wrote :

Looks like IP address 192.168.100.189 is taken elswhere, all bmc instances trying to use it fail
cluded in the output.
{
  "nodes": [
    {
      "pm_password": "password",
      "name": "baremetal-1505-0",
      "memory": 8192,
      "pm_addr": "192.168.100.189",
      "mac": [
        "fa:16:3e:9c:46:16"
      ],
      "capabili

And then at bmc-1505

 4274.037635] openstackbmc[30231]: self.serversocket = ipmisession.Session._assignsocket(addrinfo)
[ 4274.039199] openstackbmc[30231]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/session.py", line 373, in _assignsocket
[ 4274.041213] openstackbmc[30231]: tmpsocket.bind(server[4])
[ 4274.042252] openstackbmc[30231]: File "/usr/lib64/python2.7/socket.py", line 224, in meth
[ 4274.043643] openstackbmc[30231]: return getattr(self._sock,name)(*args)
[ 4274.044853] openstackbmc[30231]: socket.error: rrno 99] Cannot assign requested address
[ 4274.046509] openstackbmc[30230]: Traceback (most recent call last):
[ 4274.047768] openstackbmc[30230]: File "/usr/local/bin/openstackbmc", line 322, in <module>
[ 4274.0

Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :

There are two stacks lingering at nodepool, let's remove them, and see if the issue disappear.

Revision history for this message
Quique Llorente (quiquell) wrote :

Confirmed lingering IP on those stacks (455, 431)
{
      "pm_password": "password",
      "name": "baremetal-431-1",
      "memory": 8192,
      "pm_addr": "192.168.100.189",
      "mac": [
        "fa:16:3e:fc:e2:68"
      ],
      "capabilities": "boot_option:local",
      "pm_type": "pxe_ipmitool",
      "disk": 80,
      "arch": "x86_64",
      "cpu": 4,
      "pm_user": "admin" },

Revision history for this message
wes hayutin (weshayutin) wrote :

Only see periodic jobs impacted here.. removing alert .. keeping promotion-blocker

tags: removed: alert
Revision history for this message
Quique Llorente (quiquell) wrote :

After cleaning up lingering stack we have no more issues like t his, closing the bug.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.