[Rocky] FS01 periodic job failed at overcloud prepare image giving Error: Unable to establish IPMI v2 / RMCP+ session\n'.: ProcessExecutionError: Unexpected error while running command

Bug #1792870 reported by chandan kumar on 2018-09-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Quique Llorente

Bug Description

Rocky periodic job FS001 failed at overcloud prepare image setup giving following errors:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky/4820d62/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz#_2018-09-17_02_06_38

2018-09-17 02:06:38 | Exception registering nodes: {u'status': u'FAILED', u'message': [{u'result': u'Node d51d0e4d-c04d-4672-b560-405980bfe993 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d51d0e4d-c04d-4672-b560-405980bfe993. Error: IPMI call failed: power status.'}, {u'result': u'Node 50ab94b4-9a55-4c1b-9616-26428eec977c did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 50ab94b4-9a55-4c1b-9616-26428eec977c. Error: IPMI call failed: power status.'}, {u'result': u'Node 5a572d9d-590b-4c67-b37e-363ae4234060 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 5a572d9d-590b-4c67-b37e-363ae4234060. Error: IPMI call failed: power status.'}, {u'result': u'Node d28945f3-06c2-4730-bb9b-d1124a1380f0 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d28945f3-06c2-4730-bb9b-d1124a1380f0. Error: IPMI call failed: power status.'}], u'result': u'Failure caused by error in tasks: send_message\n\n send_message [task_ex_id=cb21e549-04a7-4a93-9b84-84806d6be2a3] -> Workflow failed due to message status\n [wf_ex_id=31fbb0a4-8e79-4c28-8f16-4c12eeb981d3, idx=0]: Workflow failed due to message status\n'}

While going deep, we found that
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky/4820d62/logs/undercloud/var/log/extra/docker/containers/ironic_pxe_http/log/ironic/ironic-conductor.log.txt.gz#_2018-09-17_02_06_28_644

 2018-09-17 02:06:28.644 7 ERROR ironic.drivers.modules.ipmitool [req-7276def4-b4d4-4695-a060-9b0ccc998e5c 70b4958c371e40e9b34bd1262ff1111e 7a684903197e4a9f8c8c82b1d39c4c53 - default default] IPMI Error while attempting "ipmitool -I lanplus -H 192.168.100.234 -L ADMINISTRATOR -U admin -R 12 -N 5 -f /tmp/tmpz15Zrd power status" for node d28945f3-06c2-4730-bb9b-d1124a1380f0. Error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 192.168.100.234 -L ADMINISTRATOR -U admin -R 12 -N 5 -f /tmp/tmpz15Zrd power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\n': ProcessExecutionError: Unexpected error while running command.
2018-09-17 02:06:28.644 7 WARNING ironic.drivers.modules.ipmitool [req-7276def4-b4d4-4695-a060-9b0ccc998e5c 70b4958c371e40e9b34bd1262ff1111e 7a684903197e4a9f8c8c82b1d39c4c53 - default default] IPMI power status failed for node d28945f3-06c2-4730-bb9b-d1124a1380f0 with error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 192.168.100.234 -L ADMINISTRATOR -U admin -R 12 -N 5 -f /tmp/tmpz15Zrd power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\n'.: ProcessExecutionError: Unexpected error while running command.
2018-09-17 02:06:28.644 7 ERROR ironic.conductor.manager [req-7276def4-b4d4-4695-a060-9b0ccc998e5c 70b4958c371e40e9b34bd1262ff1111e 7a684903197e4a9f8c8c82b1d39c4c53 - default default] Failed to get power state for node d28945f3-06c2-4730-bb9b-d1124a1380f0. Error: IPMI call failed: power status.: IPMIFailure: IPMI call failed: power status.

There might be some issue with Ironic conductor leading to failed to bring node to power status.

tags: added: alert promotion-blocker
Changed in tripleo:
assignee: nobody → Quique Llorente (quiquell)
Quique Llorente (quiquell) wrote :

Looks like we have overlapping IPs
eth0
[ 88.307440] os-net-config[2426]: [2018/09/17 05:36:28 PM] NFO] running ifup on interface: eth0
[ 92.370410] os-net-config[2426]: [2018/09/17 05:36:32 PM] RROR] Failure(s) occurred when applying configuration
[ 92.372815] os-net-config[2426]: [2018/09/17 05:36:32 PM] RROR] stdout: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host already uses address 192.168.100.189.
[ 92.375545] os-net-config[2426]: , stderr:
[ 92.376418] os-net-config[2426]: Traceback (most recent call last):
[ 92.377631] os-net-config[2426]: File "/bin/os-net-config", line 10, in <module>
[ 92.378907] os-net-config[2426]: sys.exit(main())
[ 92.379850] os-net-config[2426]: File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 285, in main
[ 92.395047] os-net-config[2426]: activate=not opts.no_activate)
[

Quique Llorente (quiquell) wrote :

Looks like this is happening after the te-broker was re-constructed last week

Quique Llorente (quiquell) wrote :

Looks like IP address 192.168.100.189 is taken elswhere, all bmc instances trying to use it fail
cluded in the output.
{
  "nodes": [
    {
      "pm_password": "password",
      "name": "baremetal-1505-0",
      "memory": 8192,
      "pm_addr": "192.168.100.189",
      "mac": [
        "fa:16:3e:9c:46:16"
      ],
      "capabili

And then at bmc-1505

 4274.037635] openstackbmc[30231]: self.serversocket = ipmisession.Session._assignsocket(addrinfo)
[ 4274.039199] openstackbmc[30231]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/private/session.py", line 373, in _assignsocket
[ 4274.041213] openstackbmc[30231]: tmpsocket.bind(server[4])
[ 4274.042252] openstackbmc[30231]: File "/usr/lib64/python2.7/socket.py", line 224, in meth
[ 4274.043643] openstackbmc[30231]: return getattr(self._sock,name)(*args)
[ 4274.044853] openstackbmc[30231]: socket.error: rrno 99] Cannot assign requested address
[ 4274.046509] openstackbmc[30230]: Traceback (most recent call last):
[ 4274.047768] openstackbmc[30230]: File "/usr/local/bin/openstackbmc", line 322, in <module>
[ 4274.0

Quique Llorente (quiquell) wrote :

There are two stacks lingering at nodepool, let's remove them, and see if the issue disappear.

Quique Llorente (quiquell) wrote :

Confirmed lingering IP on those stacks (455, 431)
{
      "pm_password": "password",
      "name": "baremetal-431-1",
      "memory": 8192,
      "pm_addr": "192.168.100.189",
      "mac": [
        "fa:16:3e:fc:e2:68"
      ],
      "capabilities": "boot_option:local",
      "pm_type": "pxe_ipmitool",
      "disk": 80,
      "arch": "x86_64",
      "cpu": 4,
      "pm_user": "admin" },

wes hayutin (weshayutin) wrote :

Only see periodic jobs impacted here.. removing alert .. keeping promotion-blocker

tags: removed: alert
Quique Llorente (quiquell) wrote :

After cleaning up lingering stack we have no more issues like t his, closing the bug.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers