Failed to get power state for node FS01/02

Bug #1797526 reported by Martin Kopec
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
wes hayutin

Bug Description

Promotion jobs failed during Prepare the overcloud images for deploy due to:

Exception registering nodes: {u'status': u'FAILED', u'message': [{u'result': u'Node 7bc9d0d0-f57c-496a-ba58-d8de1441d42d did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 7bc9d0d0-f57c-496a-ba58-d8de1441d42d. Error: IPMI call failed: power status.'}, {u'result': u'Node 378f5490-5d15-4293-b91e-d3c5ae975aa2 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 378f5490-5d15-4293-b91e-d3c5ae975aa2. Error: IPMI call failed: power status.'}], u'result': u'Failure caused by error in tasks: send_message\n\n send_message [task_ex_id=a9b4c588-ddf6-4d6b-819e-55ebcab6f3d9] -> Workflow failed due to message status\n [wf_ex_id=451f6911-7f5e-46af-8ac7-262519c4af37, idx=0]: Workflow failed due to message status\n'}

[1] https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/23bb2e7/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz#_2018-10-11_07_42_10

[2] https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/a82274f/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz#_2018-10-11_07_41_55

Revision history for this message
Martin Kopec (mkopec) wrote :

This may be related to https://bugs.launchpad.net/tripleo/+bug/1797527.
This bug failed due to an ironic error:

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/a82274f/logs/undercloud/var/log/extra/errors.txt.gz#_2018-10-11_07_41_41_674

2018-10-11 07:41:41.674 ERROR /var/log/extra/docker/containers/ironic_conductor/log/ironic/ironic-conductor.log: 7 ERROR ironic.drivers.modules.ipmitool [req-ac5eef48-b4aa-455f-9aeb-f038f10b9853 df74e7d6dae246faa6021488d0444e22 9c9f0a8feaff4f2a895b4779564304fa - default default] IPMI Error while attempting "ipmitool -I lanplus -H 192.168.100.127 -L ADMINISTRATOR -U admin -R 12 -N 5 -f /tmp/tmpw7ebpU power status" for node d3cd473f-fa5a-4529-911c-87ef24f4b8b9. Error: Unexpected error while running command.

however it can be related to the mistral error during introspection described in https://bugs.launchpad.net/tripleo/+bug/1797527.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → stein-1
Revision history for this message
Martin Kopec (mkopec) wrote :
tags: added: promotion-blocker
Revision history for this message
wes hayutin (weshayutin) wrote :
Download full text (7.5 KiB)

https://logs.rdoproject.org/89/608589/6/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/b472dad/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz

018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall [-] Dynamic backoff interval looping call 'ironic.conductor.utils._wait' failed: LoopingCallTimeOut: Looping call timed out after 18.52 seconds
2018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall Traceback (most recent call last):
2018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 141, in _run_loop
2018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall idle = idle_for_func(result, watch.elapsed())
2018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 338, in _idle_for
2018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall % self._error_time)
2018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall LoopingCallTimeOut: Looping call timed out after 18.52 seconds
2018-10-12 10:24:41.808 24389 ERROR oslo.service.loopingcall
2018-10-12 10:24:41.813 24389 ERROR ironic.conductor.utils [req-85e82205-d491-445f-bb14-f582b8c4e760 1da9e2fed7d74dbc844489cfd8f64bda 91ed5b464a604594bba717ce47c013bc - default default] Timed out after 30 secs waiting for power power on on node abd3ff9e-6657-4ae1-8892-2c6731481679.: LoopingCallTimeOut: Looping call timed out after 18.52 seconds

Later in the log

https://logs.rdoproject.org/89/608589/6/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/b472dad/logs/undercloud/var/log/ironic/ironic-conductor.log.txt.gz#_2018-10-12_10_47_07_350

2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall [-] Dynamic backoff interval looping call 'ironic.conductor.utils._wait' failed: LoopingCallTimeOut: Looping call timed out after 19.96 seconds
2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall Traceback (most recent call last):
2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 141, in _run_loop
2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall idle = idle_for_func(result, watch.elapsed())
2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 338, in _idle_for
2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall % self._error_time)
2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall LoopingCallTimeOut: Looping call timed out after 19.96 seconds
2018-10-12 10:47:07.349 24389 ERROR oslo.service.loopingcall
2018-10-12 10:47:07.350 24389 ERROR ironic.conductor.utils [req-84ca31f4-442a-4268-b434-4299ffdbbb3d 1da9e2fed7d74dbc844489cfd8f64bda 91ed5b464a604594bba717ce47c013bc - default default] Timed out after 30 secs waiting for power power on on node abd3ff9e-6657-4ae1-8892-2c6731481679.: LoopingCallTimeOut: Looping call timed out after 19.96 seconds
2018-10-12 10:47:07.369 24389 ERROR ironic.conductor.manager [req-84ca31f4-442a-4268-b434-4299ffdbbb3...

Read more...

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The 'LoopingCallTimeOut: Looping call timed out' message is likely a red herring, ironic-tempest-dsvm-ipa-partition-uefi-pxe_ipmitool-tinyipa job is reported by elastic-recheck stats as 100% passing with that error message appeared in controller/logs/screen-ir-cond.txt. It seems that that message only indicates for future retrying attempts.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/610078

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/610087

Revision history for this message
Rafael Folco (rafaelfolco) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

For the trace above - two things:
 - there was no ipmitool installed (rhel minimal)
 - checking of upshift has been set up for OVB

so that may not be the same problem

Revision history for this message
Martin Schuppert (mschuppert) wrote :
Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/610087
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=c1c6856316260412cf9ac6b27a41373229a538ba
Submitter: Zuul
Branch: master

commit c1c6856316260412cf9ac6b27a41373229a538ba
Author: Wes Hayutin <email address hidden>
Date: Fri Oct 12 09:35:06 2018 -0600

    Add multi ports to tcpdump for ironic debug

    This commit adds an array of ports to be monitored while debugging
    ironic issues via tcpdump.

    Related-Bug: #1797526
    Change-Id: Ib941412b7ccdc06242092ac784c287c10e39fdfb

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/610078
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=850f2242648662c6c0c601dc996c28d18cfa483d
Submitter: Zuul
Branch: master

commit 850f2242648662c6c0c601dc996c28d18cfa483d
Author: Wes Hayutin <email address hidden>
Date: Fri Oct 12 09:03:37 2018 -0600

    increase the available ram for the bmc node

    We may be running out of memory on the bmc node
    Match the CI settings that merged
    https://review.openstack.org/#/c/610113/

    Related-Bug: #1797526
    Change-Id: I45070d5eca7844f283111de52ac10806d3fe5b57

Changed in tripleo:
milestone: stein-2 → stein-3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
tags: removed: promotion-blocker
Changed in tripleo:
milestone: stein-3 → stein-rc1
Revision history for this message
wes hayutin (weshayutin) wrote :

fs01/fs02 are no longer hitting this issue

Changed in tripleo:
status: Incomplete → Fix Released
assignee: nobody → wes hayutin (weshayutin)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.