[queens] overcloud prepare image failed by giving IronicAction.node.set_provision_state failed: 'NoneType' object has no attribute '__getitem_

Bug #1792872 reported by chandan kumar
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Marios Andreou

Bug Description

Queens periodic jobs FS01 failed at overcloud prepare image step giving the following error:
https://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/59f5f7c/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz#_2018-09-16_06_30_22

2018-09-16 06:30:00 | + openstack overcloud node import instackenv.json
2018-09-16 06:30:05 | Waiting for messages on queue 'tripleo' with no timeout.
2018-09-16 06:30:22 | Exception registering nodes: {u'status': u'FAILED', u'message': [{u'result': u"Failed to run action [action_ex_id=bd0d73b4-6911-4f8f-ab48-41d9bc1d4e67, action_cls='<class 'mistral.actions.action_factory.IronicAction'>', attributes='{u'client_method_name': u'node.set_provision_state'}', params='{u'state': u'manage', u'node_uuid': u'5acd5ccb-7ea4-497b-aeb6-e4f6c352276c'}']\n IronicAction.node.set_provision_state failed: 'NoneType' object has no attribute '__getitem__' (HTTP 500)"}, {}, {}, {}], u'result': None}
2018-09-16 06:30:22 | Started Mistral Workflow tripleo.baremetal.v1.register_or_update. Execution ID: 797e80df-906a-44e8-a9fa-d37ee0a3f8df
2018-09-16 06:30:22 |
2018-09-16 06:30:22 | [{u'result': u"Failed to run action [action_ex_id=bd0d73b4-6911-4f8f-ab48-41d9bc1d4e67, action_cls='<class 'mistral.actions.action_factory.IronicAction'>', attributes='{u'client_method_name': u'node.set_provision_state'}', params='{u'state': u'manage', u'node_uuid': u'5acd5ccb-7ea4-497b-aeb6-e4f6c352276c'}']\n IronicAction.node.set_provision_state failed: 'NoneType' object has no attribute '__getitem__' (HTTP 500)"}, {}, {}, {}]
2018-09-16 06:30:22 | {u'status': u'FAILED', u'message': [{u'result': u"Failed to run action [action_ex_id=bd0d73b4-6911-4f8f-ab48-41d9bc1d4e67, action_cls='<class 'mistral.actions.action_factory.IronicAction'>', attributes='{u'client_method_name': u'node.set_provision_state'}', params='{u'state': u'manage', u'node_uuid': u'5acd5ccb-7ea4-497b-aeb6-e4f6c352276c'}']\n IronicAction.node.set_provision_state failed: 'NoneType' object has no attribute '__getitem__' (HTTP 500)"}, {}, {}, {}], u'result': None}

Below is the traceback from ironic conductor log.
https://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/59f5f7c/logs/undercloud/var/log/ironic/app.log.txt.gz#_2018-09-16_06_30_14_147

2018-09-16 06:30:14.147 23784 ERROR wsme.api [req-0cf2f1f1-bdc0-4072-9467-f1d5a02d5704 f0f20bf416fc4313983b66ffbc830981 68c9abc94e5748b3b4cd999cdb0bd00f - default default] Server-side error: "'NoneType' object has no attribute '__getitem__'". Detail:
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
    result = f(self, *args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/api/controllers/v1/node.py", line 609, in provision
    topic = pecan.request.rpcapi.get_topic_for(rpc_node)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/rpcapi.py", line 136, in get_topic_for
    ring = self.ring_manager[node.driver]

  File "/usr/lib/python2.7/site-packages/ironic/common/hash_ring.py", line 67, in __getitem__
    return self.ring[driver_name]

TypeError: 'NoneType' object has no attribute '__getitem__'

Changed in tripleo:
assignee: nobody → Quique Llorente (quiquell)
Revision history for this message
chandan kumar (chkumar246) wrote :
Download full text (3.4 KiB)

We are seeing a different issue on queens https://logs.rdoproject.org/24/567224/108/openstack-check/legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/eb39d51/logs/undercloud/var/log/ironic-inspector/ironic-inspector.log.txt.gz#_2018-09-18_00_58_25_777

018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base [-] The PXE filter DnsmasqFilter, state=initialized encountered an exception: Unexpected error while running command.
Command: sudo ironic-inspector-rootwrap /etc/ironic-inspector/rootwrap.conf systemctl start openstack-ironic-inspector-dnsmasq.service
Exit code: 99
Stdout: u''
Stderr: u'/usr/bin/ironic-inspector-rootwrap: Unauthorized command: systemctl start openstack-ironic-inspector-dnsmasq.service (no filter matched)\n'; resetting the filter: ProcessExecutionError: Unexpected error while running command.
Command: sudo ironic-inspector-rootwrap /etc/ironic-inspector/rootwrap.conf systemctl start openstack-ironic-inspector-dnsmasq.service
Exit code: 99
Stdout: u''
Stderr: u'/usr/bin/ironic-inspector-rootwrap: Unauthorized command: systemctl start openstack-ironic-inspector-dnsmasq.service (no filter matched)\n'
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base Traceback (most recent call last):
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base File "/usr/lib/python2.7/site-packages/ironic_inspector/pxe_filter/base.py", line 139, in fsm_reset_on_error
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base yield self.fsm
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base File "/usr/lib/python2.7/site-packages/ironic_inspector/pxe_filter/base.py", line 81, in inner
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base return method(self, *args, **kwargs)
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base File "/usr/lib/python2.7/site-packages/ironic_inspector/pxe_filter/dnsmasq.py", line 141, in init_filter
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base _execute(CONF.dnsmasq_pxe_filter.dnsmasq_start_command)
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base File "/usr/lib/python2.7/site-packages/ironic_inspector/pxe_filter/dnsmasq.py", line 313, in _execute
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base check_exit_code=not ignore_errors)
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 424, in execute
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base cmd=sanitized_cmd)
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base ProcessExecutionError: Unexpected error while running command.
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base Command: sudo ironic-inspector-rootwrap /etc/ironic-inspector/rootwrap.conf systemctl start openstack-ironic-inspector-dnsmasq.service
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base Exit code: 99
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter.base Stdout: u''
2018-09-18 00:58:25.777 1746 ERROR ironic_inspector.pxe_filter....

Read more...

Revision history for this message
wes hayutin (weshayutin) wrote :

there is a more recent ironic issue

Closing this bug:

Failed to run task [error=Failed to find action [action_name=baremetal_introspection.get_status], wf=tripleo.baremetal.v1.introspect, task=get_introspection_status]:\nTraceback (most recent call last):\n File "/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 63, in run_task\n task.run()\n File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper\n result = f(*args, **kwargs)\n File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 328, in run\n self._run_new()\n File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper\n result = f(*args, **kwargs)\n File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 354, in _run_new\n self._schedule_actions()\n File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 597, in _schedule_actions\n action = self._build_action()\n File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 493, in _build_action\n self.wf_spec.get_name()\n File "/usr/lib/python2.7/site-packages/mistral/engine/actions.py", line 622, in resolve_action_definition\n "Failed to find action [action_name=%s]" %

Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
tags: removed: alert promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote :

Adding a note here (rover). The issue is still being seen on Queens and Pike jobs so I think we need to reconsider the promotion blocker for this? There are 2 distinct traces:

1. What we have in the description of this bug "IronicAction.node.set_provision_state failed: 'NoneType' object has no attribute '__getitem__' (HTTP 500)"}, {}, {}, {}], u'result': None}"

2. What is in comment #2 from weshay "InvalidActionException: Failed to find action [action_name=baremetal_introspection.get_status]\n'}"

Recent examples of these are:

Queens:
  --> - success from "2018-09-24 06:37:13" http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/ba69b34/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz

  --> failed on "2018-09-26 06:50:17 " with trace #2 above http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/52d334f/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz

Pike: failed on "2018-09-26 05:13:08" with trace #1 http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-pike/0862b6c/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz

Not sure if we need a new bug for #2 yet?

Revision history for this message
Marios Andreou (marios-b) wrote :

First this is definitely a thing, from today on master at [1] for example. Unfortunately it doesn't happen consistently - the run prior to [1] was a success (via [2]). Furthermore the trace is not always the same but the error always occurs during the overcloud-prep-images.sh and specifically during the Ironic/Mistral introspection. I think this is a duplicate of https://bugs.launchpad.net/tripleo/+bug/1794952 but choosing this one since it was filed first and has more info already.

Did some digging today to collect info before reaching out to Ironic & Mistral folks. On the tripleo side the code involved lives here [3] and on the ironic side I believe it is this [4]. You can see more of the trace in the mistral engine [5] and executor [6] logs (e.g. like http://paste.openstack.org/raw/731354/ ).

[1] http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/034a1b1/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz
[2] http://cistatus.tripleo.org/promotion/
[3] https://github.com/openstack/tripleo-common/blob/5e0e291e0bec62b0ee9f3a989b591960d3a28ac3/workbooks/baremetal.yaml#L8-L59
[4] https://github.com/openstack/python-ironicclient/blob/fb94fb825cccecd11c4a2f78beeff3b0b19956f3/ironicclient/v1/node.py#L503
[5] http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/034a1b1/logs/undercloud/var/log/containers/mistral/engine.log.txt.gz
[6] http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/034a1b1/logs/undercloud/var/log/containers/mistral/executor.log.txt.gz

wes hayutin (weshayutin)
tags: added: alert promotion-blocker
Changed in tripleo:
status: Incomplete → Triaged
Revision history for this message
Martin Kopec (mkopec) wrote :

FS02 master and FS01 rocky failed the same way as FS01 and FS35 here https://bugs.launchpad.net/tripleo/+bug/1794952

In case of FS02 master, introspection passed [1], however the nodes are not found [2] when listing all manageable nodes by `openstack overcloud node provide --all-manageable`

In case of FS01 rocky, again introspection passed for all nodes [3], however, one of them has not changed to power on state [4] which could made the listing of all manageable nodes failed.

[1] https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/e11ad49/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz#_2018-10-03_02_14_41
[2] https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/e11ad49/logs/undercloud/var/log/extra/errors.txt.gz#_2018-10-03_02_12_31_544

[3] https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky/1c1457d/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz#_2018-10-03_02_12_40
[4] https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky/1c1457d/logs/undercloud/var/log/extra/errors.txt.gz#_2018-10-03_02_14_14_891

Revision history for this message
Brad P. Crochet (brad-9) wrote :

From what I can see, some, if not most[1], of the set_provision_state calls succeed. Which tells me this isn't in Mistral or the workflow itself.

[1] https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/034a1b1/logs/undercloud/var/log/containers/mistral/engine.log.txt.gz#_2018-10-03_01_59_13_491

Perhaps there is some underlying issue with the nodes given?

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

Something is definitely going on that is weird with the ironic API. See https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/034a1b1/logs/undercloud/var/log/containers/ironic/app.log.txt.gz#_2018-10-03_01_59_13_952

The only thing that I can think that could explain this is if the hash ring has not been built yet or is presently set to None when the request is processed. It could be that it is a new request thread on a brand new worker... which could open that possibility. I need to head out and I'll be out tomorrow, but someone should consider adding a check into the ironic API code to check if the hash ring is None, and if it is, force it to build.

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Revision history for this message
Dmitry Tantsur (divius) wrote :

Ironic story for #1: https://storyboard.openstack.org/#!/story/2003966

#2 seems to be a purely Mistral problem at first glance.

Changed in tripleo:
assignee: Quique Llorente (quiquell) → Marios Andreou (marios-b)
Revision history for this message
Marios Andreou (marios-b) wrote :

from the ironic story in comment #9
https://review.openstack.org/#/c/607904/ "Fixes a race condition in the hash ring code"

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.