check-tripleo-ironic-undercloud-precise failing

Bug #1300589 reported by Robert Collins
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
High
Chris Krelle

Bug Description

e.g. http://logs.openstack.org/43/84043/2/check-tripleo/check-tripleo-ironic-undercloud-precise/a956ec9/

2014-04-01 02:55:51.973 | + echo 'Waiting for the undercloud stack to be ready'
2014-04-01 02:55:51.973 | Waiting for the undercloud stack to be ready
2014-04-01 02:55:51.974 | + wait_for_stack_ready 220 10 undercloud
2014-04-01 03:31:51.728 | Command output matched 'CREATE_FAILED'. Exiting...

Looking at nova-compute log:
2014-04-01 03:31:21.939 4490 ERROR nova.compute.manager [req-860b9cc8-d71a-4804-9384-7e8cb2c59057 771f2c6bbf2045a893d039346c8ab5c8 9eb55fac42dc4fcc928f7b18aa1a5d36] [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] Instance failed to spawn
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] Traceback (most recent call last):
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1718, in _spawn
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] block_device_info)
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/ironic/nova/virt/ironic/driver.py", line 472, in spawn
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] timer.start(interval=10).wait()
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] return hubs.get_hub().switch()
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] return self.greenlet.switch()
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/loopingcall.py", line 78, in _inner
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] self.f(*self.args, **self.kw)
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/ironic/nova/virt/ironic/driver.py", line 468, in _wait_for_active
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] raise exception.InstanceDeployFailure(msg)
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] InstanceDeployFailure: Failed to provision instance 05e3af7a-383a-4697-9cc9-bd7c572600f0: Deploy timed out, but an unhandled exception was encountered while aborting. More info may be found in the log file.
2014-04-01 03:31:21.939 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0]
2014-04-01 03:31:35.365 4490 ERROR nova.compute.manager [req-860b9cc8-d71a-4804-9384-7e8cb2c59057 771f2c6bbf2045a893d039346c8ab5c8 9eb55fac42dc4fcc928f7b18aa1a5d36] [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] Error: Failed to provision instance 05e3af7a-383a-4697-9cc9-bd7c572600f0: Deploy timed out, but an unhandled exception was encountered while aborting. More info may be found in the log file.
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] Traceback (most recent call last):
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1309, in _build_instance
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] set_access_ip=set_access_ip)
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 399, in decorated_function
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] return function(self, context, *args, **kwargs)
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1721, in _spawn
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] LOG.exception(_('Instance failed to spawn'), instance=instance)
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] six.reraise(self.type_, self.value, self.tb)
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1718, in _spawn
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] block_device_info)
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/ironic/nova/virt/ironic/driver.py", line 472, in spawn
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] timer.start(interval=10).wait()
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] return hubs.get_hub().switch()
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] return self.greenlet.switch()
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/loopingcall.py", line 78, in _inner
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] self.f(*self.args, **self.kw)
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/ironic/nova/virt/ironic/driver.py", line 468, in _wait_for_active
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] raise exception.InstanceDeployFailure(msg)
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0] InstanceDeployFailure: Failed to provision instance 05e3af7a-383a-4697-9cc9-bd7c572600f0: Deploy timed out, but an unhandled exception was encountered while aborting. More info may be found in the log file.
2014-04-01 03:31:35.365 4490 TRACE nova.compute.manager [instance: 05e3af7a-383a-4697-9cc9-bd7c572600f0]

And Ironic-conductor log:
(stevedore.extension): 2014-04-01 02:40:13,499 ERROR Could not load 'pxe_seamicro': PXEAndSeaMicroDriver
(stevedore.extension): 2014-04-01 02:40:13,500 ERROR PXEAndSeaMicroDriver
Traceback (most recent call last):
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/stevedore/extension.py", line 162, in _load_plugins
    verify_requirements,
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/stevedore/enabled.py", line 66, in _load_one_plugin
    verify_requirements,
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/stevedore/extension.py", line 180, in _load_one_plugin
    obj = plugin(*invoke_args, **invoke_kwds)
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/drivers/pxe.py", line 102, in __init__
    raise exception.DriverNotFound('PXEAndSeaMicroDriver')
DriverNotFound: PXEAndSeaMicroDriver
(stevedore.extension): 2014-04-01 02:40:13,501 ERROR Could not load 'fake_seamicro': FakeSeaMicroDriver
(stevedore.extension): 2014-04-01 02:40:13,502 ERROR FakeSeaMicroDriver
Traceback (most recent call last):
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/stevedore/extension.py", line 162, in _load_plugins
    verify_requirements,
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/stevedore/enabled.py", line 66, in _load_one_plugin
    verify_requirements,
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/stevedore/extension.py", line 180, in _load_one_plugin
    obj = plugin(*invoke_args, **invoke_kwds)
  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/drivers/fake.py", line 87, in __init__
    raise exception.DriverNotFound('FakeSeaMicroDriver')
DriverNotFound: FakeSeaMicroDriver
(ironic.conductor.manager): 2014-04-01 03:31:17,856 ERROR Timeout reached when waiting callback for node f36bc755-df36-40fc-8d57-9504e194eadd
(glanceclient.common.http): 2014-04-01 03:31:17,879 ERROR Request returned failure status.
(ironic.conductor.utils): 2014-04-01 03:31:17,879 ERROR Cleanup failed for node f36bc755-df36-40fc-8d57-9504e194eadd after deploy timeout: HTTPUnauthorized (HTTP 401)

Revision history for this message
Robert Collins (lifeless) wrote :

This is the problem:
       _create_token_file(task, node)
        _update_neutron(task, node)
        manager_utils.node_set_boot_device(task, node, 'pxe', persistent=True)
        manager_utils.node_power_action(task, node, states.REBOOT)

There's no synchronisation with neutron (neither poll nor call-back) to know that (all) the dnsmasq processes serving that port have been updated - and a VM that boots quickly may DHCP before the dnsmasq is hupped.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/84361

Changed in ironic:
assignee: nobody → Robert Collins (lifeless)
status: New → In Progress
Changed in ironic:
assignee: Robert Collins (lifeless) → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
Chris Krelle (nobodycam) wrote :

looking in to this I am finding that the test nodes in the gate tests appear to be in the power on state when they are added to ironic.

2014-04-04 22:07:15.971 3703 INFO ironic.conductor.manager [-] During sync_power_state, node 8dc1d889-bd74-4edf-a817-8e2cc89c50ce has no previous known state. Recording current state 'power on'.

ironic has the config option force_power_state_during_sync which in this case was not taking affect be cause the initial state in the database is none. I will add a patch to correct the issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/85529

Changed in ironic:
assignee: Lucas Alvares Gomes (lucasagomes) → Chris Krelle (nobodycam)
Changed in ironic:
importance: Undecided → High
Revision history for this message
Dmitry Tantsur (divius) wrote :

Hi! What's the state of this bug? Review https://review.openstack.org/#/c/85529/ has "DO NOT MERGE" in title and no activity for more than a month.

Revision history for this message
Michael Davies (mrda) wrote :

See also https://review.openstack.org/#/c/97693/ which has been WONTFIX'd because of this bug and it's proposed patch.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Opened a separate bug to track the neutron port update race @ https://bugs.launchpad.net/ironic/+bug/1334447 Workaround patch merged at https://review.openstack.org/#/c/91719/, proposed spec for a better solution @ https://review.openstack.org/#/c/99770/

Revision history for this message
aeva black (tenbrae) wrote :

This was fixed by a combination of many patches addressing separate bugs -- the end result is that this jenkins job is currently passing. Closing this bug as the existing issues seem to have been resolved. Please re-open it if you feel this is incorrect.

Changed in ironic:
status: In Progress → Fix Committed
milestone: none → juno-2
Changed in ironic:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by lifeless (<email address hidden>) on branch: master
Review: https://review.openstack.org/84361
Reason: We've gotten it going without this.

Thierry Carrez (ttx)
Changed in ironic:
milestone: juno-2 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Chris Krelle (<email address hidden>) on branch: master
Review: https://review.openstack.org/85529
Reason: bug fixed with another patch

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.