VMware: InstanceNotRescuable hit during rescue tempest tests

Bug #1278149 reported by Ryan Hsu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Gary Kotton

Bug Description

Minesweeper CI is seeing the following Tempest suites fail due to errors during performing rescue operations.

tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON
tempest.api.compute.v3.servers.test_server_rescue.ServerRescueTestXML
tempest.api.compute.v3.servers.test_server_rescue.ServerRescueV3Test

Error message seen in the nova cpu log is:

Traceback (most recent call last):
  File "/opt/stack/oslo.messaging/oslo/messaging/_executors/base.py", line 36, in _dispatch
    incoming.reply(self.callback(incoming.ctxt, incoming.message))
  File "/opt/stack/oslo.messaging/oslo/messaging/rpc/dispatcher.py", line 134, in __call__
    return self._dispatch(endpoint, method, ctxt, args)
  File "/opt/stack/oslo.messaging/oslo/messaging/rpc/dispatcher.py", line 104, in _dispatch
    result = getattr(endpoint, method)(ctxt, **new_args)
  File "/opt/stack/nova/nova/compute/manager.py", line 356, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/opt/stack/nova/nova/exception.py", line 88, in wrapped
    payload)
  File "/opt/stack/nova/nova/openstack/common/excutils.py", line 68, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/exception.py", line 71, in wrapped
    return f(self, context, *args, **kw)
  File "/opt/stack/nova/nova/compute/manager.py", line 240, in decorated_function
    pass
  File "/opt/stack/nova/nova/openstack/common/excutils.py", line 68, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/compute/manager.py", line 226, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/opt/stack/nova/nova/compute/manager.py", line 291, in decorated_function
    function(self, context, *args, **kwargs)
  File "/opt/stack/nova/nova/compute/manager.py", line 2731, in rescue_instance
    reason=_("Driver Error: %s") % unicode(e))
InstanceNotRescuable: Instance 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0 cannot be rescued: Driver Error: can't set attribute

Full logs for a run that saw this error is available here: http://208.91.1.172//logs/nova/72125/1/

Ryan Hsu (rhsu)
description: updated
Changed in nova:
assignee: nobody → Shawn Hartsock (hartsock)
Revision history for this message
Shawn Hartsock (hartsock) wrote :

What's interesting is something is causing the InstanceNotRescuable fault to mask the deeper fault in vmops.py ... this could make field diagnostics very difficult. We should invest some time in another patch to address that problem. Here's the important part of the log files:

2014-02-08 14:27:04.706 DEBUG nova.virt.vmwareapi.vmops [req-725a0ee2-276d-4252-97dd-8e360e657226 ServerRescueTestXML-107000062 ServerRescueTestXML-601874270] [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] Powered off the VM power_off /opt/stack/nova/nova/virt/vmwareapi/vmops.py:1152
2014-02-08 14:27:04.708 ERROR nova.compute.manager [req-725a0ee2-276d-4252-97dd-8e360e657226 ServerRescueTestXML-107000062 ServerRescueTestXML-601874270] [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] Error trying to Rescue Instance
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] Traceback (most recent call last):
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] File "/opt/stack/nova/nova/compute/manager.py", line 2725, in rescue_instance
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] rescue_image_meta, admin_password)
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 670, in rescue
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] _vmops.rescue(context, instance, network_info, image_meta)
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 1087, in rescue
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] r_instance['name'] = r_instance['name'] + self._rescue_suffix
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] File "/opt/stack/nova/nova/objects/base.py", line 388, in __setitem__
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] setattr(self, name, value)
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0] AttributeError: can't set attribute
2014-02-08 14:27:04.708 24360 TRACE nova.compute.manager [instance: 3f317fe3-1777-4f0d-a60a-f2370d9b2fb0]
2014-02-08 14:27:04.710 24360 DEBUG oslo.messaging._drivers.amqpdriver [-] MSG_ID is b979f498fc1547a0bfff17b862ad0777 _send /opt/stack/oslo.messaging/oslo/messaging/_drivers/amqpdriver.py:358

Revision history for this message
Shawn Hartsock (hartsock) wrote :

Root cause is here:
* https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/vmops.py#L1087

Some rule about treatment of instance objects must have just recently changed. The vmwareapi/vmops.py file is the only code in Nova that does a deep copy of instance and then attempts to mutate the copy. We may have to rewrite any methods that assumed deep copy and mutate was okay ... or we will have to find and revert the change that made this strategy fail.

More on this later.

Gary Kotton (garyk)
Changed in nova:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Gary Kotton (garyk) wrote :

The bug is marked as critical as it is breaking minesweeper

tags: added: minesweeper vmware
Revision history for this message
Gary Kotton (garyk) wrote :
Gary Kotton (garyk)
Changed in nova:
assignee: Shawn Hartsock (hartsock) → Gary Kotton (garyk)
milestone: none → icehouse-3
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/72285
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9bb9a77e41b009cb940e98feda666dd8d806c862
Submitter: Jenkins
Branch: master

commit 9bb9a77e41b009cb940e98feda666dd8d806c862
Author: Gary Kotton <email address hidden>
Date: Sun Feb 9 23:51:50 2014 -0800

    VMware: fix instance rescue bug

    Commit I6448cb929e9afce814d0580eada91b8fd37befa3 added object support
    to the rescue operations. This exposed a bug in the VMware driver
    which was caught by the minesweeper. A rescue instance is named
    '<instance-uuid>-rescue'. In the past this name was updated on the
    instance, now it is passed as a parameter to the spawn method.

    Change-Id: I633614c6c634d41a32a9b1b70d941e9bfdcadedd
    Closes-bug: #1278149

Changed in nova:
status: In Progress → Fix Committed
Tracy Jones (tjones-i)
Changed in openstack-vmwareapi-team:
status: New → Fix Committed
importance: Undecided → Critical
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → 2014.1
Tracy Jones (tjones-i)
no longer affects: openstack-vmwareapi-team
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.