Instances stuck with task_state of unshelving after RPC call timeout.

Bug #1367186 reported by Takashi Natsume
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Takashi Natsume
Juno
Fix Released
Undecided
Takashi Natsume

Bug Description

Instances stuck with task_state of unshelving after RPC call between nova-conductor and nova-scheduler fails(because of, for example, timeout) in the operation of unshelve.

The environment:
Ubuntu 14.04 LTS(64bit)
stable/icehouse(2014.1.2)
(I could also reproduce it with master(commit:a1fa42f2ad11258f8b9482353e078adcf73ee9c2).)

How to reproduce:
1. create a VM instance
2. shelve the VM instance
3. stop nova-scheduler process
4. unshelve the VM instance
(The nova-conductor calls the nova-scheduler, but the RPC call times out.)

Then the VM instance stucks with task_state of unshelving(See the following).
The VM instance still remains stuck even after nova-scheduler process starts again.

stack@devstack-icehouse:/opt/devstack$ nova list
+--------------------------------------+---------+-------------------+------------+-------------+-------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+---------+-------------------+------------+-------------+-------------------+
| 12e488e8-1df1-479d-866e-51c3117e384b | server1 | SHELVED_OFFLOADED | unshelving | Shutdown | public=10.0.2.194 |
+--------------------------------------+---------+-------------------+------------+-------------+-------------------+

nova-conductor.log:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2014-09-09 18:18:13.263 13087 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/conductor/manager.py", line 849, in unshelve_instance
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher instance)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/conductor/manager.py", line 816, in _schedule_instances
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher request_spec, filter_properties)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/rpcapi.py", line 103, in select_destinations
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher request_spec=request_spec, filter_properties=filter_properties)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher retry=self.retry)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher timeout=timeout, retry=retry)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher retry=retry)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher result = self._waiter.wait(msg_id, timeout)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher reply, ending = self._poll_connection(msg_id, timeout)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher % msg_id)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher MessagingTimeout: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher
2014-09-09 18:18:13.274 13087 ERROR oslo.messaging._drivers.common [-] Returning exception Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56 to caller
2014-09-09 18:18:13.275 13087 ERROR oslo.messaging._drivers.common [-] ['Traceback (most recent call last):\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\n incoming.message))\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\n result = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/opt/stack/nova/nova/conductor/manager.py", line 849, in unshelve_instance\n instance)\n', ' File "/opt/stack/nova/nova/conductor/manager.py", line 816, in _schedule_instances\n request_spec, filter_properties)\n', ' File "/opt/stack/nova/nova/scheduler/rpcapi.py", line 103, in select_destinations\n request_spec=request_spec, filter_properties=filter_properties)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call\n retry=self.retry)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send\n timeout=timeout, retry=retry)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send\n retry=retry)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send\n result = self._waiter.wait(msg_id, timeout)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait\n reply, ending = self._poll_connection(msg_id, timeout)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection\n % msg_id)\n', 'MessagingTimeout: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56\n']
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Changed in nova:
assignee: nobody → Takashi NATSUME (natsume-takashi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/120347

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/120347
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Joe Gordon (jogo) wrote :

patch is not in progress anymore

Changed in nova:
status: In Progress → Confirmed
assignee: Takashi NATSUME (natsume-takashi) → nobody
Changed in nova:
assignee: nobody → Takashi NATSUME (natsume-takashi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/155654

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/155654
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a84be486c80da690b627d99644a5ed656757097c
Submitter: Jenkins
Branch: master

commit a84be486c80da690b627d99644a5ed656757097c
Author: Takashi NATSUME <email address hidden>
Date: Fri Feb 13 14:33:11 2015 +0900

    Handle MessagingException in unshelving instance

    Add Handling MessagingException in nova-conductor
    when unshelving instance

    Change-Id: I4dd95ee08e9618b8fd51f043c0f89f4ddcf1cb35
    Closes-Bug: #1367186

Changed in nova:
status: In Progress → Fix Committed
tags: added: juno-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/157285

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/158159

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/juno)

Change abandoned by Takashi NATSUME (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/157285
Reason: This patch is abandoned because the new one is pushed.

https://review.openstack.org/#/c/158159/

Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/158159
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=33dcffcc42e8c0d52727458f536eaf28ac1748ae
Submitter: Jenkins
Branch: stable/juno

commit 33dcffcc42e8c0d52727458f536eaf28ac1748ae
Author: Takashi NATSUME <email address hidden>
Date: Fri Feb 13 14:33:11 2015 +0900

    Handle MessagingException in unshelving instance

    Add Handling MessagingException in nova-conductor
    when unshelving instance

    Change-Id: I4dd95ee08e9618b8fd51f043c0f89f4ddcf1cb35
    Closes-Bug: #1367186
    (cherry-pick from commit a84be486c80da690b627d99644a5ed656757097c)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.