nova.tests.functional.test_cross_cell_migrate.TestMultiCellMigrate.test_delete_while_in_verify_resize_status hits oslo.messaging._drivers.impl_fake.send failure

Bug #1938021 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Unassigned
oslo.messaging
New
Undecided
Unassigned

Bug Description

https://a8ba7f0ac14669316775-62d3a5548ea094caef4a9963ba6c55d1.ssl.cf1.rackcdn.com/798145/4/gate/nova-tox-functional-centos8-py36/1ee0272/testr_results.html

2021-07-25 02:45:22,896 ERROR [nova.api.openstack.wsgi] Unexpected exception in API method
Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 207, in _send
    reply, failure = reply_q.get(timeout=timeout)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
    return waiter.wait()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
    return get_hub().switch()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
    return self.greenlet.switch()
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/api/openstack/wsgi.py", line 658, in wrapped
    return f(*args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/api/openstack/compute/servers.py", line 1070, in delete
    self._delete(req.environ['nova.context'], req, id)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/api/openstack/compute/servers.py", line 883, in _delete
    self.compute_api.delete(context, instance)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 226, in inner
    return function(self, context, instance, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 153, in inner
    return f(self, context, instance, *args, **kw)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2541, in delete
    self._delete_instance(context, instance)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2533, in _delete_instance
    task_state=task_states.DELETING)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2311, in _delete
    self._confirm_resize_on_deleting(context, instance)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/api.py", line 2405, in _confirm_resize_on_deleting
    context, instance, migration, do_cast=False)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/conductor/api.py", line 182, in confirm_snapshot_based_resize
    ctxt, instance, migration, do_cast=do_cast)
  File "/home/zuul/src/opendev.org/openstack/nova/nova/conductor/rpcapi.py", line 468, in confirm_snapshot_based_resize
    return cctxt.call(ctxt, 'confirm_snapshot_based_resize', **kw)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call
    transport_options=self.transport_options)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send
    transport_options=transport_options)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 223, in send
    transport_options)
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_fake.py", line 214, in _send
    'No reply on topic %s' % target.topic)
oslo_messaging.exceptions.MessagingTimeout: No reply on topic conductor
2021-07-25 02:45:22,898 INFO [nova.api.openstack.wsgi] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.exceptions.MessagingTimeout'>

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

A recent hit showing that it is happening not just in the centos job
https://zuul.opendev.org/t/openstack/build/c550c9be7e6d4681b8aa8b53462d97b1/log/job-output.txt#4332

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Lee Yarwood (lyarwood)
summary: - oslo.messaging._drivers.impl_fake.send failure during nova functional
- tests
+ nova.tests.functional.test_cross_cell_migrate.TestMultiCellMigrate.test_delete_while_in_verify_resize_status
+ hits oslo.messaging._drivers.impl_fake.send failure
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Okay this looks like a simple RPC call timeout to the conductor as the test is setting this timeout to 1 [1] while the actual call and task within the conductor can easily take longer than this [2][3]. I'm not sure why we are suddenly seeing this but I think the easiest thing to do here is to move the call over to long_rpc_timeout?

[1] https://github.com/openstack/nova/blob/72c8722e09a47794fc5412a14587a74e79195fca/nova/tests/functional/test_cross_cell_migrate.py#L73-L75
[2] https://github.com/openstack/nova/blob/72c8722e09a47794fc5412a14587a74e79195fca/nova/conductor/rpcapi.py#L459-L468
[3] https://github.com/openstack/nova/blob/72c8722e09a47794fc5412a14587a74e79195fca/nova/conductor/tasks/cross_cell_migrate.py#L1027-L1050

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/803714

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/803714
Committed: https://opendev.org/openstack/nova/commit/d4dbcd5fa05ac2f988b65d611f71805f90411581
Submitter: "Zuul (22348)"
Branch: master

commit d4dbcd5fa05ac2f988b65d611f71805f90411581
Author: Lee Yarwood <email address hidden>
Date: Fri Aug 6 10:02:15 2021 +0100

    func: Increase rpc_response_timeout in TestMultiCellMigrate tests

    This was previously set really low to 1 second that was leading to more
    involved flows such as test_delete_while_in_verify_resize_status timing
    out when the target calls the conductor to confirm the resize on the
    source.

    This change simply increases the timeout in the test but we might want
    to think about moving this call over to rpc_long_timeout this could be
    an issue in real world deployments.

    Closes-Bug: #1938021
    Change-Id: Ibba2d1506a0b026d35d7bf35384ec6439f438b01

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/844200

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/844200
Committed: https://opendev.org/openstack/nova/commit/4cf632338d1c97bff79961e6e47b50b0017cc67e
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 4cf632338d1c97bff79961e6e47b50b0017cc67e
Author: Lee Yarwood <email address hidden>
Date: Fri Aug 6 10:02:15 2021 +0100

    func: Increase rpc_response_timeout in TestMultiCellMigrate tests

    This was previously set really low to 1 second that was leading to more
    involved flows such as test_delete_while_in_verify_resize_status timing
    out when the target calls the conductor to confirm the resize on the
    source.

    This change simply increases the timeout in the test but we might want
    to think about moving this call over to rpc_long_timeout this could be
    an issue in real world deployments.

    Closes-Bug: #1938021
    Change-Id: Ibba2d1506a0b026d35d7bf35384ec6439f438b01
    (cherry picked from commit d4dbcd5fa05ac2f988b65d611f71805f90411581)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.2.1

This issue was fixed in the openstack/nova 23.2.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.