evacuation failure causes POST_FAILURE in the nova-live-migration job

Bug #1970642 reported by Balazs Gibizer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Unassigned

Bug Description

The nova-live-migration job has a post task to test evacuation. This test fails intermittently causing the job to fail with POST_FAILURE status.

Evacuation fails due to libvirt connection error.

job log:

2022-04-27 14:24:41.092806 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_evacuate.sh:evacuate_and_wait_for_active:34 : echo 'Timed out waiting for server evacuate-bfv-test to go to ACTIVE status'

2022-04-27 14:24:41.092962 | controller | Timed out waiting for server evacuate-bfv-test to go to ACTIVE status

nova-compute log:

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server [None req-a012194e-8201-43fa-ae56-a9194a5fd5ad demo admin] Exception during message handling: nova.exception.HypervisorUnavailable: Connection to the hypervisor is broken on host

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/host.py", line 586, in get_connection

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server conn = self._get_connection()

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/host.py", line 566, in _get_connection

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server self._queue_conn_event_handler(

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server self.force_reraise()

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server raise self.value

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/host.py", line 558, in _get_connection

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server self._wrapped_conn = self._get_new_connection()

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/host.py", line 502, in _get_new_connection

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server wrapped_conn = self._connect(self._uri, self._read_only)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/host.py", line 353, in _connect

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return self._libvirt_proxy.openAuth(uri, auth, flags)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 193, in doit

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server result = proxy_call(self._autowrap, f, *args, **kwargs)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 151, in proxy_call

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server rv = execute(f, *args, **kwargs)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 132, in execute

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server six.reraise(c, e, tb)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server raise value

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 86, in tworker

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server rv = meth(*args, **kwargs)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/libvirt.py", line 104, in openAuth

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server if ret is None:raise libvirtError('virConnectOpenAuth() failed')

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server libvirt.libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py", line 241, in inner

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return func(*args, **kwargs)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 71, in wrapped

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server _emit_versioned_exception_notification(

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server self.force_reraise()

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server raise self.value

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 63, in wrapped

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 184, in decorated_function

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server LOG.warning("Failed to revert task state for instance. "

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server self.force_reraise()

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server raise self.value

Apr 27 14:13:41.050639 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 155, in decorated_function

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/utils.py", line 1439, in decorated_function

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 212, in decorated_function

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server compute_utils.add_instance_fault_from_exc(context,

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server self.force_reraise()

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server raise self.value

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 201, in decorated_function

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 3548, in rebuild_instance

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server self._do_rebuild_instance_with_claim(

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 3625, in _do_rebuild_instance_with_claim

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server claim_context = rebuild_claim(

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return f(*args, **kwargs)

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/resource_tracker.py", line 204, in rebuild_claim

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return self._move_claim(

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/resource_tracker.py", line 280, in _move_claim

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server if self.disabled(nodename):

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/resource_tracker.py", line 666, in disabled

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server not self.driver.node_is_available(nodename))

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/driver.py", line 1564, in node_is_available

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server if nodename in self.get_available_nodes():

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 11037, in get_available_nodes

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server return [self._host.get_hostname()]

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1051, in get_hostname

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server hostname = self.get_connection().getHostname()

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/host.py", line 596, in get_connection

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server raise exception.HypervisorUnavailable()

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server nova.exception.HypervisorUnavailable: Connection to the hypervisor is broken on host

Apr 27 14:13:41.061728 ubuntu-focal-iweb-mtl01-0029462367 nova-compute[97475]: ERROR oslo_messaging.rpc.server

Example failure: https://zuul.opendev.org/t/openstack/build/6c792e4ee72542cbaf386710e57a4b01/log/controller/logs/screen-n-cpu.txt#12329

I see multiple matching failures from the last two weeks: https://paste.opendev.org/show/bbB6cuBQBGSUPyAxPXva/

Tags: gate-failure
tags: added: gate-failure
Revision history for this message
melanie witt (melwitt) wrote :
Download full text (15.5 KiB)

I was just looking at another failure I saw today [1] and realized I think the "Connection refused" error is actually from the "Run negative evacuate tests" [2] portion of the post task (matching timestamps) and is expected because the task stopped libvirtd intentionally and evacuate is supposed to fail.

It looks like this is failing later after libvirtd is started again during the non-negative evacuate with cinder responding with a 500 error when we request to update the volume attachment.

job-output.txt:

negative evacuate:

2022-04-22 16:37:06.914969 | controller | ++ /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:23 : openstack server show evacuate-bfv-test -f value -c status
2022-04-22 16:37:09.101146 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:23 : status=ERROR
2022-04-22 16:37:09.102811 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:24 : '[' ERROR '!=' ERROR ']'

evacuate:

2022-04-22 16:40:27.696943 | controller | ++ /opt/stack/nova/roles/run-evacuate-hook/files/test_evacuate.sh:evacuate_and_wait_for_active:37 : openstack server show evacuate-bfv-test -f value -c status
2022-04-22 16:40:30.135972 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_evacuate.sh:evacuate_and_wait_for_active:37 : status=ERROR
2022-04-22 16:40:30.137853 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_evacuate.sh:evacuate_and_wait_for_active:29 : '[' ERROR '!=' ACTIVE ']'

controller/logs/screen-n-cpu.txt:

connection refused:

Apr 22 16:37:07.031399 ubuntu-focal-ovh-bhs1-0029416164 nova-compute[97536]: INFO nova.compute.manager [None req-4765007e-63de-4ac7-9f48-b69f22762b6f demo admin] [instance: a5255f10-86e4-4dad-809e-847899d57ae0] Evacuating instance
Apr 22 16:37:07.079532 ubuntu-focal-ovh-bhs1-0029416164 nova-compute[97536]: DEBUG oslo_concurrency.lockutils [None req-4765007e-63de-4ac7-9f48-b69f22762b6f demo admin] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker.rebuild_claim" :: waited 0.000s {{(pid=97536) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}
Apr 22 16:37:07.100782 ubuntu-focal-ovh-bhs1-0029416164 nova-compute[97536]: DEBUG nova.virt.libvirt.host [None req-4765007e-63de-4ac7-9f48-b69f22762b6f demo admin] Connecting to libvirt: qemu:///system {{(pid=97536) _get_new_connection /opt/stack/nova/nova/virt/libvirt/host.py:499}}
Apr 22 16:37:07.102855 ubuntu-focal-ovh-bhs1-0029416164 nova-compute[97536]: ERROR nova.virt.libvirt.host [None req-4765007e-63de-4ac7-9f48-b69f22762b6f demo admin] Connection to libvirt failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused: libvirt.libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused
Apr 22 16:37:07.102855 ubuntu-focal-ovh-bhs1-0029416164 nova-compute[97536]: ERROR nova.virt.libvirt.host Traceback (most recent call last):
Apr 22 16:37:07.102855 ubuntu-focal-ovh-bhs1-0029416164 nova-compute[97536]: ERROR nova.virt.libvirt.h...

Revision history for this message
melanie witt (melwitt) wrote :

Though this is a 500 coming from cinder causing us to fail, we do have some precedent of retrying on 500's from cinder with this decorator, for example:

    @retrying.retry(stop_max_attempt_number=5,
                    retry_on_exception=lambda e:
                    (isinstance(e, cinder_exception.ClientException) and
                     e.code == 500))
    def attachment_delete(self, context, attachment_id):

So perhaps we need to put that on the attachment_update method as well.

Error in nova-compute log:

Apr 22 16:40:28.215513 ubuntu-focal-ovh-bhs1-0029416164 nova-compute[97536]: ERROR nova.compute.manager [None req-6bc42ac9-9226-4c73-8fd5-79d8283141d6 demo admin] [instance: a5255f10-86e4-4dad-809e-847899d57ae0] Setting instance vm_state to ERROR: nova.exception.InvalidBDM: Unable to update the attachment. (HTTP 500) (Request-ID: req-ef871295-7384-4da8-abc7-2278b164fb40)

Changed in nova:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Thanks for looking into it. I did missed the fact that there was an intentionally unsuccessful evacuation attempt. I see the cinder error now. It is an RPC messaging timeout[1] in the cinder-api, but nothing in the cinder volume logs.

It would be nice to have a better response code like 409 to retry one. But yes, it is an option to retry from the nova side.

I've reported a separate bug https://bugs.launchpad.net/neutron/+bug/1971563 that has similar evac failure but the underlying cause is vif plug timeout.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Also I refined my original signature based on your findings: https://paste.opendev.org/show/bbWBpA6P6ZO0gD8xjell/

So there is 4 failures from the last 14days.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/901815

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/901815
Committed: https://opendev.org/openstack/nova/commit/1f8ce3cc70ceaf8edd25b1ae2646e525cf2b3973
Submitter: "Zuul (22348)"
Branch: master

commit 1f8ce3cc70ceaf8edd25b1ae2646e525cf2b3973
Author: Rajesh Tailor <email address hidden>
Date: Fri Nov 24 13:25:25 2023 +0530

    Add retry on attachment_update

    In nova-live-migration job, evacuation failures are causing
    POST_FAILURE.

    As per discussion on bug, it looks like 500 is coming from cinder
    which is cause of this failure.

    Similar to attachment_delete method, this change adds a retry
    mechanism in cinder API calls attachment_update method.

    Closes-Bug: #1970642
    Change-Id: I1da3c8481f7e7a1e8776cf03f5c4cf117b7fabaa

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 29.0.0.0rc1

This issue was fixed in the openstack/nova 29.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.