Post live migration at destination failed:

Bug #1911891 reported by Jake Hill
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

nova 2:21.1.0-0ubuntu1

Recently migrated from OVS to OVN networking.

Performing a series of live migrations (block migrations) to evacuate a broken hypervisor. Most of these have succeeded or have failed for understandable reasons. One of the migrations is stuck in this strange state, however. The instance UUID in this case is 1eb3c33c-8c2b-4d28-9050-70ac740ceb34.

$ nova migration-list --instance-uuid 1eb3c33c-8c2b-4d28-9050-70ac740ceb34
+-----+--------------------------------------+-------------+-------------+----------------+--------------+-----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+
| Id | UUID | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status | Instance UUID | Old Flavor | New Flavor | Created At | Updated At | Type | Project ID | User ID |
+-----+--------------------------------------+-------------+-------------+----------------+--------------+-----------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+
| 943 | f0013171-b3bf-4884-a94e-b627c48b1530 | oshv01.maas | oshv04.maas | oshv01 | oshv04.maas | - | completed | 1eb3c33c-8c2b-4d28-9050-70ac740ceb34 | 35 | 35 | 2021-01-14T14:20:34.000000 | 2021-01-14T18:00:21.000000 | live-migration | d25b3d096e5c47a3b1701b3fff2c2823 | 41f5b059d3bf4bebad08ffcabd0899bf |

This shows that this migration from oshv01 to oshv04 as "completed". However, the task status for the instance still shows it in migration;

$ openstack server show 1eb3c33c-8c2b-4d28-9050-70ac740ceb34 -fvalue -cOS-EXT-SRV-ATTR:host
oshv01

$ openstack server show 1eb3c33c-8c2b-4d28-9050-70ac740ceb34 -fvalue -cOS-EXT-STS:task_state
migrating

The instance is apparently running on the intended destination oshv04 but nova apparently thinks it is still running on the source host oshv01.

I can find some log messages from around the time the migration "almost completed". In relation to the migration UUID I have;

/var/log/syslog.1:Jan 14 17:59:40 oshv04 nova-compute: 2021-01-14 17:59:40.714 2343417 INFO nova.compute.resource_tracker [req-4f8cf1b6-a022-4619-b1dc-2b4e92a35300 - - - - -] [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Updating resource usage from migration f0013171-b3bf-4884-a94e-b627c48b1530
/var/log/syslog.1:Jan 14 18:00:20 oshv01 nova-compute: 2021-01-14 18:00:20.511 2620747 INFO nova.compute.resource_tracker [req-e30fbdb7-7e5f-476b-b4f1-829ca65add60 - - - - -] [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Updating resource usage from migration f0013171-b3bf-4884-a94e-b627c48b1530
/var/log/syslog.1:Jan 14 18:00:21 oshv01 nova-compute: 2021-01-14 18:00:21.039 2620747 INFO nova.compute.resource_tracker [req-61749f8c-26d0-491a-8935-d8700d7a4de6 - - - - -] [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Updating resource usage from migration f0013171-b3bf-4884-a94e-b627c48b1530
/var/log/syslog.1:Jan 14 18:00:21 juju-533538-38-lxd-17 placement-api: 2021-01-14 18:00:21.193 1189353 INFO placement.requestlog [req-3ef495bb-1c3d-4a0c-9a2a-8c9253f3e842 74f0533fa60b4c03ba0e472b4a951b8c 0ccc5b9ded584deda3683e014cba8b67 - 174a4647a56b4d09bc6ab8952c0105f1 174a4647a56b4d09bc6ab8952c0105f1] 127.0.0.1 "GET /allocations/f0013171-b3bf-4884-a94e-b627c48b1530" status: 200 len: 264 microversion: 1.28
/var/log/syslog.1:Jan 14 18:00:21 juju-533538-38-lxd-17 placement-api: 2021-01-14 18:00:21.254 1189350 INFO placement.requestlog [req-3fc0ba8c-d8f8-4c23-8aea-563e9930df99 74f0533fa60b4c03ba0e472b4a951b8c 0ccc5b9ded584deda3683e014cba8b67 - 174a4647a56b4d09bc6ab8952c0105f1 174a4647a56b4d09bc6ab8952c0105f1] 127.0.0.1 "PUT /allocations/f0013171-b3bf-4884-a94e-b627c48b1530" status: 204 len: 0 microversion: 1.28
/var/log/syslog.1:Jan 14 18:00:21 oshv01 nova-compute: 2021-01-14 18:00:21.256 2620747 INFO nova.scheduler.client.report [req-61749f8c-26d0-491a-8935-d8700d7a4de6 - - - - -] Deleted allocation for migration f0013171-b3bf-4884-a94e-b627c48b1530

Also in relation to the instance UUID I have;

/var/log/syslog.1:Jan 14 18:00:19 oshv01 nova-compute: 2021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [req-61749f8c-26d0-491a-8935-d8700d7a4de6 - - - - -] [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Post live migration at destination oshv04.maas failed: nova.exception_Remote.NeutronAdminCredentialConfigurationInvalid_Remote: Networking client is experiencing an unauthorized exception.#012Traceback (most recent call last):#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 184, in wrapper#012 ret = obj(*args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 269, in _handle_fault_response#012 exception_handler_v20(status_code, error_body)#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 91, in exception_handler_v20#012 raise client_exc(message=error_message,#012#012neutronclient.common.exceptions.Unauthorized: 401-{'error': {'code': 401, 'title': 'Unauthorized', 'message': 'The request you have made requires authentication.'}}#012Neutron server returns request_ids: ['req-3d23fb06-e68f-4131-821d-10640141ba6e']#012#012#012During handling of the above exception, another exception occurred:#012#012#012Traceback (most recent call last):#012#012 File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming#012 res = self.dispatcher.dispatch(message)#012#012 File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 276, in dispatch#012 return self._do_dispatch(endpoint, method, ctxt, args)#012#012 File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 196, in _do_dispatch#012 result = func(ctxt, **new_args)#012#012 File "/usr/lib/python3/dist-packages/nova/exception_wrapper.py", line 77, in wrapped#012 _emit_exception_notification(#012#012 File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__#012 self.force_reraise()#012#012 File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise#012 six.reraise(self.type_, self.value, self.tb)#012#012 File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise#012 raise value#012#012 File "/usr/lib/python3/dist-packages/nova/exception_wrapper.py", line 69, in wrapped#012 return f(self, context, *args, **kw)#012#012 File "/usr/lib/python3/dist-packages/nova/compute/utils.py", line 1447, in decorated_function#012 return function(self, context, *args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 216, in decorated_function#012 compute_utils.add_instance_fault_from_exc(context,#012#012 File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__#012 self.force_reraise()#012#012 File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise#012 six.reraise(self.type_, self.value, self.tb)#012#012 File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise#012 raise value#012#012 File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 205, in decorated_function#012 return function(self, context, *args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 8557, in post_live_migration_at_destination#012 self.network_api.setup_networks_on_host(context, instance,#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 395, in setup_networks_on_host#012 data = self.list_ports(context, **search_opts)#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1800, in list_ports#012 return get_client(context).list_ports(**search_opts)#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 184, in wrapper#012 ret = obj(*args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 797, in list_ports#012 return self.list('ports', self.ports_path, retrieve_all,#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 184, in wrapper#012 ret = obj(*args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 369, in list#012 for r in self._pagination(collection, path, **params):#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 384, in _pagination#012 res = self.get(path, params=params)#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 184, in wrapper#012 ret = obj(*args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 353, in get#012 return self.retry_request("GET", action, body=body,#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 184, in wrapper#012 ret = obj(*args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 330, in retry_request#012 return self.do_request(method, action, body=body,#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 184, in wrapper#012 ret = obj(*args, **kwargs)#012#012 File "/usr/lib/python3/dist-packages/neutronclient/v2_0/client.py", line 294, in do_request#012 self._handle_fault_response(status_code, replybody, resp)#012#012 File "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 200, in wrapper#012 raise exception.NeutronAdminCredentialConfigurationInvalid()#012#012nova.exception.NeutronAdminCredentialConfigurationInvalid: Networking client is experiencing an unauthorized exception.#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Traceback (most recent call last):#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 8477, in _post_live_migration#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] self.compute_rpcapi.post_live_migration_at_destination(ctxt,#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] File "/usr/lib/python3/dist-packages/nova/compute/rpcapi.py", line 915, in post_live_migration_at_destination#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] return cctxt.call(ctxt, 'post_live_migration_at_destination',#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 177, in call#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] self.transport._send(self.target, msg_ctxt, msg,#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 124, in _send#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] return self._driver.send(target, ctxt, message,#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 652, in send#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] return self._send(target, ctxt, message, wait_for_reply, timeout,#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 644, in _send#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] raise result#0122021-01-14 18:00:19.140 2620747 ERROR nova.compute.manager [instance: 1eb3c33c-8c2b-4d28-9050-70ac740ceb34] nova.exception_Re

Revision history for this message
Lee Yarwood (lyarwood) wrote :

tl;dr This smells like a request token timeout issue?

Looking at the provided output and logs I see that the migration attempt was `Created At 2021-01-14T14:20:34.000000` and last `Updated At 2021-01-14T18:00:21.000000`. Buried deep in the logs you provided I see `Networking client is experiencing an unauthorized exception` that to me suggests the original request token had timed out here. Are you using service tokens in your env?

Changed in nova:
status: New → Incomplete
Revision history for this message
Jake Hill (routergod) wrote :

Thanks for suggesting.

I think not service tokens here. It's a charmed install but I can't find any evidence of the charms enabling service tokens. I have the keystone expiry set rather long (10800s), if I remember due to migration timeouts some time back.

It seems that restarting the compute service on the source host has resolved the data inconsistency.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.