Octavia amphora VMs fail to migrate (scp fails for disk.config file)

Bug #1933981 reported by Steven Parker
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack nova-compute charm
Undecided
Unassigned

Bug Description

Octavia Amphora VM migration fails on newly deployed Octavia install

Non amphora VMs seem to migrate with any issues.

The relevant error reported from nova-compute is

2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Command: scp -r HostOne.nonprod.maas:/var/lib/nova/instances/e3967a91-3f5d-45b3-9300-28f8ee936fa3/disk.config /var/lib/nova/instances/e3967a91-3f5d-45b3-9300-28f8ee936fa3
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Exit code: 1
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Stdout: ''
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Stderr: 'Host key verification failed.\r\n'

Steps to reproduce.
  One an Openstack Stein cloud with Octavia installed. Simply deploy a load balancer and try and migrate that instance to another node.

VERSIONS Used

dpkg -l | grep nova
ii nova-common 2:19.1.0-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 2:19.1.0-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:19.1.0-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:19.1.0-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python3-nova 2:19.1.0-0ubuntu1~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:13.0.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x

Work around

I can login to the target machine and do this to get it to work
ubuntu@HostTwo:~$ sudo -u nova scp -r HostOne.pnp.maas:/var/lib/nova/instances/7c92fefb-61a7-41e0-a224-eb52a6a24f12/disk.config test.config
The authenticity of host 'cmooschstupOne.pnp.maas (10.55.33.148)' can't be established.
ECDSA key fingerprint is SHA256:XXXXXXXXX
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hostOne.pnp.maas,10.55.33.148' (ECDSA) to the list of known hosts.

    and now it works fine

There is a somewhat unique execution path here. That disk.config is the meta data that would otherwise probably be served to other "regular" vms via the meta data service

We do not see that scp action for vms that are not amphora on our cloud, nor do we see that file under /var/lib/nova/instances/INSTANCE for instances other then amphora.

Thanks,
  Steven

Revision history for this message
Steven Parker (sbparke) wrote :
Download full text (6.4 KiB)

Traceback as seen in nova compute logs

2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Traceback (most recent call last):
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 6722, in _do_live_migration
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] block_migration, disk, dest, migrate_data)
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] File "/usr/lib/python3/dist-packages/nova/compute/rpcapi.py", line 770, in pre_live_migration
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] disk=disk, migrate_data=migrate_data)
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 178, in call
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] retry=self.retry)
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 128, in _send
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] retry=retry)
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 645, in send
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] call_monitor_timeout, retry=retry)
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 636, in _send
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] raise result
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] oslo_messaging.rpc.client.RemoteError: Remote error: ProcessExecutionError Unexpected error while running command.
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Command: scp -r host:/var/lib/nova/instances/e3967a91-3f5d-45b3-9300-28f8ee936fa3/disk.config /var/lib/nova/instances/e3967a91-3f5d-45b3-9300-28f8ee936fa3
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Exit code: 1
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Stdout: ''
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] Stderr: 'Host key verification failed.\r\n'
2021-02-11 18:46:16.194 898529 ERROR nova.compute.manager [instance: e3967a91-3f5d-45b3-9300-28f8ee936fa3] ['Traceback (most recent call last):\n', ...

Read more...

Revision history for this message
Steven Parker (sbparke) wrote :

This was originally bugged against nova itself but it is really a deployment issue.
https://bugs.launchpad.net/nova/+bug/1915441

summary: Octavia amphora VMs fail to migrate (scp fails for disk.config file)
- Edit
tags: added: advocate developer
tags: added: developer-advocate
removed: advocate developer
tags: added: openstack-advocacy
removed: developer-advocate
Revision history for this message
Billy Olsen (billy-olsen) wrote :

The known hosts files are propagated from the nova-cloud-controller to the nova-compute hosts on the relation data, however they are only shared among common nova-compute applications. e.g. if you have nova-compute-1 and nova-compute-2 as applications, the ssh known hosts information is shared only between those units for nova-compute-1 and separately for those units for nova-compute-2.

To determine what's going on here, more information is needed.

What is the nova-compute application layout? Are these migrations occurring between hypervisors within the same nova-compute application? Can you provide unit logs and possibly relation data between nova-cloud-controller and nova-compute? Are you able to migrate other instances between the same two hypervisors?

Changed in charm-nova-compute:
status: New → Incomplete
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Since we are talking about Octavia Amphora VMs, it is also worth noting that the Octavia service itself support setting up load balancers using two VMs in ACTIVE-STANDBY configuration.

Upstream guidance for life cycle management of these VMs also rely on this mode of deployment. I.e. you don't update the software of a running Amphora, you instead update the image and fail-over the running load balancer to a newly deployed Amphora instance.

So while the migration between named nova-compute applications is a issue to be resolved, I would argue that the use case for this for Octavia Amhpora VMs is less valid. I would urge you to use the built-in redundancy in the service itself instead.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack nova-compute charm because there has been no activity for 60 days.]

Changed in charm-nova-compute:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers