When we try to migrate an instance from one hypervisor to another (either standard or live migration) nova fails.
Description
===========
Instance migration cannot be performed s
Steps to reproduce
==================
I installed ussuri with the latest binaries
[UNDERCLOUD]
sudo rpm -qa | grep tripleo
openstack-tripleo-puppet-elements-12.3.1-0.20200527034212.22afbbd.el8.noarch
python3-tripleo-common-12.4.1-0.20200609211945.eff7e96.el8.noarch
ansible-tripleo-ipsec-9.3.0-0.20200521172422.0c8693c.el8.noarch
puppet-tripleo-12.3.1-0.20200611213425.b8568c4.el8.noarch
tripleo-ansible-1.4.1-0.20200619031429.acd760c.el8.noarch
python3-tripleo-repos-0.1.1-0.20200612045415.ecf6206.el8.noarch
ansible-role-tripleo-modify-image-1.2.1-0.20200616023947.4130a44.el8.noarch
ansible-tripleo-ipa-0.2.1-0.20200609201503.c22fc8d.el8.noarch
openstack-tripleo-common-containers-12.4.1-0.20200609211945.eff7e96.el8.noarch
openstack-tripleo-common-12.4.1-0.20200609211945.eff7e96.el8.noarch
openstack-tripleo-heat-templates-12.3.1-0.20200622111001.18baea4.el8.noarch
openstack-tripleo-validations-12.3.1-0.20200609043424.0bc2cad.el8.noarch
python3-tripleoclient-13.3.1-0.20200609124940.19a26f3.el8.noarch
openstack-tripleo-image-elements-12.0.1-0.20200527033931.e144560.el8.noarch
$ sudo rpm -qa | grep openstack
openstack-tripleo-puppet-elements-12.3.1-0.20200527034212.22afbbd.el8.noarch
openstack-heat-common-14.0.1-0.20200521081432.3c77011.el8.noarch
openstack-heat-monolith-14.0.1-0.20200521081432.3c77011.el8.noarch
openstack-ironic-python-agent-builder-2.0.1-0.20200622155937.e9d0443.el8.noarch
puppet-openstack_extras-16.3.1-0.20200528102452.c217890.el8.noarch
puppet-openstacklib-16.3.1-0.20200518081406.ac285a7.el8.noarch
openstack-tripleo-common-containers-12.4.1-0.20200609211945.eff7e96.el8.noarch
openstack-heat-agents-2.0.1-0.20200526185440.b639e78.el8.noarch
openstack-tripleo-common-12.4.1-0.20200609211945.eff7e96.el8.noarch
python3-openstackclient-5.2.0-0.20200604131927.c5719a1.el8.noarch
openstack-heat-api-14.0.1-0.20200521081432.3c77011.el8.noarch
ansible-role-openstack-operations-0.0.1-0.20200507053741.274739e.el8.noarch
openstack-tripleo-heat-templates-12.3.1-0.20200622111001.18baea4.el8.noarch
openstack-tripleo-validations-12.3.1-0.20200609043424.0bc2cad.el8.noarch
python-openstackclient-lang-5.2.0-0.20200604131927.c5719a1.el8.noarch
python3-openstacksdk-0.46.0-0.20200424132926.fc3b3d0.el8.noarch
openstack-tripleo-image-elements-12.0.1-0.20200527033931.e144560.el8.noarch
openstack-selinux-0.8.22-0.20200615172427.137ecf6.el8.noarch
openstack-heat-engine-14.0.1-0.20200521081432.3c77011.el8.noarch
[Compute node] -Origin of the instance
rpm -qa | grep tripleo
puppet-tripleo-13.0.0-0.20200610001441.e62b614.el8.noarch
[root@compute-0 nova]# rpm -qa | grep tripleo
puppet-tripleo-13.0.0-0.20200610001441.e62b614.el8.noarch
[root@compute-0 nova]# rpm -qa | grep openstack
python3-openstacksdk-0.46.0-0.20200415112501.fc3b3d0.el8.noarch
python3-openstackclient-5.2.0-0.20200604131422.c5719a1.el8.noarch
openstack-heat-agents-2.1.0-0.20200513081051.40429ad.el8.noarch
puppet-openstack_extras-17.0.0-0.20200602173450.2d9c822.el8.noarch
puppet-openstacklib-17.0.0-0.20200602154731.6d39c44.el8.noarch
python-openstackclient-lang-5.2.0-0.20200604131422.c5719a1.el8.noarch
openstack-selinux-0.8.20-0.20200429132018.3300746.el8.noarch
Podman containers on compute node
[root@compute-0 nova]# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dd35852b5788 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-nova-compute:current-tripleo kolla_start 42 hours ago Up 23 hours ago nova_compute
5f4027872495 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-collectd:current-tripleo kolla_start 42 hours ago Up 23 hours ago collectd
254b4d865b8e under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-nova-compute:current-tripleo kolla_start 42 hours ago Up 23 hours ago nova_migration_target
c24f7bc54064 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-iscsid:current-tripleo kolla_start 43 hours ago Up 23 hours ago iscsid
ccaa293b0700 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-nova-libvirt:current-tripleo kolla_start 43 hours ago Up 23 hours ago nova_virtlogd
4c401a029a9f under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-qdrouterd:current-tripleo kolla_start 43 hours ago Up 23 hours ago metrics_qdr
93bef9e27d57 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-ceilometer-compute:current-tripleo kolla_start 47 hours ago Up 23 hours ago ceilometer_agent_compute
29eccba3f7ec under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-neutron-metadata-agent-ovn:current-tripleo /bin/bash -c HAPR... 2 days ago Up 23 hours ago neutron-haproxy-ovnmeta-fa49b61e-de5a-433f-9749-a48664a660c6
6ab110d743ad under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-neutron-metadata-agent-ovn:current-tripleo kolla_start 2 days ago Up 23 hours ago ovn_metadata_agent
9f57f5987fea under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-ovn-controller:current-tripleo kolla_start 2 days ago Up 23 hours ago ovn_controller
d95c86ff3992 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-cron:current-tripleo kolla_start 2 days ago Up 23 hours ago logrotate_crond
We use the following steps to build our openstack ussuri cluster (with ~25 systems) to overcome the kwown problems of external ceph cluster ( and octavia (https://bugs.launchpad.net/tripleo/+bug/1881420)
1. We build the stack only
openstack overcloud deploy --templates ~/templates --stack-only \
-e environment files
2. We authorise the user heat-admin to all systems with
openstack overcloud admin authorize --overcloud-ssh-user heat-admin --overcloud-ssh-key ~/.ssh/id_rsa
3. We download the config to a directory
openstack overcloud config download --name overcloud --config-dir $OUTPUT_DIR
4. We create the inventory and ansible.conf
tripleo-ansible-inventory --ansible_ssh_user heat-admin --static-yaml-inventory $OUTPUT_DIR"inventory.yaml"
openstack tripleo config generate ansible --output-dir $OUTPUT_DIR --deployment-user stack
5. We run the ansible-playbook-command.sh in the $OUTPUT_DIR
6. Cluster is created without any obvious problems
Expected result
===============
I tried from the horizon to live migrate some VM and failed.
I tried from the CLI to to live migrate and failed
Actual result
=============
The instance to be migrated to another hypervisor host
Environment
===========
1. Openstack ussuri with the latest binaries as above
2. Ceph external version: nautilus latest version
3. Networking : OVN
Logs & Configs
==============
2020-06-27 12:22:11.465 7 INFO nova.compute.manager [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Took 2.08 seconds for pre_live_migration on destination host compute-1.DOMAIN-NAME.
2020-06-27 12:22:13.148 7 ERROR nova.virt.libvirt.driver [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://<email address hidden>:2022/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Could not create directory '/root/.ssh'.^M
"System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)."
Connection closed by 10.158.3.189 port 2022: Connection reset by peer: libvirt.libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+ssh://<email address hidden>:2022/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Could not create directory '/root/.ssh'.^M
2020-06-27 12:22:13.536 7 ERROR nova.virt.libvirt.driver [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Migration operation has aborted
2020-06-27 12:22:13.555 7 INFO nova.compute.manager [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Swapping old allocation on dict_keys(['87ec1aef-9d78-46d7-9767-4ff633d719d1']) held by migration f77d314e-67ed-45ba-a07e-54b445f4cfa7 for instance
2020-06-27 12:22:14.918 7 WARNING nova.compute.manager [req-35309488-77a3-43c7-8908-a2df5ba23a4b 0f76e7d44f584c5080a07f37219e2dac b42c617cd8724b3aa7e0cf6fdc3aad39 - default default] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Received unexpected event network-vif-unplugged-8b81b46b-b726-40dc-a78d-648dd118351c for instance with vm_state active and task_state None.
~
Looks like a duplicate of https:/ /bugs.launchpad .net/tripleo/ +bug/1881642