nova VM migrate fails

Bug #1885371 reported by Vasileios Baousis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
New
Undecided
Unassigned

Bug Description

When we try to migrate an instance from one hypervisor to another (either standard or live migration) nova fails.

Description
===========
Instance migration cannot be performed s

Steps to reproduce
==================

I installed ussuri with the latest binaries
[UNDERCLOUD]
 sudo rpm -qa | grep tripleo
openstack-tripleo-puppet-elements-12.3.1-0.20200527034212.22afbbd.el8.noarch
python3-tripleo-common-12.4.1-0.20200609211945.eff7e96.el8.noarch
ansible-tripleo-ipsec-9.3.0-0.20200521172422.0c8693c.el8.noarch
puppet-tripleo-12.3.1-0.20200611213425.b8568c4.el8.noarch
tripleo-ansible-1.4.1-0.20200619031429.acd760c.el8.noarch
python3-tripleo-repos-0.1.1-0.20200612045415.ecf6206.el8.noarch
ansible-role-tripleo-modify-image-1.2.1-0.20200616023947.4130a44.el8.noarch
ansible-tripleo-ipa-0.2.1-0.20200609201503.c22fc8d.el8.noarch
openstack-tripleo-common-containers-12.4.1-0.20200609211945.eff7e96.el8.noarch
openstack-tripleo-common-12.4.1-0.20200609211945.eff7e96.el8.noarch
openstack-tripleo-heat-templates-12.3.1-0.20200622111001.18baea4.el8.noarch
openstack-tripleo-validations-12.3.1-0.20200609043424.0bc2cad.el8.noarch
python3-tripleoclient-13.3.1-0.20200609124940.19a26f3.el8.noarch
openstack-tripleo-image-elements-12.0.1-0.20200527033931.e144560.el8.noarch

$ sudo rpm -qa | grep openstack
openstack-tripleo-puppet-elements-12.3.1-0.20200527034212.22afbbd.el8.noarch
openstack-heat-common-14.0.1-0.20200521081432.3c77011.el8.noarch
openstack-heat-monolith-14.0.1-0.20200521081432.3c77011.el8.noarch
openstack-ironic-python-agent-builder-2.0.1-0.20200622155937.e9d0443.el8.noarch
puppet-openstack_extras-16.3.1-0.20200528102452.c217890.el8.noarch
puppet-openstacklib-16.3.1-0.20200518081406.ac285a7.el8.noarch
openstack-tripleo-common-containers-12.4.1-0.20200609211945.eff7e96.el8.noarch
openstack-heat-agents-2.0.1-0.20200526185440.b639e78.el8.noarch
openstack-tripleo-common-12.4.1-0.20200609211945.eff7e96.el8.noarch
python3-openstackclient-5.2.0-0.20200604131927.c5719a1.el8.noarch
openstack-heat-api-14.0.1-0.20200521081432.3c77011.el8.noarch
ansible-role-openstack-operations-0.0.1-0.20200507053741.274739e.el8.noarch
openstack-tripleo-heat-templates-12.3.1-0.20200622111001.18baea4.el8.noarch
openstack-tripleo-validations-12.3.1-0.20200609043424.0bc2cad.el8.noarch
python-openstackclient-lang-5.2.0-0.20200604131927.c5719a1.el8.noarch
python3-openstacksdk-0.46.0-0.20200424132926.fc3b3d0.el8.noarch
openstack-tripleo-image-elements-12.0.1-0.20200527033931.e144560.el8.noarch
openstack-selinux-0.8.22-0.20200615172427.137ecf6.el8.noarch
openstack-heat-engine-14.0.1-0.20200521081432.3c77011.el8.noarch

[Compute node] -Origin of the instance
rpm -qa | grep tripleo
puppet-tripleo-13.0.0-0.20200610001441.e62b614.el8.noarch
[root@compute-0 nova]# rpm -qa | grep tripleo
puppet-tripleo-13.0.0-0.20200610001441.e62b614.el8.noarch
[root@compute-0 nova]# rpm -qa | grep openstack
python3-openstacksdk-0.46.0-0.20200415112501.fc3b3d0.el8.noarch
python3-openstackclient-5.2.0-0.20200604131422.c5719a1.el8.noarch
openstack-heat-agents-2.1.0-0.20200513081051.40429ad.el8.noarch
puppet-openstack_extras-17.0.0-0.20200602173450.2d9c822.el8.noarch
puppet-openstacklib-17.0.0-0.20200602154731.6d39c44.el8.noarch
python-openstackclient-lang-5.2.0-0.20200604131422.c5719a1.el8.noarch
openstack-selinux-0.8.20-0.20200429132018.3300746.el8.noarch

Podman containers on compute node
[root@compute-0 nova]# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dd35852b5788 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-nova-compute:current-tripleo kolla_start 42 hours ago Up 23 hours ago nova_compute
5f4027872495 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-collectd:current-tripleo kolla_start 42 hours ago Up 23 hours ago collectd
254b4d865b8e under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-nova-compute:current-tripleo kolla_start 42 hours ago Up 23 hours ago nova_migration_target
c24f7bc54064 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-iscsid:current-tripleo kolla_start 43 hours ago Up 23 hours ago iscsid
ccaa293b0700 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-nova-libvirt:current-tripleo kolla_start 43 hours ago Up 23 hours ago nova_virtlogd
4c401a029a9f under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-qdrouterd:current-tripleo kolla_start 43 hours ago Up 23 hours ago metrics_qdr
93bef9e27d57 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-ceilometer-compute:current-tripleo kolla_start 47 hours ago Up 23 hours ago ceilometer_agent_compute
29eccba3f7ec under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-neutron-metadata-agent-ovn:current-tripleo /bin/bash -c HAPR... 2 days ago Up 23 hours ago neutron-haproxy-ovnmeta-fa49b61e-de5a-433f-9749-a48664a660c6
6ab110d743ad under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-neutron-metadata-agent-ovn:current-tripleo kolla_start 2 days ago Up 23 hours ago ovn_metadata_agent
9f57f5987fea under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-ovn-controller:current-tripleo kolla_start 2 days ago Up 23 hours ago ovn_controller
d95c86ff3992 under-ussuri02.ctlplane.DOMAIN-NAME:8787/tripleou/centos-binary-cron:current-tripleo kolla_start 2 days ago Up 23 hours ago logrotate_crond

We use the following steps to build our openstack ussuri cluster (with ~25 systems) to overcome the kwown problems of external ceph cluster ( and octavia (https://bugs.launchpad.net/tripleo/+bug/1881420)
1. We build the stack only
openstack overcloud deploy --templates ~/templates --stack-only \
 -e environment files
2. We authorise the user heat-admin to all systems with
openstack overcloud admin authorize --overcloud-ssh-user heat-admin --overcloud-ssh-key ~/.ssh/id_rsa
3. We download the config to a directory
openstack overcloud config download --name overcloud --config-dir $OUTPUT_DIR
4. We create the inventory and ansible.conf
tripleo-ansible-inventory --ansible_ssh_user heat-admin --static-yaml-inventory $OUTPUT_DIR"inventory.yaml"
openstack tripleo config generate ansible --output-dir $OUTPUT_DIR --deployment-user stack
5. We run the ansible-playbook-command.sh in the $OUTPUT_DIR
6. Cluster is created without any obvious problems

Expected result
===============
I tried from the horizon to live migrate some VM and failed.
I tried from the CLI to to live migrate and failed

Actual result
=============
The instance to be migrated to another hypervisor host

Environment
===========
1. Openstack ussuri with the latest binaries as above

2. Ceph external version: nautilus latest version

3. Networking : OVN

Logs & Configs
==============
2020-06-27 12:22:11.465 7 INFO nova.compute.manager [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Took 2.08 seconds for pre_live_migration on destination host compute-1.DOMAIN-NAME.
2020-06-27 12:22:13.148 7 ERROR nova.virt.libvirt.driver [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://<email address hidden>:2022/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Could not create directory '/root/.ssh'.^M
"System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)."
Connection closed by 10.158.3.189 port 2022: Connection reset by peer: libvirt.libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+ssh://<email address hidden>:2022/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Could not create directory '/root/.ssh'.^M
2020-06-27 12:22:13.536 7 ERROR nova.virt.libvirt.driver [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Migration operation has aborted
2020-06-27 12:22:13.555 7 INFO nova.compute.manager [-] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Swapping old allocation on dict_keys(['87ec1aef-9d78-46d7-9767-4ff633d719d1']) held by migration f77d314e-67ed-45ba-a07e-54b445f4cfa7 for instance
2020-06-27 12:22:14.918 7 WARNING nova.compute.manager [req-35309488-77a3-43c7-8908-a2df5ba23a4b 0f76e7d44f584c5080a07f37219e2dac b42c617cd8724b3aa7e0cf6fdc3aad39 - default default] [instance: acd052d7-b65e-462e-88ee-466fd8a03df0] Received unexpected event network-vif-unplugged-8b81b46b-b726-40dc-a78d-648dd118351c for instance with vm_state active and task_state None.
~

Revision history for this message
Harry Kominos (hkominos) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.