live migration doesn't work with ssh

Bug #1904393 reported by Cyril Lopez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Packstack
New
Undecided
Unassigned

Bug Description

By default live migration is based on SSH :
./hieradata/common.yaml:NOVA_MIGRATION_KEY_TYPE: ssh-rsa

in packstack puppet module we set this .ssh/config file :
Host *
    User nova_migration
    UserKnownHostsFile /dev/null
    IdentityFile /etc/nova/migration/identity

but a param is missing : StrictHostKeyChecking no

If this is not set, fail with :
2020-11-16 03:31:00.484 23472 ERROR nova.virt.libvirt.driver [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Live Migration failure: operation failed: Failed to connect to remote libvir
t URI qemu+ssh://<email address hidden>/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Host key verification failed.: Connection reset by peer: libvirt.libvirtError
: operation failed: Failed to connect to remote libvirt URI qemu+ssh://<email address hidden>/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Host key verification f
ailed.: Connection reset by peer
2020-11-16 03:31:00.907 23472 ERROR nova.virt.libvirt.driver [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Migration operation has aborted

Revision history for this message
Cyril Lopez (cylopez) wrote :

In addition, it configure ssh client nova user but we use nova_migration....

So to make it works :
mkdir /var/lib/nova_migration
cp -r /var/lib/nova/.ssh/ /var/lib/nova_migration/
chown nova_migration: /var/lib/nova_migration/ -R

Revision history for this message
Cyril Lopez (cylopez) wrote :

From :
2020-11-16 04:35:19.689 23472 ERROR nova.virt.libvirt.driver [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://<email address hidden>/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Host key verification failed.: Connection reset by peer: libvirt.libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+ssh://<email address hidden>/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Host key verification failed.: Connection reset by peer
2020-11-16 04:35:20.120 23472 ERROR nova.virt.libvirt.driver [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Migration operation has aborted

To :
20-11-16 05:11:08.281 23472 INFO nova.compute.manager [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] During sync_power_state the instance has a pending task (migrating). Skip.
2020-11-16 05:11:14.049 23472 INFO nova.compute.manager [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Took 7.85 seconds for pre_live_migration on destination host icare-cmp01.local.
2020-11-16 05:11:15.195 23472 INFO nova.virt.libvirt.migration [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Increasing downtime to 50 ms after 0 sec elapsed time
2020-11-16 05:11:15.396 23472 INFO nova.virt.libvirt.driver [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Migration running for 0 secs, memory 100% remaining (bytes processed=0, remaining=0, total=0); disk 100% remaining (bytes processed=0, remaining=0, total=0).
2020-11-16 05:11:26.628 23472 INFO nova.compute.manager [req-5e454d77-0bc5-423d-bc72-828344672e52 - - - - -] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] VM Paused (Lifecycle Event)
2020-11-16 05:11:26.924 23472 INFO nova.virt.libvirt.driver [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Migration operation has completed
2020-11-16 05:11:26.924 23472 INFO nova.virt.libvirt.driver [-] [instance: 7116abfe-6804-4404-9902-29d6f44ab785] Migration operation has completed

Revision history for this message
Cyril Lopez (cylopez) wrote :

I forget to mention this:

sed 's#Migration:/:#Migration:/var/lib/nova_migration:#' -i /etc/passwd

Revision history for this message
Javier Peña (jpena-c) wrote :

I have managed to migrate some instances, including block migration, just by applying https://review.opendev.org/762996, but it is not 100% consistent (it works from host A to host B, but not the other way).

I'm not sure where the error is in this case (need to test more), but I think it's not related to the nova_migration user configuration.

Revision history for this message
Cyril Lopez (cylopez) wrote :

Hi Javier, thanks for taking care of this.

My context is one controller + compute node and two compute. all deploy by one packstack command line. I faced it on all role compute.

My first issue was the StrictHostKeyChecking and the SSH KEY indeed.

At this point I was able to run manually ssh as expect as "nova" user. But the issue was still here. As this point I understand, the user used to initiate the SSH session was not nova but nova_migration, so I should configure it as nova : access to the key and the .ssh/config.

BR
Cyril

Revision history for this message
Javier Peña (jpena-c) wrote :

Hi Cyril,

I've run a second test, with the latest master Packstack code + https://review.opendev.org/762996.

My setup was one controler/compute and one compute, this time I used OVS instead of OVN, and I managed to live-migrate Cinder-backed and ephemeral-backed VMs back and forth.

Did you find the issue on any stable release? If so, let me know so I can test that, and see if it's fixed elsewhere.

Javier

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.