live-migration fails because of " Host key verification failed"

Bug #1624997 reported by KC Bi
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Landscape Server
New
Undecided
Unassigned
OpenStack Nova Compute Charm
Incomplete
Medium
Unassigned
nova-compute (Juju Charms Collection)
Invalid
Medium
Unassigned

Bug Description

OpenStack Mitaka deployed through Autopilot by following http://www.ubuntu.com/download/cloud

While migrate an instance between two nova-compute nodes, below error will be captured log:

2016-09-19 05:50:37.767 5987 ERROR nova.virt.libvirt.driver [req-1a482106-cd47-484a-bf1e-5e1c96dc2009 3473024639d04768a780d4e3b0d92438 72a68c40f9aa4a8b87100824ea708c2a - - -] [instance: 2f145635-bb76-4554-8385-861eb4fc849b] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://CA29/system: Cannot recv data: Host key verification failed.: Connection reset by peer
2016-09-19 05:50:37.768 5987 DEBUG nova.virt.libvirt.driver [req-1a482106-cd47-484a-bf1e-5e1c96dc2009 3473024639d04768a780d4e3b0d92438 72a68c40f9aa4a8b87100824ea708c2a - - -] [instance: 2f145635-bb76-4554-8385-861eb4fc849b] Migration operation thread notification thread_finished /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py:6530
2016-09-19 05:50:37.777 5987 DEBUG nova.virt.libvirt.driver [req-1a482106-cd47-484a-bf1e-5e1c96dc2009 3473024639d04768a780d4e3b0d92438 72a68c40f9aa4a8b87100824ea708c2a - - -] [instance: 2f145635-bb76-4554-8385-861eb4fc849b] VM running on src, migration failed _live_migration_monitor /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py:6319
2016-09-19 05:50:37.778 5987 DEBUG nova.virt.libvirt.driver [req-1a482106-cd47-484a-bf1e-5e1c96dc2009 3473024639d04768a780d4e3b0d92438 72a68c40f9aa4a8b87100824ea708c2a - - -] [instance: 2f145635-bb76-4554-8385-861eb4fc849b] Fixed incorrect job type to be 4 _live_migration_monitor /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py:6339
2016-09-19 05:50:37.779 5987 ERROR nova.virt.libvirt.driver [req-1a482106-cd47-484a-bf1e-5e1c96dc2009 3473024639d04768a780d4e3b0d92438 72a68c40f9aa4a8b87100824ea708c2a - - -] [instance: 2f145635-bb76-4554-8385-861eb4fc849b] Migration operation has aborted
2016-09-19 05:51:07.633 5987 DEBUG nova.virt.libvirt.driver [req-1a482106-cd47-484a-bf1e-5e1c96dc2009 3473024639d04768a780d4e3b0d92438 72a68c40f9aa4a8b87100824ea708c2a - - -] [instance: 2f145635-bb76-4554-8385-861eb4fc849b] Live migration monitoring is all done _live_migration /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py:6550

Revision history for this message
KC Bi (bikecheng) wrote :
Revision history for this message
Chuck Short (zulcss) wrote :

It looks like the ssh keys are not set up properly.

Changed in nova (Ubuntu):
status: New → Triaged
Revision history for this message
KC Bi (bikecheng) wrote :

The default setup(qemu+ssh) triggers the problem. Finally, live migration successes with qemu+tcp setup in libvirtd configuraiton.

From my perspective, after OpenStack deployment through Autopilot, everything should be all set. Otherwise, there should be some guides telling exact steps needed to perform the operation.

Revision history for this message
Xiang Hui (xianghui) wrote :

Hi KC, I have hit the same problem, could you provide the detail way to fix it, what do you mean with qemu+tcp setup in libvirtd configuration? thanks.

Changed in nova (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
KC Bi (bikecheng) wrote :

Below configurations have been used to work around the problem:

 1. Edit /etc/nova/nova.conf on nova nodes which participate in the live migration:

 [libvirt]
 rbd_user =
 rbd_secret_uuid =
 #live_migration_uri = qemu+ssh://%s/system
 live_migration_uri = qemu+tcp://%s/system

 # Disable tunnelled migration so that selective
 # live block migration can be supported.
 live_migration_tunnelled = True

 iscsi_use_multipath = True

 2. Edit /etc/libvirt/libvirtd.conf:
 listen_tcp = 1
 listen_addr = "0.0.0.0"
 auth_tcp = "none"

 3. Restart nova-compute and libvirt:
 # service libvirt-bin restart
 # service nova-compute restart

Meanwhile, such modifications won't work if you are using Ubuntu with kernel 3.x like 3.13.0-100-generic I previously used - it seems there is a kernel bug underground. Latest kernel 4.4.0-45-generic work well with above mentioned configurations.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I'm going to target this at autopilot and the nova-compute charm since this looks to be a config issue instead of a packaging issue.

no longer affects: autopilot
Changed in nova (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
James Page (james-page) wrote :

I have a feeling this is related to bug 1628216 but we'll need to confirm a few details.

For those on the bug impacted, please can you describe the network configuration in your infrastructure; if 'unit-get private-address' does not match the DNS response from nslookup hostname, then we see this type of mismatch and broken live migration due to mismatching SSH keys.

FWIW the solution in #5 might work, but it does disable any level of encryption and security on your compute nodes, allowing anyone with network access to tinkle with the instances running on your hypervisors.

Marking 'Incomplete' for now - if we can confirm the DNS/private-address mismatch problem then we can mark as a dupe of bug 1628216.

Changed in nova-compute (Juju Charms Collection):
status: New → Incomplete
importance: Undecided → Medium
James Page (james-page)
Changed in charm-nova-compute:
importance: Undecided → Medium
status: New → Incomplete
Changed in nova-compute (Juju Charms Collection):
status: Incomplete → Invalid
Revision history for this message
Drew Freiberger (afreiberger) wrote :

James,

I just filed a related bug which narrows down that this is happening if ceph-osd configures ssh key exchanges before nova-compute unit is added to the host.

https://bugs.launchpad.net/charm-nova-compute/+bug/1677707

This is also a Xenial Mitaka cloud, these nodes were added to a juju-deployer environment (Bootstack) manually by adding ubuntu charm to provision host, then ceph-osd add-unit, then nova-compute add-unit.

no longer affects: nova (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.