live-migration fails if nova-compute is added to a machine after ceph-osd has been added
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Nova Compute Charm |
Expired
|
Undecided
|
Unassigned |
Bug Description
As I'm digging into an issue with live-migration which results in all of my previously-live nova-compute units not able to connect (because of hostkey verification failures) to qemu+ssh:
Failed to connect to remote libvirt URI qemu+ssh:
I was able to work-around by connecting to source nova hypervisor, becoming root, ssh dest, and accepting the host key, and then live-migration succeeded to the new hypervisor.
Four of the nova-compute instances of the 64 installed, had ceph-osd charm installed and performing before nova-compute was installed. Some nodes that had ceph-osd charm installed before, but not functioning (due to install hook failed on ceph-osd due to missing OSD path mounts), do not exhibit this live-migration failure.
I think there is a problem with interaction with ssh host-key sharing that happens with both ceph-osd and nova-compute charms that breaks nova-compute cross-host pollenation of ssh host-keys if ceph-osd pollenates hostkeys before nova-compute does.
This is noticed on xenial units 16.04.2 LTS under juju 2.1.2.1 with nova-compute charm revision 135 and ceph-osd charm revision 17.
Any recommendations to troubleshoot or resolve this would be helpful.
I'm still digging into the issue to determine what's different on these nodes from the others, but think this may be a safeguard in nova-compute charm.