Did some investigation of the could not resolve hostname issue in nova.
Looking from inside a nova-compute pod in a standard config and trying to ping another compute, you get intermittent results:
controller-0:~$ kubectl exec -it -n openstack nova-compute-compute-0-75ea0372-rg9kk -c nova-compute /bin/bash
[root@compute-0 /]# while :; do (ping compute-1 -c 1; sleep 2;);done
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.078 ms
--- compute-1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
ping: compute-1: Name or service not known
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.100 ms
--- compute-1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.100/0.100/0.100/0.000 ms
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.106 ms
--- compute-1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.106/0.106/0.106/0.000 ms
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.100 ms
--- compute-1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.100/0.100/0.100/0.000 ms
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.101 ms
--- compute-1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.101/0.101/0.101/0.000 ms
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from compute-1 (192.168.204.122): icmp_seq=1 ttl=64 time=0.103 ms
Get the same result for infra/cluster-host name, though note that they're on the same interface in this lab (wcp99-103):
[root@compute-0 /]# while :; do (ping compute-1-infra -c 1; sleep 2;);done
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
PING compute-1-infra (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.107 ms
--- compute-1-infra ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.107/0.107/0.107/0.000 ms
ping: compute-1-infra: Name or service not known
PING compute-1-infra (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.094 ms
--- compute-1-infra ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.094/0.094/0.094/0.000 ms
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
Don't see any problems when I attempt this from compute-0 host outside of pod.
For reference, here's /etc/resolv.conf from inside the pod:
[root@compute-0 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search openstack.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Looking at the nova code, it uses hostname (instance.host) without any configurable optionality:
nova/virt/libvirt/driver.py:
def pre_live_migration(self, context, instance, block_device_info, network_info, disk_info, migrate_data):
...
if not is_shared_block_storage:
# Ensure images and backing files are present. LOG.debug('Checking to make sure images and backing files are ' 'present before live migration.', instance=instance) self._create_images_and_backing( context, instance, instance_dir, disk_info, fallback_from_host=instance.host)
if (configdrive.required_by(instance) and CONF.config_drive_format == 'iso9660'): # NOTE(pkoniszewski): Due to a bug in libvirt iso config # drive needs to be copied to destination prior to # migration when instance path is not shared and block # storage is not shared. Files that are already present # on destination are excluded from a list of files that # need to be copied to destination. If we don't do that # live migration will fail on copying iso config drive to # destination and writing to read-only device. # Please see bug/1246201 for more details. src = "%s:%s/disk.config" % (instance.host, instance_dir) self._remotefs.copy_file(src, instance_dir)
In stx-nova based on pike, we had hooked this to convert hostname to hostname-infra to ensure we used the correct network:
if (configdrive.required_by(instance) and CONF.config_drive_format == 'iso9660'): # NOTE(pkoniszewski): Due to a bug in libvirt iso config # drive needs to be copied to destination prior to # migration when instance path is not shared and block # storage is not shared. Files that are already present # on destination are excluded from a list of files that # need to be copied to destination. If we don't do that # live migration will fail on copying iso config drive to # destination and writing to read-only device. # Please see bug/1246201 for more details. src = "%s:%s/disk.config" % ( utils.safe_ip_format(instance.host), instance_dir) self._remotefs.copy_file(src, instance_dir)
nova/utils:
def safe_ip_format(ip):
"""Transform ip string to "safe" format.
Will return ipv4 addresses unchanged, but will nest ipv6 addresses
inside square brackets.
"""
try:
if netaddr.IPAddress(ip).version == 6:
return '[%s]' % ip
except (TypeError, netaddr.AddrFormatError): # hostname
# In TiC, we set up ssh keys for passwordless ssh between
# computes. If we have an infra interface present, the keys
# will be associated with that interface rather than the
# mgmt interface. We also always provide hostname
# resolution for the mgmt interface (compute-n) and the
# infra interface (compute-n-infra) irrespective of the
# infra interface actually being provisioned. By ensuring
# that we use the infra interface hostname we guarantee we
# will align with the ssh keys.
if '-infra' not in ip:
return '%s-infra' % ip
pass
# it's IPv4 or hostname
return ip
So assuming we want to support live migration of VMs with config drives, we'll need to fix hostname resolution inside nova pods and figure out a way for it to resolve to the cluster-host network instead of management network as the ssh keys are setup for cluster-host network only.
Alternately, we would have to add configurability to nova to allow us to program the ip address we want as is currently done for live and cold migration via options live_migration_inbound_addr and my_ip, respectively.
Did some investigation of the could not resolve hostname issue in nova.
Looking from inside a nova-compute pod in a standard config and trying to ping another compute, you get intermittent results: compute- 0-75ea0372- rg9kk -c nova-compute /bin/bash
controller-0:~$ kubectl exec -it -n openstack nova-compute-
[root@compute-0 /]# while :; do (ping compute-1 -c 1; sleep 2;);done
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.078 ms
--- compute-1 ping statistics --- 078/0.078/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.078/0.
ping: compute-1: Name or service not known
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.100 ms
--- compute-1 ping statistics --- 100/0.100/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.100/0.
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
ping: compute-1: Name or service not known
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.106 ms
--- compute-1 ping statistics --- 106/0.106/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.106/0.
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.100 ms
--- compute-1 ping statistics --- 100/0.100/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.100/0.
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.101 ms
--- compute-1 ping statistics --- 101/0.101/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.101/0.
PING compute-1 (192.168.204.122) 56(84) bytes of data.
64 bytes from compute-1 (192.168.204.122): icmp_seq=1 ttl=64 time=0.103 ms
Get the same result for infra/cluster-host name, though note that they're on the same interface in this lab (wcp99-103):
[root@compute-0 /]# while :; do (ping compute-1-infra -c 1; sleep 2;);done
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
PING compute-1-infra (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.107 ms
--- compute-1-infra ping statistics --- 107/0.107/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.107/0.
ping: compute-1-infra: Name or service not known
PING compute-1-infra (192.168.204.122) 56(84) bytes of data.
64 bytes from 192.168.204.122 (192.168.204.122): icmp_seq=1 ttl=64 time=0.094 ms
--- compute-1-infra ping statistics --- 094/0.094/ 0.000 ms
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.094/0.
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
ping: compute-1-infra: Name or service not known
Don't see any problems when I attempt this from compute-0 host outside of pod.
For reference, here's /etc/resolv.conf from inside the pod: svc.cluster. local svc.cluster.local cluster.local
[root@compute-0 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search openstack.
options ndots:5
Looking at the nova code, it uses hostname (instance.host) without any configurable optionality: libvirt/ driver. py: migration( self, context, instance, block_device_info,
network_ info, disk_info, migrate_data): block_storage:
LOG.debug( 'Checking to make sure images and backing files are '
'present before live migration.', instance=instance)
self. _create_ images_ and_backing(
context, instance, instance_dir, disk_info,
fallback_ from_host= instance. host) required_ by(instance) and
CONF. config_ drive_format == 'iso9660'):
# NOTE(pkoniszewski): Due to a bug in libvirt iso config
# drive needs to be copied to destination prior to
# migration when instance path is not shared and block
# storage is not shared. Files that are already present
# on destination are excluded from a list of files that
# need to be copied to destination. If we don't do that
# live migration will fail on copying iso config drive to
# destination and writing to read-only device.
# Please see bug/1246201 for more details.
src = "%s:%s/disk.config" % (instance.host, instance_dir)
self. _remotefs. copy_file( src, instance_dir)
nova/virt/
def pre_live_
...
if not is_shared_
# Ensure images and backing files are present.
if (configdrive.
In stx-nova based on pike, we had hooked this to convert hostname to hostname-infra to ensure we used the correct network: required_ by(instance) and
CONF. config_ drive_format == 'iso9660'):
# NOTE(pkoniszewski): Due to a bug in libvirt iso config
# drive needs to be copied to destination prior to
# migration when instance path is not shared and block
# storage is not shared. Files that are already present
# on destination are excluded from a list of files that
# need to be copied to destination. If we don't do that
# live migration will fail on copying iso config drive to
# destination and writing to read-only device.
# Please see bug/1246201 for more details.
src = "%s:%s/disk.config" % (
utils. safe_ip_ format( instance. host),
instance_ dir)
self. _remotefs. copy_file( src, instance_dir)
if (configdrive.
nova/utils:
def safe_ip_format(ip):
"""Transform ip string to "safe" format.
Will return ipv4 addresses unchanged, but will nest ipv6 addresses IPAddress( ip).version == 6: AddrFormatError ): # hostname
inside square brackets.
"""
try:
if netaddr.
return '[%s]' % ip
except (TypeError, netaddr.
# In TiC, we set up ssh keys for passwordless ssh between
# computes. If we have an infra interface present, the keys
# will be associated with that interface rather than the
# mgmt interface. We also always provide hostname
# resolution for the mgmt interface (compute-n) and the
# infra interface (compute-n-infra) irrespective of the
# infra interface actually being provisioned. By ensuring
# that we use the infra interface hostname we guarantee we
# will align with the ssh keys.
if '-infra' not in ip:
return '%s-infra' % ip
pass
# it's IPv4 or hostname
return ip
So assuming we want to support live migration of VMs with config drives, we'll need to fix hostname resolution inside nova pods and figure out a way for it to resolve to the cluster-host network instead of management network as the ssh keys are setup for cluster-host network only.
Alternately, we would have to add configurability to nova to allow us to program the ip address we want as is currently done for live and cold migration via options live_migration_ inbound_ addr and my_ip, respectively.