Comment 8 for bug 2023931

Revision history for this message
Jake Nabasny (slapcat) wrote :

I ran into this bug with sunbeam deployed in a VM. Parts of the configuration during bootstrap used the reverse DNS name for the host machine instead of the FQDN set on the VM:

""""""""""""
# cat /var/snap/openstack-hypervisor/common/etc/nova/nova.conf | grep -i host
host = syn-172-100-xx-xx.res.spectrum.com

# python3 -c "import socket; print(socket.getfqdn())"
syn-172-100-xx-xx.res.spectrum.com

# sudo snap get openstack-hypervisor node
Key Value
node.fqdn syn-172-100-xx-xx.res.spectrum.com
node.ip-address 10.162.57.152
""""""""""""

alanbach and I found the following workaround to correct the misconfiguration in nova/neutron/ovn, but it is quite involved. If someone runs into this on a new cluster, it would be easier to fix the hostname resolution issues and then redeploy.

=== Workaround ===

1. Fix hostname resolution so forward and reverse lookup return expected values:

# echo "10.162.57.152 sunbeam.nabasny.com" >> /etc/hosts
# snap set openstack-hypervisor node.fqdn=sunbeam.nabasny.com

2. Update the hostname on openstack-hypervisor (run on the sunbeam machine):
# openstack-hypervisor.ovs-vsctl get open_vswitch . external_ids:hostname=$(hostname)

3. Update the OVN Gateway Chassis. First find the right LRP UUID for the gateway chassis:
# kubectl exec -it -n openstack neutron-0 -c neutron-server -- /bin/bash

root@neutron-0:/# neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair
...
2024-05-30 18:21:51.718 46 WARNING neutron.scheduler.l3_ovn_scheduler [-] Gateway lrp-9ada90ed-1d9f-4946-8823-ba54ce007464 was not scheduled on any chassis, no candidates are available
...

Then set it from ovn-central:
# kubectl exec -it -n openstack ovn-central-0 -c ovn-nb-db-server -- /bin/bash

root@ovn-central-0:/# ovn-nbctl lrp-set-gateway-chassis lrp-ed3c9158-e33c-4910-b8dd-b6ec667a33be <FQDN> 2

You can verify it is set correctly with:
root@ovn-central-0:/# ovn-nbctl find Gateway_Chassis

4. Run the neutron-ovn-db-sync-util in neutron-0 again. This time it should complete successfully:
# kubectl exec -it -n openstack neutron-0 -c neutron-server -- /bin/bash

root@neutron-0:/# neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair

5. Update all mentions of the incorrect hostname(s) in nova-mysql-0. First get the password:

juju run nova-mysql/0 get-password

Then login and update the relevant entries (pay attention to the specific id on the row you are updating):

# kubectl exec -it -n openstack nova-mysql-0 -c mysql -- mysql -uroot -p<password>

mysql> use nova;
mysql> select * from compute_nodes;
mysql> update compute_nodes set host='<hostname>' where id='<id>';
mysql> select * from services;
mysql> update services set host='<hostname>' where id='<id>';
mysql> use nova_api;
mysql> select * from host_mappings;
mysql> update host_mappings set host='<hostname>' where id='<id>';
mysql> quit;

6. Restart nova-0 and the nova-compute service:

# kubectl delete pod -n openstack nova-0
# systemctl restart snap.openstack-hypervisor.nova-compute

7. Check that the hostname shows correct for compute services and network agents, then try to launch an instance:

# openstack network agent list
# openstack compute service list