Nova install playbook causes temporary inability to schedule to compute nodes

Bug #2056180 reported by Andrew Bonney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Confirmed
High
Unassigned

Bug Description

I noted the following when patching one of our production environments. This involved rebuilding the virtualenv.

/etc/nova/ is a symlink into the virtualenv which gets removed during https://github.com/openstack/openstack-ansible-os_nova/blob/master/tasks/nova_pre_install.yml#L59 for a source install. Whilst the new symlink gets created immediately afterwards, the files within this directory don't get created until the post_install step in https://github.com/openstack/openstack-ansible-os_nova/blob/master/tasks/nova_post_install.yml#L73. This results in a potentially quite long period where the contents of /etc/nova/ don't exist.

This is likely fine for files like nova.conf if these aren't dynamically re-read, but for 'vendor_data.json' which appears to be read when scheduling VMs to a compute node, this causes scheduling failures as follows:

Traceback (most recent call last): File "/openstack/venvs/nova-27.1.0/lib/python3.8/site-packages/nova/conductor/manager.py", line 690, in build_instances scheduler_utils.populate_retry( File "/openstack/venvs/nova-27.1.0/lib/python3.8/site-packages/nova/scheduler/utils.py", line 998, in populate_retry raise exception.MaxRetriesExceeded(reason=msg) nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exceeded max scheduling attempts 5 for instance 3fb6fb0f-2212-4305-9285-6babce3acb48. Last exception: [Errno 2] No such file or directory: '/etc/nova/vendor_data.json'

Whilst this issue is specific to Nova and the fix may be as simple as moving the dynamically read file into /var/lib/nova/, it perhaps indicates a wider issue for files in /etc/<service name>/ which may be dynamically read (perhaps policy files and similar) and exist for services which are targeted via RPC and not protected by HAProxy maintenance mode.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Yeah, ok, I think we've faced some issue in Neutron, but it was fixed there a while back with:
https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/788501

I didn't iterate over services back then as was not sure if anything else is affected in the same way. But apparently it is...

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.