nova.conf [glance] api_servers configuration is not HA

Bug #1468393 reported by Serge van Ginderachter
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Confirmed
High
Andy McCrae

Bug Description

-- nova.conf
[glance]
api_servers = ....

In juno this setting pointed to an endpoint that was load balanced.
In kilo this setting was updated with a list of all glance containers (glance_api_servers: "{% for host in groups['glance_all'] %}{{ hostvars[host]['container_address'] }}:{{ glance_service_port }}{% if not loop.last %},{% endif %}{% endfor %}")

The way how nova accesses glance, however, is by iterating continuously between a randomized version of this list of urls (optionally doing some retries, but this is left unconfigured and default in the osad/kilo setup.)

If the glance client gets an error on the currently contacted host, it will raise an error, and will not try to fall back on one of the other hosts.

The request at hand will plainly fail when one of the glance containers/services is down, and it just happens to have picked this one for the current request.

This behaviour can easily be understood looking at nova/image/glance.py:

https://github.com/openstack/nova/blob/stable/kilo/nova/image/glance.py#L219

As such this configuration in osad 11 is not highly available.

I couldn't find any good sources on this change from osad 10/juno, and on the reasons for it. From irc discussions I was told this was because of issues with a loadbalancer setup (timeouts, e.g. when transferring large images?), however I would think such issues should be resolved by tuning the LB?

To be complete, let me point out this is list of glance endpoints is also used in cinder.conf, however we did not experience issues there, and i didn't check how cinder operates on that list. Please note however this might be related to use using Ceph/RBD for both glance and cinder.

playbooks/roles/os_cinder/templates/cinder.conf.j2:glance_api_servers = {{ glance_api_servers }}

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :
Changed in openstack-ansible:
assignee: nobody → Andy McCrae (andrew-mccrae)
importance: Undecided → High
Changed in openstack-ansible:
milestone: none → 11.0.4
Changed in openstack-ansible:
status: New → Confirmed
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.