Activity log for bug #1468393

Date Who What changed Old value New value Message
2015-06-24 14:45:07 Serge van Ginderachter bug added bug
2015-06-24 15:08:54 Andy McCrae openstack-ansible: assignee Andy McCrae (andrew-mccrae)
2015-06-24 15:08:57 Andy McCrae openstack-ansible: importance Undecided High
2015-06-24 15:47:59 Kevin Carter openstack-ansible: milestone 11.0.4
2015-06-25 03:56:33 Kevin Carter openstack-ansible: status New Confirmed
2015-06-25 03:57:27 Kevin Carter marked as duplicate 1461245
2016-03-15 15:14:05 Diana Clarke description -- nova.conf [glance] apie_servers = .... In juno this setting pointed to an endpoint that was load balanced. In kilo this setting was updated with a list of all glance containers (glance_api_servers: "{% for host in groups['glance_all'] %}{{ hostvars[host]['container_address'] }}:{{ glance_service_port }}{% if not loop.last %},{% endif %}{% endfor %}") The way how nova accesses glance, however, is by iterating continuously between a randomized version of this list of urls (optionally doing some retries, but this is left unconfigured and default in the osad/kilo setup.) If the glance client gets an error on the currently contacted host, it will raise an error, and will not try to fall back on one of the other hosts. The request at hand will plainly fail when one of the glance containers/services is down, and it just happens to have picked this one for the current request. This behaviour can easily be understood looking at nova/image/glance.py: https://github.com/openstack/nova/blob/stable/kilo/nova/image/glance.py#L219 As such this configuration in osad 11 is not highly available. I couldn't find any good sources on this change from osad 10/juno, and on the reasons for it. From irc discussions I was told this was because of issues with a loadbalancer setup (timeouts, e.g. when transferring large images?), however I would think such issues should be resolved by tuning the LB? To be complete, let me point out this is list of glance endpoints is also used in cinder.conf, however we did not experience issues there, and i didn't check how cinder operates on that list. Please note however this might be related to use using Ceph/RBD for both glance and cinder. playbooks/roles/os_cinder/templates/cinder.conf.j2:glance_api_servers = {{ glance_api_servers }} -- nova.conf [glance] api_servers = .... In juno this setting pointed to an endpoint that was load balanced. In kilo this setting was updated with a list of all glance containers (glance_api_servers: "{% for host in groups['glance_all'] %}{{ hostvars[host]['container_address'] }}:{{ glance_service_port }}{% if not loop.last %},{% endif %}{% endfor %}") The way how nova accesses glance, however, is by iterating continuously between a randomized version of this list of urls (optionally doing some retries, but this is left unconfigured and default in the osad/kilo setup.) If the glance client gets an error on the currently contacted host, it will raise an error, and will not try to fall back on one of the other hosts. The request at hand will plainly fail when one of the glance containers/services is down, and it just happens to have picked this one for the current request. This behaviour can easily be understood looking at nova/image/glance.py: https://github.com/openstack/nova/blob/stable/kilo/nova/image/glance.py#L219 As such this configuration in osad 11 is not highly available. I couldn't find any good sources on this change from osad 10/juno, and on the reasons for it. From irc discussions I was told this was because of issues with a loadbalancer setup (timeouts, e.g. when transferring large images?), however I would think such issues should be resolved by tuning the LB? To be complete, let me point out this is list of glance endpoints is also used in cinder.conf, however we did not experience issues there, and i didn't check how cinder operates on that list. Please note however this might be related to use using Ceph/RBD for both glance and cinder. playbooks/roles/os_cinder/templates/cinder.conf.j2:glance_api_servers = {{ glance_api_servers }}
2016-03-22 14:27:39 Diana Clarke bug added subscriber Diana Clarke