glance_api_servers should be set to the internal_lb_vip_address

Bug #1461245 reported by Bjoern
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
High
Andy McCrae

Bug Description

Unlike glance_host the glance_api_servers has been set to the list of glance containers.
Ideally this is also set to the VIP address so we can easily disable Load Balancer pool member without having to remove any servers from this list

Tags: in-kilo
Revision history for this message
Bjoern (bjoern-t) wrote :

FYI, I did see that inside the cinder.conf

Revision history for this message
Kevin Carter (kevin-carter) wrote :

This is intentional as the list of hosts is where an image could be retried more directly. This ends up being faster than using the lb vip while also having the added benefit of being not subject to connection timeouts that would be associated with a traditional load balancer.

Changed in openstack-ansible:
status: New → Invalid
Revision history for this message
Bjoern (bjoern-t) wrote :

I disagree, if the host is down for example it will generate API errors causing instances not to boot etc.
So I still want this setting, since we also have customers with global load balancing.
At least give us an option

Changed in openstack-ansible:
status: Invalid → New
Revision history for this message
Kevin Carter (kevin-carter) wrote :

I dont believe that this should cause any api errors as the list of servers should be iterated over and if a server is down or otherwise unavailable , it should use the next one in the array. Maybe we can expose it as an option that could be overloaded if needed.

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Low
Changed in openstack-ansible:
milestone: none → 11.0.3
Changed in openstack-ansible:
milestone: 11.0.3 → 11.0.4
Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

If we can show that it will/won't work when servers are down then we can know how to properly address this - I'll try test this.

If it does iterate through the list regardless of failures then the option is (in my opinion) completely useless, but if it does cause failures then the default should change to be the LB vip because that isn't desirable behavior for anybody.

Changed in openstack-ansible:
importance: Low → High
Changed in openstack-ansible:
assignee: nobody → Andy McCrae (andrew-mccrae)
Revision history for this message
Serge van Ginderachter (svg) wrote :

I can't directly comment for the cinder.conf case, but the same applies here for nova.conf.

Looking at the glance client implementation in nova, it is very clear this does not iterate in some fallback mode, but plainly fails, as Bjoern states.

See marked-as-duplicate #1468393 for more details.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/195497

Changed in openstack-ansible:
status: Confirmed → In Progress
Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

I've proposed a fix to kilo and master.

I tested this and it does fail for both cinder and nova - I did cinder creates with --image-id from glance and it works 1/5 times roughly (i made 5 gance api servers and shut 4 off). nova image-list is the same, whilst glance image-list worked every time.

I tried to increase num_retries to 6 (1 greater than the number of api servers) but this didn't really work either.

This strikes me a bit as a nova/cinder upstream bug, why allow us to specify a list if you can't iterate through the list on a failure we might want to look into that.

@kevin-carter I know you mentioned issues with the LB when using the vip, can we clarify/get more detail on this? I think we should look to fixing that up too - but in my testing I didn't see anything out of the ordinary.

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

Another NB. since its a group_vars change it will have to be manually applied to existing installs since its just a variable adjustment and then a re-run of the nova-conf and cinder-conf sections.

Revision history for this message
git-harry (git-harry) wrote :

@Andy - I think this does work. You don't give any details regarding your testing but I think you forgot to restart your cinder-api services. cinder-api does a check that the images exists and gets its metadata before handing the request to cinder-volume to download the image.

Changed in openstack-ansible:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/195497
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=dacc294d406cede800878d76b6f4b11c8af8e22b
Submitter: Jenkins
Branch: kilo

commit dacc294d406cede800878d76b6f4b11c8af8e22b
Author: Andy McCrae <email address hidden>
Date: Thu Jun 25 11:01:56 2015 +0100

    Default to use host/port for glance_api_servers

    The glance_api_servers points to a list of glance_api_servers for both
    cinder and nova. This causes "nova image-list" to fail when glance api
    servers are unavailable. Pointing to the LB VIP works as intended, so
    removing this var in favour of the "host/port" vars ensures that only
    the glance servers that are available are used.

    glance_api_servers is still available and if specified will be used in
    favour of host/port, but default it is commented out and the host/port
    will be used - which uses the internal_lb_vip_address and default
    glance_api_service_port.

    Change-Id: I6794a1a266d22944be8d5634ee0c0ce6cd9f2c59
    Closes-Bug: #1461245
    (cherry picked from commit d8b4cf9e790f812b2cfafb237d5909e46c3744d9)

tags: added: in-kilo
Changed in openstack-ansible:
status: Fix Committed → Fix Released
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.11

This issue was fixed in the openstack/openstack-ansible 11.2.11 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 11.2.12

This issue was fixed in the openstack/openstack-ansible 11.2.12 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.