IndexError adding host to availability zone

Bug #1419115 reported by Chris Friesen
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Hans Lindgren
Kilo
Won't Fix
Low
Gonzalo De La Torre

Bug Description

There appears to be a bug in the code dealing with adding a disabled host to an aggregate that is exported as an availability zone.

I disable the nova-compute service on a host and then tried to add it to an aggregate that is exported as an availabilty zone. This resulted in the following error.

   File "/usr/lib64/python2.7/site-packages/oslo/utils/excutils.py", line 82, in __exit__
     six.reraise(self.type_, self.value, self.tb)
   File "/usr/lib64/python2.7/site-packages/nova/exception.py", line 71, in wrapped
     return f(self, context, *args, **kw)
   File "/usr/lib64/python2.7/site-packages/nova/compute/api.py", line 3673, in add_host_to_aggregate
     aggregate=aggregate)
   File "/usr/lib64/python2.7/site-packages/nova/compute/api.py", line 3591, in is_safe_to_update_az
     host_az = host_azs.pop()
 IndexError: pop from empty list

The code looks like this:

        if 'availability_zone' in metadata:
            _hosts = hosts or aggregate.hosts
            zones, not_zones = availability_zones.get_availability_zones(
                context, with_hosts=True)
            for host in _hosts:
                # NOTE(sbauza): Host can only be in one AZ, so let's take only
                # the first element
                host_azs = [az for (az, az_hosts) in zones
                            if host in az_hosts
                            and az != CONF.internal_service_availability_zone]
                host_az = host_azs.pop()

It appears that for a disabled host, host_azs can be empty, resulting in an error when we try to pop() from it.

It works fine if the service is enabled on the host, and it works fine if the service is diabled and I try to add the host to an aggregate that is not exported as an availability zone.

Tags: compute
jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichenjc (jichenjc)
Revision history for this message
jichenjc (jichenjc) wrote :

I confirmed same situation on my env

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/154269

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/154593

Changed in nova:
assignee: jichenjc (jichenjc) → Hans Lindgren (hanlind)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by jichenjc (<email address hidden>) on branch: master
Review: https://review.openstack.org/154269

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Fix proposed in https://review.openstack.org/#/c/154593/ is still in progress

Changed in nova:
importance: Undecided → Low
Revision history for this message
Max Lobur (max-lobur) wrote :

I'm hitting it when I add any node to any AZ, if at least one node shown in nova service-list is in down state.
Juno nova-api, 2014.2.2

Revision history for this message
Max Lobur (max-lobur) wrote :

Figured out that in my case problem was caused by upper case used in node names. In truth they named as lower case, and nova does not do conversion if you try to add node named as upper case.
Once I switched to lowercase it worked.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/154593
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e35c6761132e1abfea70ec1283891660451ec556
Submitter: Jenkins
Branch: master

commit e35c6761132e1abfea70ec1283891660451ec556
Author: Hans Lindgren <email address hidden>
Date: Tue Feb 10 18:57:51 2015 +0100

    Fix logic for checking if az can be updated

    The existing logic was overly complicated and missed out on those hosts
    whose services were disabled.

    This is a complete rewrite that makes use of a single aggregate query,
    thereby bypassing a lot of the extra logic needed by the old code.

    Fixes an issue in objects.AggregateList.get_by_metadata_key() where
    filtering by an empty list of hosts will return metadata for all hosts.

    Also removes a call to db.aggregate_metadata_get_by_metadata_key() to
    avoid the need for special handling due to that method returning
    metadata as sets of values instead of strings and also because the same
    metadata is already fetched in another method.

    Change-Id: I514e63ce863f2c77dcd47af3e3674019033a77de
    Closes-Bug: #1419115

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/239274

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/kilo)

Change abandoned by Dave Walker (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/239274
Reason:
stable/kilo closed for 2015.1.4

This release is now pending its final release and no freeze exception has
been seen for this changeset. Therefore, I am now abandoning this change.

If this is not correct, please urgently raise a thread on openstack-dev.

More details at: https://wiki.openstack.org/wiki/StableBranch

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.