Boot from volume fails when cross_az_attach=False due to: ObjectActionError: Object action obj_load_attr failed because: attribute host not lazy-loadable

Bug #1693600 reported by Matt Riedemann on 2017-05-25
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matt Riedemann
Ocata
High
Matt Riedemann

Bug Description

Reproduced here:

http://logs.openstack.org/74/467674/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/414bb96/logs/screen-n-api.txt.gz?level=TRACE#_May_25_02_46_02_139728

May 25 02:46:02.139728 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api [req-4a040119-d445-4eb8-8d56-a29653bb6866 tempest-TestVolumeBootPattern-1408924396 tempest-TestVolumeBootPattern-1408924396] Failed BDM validation for volume: 4eb7a1cf-1f6d-4b7c-ae37-99688a0c1e6d
May 25 02:46:02.140058 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api Traceback (most recent call last):
May 25 02:46:02.140638 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api File "/opt/stack/new/nova/nova/compute/api.py", line 1412, in _validate_bdm
May 25 02:46:02.140851 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api context, volume_id, instance)
May 25 02:46:02.141057 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api File "/opt/stack/new/nova/nova/compute/api.py", line 3711, in _check_attach_and_reserve_volume
May 25 02:46:02.141281 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api instance=instance)
May 25 02:46:02.141487 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api File "/opt/stack/new/nova/nova/volume/cinder.py", line 285, in check_availability_zone
May 25 02:46:02.141700 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api instance_az = az.get_instance_availability_zone(context, instance)
May 25 02:46:02.141997 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api File "/opt/stack/new/nova/nova/availability_zones.py", line 167, in get_instance_availability_zone
May 25 02:46:02.142308 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api host = instance.get('host')
May 25 02:46:02.142540 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 771, in get
May 25 02:46:02.142762 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api return getattr(self, key)
May 25 02:46:02.142978 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 67, in getter
May 25 02:46:02.143196 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api self.obj_load_attr(name)
May 25 02:46:02.143414 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api File "/opt/stack/new/nova/nova/objects/instance.py", line 1029, in obj_load_attr
May 25 02:46:02.143636 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api reason='attribute %s not lazy-loadable' % attrname)
May 25 02:46:02.144016 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api ObjectActionError: Object action obj_load_attr failed because: attribute host not lazy-loadable
May 25 02:46:02.144244 ubuntu-xenial-infracloud-vanilla-8978830 nova-api[17948]: ERROR nova.compute.api

This is because the instance object here is not created from the database, it's created in the API code to be set in the BuildRequest object, and it doesn't have the host field set:

https://github.com/openstack/nova/blob/4b732b5b3e7c1da63fa20af52028e7903986ba6a/nova/compute/api.py#L1023

So we need to handle the fact that 'host' might not be set in the instance object in this code.

Matt Riedemann (mriedem) wrote :

We stopped creating the instance in the API in Ocata so this has to go back to Ocata too.

Fix proposed to branch: master
Review: https://review.openstack.org/468147

Changed in nova:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/468147
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=91bd79058abf79978335010a325952f17729d7a5
Submitter: Jenkins
Branch: master

commit 91bd79058abf79978335010a325952f17729d7a5
Author: Matt Riedemann <email address hidden>
Date: Thu May 25 15:46:22 2017 -0400

    Avoid lazy-load error when getting instance AZ

    When [cinder]cross_az_attach=False (not the default) and doing
    boot from volume, the API code validates the BDM by seeing if
    the instance and the volume are in the same availability zone.
    To get the AZ for the instance, the code is first trying to get
    the instance.host value.

    In Ocata we stopped creating the instance in the API and moved that
    to conductor for cells v2. So the Instance object in this case now
    is created in the _provision_instances method and stored in the
    BuildRequest object. Since there is no host to set on the instance
    yet and the Instance object wasn't populated from DB values, which
    before would set the host field on the instance object to None by
    default, trying to get instance.host will lazy-load the field and
    it blows up with ObjectActionError.

    The correct thing to do here is check if the host attribute is set
    on the Instance object. There is clear intent to assume host is
    not set in the instance since it was using instance.get('host'),
    probably from way back in the days when the instance in this case
    was a dict. So it's expecting to handle None, but we need to
    modernize how that is checked.

    Change-Id: I0dccb6a416dfe0eae4f7c52dfc28786a449b17bd
    Closes-Bug: #1693600

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/469548
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2868d52e42cccf539938afbb252650fb4ea4e3c5
Submitter: Jenkins
Branch: stable/ocata

commit 2868d52e42cccf539938afbb252650fb4ea4e3c5
Author: Matt Riedemann <email address hidden>
Date: Thu May 25 15:46:22 2017 -0400

    Avoid lazy-load error when getting instance AZ

    When [cinder]cross_az_attach=False (not the default) and doing
    boot from volume, the API code validates the BDM by seeing if
    the instance and the volume are in the same availability zone.
    To get the AZ for the instance, the code is first trying to get
    the instance.host value.

    In Ocata we stopped creating the instance in the API and moved that
    to conductor for cells v2. So the Instance object in this case now
    is created in the _provision_instances method and stored in the
    BuildRequest object. Since there is no host to set on the instance
    yet and the Instance object wasn't populated from DB values, which
    before would set the host field on the instance object to None by
    default, trying to get instance.host will lazy-load the field and
    it blows up with ObjectActionError.

    The correct thing to do here is check if the host attribute is set
    on the Instance object. There is clear intent to assume host is
    not set in the instance since it was using instance.get('host'),
    probably from way back in the days when the instance in this case
    was a dict. So it's expecting to handle None, but we need to
    modernize how that is checked.

    Change-Id: I0dccb6a416dfe0eae4f7c52dfc28786a449b17bd
    Closes-Bug: #1693600
    (cherry picked from commit 91bd79058abf79978335010a325952f17729d7a5)

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

This issue was fixed in the openstack/nova 15.0.6 release.

Change abandoned by David Moreau Simard (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/485257
Reason: Correct, this doesn't seem required for newton.

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/485257
Reason: It's unclear what is causing this in newton and tomorrow is newton-eol, so I'm going to abandon this. I asked for recreate steps a couple of months ago with no response.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers