Boot from volume fails when cross_az_attach=False and volume is provided to nova without an AZ for the instance

Bug #1694844 reported by Matt Riedemann
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann

Bug Description

This was recreated with a devstack change:

http://logs.openstack.org/74/467674/4/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/3dbd6e9/logs/screen-n-api.txt.gz#_May_26_02_41_54_584798

In this failing test, Tempest creates a volume:

{"volume": {"status": "creating", "user_id": "2256bb66db8741aab58a20367b00bfa2", "attachments": [], "links": [{"href": "https://10.39.38.35:8776/v2/272882ba896341d483982dbcb1fde0f4/volumes/55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "rel": "self"}, {"href": "https://10.39.38.35:8776/272882ba896341d483982dbcb1fde0f4/volumes/55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "rel": "bookmark"}], "availability_zone": "nova", "bootable": "false", "encrypted": false, "created_at": "2017-05-26T02:41:45.617286", "description": null, "updated_at": null, "volume_type": "lvmdriver-1", "name": "tempest-TestVolumeBootPattern-volume-origin-1984626538", "replication_status": null, "consistencygroup_id": null, "source_volid": null, "snapshot_id": null, "multiattach": false, "metadata": {}, "id": "55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "size": 1}}

And the AZ on the volume defaults to 'nova' because that's the default AZ in cinder.conf.

That volume ID is then passed to create the server:

{"server": {"block_device_mapping_v2": [{"source_type": "volume", "boot_index": 0, "destination_type": "volume", "uuid": "55a7c64a-f7b2-4b77-8f60-c1ccda8e0c30", "delete_on_termination": true}], "networks": [{"uuid": "da48954d-1f66-427b-892c-a7f2eb1b54a3"}], "imageRef": "", "name": "tempest-TestVolumeBootPattern-server-1371698056", "flavorRef": "42"}}

Which fails with the 400 InvalidVolume error because of this check in the API:

https://github.com/openstack/nova/blob/f112dc686dadd643410575cc3487cf1632e4f689/nova/volume/cinder.py#L286

The instance is not associated with a host yet so it's not in an aggregate, and since an AZ wasn't specified when creating an instance (and I don't think we want people passing 'nova' as the AZ), it fails when comparing None to 'nova'.

This is separate from bug 1497253 and change https://review.openstack.org/#/c/366724/ because in that case Nova is creating the volume during boot from volume and can specify the AZ for the volume. In this bug, the volume already exists and is provided to Nova.

We might need to be able to distinguish if the API or compute service is calling check_availability_zone and if so, pass a default AZ in the case of the API if one isn't defined.

Tags: volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/469675

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/684141

Matt Riedemann (mriedem)
no longer affects: nova/ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/684141
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f212b855835d226189b202001bba97bb8c289179
Submitter: Zuul
Branch: master

commit f212b855835d226189b202001bba97bb8c289179
Author: Matt Riedemann <email address hidden>
Date: Mon Sep 23 17:43:33 2019 -0400

    Add functional tests for [cinder]/cross_az_attach=False

    This adds some simple functional tests for the API validation
    behavior during server create when [cinder]/cross_az_attach=False
    meaning the server being created and any volumes attached to it
    must be in the same AZ.

    As part of this, bug 1694844 is recreated where the server is
    created without an AZ (or default_schedule_zone AZ) which results
    in a 400 response because None != whatever the volume's AZ is
    (which defaults to "nova" in Cinder). This is important for testing
    fixes for that bug later since the API interaction code is pretty
    hairy and unit tests are insufficient for validating a fix.

    Change-Id: I1b724f7ad3e2f6baa9fd865a8e22d87bf909b488
    Related-Bug: #1694844

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/469675
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=07a24dcef7ce6767b4b5bab0c8d3166cbe5b39c0
Submitter: Zuul
Branch: master

commit 07a24dcef7ce6767b4b5bab0c8d3166cbe5b39c0
Author: Matt Riedemann <email address hidden>
Date: Tue Oct 8 18:01:44 2019 -0400

    Default AZ for instance if cross_az_attach=False and checking from API

    If we're booting from an existing volume but the instance is not being
    created in a requested availability zone, and cross_az_attach=False,
    we'll fail with a 400 since by default the volume is in the 'nova'
    AZ and the instance does not have an AZ set - because one wasn't requested
    and because it's not in a host aggregate yet.

    This refactors that AZ validation during server create in the API to
    do it before calling _validate_bdm so we get the pre-existing volumes
    early and if cross_az_attach=False, we validate the volume zone(s) against
    the instance AZ. If the [DEFAULT]/default_schedule_zone (for instances) is
    not set and the volume AZ does not match the
    [DEFAULT]/default_availability_zone then we put the volume AZ in the request
    spec as if the user requested that AZ when creating the server.

    Since this is a change in how cross_az_attach is used and how the instance
    default AZ works when using BDMs for pre-existing volumes, the docs are
    updated and a release note is added.

    Note that not all of the API code paths are unit tested because the
    functional test coverage does most of the heavy lifting for coverage.
    Given the amount of unit tests that are impacted by this change, it is
    pretty obvious that (1) many unit tests are mocking at too low a level and
    (2) functional tests are better for validating these flows.

    Closes-Bug: #1694844

    Change-Id: Ib31ba2cbff0ebb22503172d8801b6e0c3d2aa68a

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.