Deleting an instance before scheduling with BFV fails to detach volume

Bug #1750666 reported by Mohammed Naser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Mohammed Naser
Pike
Fix Committed
Undecided
Mohammed Naser
Queens
Fix Committed
Undecided
Unassigned

Bug Description

If you try to boot and instance and delete it early before scheduling, the '_delete_while_booting' codepath hits `_attempt_delete_of_buildrequest` which tries to remove the block device mappings.

However, if the cloud contains compute nodes before Pike, no block device mappings will be present in the database (because they are only saved if using the new attachment flow), which means the attachment IDs are empty and the volume delete fails:

2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume cleanup failure due to Object action obj_load_attr failed because: attribute attachment_id not lazy-loadable

Revision history for this message
Matt Riedemann (mriedem) wrote :

It should be noted that this was found during backporting of https://review.openstack.org/#/c/545132/ to stable/pike where we don't have this code to set the bdm.attachment_id:

https://github.com/openstack/nova/blob/stable/queens/nova/compute/api.py#L3765

This is also a problem for rocky/queens if there are older computes in the deployment.

So the delete code path needs to handle the case that the bdm in the build request doesn't have an attachment_id field set and ignore it, like:

if 'attachment_id' in bdm and bdm.attachment_id:
    volume_api.delete_attachment(...)
else:
    # do the old volume cleanup thing

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/546398

Changed in nova:
assignee: nobody → Mohammed Naser (mnaser)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/546803

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/546804

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/546812

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/546315
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3120627d9802ceda46c2db387fec8fbc80700338
Submitter: Zuul
Branch: master

commit 3120627d9802ceda46c2db387fec8fbc80700338
Author: Mohammed Naser <email address hidden>
Date: Tue Feb 20 16:47:06 2018 -0500

    Add functional test for deleting BFV server with old attach flow

    When creating a new instance and deleting it before it gets scheduled
    with the old attachment flow (reserve_volume), the block device mappings
    are not persisted to database which means that the clean up fails
    because it tries to lookup attachment_id which cannot be lazy loaded.

    This patch adds a (failing) functional test to check for this issue
    which will be addressed in a follow-up patch.

    Related-Bug: #1750666
    Change-Id: I294c54e5a22dd6e5b226a4b00e7cd116813f0704

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/546398
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=16c2c8b3ee9d70e928a61ceb1ef5931d40e509a4
Submitter: Zuul
Branch: master

commit 16c2c8b3ee9d70e928a61ceb1ef5931d40e509a4
Author: Mohammed Naser <email address hidden>
Date: Tue Feb 20 17:11:37 2018 -0500

    Ensure attachment_id always exists for block device mapping

    If an instance is deleted before it is scheduled, the BDM
    clean-up code uses the mappings from the build request as
    they don't exist in the database yet.

    When using the older attachment flow with reserve_volume,
    there is no attachment_id bound to the block device mapping
    and because it is not loaded from database but rather from
    the build request, accessing the attachment_id field raises
    an exception with 'attachment_id not lazy-loadable'

    If we did a new style attach, _validate_bdm will add the
    attachment_id from Cinder. If we did not, then this patch
    will make sure to set it to 'None' to avoid raising an
    exception when checking if we have an attachment_id set in
    the BDM clean-up code

    Change-Id: I3cc775fc7dafe691b97a15e50ae2e93c92f355be
    Closes-Bug: #1750666

melanie witt (melwitt)
tags: added: queens-rc-potential volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/546803
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9b59abc04475e51458112541ca2bb50e51599cfd
Submitter: Zuul
Branch: stable/queens

commit 9b59abc04475e51458112541ca2bb50e51599cfd
Author: Mohammed Naser <email address hidden>
Date: Tue Feb 20 16:47:06 2018 -0500

    Add functional test for deleting BFV server with old attach flow

    When creating a new instance and deleting it before it gets scheduled
    with the old attachment flow (reserve_volume), the block device mappings
    are not persisted to database which means that the clean up fails
    because it tries to lookup attachment_id which cannot be lazy loaded.

    This patch adds a (failing) functional test to check for this issue
    which will be addressed in a follow-up patch.

    Related-Bug: #1750666
    Change-Id: I294c54e5a22dd6e5b226a4b00e7cd116813f0704
    (cherry picked from commit 3120627d9802ceda46c2db387fec8fbc80700338)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/546804
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=58af9ba39bda8753bb9d86d3587e2529911843f9
Submitter: Zuul
Branch: stable/queens

commit 58af9ba39bda8753bb9d86d3587e2529911843f9
Author: Mohammed Naser <email address hidden>
Date: Tue Feb 20 17:11:37 2018 -0500

    Ensure attachment_id always exists for block device mapping

    If an instance is deleted before it is scheduled, the BDM
    clean-up code uses the mappings from the build request as
    they don't exist in the database yet.

    When using the older attachment flow with reserve_volume,
    there is no attachment_id bound to the block device mapping
    and because it is not loaded from database but rather from
    the build request, accessing the attachment_id field raises
    an exception with 'attachment_id not lazy-loadable'

    If we did a new style attach, _validate_bdm will add the
    attachment_id from Cinder. If we did not, then this patch
    will make sure to set it to 'None' to avoid raising an
    exception when checking if we have an attachment_id set in
    the BDM clean-up code

    Change-Id: I3cc775fc7dafe691b97a15e50ae2e93c92f355be
    Closes-Bug: #1750666
    (cherry picked from commit 16c2c8b3ee9d70e928a61ceb1ef5931d40e509a4)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0rc3

This issue was fixed in the openstack/nova 17.0.0.0rc3 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/546812
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b61bf73b0fe5e96ab2953df93444acd2db6de62f
Submitter: Zuul
Branch: stable/pike

commit b61bf73b0fe5e96ab2953df93444acd2db6de62f
Author: Mohammed Naser <email address hidden>
Date: Tue Feb 20 16:47:06 2018 -0500

    Add functional test for deleting BFV server with old attach flow

    When creating a new instance and deleting it before it gets scheduled
    with the old attachment flow (reserve_volume), the block device mappings
    are not persisted to database which means that the clean up fails
    because it tries to lookup attachment_id which cannot be lazy loaded.

    This patch adds a (failing) functional test to check for this issue
    which will be addressed in a follow-up patch.

    Conflicts:
     nova/tests/functional/wsgi/test_servers.py

    Related-Bug: #1750666
    Change-Id: I294c54e5a22dd6e5b226a4b00e7cd116813f0704
    (cherry picked from commit 3120627d9802ceda46c2db387fec8fbc80700338)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/546275
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e5a055d2c6f04330cb83b33f8ddd4c9875fdb1b7
Submitter: Zuul
Branch: stable/pike

commit e5a055d2c6f04330cb83b33f8ddd4c9875fdb1b7
Author: Mohammed Naser <email address hidden>
Date: Tue Feb 20 17:11:37 2018 -0500

    Ensure attachment_id always exists for block device mapping

    If an instance is deleted before it is scheduled, the BDM
    clean-up code uses the mappings from the build request as
    they don't exist in the database yet.

    When using the older attachment flow with reserve_volume,
    there is no attachment_id bound to the block device mapping
    and because it is not loaded from database but rather from
    the build request, accessing the attachment_id field raises
    an exception with 'attachment_id not lazy-loadable'

    If we did a new style attach, _validate_bdm will add the
    attachment_id from Cinder. If we did not, then this patch
    will make sure to set it to 'None' to avoid raising an
    exception when checking if we have an attachment_id set in
    the BDM clean-up code

    Conflicts:
          nova/tests/functional/wsgi/test_servers.py

    Change-Id: I3cc775fc7dafe691b97a15e50ae2e93c92f355be
    Closes-Bug: #1750666
    (cherry picked from commit 16c2c8b3ee9d70e928a61ceb1ef5931d40e509a4)

    Detach volumes when deleting a BFV server pre-scheduling

    If the user creates a volume-backed server from an existing
    volume, the API reserves the volume by creating an attachment
    against it. This puts the volume into 'attaching' status.

    If the user then deletes the server before it's created in a
    cell, by deleting the build request, the attached volume is
    orphaned and requires admin intervention in the block storage
    service.

    This change simply pulls the BDMs off the BuildRequest when
    we delete the server via the build request and does the same
    local cleanup of those volumes as we would in a "normal" local
    delete scenario that the instance was created in a cell but
    doesn't have a host.

    We don't have to worry about ports in this scenario since
    ports are created on the compute, in a cell, and if we're
    deleting a build request then we never got far enough to
    create ports.

    Conflicts:
          nova/tests/functional/wsgi/test_servers.py

    Change-Id: I1a576bdb16befabe06a9728d7adf008fc0667077
    Partial-Bug: #1404867
    (cherry picked from commit 0652e4ab3d506ea21f1bd80b6802c6df5e7b523e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.