Unhelpful invalid bdm error in compute logs when volume creation fails during boot from volume

Bug #1693315 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Newton
Fix Committed
Medium
Matt Riedemann
Ocata
Fix Committed
Medium
Lee Yarwood

Bug Description

This came up in IRC while debugging a separate problem with a user.

They are booting from volume where nova creates the volume, and were getting this unhelpful error message in the end:

BuildAbortException: Build of instance 9484f5a7-3198-47ff-b728-178515a26277 aborted: Block Device Mapping is Invalid.

That's from this generic exception that is raised up:

https://github.com/openstack/nova/blob/81bdbd0b50aeac9a677a0cef9001081008a2c407/nova/compute/manager.py#L1595

The actual exception in the traceback is much more specific:

http://paste.as47869.net/p/9qbburh7z3w3toi

2017-05-24 16:33:26.127 2331 ERROR nova.compute.manager [instance: 9484f5a7-3198-47ff-b728-178515a26277] VolumeNotCreated: Volume da947c97-66c6-4b7e-9ae6-54eb8128bb75 did not finish being created even after we waited 3 seconds or 2 attempts. And its status is error.

That's showing that the volume failed to be created almost immediately.

It would be better to include that error message in what goes into the BuildAbortException which is what ultimately goes into the recorded instance fault:

https://github.com/openstack/nova/blob/81bdbd0b50aeac9a677a0cef9001081008a2c407/nova/compute/manager.py#L1878

Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/467715

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/467715
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=20c4715a49a44c642882618f102cd0fc9342978d
Submitter: Jenkins
Branch: master

commit 20c4715a49a44c642882618f102cd0fc9342978d
Author: Matt Riedemann <email address hidden>
Date: Thu Jun 15 11:46:44 2017 -0400

    Provide original fault message when BFV fails

    When booting from volume and Nova is creating the volume,
    it can fail (timeout, invalid AZ in Cinder, etc) and the
    generic Exception handling in _prep_block_device will log
    the original exception trace but then raise a generic
    InvalidBDM exception, which is handled higher up and converted
    to a BuildAbortException, which is recorded as an instance
    fault, but the original error message is lost from the fault.

    It would be better to include the original exception message that
    triggered the failure so that goes into the fault for debug.

    For example, this is a difference of getting an error like this:

      BuildAbortException: Build of instance
      9484f5a7-3198-47ff-b728-178515a26277 aborted:
      Block Device Mapping is Invalid.

    To something more useful like this:

      BuildAbortException: Build of instance
      9484f5a7-3198-47ff-b728-178515a26277 aborted:
      Volume da947c97-66c6-4b7e-9ae6-54eb8128bb75 did not finish
      being created even after we waited 3 seconds or 2 attempts.
      And its status is error.

    Change-Id: I20a5e8e5e10dd505c1b24c208f919c6550e9d1a4
    Closes-Bug: #1693315

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0rc1

This issue was fixed in the openstack/nova 16.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/493141

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/493206

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/493141
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=50eb1c6df4ef067e2f3de1df4e89dfab5df66c47
Submitter: Zuul
Branch: stable/ocata

commit 50eb1c6df4ef067e2f3de1df4e89dfab5df66c47
Author: Matt Riedemann <email address hidden>
Date: Thu Jun 15 11:46:44 2017 -0400

    Provide original fault message when BFV fails

    When booting from volume and Nova is creating the volume,
    it can fail (timeout, invalid AZ in Cinder, etc) and the
    generic Exception handling in _prep_block_device will log
    the original exception trace but then raise a generic
    InvalidBDM exception, which is handled higher up and converted
    to a BuildAbortException, which is recorded as an instance
    fault, but the original error message is lost from the fault.

    It would be better to include the original exception message that
    triggered the failure so that goes into the fault for debug.

    For example, this is a difference of getting an error like this:

      BuildAbortException: Build of instance
      9484f5a7-3198-47ff-b728-178515a26277 aborted:
      Block Device Mapping is Invalid.

    To something more useful like this:

      BuildAbortException: Build of instance
      9484f5a7-3198-47ff-b728-178515a26277 aborted:
      Volume da947c97-66c6-4b7e-9ae6-54eb8128bb75 did not finish
      being created even after we waited 3 seconds or 2 attempts.
      And its status is error.

    Conflicts:
          nova/compute/manager.py
          nova/tests/unit/compute/test_compute_mgr.py

    NOTE(mriedem): Conflicts in manager.py were just due to no longer
    using _LE in master (pike). The test conflict was just some tests
    added to pike which aren't in ocata in the same area of the file.

    Change-Id: I20a5e8e5e10dd505c1b24c208f919c6550e9d1a4
    Closes-Bug: #1693315
    (cherry picked from commit 20c4715a49a44c642882618f102cd0fc9342978d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/493206
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=89b6da0c2832e739e6e0d30a0e205165005dc41c
Submitter: Zuul
Branch: stable/newton

commit 89b6da0c2832e739e6e0d30a0e205165005dc41c
Author: Matt Riedemann <email address hidden>
Date: Thu Jun 15 11:46:44 2017 -0400

    Provide original fault message when BFV fails

    When booting from volume and Nova is creating the volume,
    it can fail (timeout, invalid AZ in Cinder, etc) and the
    generic Exception handling in _prep_block_device will log
    the original exception trace but then raise a generic
    InvalidBDM exception, which is handled higher up and converted
    to a BuildAbortException, which is recorded as an instance
    fault, but the original error message is lost from the fault.

    It would be better to include the original exception message that
    triggered the failure so that goes into the fault for debug.

    For example, this is a difference of getting an error like this:

      BuildAbortException: Build of instance
      9484f5a7-3198-47ff-b728-178515a26277 aborted:
      Block Device Mapping is Invalid.

    To something more useful like this:

      BuildAbortException: Build of instance
      9484f5a7-3198-47ff-b728-178515a26277 aborted:
      Volume da947c97-66c6-4b7e-9ae6-54eb8128bb75 did not finish
      being created even after we waited 3 seconds or 2 attempts.
      And its status is error.

    Change-Id: I20a5e8e5e10dd505c1b24c208f919c6550e9d1a4
    Closes-Bug: #1693315
    (cherry picked from commit 20c4715a49a44c642882618f102cd0fc9342978d)
    (cherry picked from commit 0d5fd4356af38ca0487979ad614d294a2002911d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.9

This issue was fixed in the openstack/nova 14.0.9 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.8

This issue was fixed in the openstack/nova 15.0.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.