Boot volume creation failure leaves secondary volume attached to broken server

Bug #1633249 reported by iain MacDonnell
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Ameed Ashour
Ocata
Fix Committed
Medium
Matt Riedemann
Pike
Fix Committed
Medium
Charlotte Han
Queens
Fix Committed
Medium
Lee Yarwood

Bug Description

Attempt to boot a server with a block device mapping that includes a boot volume created from an image, plus an existing data volume. If the boot-volume creation fails, the data volume is left in state "in-use", attached to the server which is now in "error" state". The user can't detach the volume because of the server's error state. They can delete the server, which then leaves the volume apparently attached to a server that no longer exists. The only way out of this is to ask an administrator to reset the state of the data volume (this option is not available to regular users by default policy).

The easiest way to reproduce this is to attempt to create the boot volume from qcow2 image where the volume size is less than the image (virtual) size.

 ~$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| 2e733722-8b19-4bff-bd8d-bb770554582a | available | data | 1 | - | false | |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+

~$ nova boot --flavor m1.large --availability-zone=imot04-1 --block-device 'id=9e122d18-d7a4-406d-b8f2-446cfddaa7c7,source=image,dest=volume,device=vda,size=5,bootindex=0' --block-device 'id=2e733722-8b19-4bff-bd8d-bb770554582a,source=volume,dest=volume,device=vdb,size=1,bootindex=1' ol4
+--------------------------------------+--------------------------------------------------+
| Property | Value |
+--------------------------------------+--------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | imot04-1 |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | DNTr8MG3kVmC |
| config_drive | |
| created | 2016-10-13T21:54:08Z |
| flavor | m1.large (4) |
| hostId | |
| id | 9541b63c-e003-4bcc-bcb8-5c0461522387 |
| image | Attempt to boot from volume - no image supplied |
| key_name | - |
| metadata | {} |
| name | ol4 |
| os-extended-volumes:volumes_attached | [{"id": "2e733722-8b19-4bff-bd8d-bb770554582a"}] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tenant_id | 66234fea2ccc42398a1ae5300c594d49 |
| updated | 2016-10-13T21:54:08Z |
| user_id | b2ae6b7bdac142ddb708a3550f61d998 |
+--------------------------------------+--------------------------------------------------+

~$ cinder list
+--------------------------------------+----------+------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+----------+------+------+-------------+----------+--------------------------------------+
| 2e733722-8b19-4bff-bd8d-bb770554582a | in-use | data | 1 | - | false | 9541b63c-e003-4bcc-bcb8-5c0461522387 |
| a5a9f27b-8c8b-4cd5-bda8-998cc4cc6f32 | creating | | 5 | - | false | |
+--------------------------------------+----------+------+------+-------------+----------+--------------------------------------+

~$ cinder list
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| 2e733722-8b19-4bff-bd8d-bb770554582a | in-use | data | 1 | - | false | 9541b63c-e003-4bcc-bcb8-5c0461522387 |
| a5a9f27b-8c8b-4cd5-bda8-998cc4cc6f32 | error | | 5 | - | false | |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+

~$ nova volume-detach 9541b63c-e003-4bcc-bcb8-5c0461522387 2e733722-8b19-4bff-bd8d-bb770554582a
ERROR (Conflict): Cannot 'detach_volume' instance 9541b63c-e003-4bcc-bcb8-5c0461522387 while it is in vm_state error (HTTP 409) (Request-ID: req-c2855350-f06b-4c17-b429-87a068eddfb1)

~$ nova delete 9541b63c-e003-4bcc-bcb8-5c0461522387
Request to delete server 9541b63c-e003-4bcc-bcb8-5c0461522387 has been accepted.

~$ nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

~$ cinder list
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| 2e733722-8b19-4bff-bd8d-bb770554582a | in-use | data | 1 | - | false | 9541b63c-e003-4bcc-bcb8-5c0461522387 |
| a5a9f27b-8c8b-4cd5-bda8-998cc4cc6f32 | error | | 5 | - | false | |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+

~$ nova show 9541b63c-e003-4bcc-bcb8-5c0461522387
ERROR (CommandError): No server with a name or ID of '9541b63c-e003-4bcc-bcb8-5c0461522387' exists.

Tags: volumes
summary: - Boot volume creation leaves secondary volume attached to broken server
+ Boot volume creation failure leaves secondary volume attached to broken
+ server
Revision history for this message
John Griffith (john-griffith) wrote :

Just ran into this while looking at another bug. Specifying a pysical_block_size doesn't seem to work any longer so you end up with a boot failure. The annoying things are that first off even though boot failed the Instance is listed as Active and Up, and then after deteting the failure, and deleting the Instance as reported in this bug the volume is never detached.

Looking at the logs it appears that the cleanup never issues any of the Cinder calls to clean things up.

Revision history for this message
John Griffith (john-griffith) wrote :

removing Cinder as Nova never even made the terminate/detach calls to Cinder in this case.

no longer affects: cinder
Lee Yarwood (lyarwood)
Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Maciej Szankin (mszankin) wrote :

Lee, are you working on this? If not, please change status accordingly.

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Lee Yarwood (lyarwood) → nobody
Changed in nova:
assignee: nobody → Raghad Qutteneh (raghadq)
Ameed Ashour (ameeda)
Changed in nova:
assignee: Raghad Qutteneh (raghadq) → Ameed Ashour (ameeda)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/528385

Matt Riedemann (mriedem)
tags: added: volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/544143

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/544144

Changed in nova:
assignee: Ameed Ashour (ameeda) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Ameed Ashour (ameeda)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/545087

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/528385
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=61f6751a1807d3c3ee76d0351d17a82c6e1a915a
Submitter: Zuul
Branch: master

commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

    Detach volumes when VM creation fails

    If the boot-volume creation fails, the data volume is left in state
    "in-use", attached to the server which is now in "error" state.
    The user can't detach the volume because of the server's error state.

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

    This change fixes the problem in the compute service such that
    when the creation fails, compute manager detaches the created volumes
    before putting the VM into error state. Then you can delete the instance
    without care about attached volumes.

    Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0
    Closes-bug: #1633249

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by Rong Han (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/544143

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/queens)

Change abandoned by Rong Han (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/544144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/544144
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=caf1d292dae225445119729d3a462267d860181a
Submitter: Zuul
Branch: stable/queens

commit caf1d292dae225445119729d3a462267d860181a
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

    Detach volumes when VM creation fails

    If the boot-volume creation fails, the data volume is left in state
    "in-use", attached to the server which is now in "error" state.
    The user can't detach the volume because of the server's error state.

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

    This change fixes the problem in the compute service such that
    when the creation fails, compute manager detaches the created volumes
    before putting the VM into error state. Then you can delete the instance
    without care about attached volumes.

    Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0
    Closes-bug: #1633249
    (cherry picked from commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/544143
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4dbe72f976a67d442fd0e0489cadc3bc605ed012
Submitter: Zuul
Branch: stable/pike

commit 4dbe72f976a67d442fd0e0489cadc3bc605ed012
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

    Detach volumes when VM creation fails

    If the boot-volume creation fails, the data volume is left in state
    "in-use", attached to the server which is now in "error" state.
    The user can't detach the volume because of the server's error state.

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

    This change fixes the problem in the compute service such that
    when the creation fails, compute manager detaches the created volumes
    before putting the VM into error state. Then you can delete the instance
    without care about attached volumes.

    Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0
    Closes-bug: #1633249
    (cherry picked from commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a)
    (cherry picked from commit 22164d5118ea04321432432d89877aae91097e81)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.2

This issue was fixed in the openstack/nova 17.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/545087
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b188492ca4598af06c3fd4d8b0e905be980a29a3
Submitter: Zuul
Branch: stable/ocata

commit b188492ca4598af06c3fd4d8b0e905be980a29a3
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

    Detach volumes when VM creation fails

    If the boot-volume creation fails, the data volume is left in state
    "in-use", attached to the server which is now in "error" state.
    The user can't detach the volume because of the server's error state.

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

    This change fixes the problem in the compute service such that
    when the creation fails, compute manager detaches the created volumes
    before putting the VM into error state. Then you can delete the instance
    without care about attached volumes.

    Conflicts:
          nova/compute/manager.py

    NOTE(mriedem): The conflict in _delete_instance is due to restructuring
    the method in I9269ffa2b80e48db96c622d0dc0817738854f602 in Pike. Also
    note that _LW has to be used for the warning message since those
    translation markers are still required in Ocata.

    Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0
    Closes-bug: #1633249
    (cherry picked from commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a)
    (cherry picked from commit 22164d5118ea04321432432d89877aae91097e81)
    (cherry picked from commit 4dbe72f976a67d442fd0e0489cadc3bc605ed012)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.1.1

This issue was fixed in the openstack/nova 15.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.