OpenStack Compute (nova)

Boot volume creation failure leaves secondary volume attached to broken server

Bug #1633249 reported by iain MacDonnell on 2016-10-13

This bug affects 3 people

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Medium	Ameed Ashour
Ocata	Fix Committed	Medium	Matt Riedemann
Pike	Fix Committed	Medium	Charlotte Han
Queens	Fix Committed	Medium	Lee Yarwood

Bug Description

Attempt to boot a server with a block device mapping that includes a boot volume created from an image, plus an existing data volume. If the boot-volume creation fails, the data volume is left in state "in-use", attached to the server which is now in "error" state". The user can't detach the volume because of the server's error state. They can delete the server, which then leaves the volume apparently attached to a server that no longer exists. The only way out of this is to ask an administrator to reset the state of the data volume (this option is not available to regular users by default policy).

The easiest way to reproduce this is to attempt to create the boot volume from qcow2 image where the volume size is less than the image (virtual) size.

~$ nova boot --flavor m1.large --availability-zone=imot04-1 --block-device 'id=9e122d18-d7a4-406d-b8f2-446cfddaa7c7,source=image,dest=volume,device=vda,size=5,bootindex=0' --block-device 'id=2e733722-8b19-4bff-bd8d-bb770554582a,source=volume,dest=volume,device=vdb,size=1,bootindex=1' ol4
+--------------------------------------+--------------------------------------------------+
| Property | Value |
+--------------------------------------+--------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | imot04-1 |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | DNTr8MG3kVmC |
| config_drive | |
| created | 2016-10-13T21:54:08Z |
| flavor | m1.large (4) |
| hostId | |
| id | 9541b63c-e003-4bcc-bcb8-5c0461522387 |
| image | Attempt to boot from volume - no image supplied |
| key_name | - |
| metadata | {} |
| name | ol4 |
| os-extended-volumes:volumes_attached | [{"id": "2e733722-8b19-4bff-bd8d-bb770554582a"}] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tenant_id | 66234fea2ccc42398a1ae5300c594d49 |
| updated | 2016-10-13T21:54:08Z |
| user_id | b2ae6b7bdac142ddb708a3550f61d998 |
+--------------------------------------+--------------------------------------------------+

~$ nova volume-detach 9541b63c-e003-4bcc-bcb8-5c0461522387 2e733722-8b19-4bff-bd8d-bb770554582a
ERROR (Conflict): Cannot 'detach_volume' instance 9541b63c-e003-4bcc-bcb8-5c0461522387 while it is in vm_state error (HTTP 409) (Request-ID: req-c2855350-f06b-4c17-b429-87a068eddfb1)

~$ nova delete 9541b63c-e003-4bcc-bcb8-5c0461522387
Request to delete server 9541b63c-e003-4bcc-bcb8-5c0461522387 has been accepted.

~$ nova show 9541b63c-e003-4bcc-bcb8-5c0461522387
ERROR (CommandError): No server with a name or ID of '9541b63c-e003-4bcc-bcb8-5c0461522387' exists.

Tags:

iain MacDonnell (imacdonn) on 2016-10-13

summary:

- Boot volume creation leaves secondary volume attached to broken server
+ Boot volume creation failure leaves secondary volume attached to broken
+ server

Revision history for this message

John Griffith (john-griffith) wrote on 2016-10-19:

Just ran into this while looking at another bug. Specifying a pysical_block_size doesn't seem to work any longer so you end up with a boot failure. The annoying things are that first off even though boot failed the Instance is listed as Active and Up, and then after deteting the failure, and deleting the Instance as reported in this bug the volume is never detached.

Looking at the logs it appears that the cleanup never issues any of the Cinder calls to clean things up.

Revision history for this message

John Griffith (john-griffith) wrote on 2016-10-19:

removing Cinder as Nova never even made the terminate/detach calls to Cinder in this case.

no longer affects:

cinder

Lee Yarwood (lyarwood) on 2016-11-08

Changed in nova:
assignee:	nobody → Lee Yarwood (lyarwood)
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Maciej Szankin (mszankin) wrote on 2017-01-05:

Lee, are you working on this? If not, please change status accordingly.

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Sean Dague (sdague) wrote on 2017-06-23:

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status:	In Progress → Confirmed
assignee:	Lee Yarwood (lyarwood) → nobody

Raghad Qutteneh (raghadq) on 2017-12-10

Changed in nova:
assignee:	nobody → Raghad Qutteneh (raghadq)

Ameed Ashour (ameeda) on 2017-12-15

Changed in nova:
assignee:	Raghad Qutteneh (raghadq) → Ameed Ashour (ameeda)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-15: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/528385

Matt Riedemann (mriedem) on 2018-01-29

tags:

added: volumes

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-14: Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/544143

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-14: Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/544144

OpenStack Infra (hudson-openstack) on 2018-02-15

Changed in nova:
assignee:	Ameed Ashour (ameeda) → Matt Riedemann (mriedem)

Matt Riedemann (mriedem) on 2018-02-15

Changed in nova:
assignee:	Matt Riedemann (mriedem) → Ameed Ashour (ameeda)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-15: Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/545087

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-15: Fix merged to nova (master)

Reviewed: https://review.openstack.org/528385
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=61f6751a1807d3c3ee76d0351d17a82c6e1a915a
Submitter: Zuul
Branch: master

commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

Detach volumes when VM creation fails

    If the boot-volume creation fails, the data volume is left in state
    "in-use", attached to the server which is now in "error" state.
    The user can't detach the volume because of the server's error state.

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

    This change fixes the problem in the compute service such that
    when the creation fails, compute manager detaches the created volumes
    before putting the VM into error state. Then you can delete the instance
    without care about attached volumes.

Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0
Closes-bug: #1633249

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-22: Change abandoned on nova (stable/pike)

#10

Change abandoned by Rong Han (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/544143

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-22: Change abandoned on nova (stable/queens)

#11

Change abandoned by Rong Han (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/544144

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-08: Fix merged to nova (stable/queens)

#12

Reviewed: https://review.openstack.org/544144
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=caf1d292dae225445119729d3a462267d860181a
Submitter: Zuul
Branch: stable/queens

commit caf1d292dae225445119729d3a462267d860181a
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

Detach volumes when VM creation fails

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

    Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0
    Closes-bug: #1633249
    (cherry picked from commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-21: Fix merged to nova (stable/pike)

#13

Reviewed: https://review.openstack.org/544143
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4dbe72f976a67d442fd0e0489cadc3bc605ed012
Submitter: Zuul
Branch: stable/pike

commit 4dbe72f976a67d442fd0e0489cadc3bc605ed012
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

Detach volumes when VM creation fails

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

    Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0
    Closes-bug: #1633249
    (cherry picked from commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a)
    (cherry picked from commit 22164d5118ea04321432432d89877aae91097e81)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-04-02: Fix included in openstack/nova 17.0.2

#14

This issue was fixed in the openstack/nova 17.0.2 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-04-02: Fix included in openstack/nova 16.1.1

#15

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-04-20: Fix included in openstack/nova 18.0.0.0b1

#16

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-04-20: Fix merged to nova (stable/ocata)

#17

Reviewed: https://review.openstack.org/545087
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b188492ca4598af06c3fd4d8b0e905be980a29a3
Submitter: Zuul
Branch: stable/ocata

commit b188492ca4598af06c3fd4d8b0e905be980a29a3
Author: Ameed Ashour <email address hidden>
Date: Wed Jan 24 09:32:24 2018 -0500

Detach volumes when VM creation fails

    They can delete the server, which then leaves the volume apparently
    attached to a server that no longer exists, which is being fixed
    separately: https://review.openstack.org/#/c/340614/

    The only way out of this is to ask an administrator to reset the state of
    the data volume (this option is not available to regular users by
    default policy).

Conflicts:
nova/compute/manager.py

    NOTE(mriedem): The conflict in _delete_instance is due to restructuring
    the method in I9269ffa2b80e48db96c622d0dc0817738854f602 in Pike. Also
    note that _LW has to be used for the warning message since those
    translation markers are still required in Ocata.