BDMNotFound raised and stale block devices left over when simultaneously reboot and deleting an instance
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
Undecided
|
Lee Yarwood | ||
| Queens |
Undecided
|
Lee Yarwood | ||
| Rocky |
Undecided
|
Lee Yarwood | ||
| Stein |
Undecided
|
Lee Yarwood | ||
| Train |
Undecided
|
Lee Yarwood |
Bug Description
Description
===========
Simultaneous requests to reboot and delete an instance _will_ race as only the call to delete takes a lock against the instance.uuid.
One possible outcome of this seen in the wild with the Libvirt driver is that the request to soft reboot will eventually turn into a hard reboot, reconnecting volumes that the delete request has already disconnected. These volumes will eventually be unmapped on the Cinder side by the delete request leaving stale devices on the host. Additionally BDMNotFound is raised by the reboot operation as the delete operation has already deleted the BDMs.
Steps to reproduce
==================
$ nova reboot $instance && nova delete $instance
Expected result
===============
The instance reboots and is then deleted without any errors raised.
Actual result
=============
BDMNotFound raised and stale block devices left over.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://
1599e3cf68779ea
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + QEMU/kvm
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
Changed in nova: | |
assignee: | nobody → Lee Yarwood (lyarwood) |
status: | New → In Progress |
Changed in nova: | |
status: | In Progress → Fix Released |
Fix proposed to branch: stable/train
Review: https:/
Fix proposed to branch: stable/stein
Review: https:/
Fix proposed to branch: stable/rocky
Review: https:/
Fix proposed to branch: stable/queens
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit 939cd9b177db8f1
Author: Lee Yarwood <email address hidden>
Date: Mon Jul 29 16:25:45 2019 +0100
compute: Take an instance.uuid lock when rebooting
Previously simultaneous requests to reboot and delete an instance could
race as only the latter took a lock against the uuid of the instance.
With the Libvirt driver this race could potentially result in attempts
being made to reconnect previously disconnected volumes on the host.
Depending on the volume backend being used this could then result in
stale block devices point to unmapped volumes being left on the host
that in turn could cause failures later on when connecting newly mapped
volumes.
This change avoids this race by ensuring any request to reboot an
instance takes an instance.uuid lock within the compute manager,
serialising requests to reboot and then delete the instance.
Closes-Bug: #1838392
Change-Id: Ieb59de10c63bb0
(cherry picked from commit 9ad54f3dacbd372
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit 304d3f62a4e3bdb
Author: Lee Yarwood <email address hidden>
Date: Mon Jul 29 16:25:45 2019 +0100
compute: Take an instance.uuid lock when rebooting
Previously simultaneous requests to reboot and delete an instance could
race as only the latter took a lock against the uuid of the instance.
With the Libvirt driver this race could potentially result in attempts
being made to reconnect previously disconnected volumes on the host.
Depending on the volume backend being used this could then result in
stale block devices point to unmapped volumes being left on the host
that in turn could cause failures later on when connecting newly mapped
volumes.
This change avoids this race by ensuring any request to reboot an
instance takes an instance.uuid lock within the compute manager,
serialising requests to reboot and then delete the instance.
Closes-Bug: #1838392
Change-Id: Ieb59de10c63bb0
(cherry picked from commit 9ad54f3dacbd372
(cherry picked from commit 939cd9b177db8f1
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 7d14b6a5170821c
Author: Lee Yarwood <email address hidden>
Date: Mon Jul 29 16:25:45 2019 +0100
compute: Take an instance.uuid lock when rebooting
Previously simultaneous requests to reboot and delete an instance could
race as only the latter took a lock against the uuid of the instance.
With the Libvirt driver this race could potentially result in attempts
being made to reconnect previously disconnected volumes on the host.
Depending on the volume backend being used this could then result in
stale block devices point to unmapped volumes being left on the host
that in turn could cause failures later on when connecting newly mapped
volumes.
This change avoids this race by ensuring any request to reboot an
instance takes an instance.uuid lock within the compute manager,
serialising requests to reboot and then delete the instance.
Closes-Bug: #1838392
Change-Id: Ieb59de10c63bb0
(cherry picked from commit 9ad54f3dacbd372
(cherry picked from commit 939cd9b177db8f1
(cherry picked from commit 304d3f62a4e3bdb
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit 16fb8ac3f4c2fe9
Author: Lee Yarwood <email address hidden>
Date: Mon Jul 29 16:25:45 2019 +0100
compute: Take an instance.uuid lock when rebooting
Previously simultaneous requests to reboot and delete an instance could
race as only the latter took a lock against the uuid of the instance.
With the Libvirt driver this race could potentially result in attempts
being made to reconnect previously disconnected volumes on the host.
Depending on the volume backend being used this could then result in
stale block devices point to unmapped volumes being left on the host
that in turn could cause failures later on when connecting newly mapped
volumes.
This change avoids this race by ensuring any request to reboot an
instance takes an instance.uuid lock within the compute manager,
serialising requests to reboot and then delete the instance.
Closes-Bug: #1838392
Change-Id: Ieb59de10c63bb0
(cherry picked from commit 9ad54f3dacbd372
(cherry picked from commit 939cd9b177db8f1
(cherry picked from commit 304d3f62a4e3bdb
(cherry picked from commit 7d14b6a5170821c
This issue was fixed in the openstack/nova 20.1.0 release.
This issue was fixed in the openstack/nova 19.1.0 release.
This issue was fixed in the openstack/nova 18.3.0 release.
Reviewed: https:/ /review. opendev. org/673463 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=9ad54f3dacb d372271f441baea 5380f913072dde
Committed: https:/
Submitter: Zuul
Branch: master
commit 9ad54f3dacbd372 271f441baea5380 f913072dde
Author: Lee Yarwood <email address hidden>
Date: Mon Jul 29 16:25:45 2019 +0100
compute: Take an instance.uuid lock when rebooting
Previously simultaneous requests to reboot and delete an instance could
race as only the latter took a lock against the uuid of the instance.
With the Libvirt driver this race could potentially result in attempts
being made to reconnect previously disconnected volumes on the host.
Depending on the volume backend being used this could then result in
stale block devices point to unmapped volumes being left on the host
that in turn could cause failures later on when connecting newly mapped
volumes.
This change avoids this race by ensuring any request to reboot an
instance takes an instance.uuid lock within the compute manager,
serialising requests to reboot and then delete the instance.
Closes-Bug: #1838392 67f92ec05453576 6cdd722dae2
Change-Id: Ieb59de10c63bb0