API allows source compute service/node deletion while instances are pending a resize confirm/revert
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
Medium
|
Matt Riedemann | ||
| Queens |
Low
|
Lee Yarwood | ||
| Rocky |
Medium
|
Matt Riedemann | ||
| Stein |
Medium
|
Matt Riedemann | ||
| Train |
Medium
|
Matt Riedemann |
Bug Description
This is split off from bug 1829479 which is about deleting a compute service which had servers evacuated from it which will orphan resource providers in placement.
A similar scenario is true where the API will allow deleting a source compute service which has migration-based allocations for the source node resource provider and pending instance resizes involving the source node. A simple scenario is:
1. create a server on host1
2. resize or cold migrate it to a dest host2
3. delete the compute service for host1
At this point the resource provider for host1 is orphaned.
4. try to confirm/revert the resize of the server which will fail because the compute node for host1 is gone and this results in the server going to ERROR status
Based on the discussion in this mailing list thread:
http://
We should probably have the DELETE /os-services/
Changed in nova: | |
status: | New → Triaged |
importance: | Undecided → Medium |
OpenStack Infra (hudson-openstack) wrote : | #2 |
Related fix proposed to branch: master
Review: https:/
Matt Riedemann (mriedem) wrote : | #3 |
This goes back further than Rocky but since Queens is in extended maintenance mode upstream I figure it's best to just focus on Rocky+ for now.
Changed in nova: | |
assignee: | nobody → Matt Riedemann (mriedem) |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
status: | Triaged → In Progress |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 94d3743b185d22c
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 11:38:07 2019 -0500
Add functional recreate test for bug 1852610
It is possible to delete a source compute service which has
pending migration-based allocations and servers in VERIFY_RESIZE
status. Doing so deletes the compute service and compute node
but orphans the source node resource provider along with its
resource allocations held by the migration record while there
is a pending resized server.
This adds a simple cold migrate test which deletes the source
compute service while the server is in VERIFY_RESIZE status and
then tries to confirm the resize which fails.
Change-Id: I644608b4e197dd
Related-Bug: #1852610
Related fix proposed to branch: stable/train
Review: https:/
OpenStack Infra (hudson-openstack) wrote : | #7 |
Related fix proposed to branch: stable/train
Review: https:/
Fix proposed to branch: stable/train
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit f7dde6054e55975
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 12:16:53 2019 -0500
Add functional recreate revert resize test for bug 1852610
This builds on I644608b4e197dd
adds a revert resize test which deletes the source compute service
while the server is in VERIFY_RESIZE status and then reverts the
resize. The results are a bit different from the confirm scenario
because the confirm fails while the revert actually works which
is more dumb luck based on where the compute service drops the
move claim during the revert process (on the dest which still exists
rather than the source).
Change-Id: I2dcb1cb3e1f8ed
Related-Bug: #1852610
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 92fed026103b47f
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 14:19:26 2019 -0500
Block deleting compute services with in-progress migrations
This builds on I0bd63b655ad3d3
which made DELETE /os-services/
response if the host has instances on it. This change checks
for in-progress migrations involving the nodes on the host,
either as the source or destination nodes, and returns a 409
error response if any are found.
Failling to do this can lead to orphaned resource providers
in placement and also failing to properly confirm or revert
a pending resize or cold migration.
A release note is included for the (justified) behavior
change in the API. A new microversion should not be required
for this since admins should not have to opt out of broken
behavior.
Change-Id: I70e06c607045a1
Closes-Bug: #1852610
Changed in nova: | |
status: | In Progress → Fix Released |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit 28d76cc7ae5c86d
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 11:38:07 2019 -0500
Add functional recreate test for bug 1852610
It is possible to delete a source compute service which has
pending migration-based allocations and servers in VERIFY_RESIZE
status. Doing so deletes the compute service and compute node
but orphans the source node resource provider along with its
resource allocations held by the migration record while there
is a pending resized server.
This adds a simple cold migrate test which deletes the source
compute service while the server is in VERIFY_RESIZE status and
then tries to confirm the resize which fails.
Change-Id: I644608b4e197dd
Related-Bug: #1852610
(cherry picked from commit 94d3743b185d22c
tags: | added: in-stable-train |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit 3774952410f98bf
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 12:16:53 2019 -0500
Add functional recreate revert resize test for bug 1852610
This builds on I644608b4e197dd
adds a revert resize test which deletes the source compute service
while the server is in VERIFY_RESIZE status and then reverts the
resize. The results are a bit different from the confirm scenario
because the confirm fails while the revert actually works which
is more dumb luck based on where the compute service drops the
move claim during the revert process (on the dest which still exists
rather than the source).
Change-Id: I2dcb1cb3e1f8ed
Related-Bug: #1852610
(cherry picked from commit f7dde6054e55975
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit a9650b3cbfc674e
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 14:19:26 2019 -0500
Block deleting compute services with in-progress migrations
This builds on I0bd63b655ad3d3
which made DELETE /os-services/
response if the host has instances on it. This change checks
for in-progress migrations involving the nodes on the host,
either as the source or destination nodes, and returns a 409
error response if any are found.
Failling to do this can lead to orphaned resource providers
in placement and also failing to properly confirm or revert
a pending resize or cold migration.
A release note is included for the (justified) behavior
change in the API. A new microversion should not be required
for this since admins should not have to opt out of broken
behavior.
Conflicts:
NOTE(mriedem): The conflict is due to change
Iec61f56c05
NOTE(mriedem): services.py had to be updated to add the LOG
variable since change I8403a841f21a62
is not in Train.
Change-Id: I70e06c607045a1
Closes-Bug: #1852610
(cherry picked from commit 92fed026103b47f
Related fix proposed to branch: stable/stein
Review: https:/
Related fix proposed to branch: stable/stein
Review: https:/
Fix proposed to branch: stable/stein
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit 7d673872462f53d
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 11:38:07 2019 -0500
Add functional recreate test for bug 1852610
It is possible to delete a source compute service which has
pending migration-based allocations and servers in VERIFY_RESIZE
status. Doing so deletes the compute service and compute node
but orphans the source node resource provider along with its
resource allocations held by the migration record while there
is a pending resized server.
This adds a simple cold migrate test which deletes the source
compute service while the server is in VERIFY_RESIZE status and
then tries to confirm the resize which fails.
Conflicts:
NOTE(mriedem): The conflict is due to change
If32bca0701
Change-Id: I644608b4e197dd
Related-Bug: #1852610
(cherry picked from commit 94d3743b185d22c
(cherry picked from commit 28d76cc7ae5c86d
tags: | added: in-stable-stein |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit 9983b2462401176
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 12:16:53 2019 -0500
Add functional recreate revert resize test for bug 1852610
This builds on I644608b4e197dd
adds a revert resize test which deletes the source compute service
while the server is in VERIFY_RESIZE status and then reverts the
resize. The results are a bit different from the confirm scenario
because the confirm fails while the revert actually works which
is more dumb luck based on where the compute service drops the
move claim during the revert process (on the dest which still exists
rather than the source).
Conflicts:
NOTE(mriedem): The conflict is due to change
If32bca0701
Change-Id: I2dcb1cb3e1f8ed
Related-Bug: #1852610
(cherry picked from commit f7dde6054e55975
(cherry picked from commit 3774952410f98bf
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit a0290858b717b4c
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 14:19:26 2019 -0500
Block deleting compute services with in-progress migrations
This builds on I0bd63b655ad3d3
which made DELETE /os-services/
response if the host has instances on it. This change checks
for in-progress migrations involving the nodes on the host,
either as the source or destination nodes, and returns a 409
error response if any are found.
Failling to do this can lead to orphaned resource providers
in placement and also failing to properly confirm or revert
a pending resize or cold migration.
A release note is included for the (justified) behavior
change in the API. A new microversion should not be required
for this since admins should not have to opt out of broken
behavior.
Conflicts:
NOTE(mriedem): The conflict in services.py is due to not
having change I9d257a003d315b
The conflict in integrated_
I4aac187283
Ibeb16ce162
test_services does not use _confirm_resize but just inlines the
call and wait for ACTIVE status in the test. The conflict in
test_
If32bca0701
Change-Id: I70e06c607045a1
Closes-Bug: #1852610
(cherry picked from commit 92fed026103b47f
(cherry picked from commit a9650b3cbfc674e
Related fix proposed to branch: stable/rocky
Review: https:/
Related fix proposed to branch: stable/rocky
Review: https:/
Fix proposed to branch: stable/rocky
Review: https:/
Related fix proposed to branch: stable/queens
Review: https:/
Related fix proposed to branch: stable/queens
Review: https:/
Fix proposed to branch: stable/queens
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 1563a15c8b4bcf1
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 11:38:07 2019 -0500
Add functional recreate test for bug 1852610
It is possible to delete a source compute service which has
pending migration-based allocations and servers in VERIFY_RESIZE
status. Doing so deletes the compute service and compute node
but orphans the source node resource provider along with its
resource allocations held by the migration record while there
is a pending resized server.
This adds a simple cold migrate test which deletes the source
compute service while the server is in VERIFY_RESIZE status and
then tries to confirm the resize which fails.
NOTE(mriedem): A couple of methods are lifted from ServerMovingTests
since change Ie991d4b53e9bb5
Change-Id: I644608b4e197dd
Related-Bug: #1852610
(cherry picked from commit 94d3743b185d22c
(cherry picked from commit 28d76cc7ae5c86d
(cherry picked from commit 7d673872462f53d
tags: | added: in-stable-rocky |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit b6b2b3a35e1b954
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 12:16:53 2019 -0500
Add functional recreate revert resize test for bug 1852610
This builds on I644608b4e197dd
adds a revert resize test which deletes the source compute service
while the server is in VERIFY_RESIZE status and then reverts the
resize. The results are a bit different from the confirm scenario
because the confirm fails while the revert actually works which
is more dumb luck based on where the compute service drops the
move claim during the revert process (on the dest which still exists
rather than the source).
Conflicts:
NOTE(mriedem): The conflict is due to not having change
Ie991d4b53e
Change-Id: I2dcb1cb3e1f8ed
Related-Bug: #1852610
(cherry picked from commit f7dde6054e55975
(cherry picked from commit 3774952410f98bf
(cherry picked from commit 9983b2462401176
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 30a635068512be5
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 14:19:26 2019 -0500
Block deleting compute services with in-progress migrations
This builds on I0bd63b655ad3d3
which made DELETE /os-services/
response if the host has instances on it. This change checks
for in-progress migrations involving the nodes on the host,
either as the source or destination nodes, and returns a 409
error response if any are found.
Failling to do this can lead to orphaned resource providers
in placement and also failing to properly confirm or revert
a pending resize or cold migration.
A release note is included for the (justified) behavior
change in the API. A new microversion should not be required
for this since admins should not have to opt out of broken
behavior.
Conflicts:
NOTE(mriedem): The conflict is due to not having change
Ie991d4b53e
Change-Id: I70e06c607045a1
Closes-Bug: #1852610
(cherry picked from commit 92fed026103b47f
(cherry picked from commit a9650b3cbfc674e
(cherry picked from commit a0290858b717b4c
This issue was fixed in the openstack/nova 20.1.0 release.
This issue was fixed in the openstack/nova 19.1.0 release.
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit 922098044b37c66
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 11:38:07 2019 -0500
Add functional recreate test for bug 1852610
It is possible to delete a source compute service which has
pending migration-based allocations and servers in VERIFY_RESIZE
status. Doing so deletes the compute service and compute node
but orphans the source node resource provider along with its
resource allocations held by the migration record while there
is a pending resized server.
This adds a simple cold migrate test which deletes the source
compute service while the server is in VERIFY_RESIZE status and
then tries to confirm the resize which fails.
Conflicts:
NOTE(mriedem): The conflict is due to not having change
Iea28332212
the helper methods are moved from ServerMovingTests to
ProviderUsa
Change-Id: I644608b4e197dd
Related-Bug: #1852610
(cherry picked from commit 94d3743b185d22c
(cherry picked from commit 28d76cc7ae5c86d
(cherry picked from commit 7d673872462f53d
(cherry picked from commit 1563a15c8b4bcf1
tags: | added: in-stable-queens |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit 917b5d383829851
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 12:16:53 2019 -0500
Add functional recreate revert resize test for bug 1852610
This builds on I644608b4e197dd
adds a revert resize test which deletes the source compute service
while the server is in VERIFY_RESIZE status and then reverts the
resize. The results are a bit different from the confirm scenario
because the confirm fails while the revert actually works which
is more dumb luck based on where the compute service drops the
move claim during the revert process (on the dest which still exists
rather than the source).
Conflicts:
NOTE(mriedem): The conflict is due to not having change
Iea28332212
the _resize_
ProviderUsa
Change-Id: I2dcb1cb3e1f8ed
Related-Bug: #1852610
(cherry picked from commit f7dde6054e55975
(cherry picked from commit 3774952410f98bf
(cherry picked from commit 9983b2462401176
(cherry picked from commit b6b2b3a35e1b954
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit d88f353796813bf
Author: Matt Riedemann <email address hidden>
Date: Thu Nov 14 14:19:26 2019 -0500
Block deleting compute services with in-progress migrations
This builds on I0bd63b655ad3d3
which made DELETE /os-services/
response if the host has instances on it. This change checks
for in-progress migrations involving the nodes on the host,
either as the source or destination nodes, and returns a 409
error response if any are found.
Failling to do this can lead to orphaned resource providers
in placement and also failing to properly confirm or revert
a pending resize or cold migration.
A release note is included for the (justified) behavior
change in the API. A new microversion should not be required
for this since admins should not have to opt out of broken
behavior.
Conflicts:
NOTE(mriedem): The conflict is due to not having change
Iea28332212
_revert_resize is added to ProviderUsageBa
within test_servers.py.
Change-Id: I70e06c607045a1
Closes-Bug: #1852610
(cherry picked from commit 92fed026103b47f
(cherry picked from commit a9650b3cbfc674e
(cherry picked from commit a0290858b717b4c
(cherry picked from commit 30a635068512be5
This issue was fixed in the openstack/nova 18.3.0 release.
Related fix proposed to branch: master /review. opendev. org/694351
Review: https:/