Conductor: fails to clean up networking resources due to _destroy_build_request CantStartEngineError
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
Matt Riedemann | ||
| Newton |
High
|
Unassigned | ||
| Ocata |
High
|
Matt Riedemann | ||
| Pike |
High
|
Matt Riedemann |
Bug Description
If libvirt fails to deploy instance - for example due to problematic vif type being passed. The conductor will fail to clean up resources. This fails with the exception below. This is due to the fact that the cell mapping was not invoked.
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Dec 7 09:12:50 utu1604template nova-conductor[
Changed in nova: | |
assignee: | nobody → Gary Kotton (garyk) |
status: | New → In Progress |
This should be fixed by https:/
Changed in nova: | |
assignee: | Gary Kotton (garyk) → Matt Riedemann (mriedem) |
Matt Riedemann (mriedem) wrote : | #3 |
This was regressed in newton: https:/
Changed in nova: | |
importance: | Undecided → High |
tags: | added: cells |
summary: |
- Conductor: fails to clean up networking resources + Conductor: fails to clean up networking resources due to + _destroy_build_request CantStartEngineError |
Change abandoned by garyk (<email address hidden>) on branch: master
Review: https:/
Reason: https:/
Matt Riedemann (mriedem) wrote : | #5 |
Newton is basically end of life at this point (we were actually supposed to do that in October I think).
While this bug was introduced in Newton, I don't think we need to hold newton-eol up for this bug, because this will only fail if your conductor service is not configured for the API database and I assume that was pretty common config until at least Ocata when we started requiring cells v2 and more so in Pike when we started pushing for split MQ for cells v2 deployments (where you have a top-level conductor service that can talk to the API DB and a conductor service per cell with it's own MQ and DB, and was isolated from the top-level MQ and API DB).
So in the interest of newton-eol, I'm not going to hold that up for this.
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → Ed Leafe (ed-leafe) |
Changed in nova: | |
assignee: | Ed Leafe (ed-leafe) → Matt Riedemann (mriedem) |
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → Ed Leafe (ed-leafe) |
Changed in nova: | |
assignee: | Ed Leafe (ed-leafe) → Matt Riedemann (mriedem) |
Fix proposed to branch: stable/pike
Review: https:/
Fix proposed to branch: stable/ocata
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit f8747407fc6ac0d
Author: Matt Riedemann <email address hidden>
Date: Mon Dec 18 17:41:26 2017 -0500
Don't try to delete build request during a reschedule
If populate_retry failed because of MaxRetriesExceeded,
don't try to delete build requests because they should
already be gone from the initial create attempt, plus
we should assume the cell conductor can't reach the API
database anyway.
Similar for hitting NoValidHost during a reschedule. We
can tell if we're doing a reschedule by the num_attempts
value in filter_properties, populated via populate_retry,
which will be >1 during a reschedule.
Change-Id: I0b3ec6bb098ca3
Closes-Bug: #1736946
Changed in nova: | |
status: | In Progress → Fix Released |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit 96acf3db0bc9eca
Author: Matt Riedemann <email address hidden>
Date: Mon Dec 18 17:41:26 2017 -0500
Don't try to delete build request during a reschedule
If populate_retry failed because of MaxRetriesExceeded,
don't try to delete build requests because they should
already be gone from the initial create attempt, plus
we should assume the cell conductor can't reach the API
database anyway.
Similar for hitting NoValidHost during a reschedule. We
can tell if we're doing a reschedule by the num_attempts
value in filter_properties, populated via populate_retry,
which will be >1 during a reschedule.
Change-Id: I0b3ec6bb098ca3
Closes-Bug: #1736946
(cherry picked from commit cf88a27c6250043
Gary Kotton (garyk) wrote : | #10 |
The bug still occurs when an invalid vnic is returned by neutron. The code from Matt does not address it
This issue was fixed in the openstack/nova 17.0.0.0b3 development milestone.
This issue was fixed in the openstack/nova 16.1.0 release.
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ocata
commit f70119c842958e7
Author: Matt Riedemann <email address hidden>
Date: Mon Dec 18 17:41:26 2017 -0500
Don't try to delete build request during a reschedule
If populate_retry failed because of MaxRetriesExceeded,
don't try to delete build requests because they should
already be gone from the initial create attempt, plus
we should assume the cell conductor can't reach the API
database anyway.
Similar for hitting NoValidHost during a reschedule. We
can tell if we're doing a reschedule by the num_attempts
value in filter_properties, populated via populate_retry,
which will be >1 during a reschedule.
Change-Id: I0b3ec6bb098ca3
Closes-Bug: #1736946
(cherry picked from commit cf88a27c6250043
(cherry picked from commit 96acf3db0bc9eca
This issue was fixed in the openstack/nova 15.1.1 release.
Fix proposed to branch: master /review. openstack. org/526356
Review: https:/