Compute node HA for ironic doesn't work due to the name duplication of Resource Provider
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Ironic |
Invalid
|
Critical
|
Dmitry Tantsur | |
| OpenStack Compute (nova) |
High
|
John Garbutt | ||
| Ocata |
High
|
Jay Pipes | ||
| Pike |
High
|
Matt Riedemann |
Bug Description
Description
===========
In an environment where there are multiple compute nodes with ironic driver,
when a compute node goes down, another compute node cannot take over ironic nodes.
Steps to reproduce
==================
1. Start multiple compute nodes with ironic driver.
2. Register one node to ironic.
2. Stop a compute node which manages the ironic node.
3. Create an instance.
Expected result
===============
The instance is created.
Actual result
=============
The instance creation is failed.
Environment
===========
1. Exact version of OpenStack you are running.
openstack-
openstack-
python2-
openstack-
openstack-
openstack-
python-
openstack-
openstack-
openstack-
2. Which hypervisor did you use?
ironic
Details
=======
When a nova-compute goes down, another nova-compute will take over ironic nodes managed by the failed nova-compute by re-balancing a hash-ring. Then the active nova-compute tries creating a
new resource provider with a new ComputeNode object UUID and the hypervisor name (ironic node UUID)[1][2][3]. This creation fails with a conflict(409) since there is a resource provider with the same name created by the failed nova-compute.
When a new instance is requested, the scheduler gets only an old resource provider for the ironic node[4]. Then, the ironic node is not selected:
WARNING nova.scheduler.
[1] https:/
[2] https:/
[3] https:/
[4] https:/
tags: | added: ironic placement |
description: | updated |
Chris Dent (cdent) wrote : | #1 |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → High |
Mark Goddard (mgoddard) wrote : | #2 |
I believe I am also seeing this issue. First, a little about the environment.
The control plane is containerised, and deployed using an Ocata release of kolla-ansible. The base OS and container OS are both CentOS 7.3. The RDO nova compute package is openstack-
I have seen the issue twice now, and as Hironori described, the main user visible symptom is that one of the ironic nodes becomes unschedulable. Digging into the logs, the compute service to which the ironic node has been mapped shows the following messages occurring every minute:
2017-09-13 09:49:42.618 7 INFO nova.scheduler.
2017-09-13 09:49:42.631 7 WARNING nova.scheduler.
2017-09-13 09:49:42.689 7 DEBUG nova.compute.
2017-09-13 09:49:42.690 7 INFO nova.compute.
2017-09-13 09:49:42.691 7 DEBUG nova.compute.
The placement logs are fairly lacking in useful information, even with logging set to debug. Picking out the relevant logs:
2017-09-13 09:51:43.604 20 DEBUG nova.api.
2017-09-13 09:51:43.612 20 INFO nova.api.
We can see here that the scheduler client first tries to GET the resource_provider for compute node 22787651-
Looking at the DB for further info, here's the troublesome RP:
+------
Mark Goddard (mgoddard) wrote : | #3 |
Staring at the nova code a little longer, I think I've pieced together what happened.
* An instance was aborted during creation.
* Destroying the instance failed because the node was locked (possibly due to a long running neutron port update), and the retry mechanism maxed out.
* Shortly afterwards, during the compute service's update_
* Deleting the resource provider for the compute node failed because allocations still existed from the instance that wasn't cleaned up.
This raises a question, why was the compute node seen to be orphaned? This happened because the ironic virt driver did not include the node in the list returned by get_available_
Another point worth mentioning is that I ended up deleting the stale resource provider in the DB, and the compute service created another, allowing things to return to normal.
Mark Goddard (mgoddard) wrote : | #4 |
Actually, looking at the ironic logs, the node was being provisioned at the time of the abort and the conductor had locked the node as it was handling the IPA callback.
Eric Fried (efried) wrote : | #5 |
Pursuant to comment #3, why doesn't the new compute host react to the 409 by deleting the old RP?
Eric Fried (efried) wrote : | #6 |
Oh, I think I understand: at the moment, this code is out of the control of ironic itself; it's all handled by the resource tracker.
So... why does the new compute host try to create the node RP with a *different* UUID? Doesn't the node have an immutable UUID that ought to be used no matter which host is doing the registration, or when, or why?
John Garbutt (johngarbutt) wrote : | #7 |
The ironic driver just passes the node uuids up to Nova:
https:/
Its the resource tracker that decides to create resource providers from that. In production pike cloud, I see the RP name matching the ironic node uuid. Not sure where the other uuid came from :S
John Garbutt (johngarbutt) wrote : | #8 |
Oh wait, its totally the compute node uuid. RP uuid=computeNode and name=ironic node uuid
Nova creates a new compute node uuid for the same ironic node uuid, because its on a different nova-compute node. That works fine see:
https:/
The ComputeNode unique constraint simply checks ('host', 'hypervisor_
If you restarted the dead n-cpu node (yet, I know, that is stupid), it would delete the orphan compute nodes, but in this case it is dead, so that left in the DB.
But on placement side, we have the (new-uuid, existing-node-uuid) pair, and it fails the unique constraint on the name:
https:/
... So someone needs to delete the old resource provider, I guess the logic that creates the new one could check to see if the name is used by some other node, and delete it as required, and re-assign all the allocations to the new node, etc... Yuck.
Related fix proposed to branch: master
Review: https:/
John Garbutt (johngarbutt) wrote : | #10 |
So idea for brute force fix...
On above change, we detect the conflict. We see if the name is duplicated, if it is we delete the old resource provider, to do that we probably have to delete all its allocations first, possible also its inventory, etc.
Once that is all done, we create a new resource provider, then add the appropriate allocations.
... But now we have a race where another instance will come in an try to get allocations on the new thing, just after we create it, but before we add back in the allocations. Boom, dang, the sky is falling down.
Now in the virt driver we know the node has the instance_uuid is set, so probably not a total disaster, but it seems a nasty race.
... not sure where to go with this.
John Garbutt (johngarbutt) wrote : | #11 |
... another possible fix is to change the ComputeNode.uuid to match the ironic_node.uuid, but we hit the same race above on the first restart after we make this change, as we delete all the auto-generated ComputeNode.uuids and update them to ironic_node.uuids. Although at least we only hit the nasty re-calculate race once... Yuck again.
John Garbutt (johngarbutt) wrote : | #12 |
Better idea: after rebalance, we must create compute nodes, lets check other compute nodes don't already exist, and either create or re-purpose depending on what happens.
Changed in nova: | |
assignee: | nobody → John Garbutt (johngarbutt) |
status: | Confirmed → In Progress |
Fix proposed to branch: master
Review: https:/
Change abandoned by John Garbutt (<email address hidden>) on branch: master
Review: https:/
Fix proposed to branch: stable/pike
Review: https:/
Dmitry Tantsur (divius) wrote : | #16 |
This started breaking ironic multinode CI, see investigation in https:/
Changed in ironic: | |
status: | New → Triaged |
importance: | Undecided → Critical |
Changed in nova: | |
assignee: | John Garbutt (johngarbutt) → Dmitry Tantsur (divius) |
Changed in ironic: | |
status: | Triaged → In Progress |
assignee: | nobody → Dmitry Tantsur (divius) |
Changed in nova: | |
assignee: | Dmitry Tantsur (divius) → Matt Riedemann (mriedem) |
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → John Garbutt (johngarbutt) |
Changed in ironic: | |
status: | In Progress → Invalid |
Fix proposed to branch: stable/pike
Review: https:/
Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/pike
Review: https:/
Reason: This is using the wrong change ID. I've re-proposed the cherry pick here:
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit e3c5e22d1fde7ca
Author: John Garbutt <email address hidden>
Date: Fri Sep 29 15:48:54 2017 +0100
Re-use existing ComputeNode on ironic rebalance
When a nova-compute service dies that is one of several ironic based
nova-compute services running, a node rebalance occurs to ensure there
is still an active nova-compute service dealing with requests for the
given instance that is running.
Today, when this occurs, we create a new ComputeNode entry. This change
alters that logic to detect the case of the ironic node rebalance and in
that case we re-use the existing ComputeNode entry, simply updating the
host field to match the new host it has been rebalanced onto.
Previously we hit problems with placement when we get a new
ComputeNode
existing entry keeps the ComputeNode.uuid the same when the rebalance of
the ComputeNode occurs.
Without keeping the same ComputeNode.uuid placement errors out with a 409
because we attempt to create a ResourceProvider that has the same name
as an existing ResourceProvdier. Had that worked, we would have noticed
the race that occurs after we create the ResourceProvider but before we
add back the existing allocations for existing instances. Keeping the
ComputeNode
ResourcePro
Closes-Bug: #1714248
Co-Authored-By: Dmitry Tantsur <email address hidden>
Change-Id: I4253cffca3dbf5
Changed in nova: | |
status: | In Progress → Fix Released |
This issue was fixed in the openstack/nova 17.0.0.0b3 development milestone.
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit e95277fa3eba86a
Author: John Garbutt <email address hidden>
Date: Fri Sep 29 15:48:54 2017 +0100
Re-use existing ComputeNode on ironic rebalance
When a nova-compute service dies that is one of several ironic based
nova-compute services running, a node rebalance occurs to ensure there
is still an active nova-compute service dealing with requests for the
given instance that is running.
Today, when this occurs, we create a new ComputeNode entry. This change
alters that logic to detect the case of the ironic node rebalance and in
that case we re-use the existing ComputeNode entry, simply updating the
host field to match the new host it has been rebalanced onto.
Previously we hit problems with placement when we get a new
ComputeNode
existing entry keeps the ComputeNode.uuid the same when the rebalance of
the ComputeNode occurs.
Without keeping the same ComputeNode.uuid placement errors out with a 409
because we attempt to create a ResourceProvider that has the same name
as an existing ResourceProvdier. Had that worked, we would have noticed
the race that occurs after we create the ResourceProvider but before we
add back the existing allocations for existing instances. Keeping the
ComputeNode
ResourcePro
Closes-Bug: #1714248
Co-Authored-By: Dmitry Tantsur <email address hidden>
Change-Id: I4253cffca3dbf5
(cherry picked from commit e3c5e22d1fde7ca
This issue was fixed in the openstack/nova 16.1.1 release.
Fix proposed to branch: stable/ocata
Review: https:/
Change abandoned by Jay Pipes (<email address hidden>) on branch: stable/ocata
Review: https:/
Reason: Need to start over with -x
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ocata
commit 028f03c8c3d3dac
Author: John Garbutt <email address hidden>
Date: Fri Sep 29 15:48:54 2017 +0100
Re-use existing ComputeNode on ironic rebalance
When a nova-compute service dies that is one of several ironic based
nova-compute services running, a node rebalance occurs to ensure there
is still an active nova-compute service dealing with requests for the
given instance that is running.
Today, when this occurs, we create a new ComputeNode entry. This change
alters that logic to detect the case of the ironic node rebalance and in
that case we re-use the existing ComputeNode entry, simply updating the
host field to match the new host it has been rebalanced onto.
Previously we hit problems with placement when we get a new
ComputeNode
existing entry keeps the ComputeNode.uuid the same when the rebalance of
the ComputeNode occurs.
Without keeping the same ComputeNode.uuid placement errors out with a 409
because we attempt to create a ResourceProvider that has the same name
as an existing ResourceProvdier. Had that worked, we would have noticed
the race that occurs after we create the ResourceProvider but before we
add back the existing allocations for existing instances. Keeping the
ComputeNode
ResourcePro
Conflicts:
Conflicts were small, primarily due to the addition in Pike of the virt
driver's requires_
not used in Ocata.
Closes-Bug: #1714248
Co-Authored-By: Dmitry Tantsur <email address hidden>
Change-Id: I4253cffca3dbf5
(cherry picked from commit e3c5e22d1fde7ca
(cherry picked from commit e95277fa3eba86a
This issue was fixed in the openstack/nova 15.1.5 release.
This isn't the first time we've seen something like this. I wonder if we should think about what the impact would be if we removed the uniq requirement on the name field of a resource provider. It seems like it will inevitably cause problems as people/services start doing things with placement that span arbitrary boundaries (like time in this case) that matter to the client side, but are meaningless to placement.