Ironic computes may not be discovered when node count is less than compute count
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Dan Smith | ||
Pike |
Fix Released
|
Medium
|
Dan Smith | ||
Queens |
Fix Released
|
Medium
|
Dan Smith | ||
tripleo |
Fix Released
|
Medium
|
Oliver Walsh |
Bug Description
In an ironic deployment being built from day zero, there is an ordering problem, which generates a race condition for operators. Consider this common example:
At config time, you create and start three nova-compute services pointing at your ironic deployment. These three will be HA using the ironic driver's hash ring functionality. At config time, there are no ironic nodes present yet, which means running discover_hosts will create no host mappings.
Next, a single ironic node is added, which is owned by one of the computes per the hash rules. At this point, you can run discover_hosts and whatever compute owns that node will get a host mapping. Then you add a second ironic node, which causes all three nova-computes to rebalance the hash ring. One or more of the ironic nodes will definitely land on one of the other nova-computes and will suddenly be unreachable because there is no host mapping until the next time discover_hosts is run. Since we track the "mapped" bit on compute nodes, and compute nodes move between hosts with ironic, we won't even notice that the new owner nova-compute needs a host mapping. In fact, we won't notice until we get lucky enough to land a never-mapped ironic node on a nova-compute for the first time and then run discover_hosts after that point.
For an automated config management system, this is a lot of complexity to handle in order to generate a stable output of a working system. In many cases where you're using ironic to bootstrap another deployment (i.e. tripleo) the number of nodes may be small (less than the computes) for quite some time.
There are a couple obvious options I see:
1. Add a --and-services flag to nova-manage, which will also look for all nova-compute services in the cell and make sure those have mappings. This is ideal because we could get all services mapped at config time without even having to have an ironic node in place yet (which is not possible today). We can't do this efficiently right away because nova.services does not have a mapped flag, and thus the scheduler periodic should _not_ include services.
2. We could unset compute_node.mapped any time we re-home an ironic node to a different nova-compute. This would cause our scheduler periodic to notice the change and create a host mapping if it happens to move to an unmapped nova-compute. This generates extra work during normal operating state and also still leaves us with an interval of time where a previously-usable ironic node becomes unusable until the host discovery periodic task runs again.
IMHO, we should do #1. It's a backportable change, and it's actually a better workflow for config automation tools than what we have today, even discounting this race. We can do what we did before, which is do it once for backports, and then add a mapped bit in master to make it more efficient, allowing it to be included in the scheduler periodic task.
Changed in tripleo: | |
assignee: | nobody → Oliver Walsh (owalsh) |
milestone: | none → rocky-1 |
status: | New → In Progress |
importance: | Undecided → Medium |
Changed in tripleo: | |
milestone: | rocky-1 → rocky-2 |
Changed in tripleo: | |
milestone: | rocky-2 → rocky-3 |
Changed in tripleo: | |
milestone: | rocky-3 → rocky-rc1 |
Changed in tripleo: | |
milestone: | rocky-rc1 → stein-1 |
Changed in tripleo: | |
status: | In Progress → Fix Released |
Changed in tripleo: | |
milestone: | stein-1 → rocky-rc1 |
See this RHBZ for reproducer instructions: https:/ /bugzilla. redhat. com/show_ bug.cgi? id=1554460