Comment 6 for bug 1411660

Revision history for this message
Ryan Moe (rmoe) wrote :

This issue should only occur with an environment that contains only one controller AND nodes with roles that depend only on 'controller'.

With the environment configuration described here we end up with a deployment order of:
['mongo', 'compute', 'cinder', 'ceph-osd']
['primary-mongo']
['primary-controller']

instead of the correct order:
['mongo']
['primary-mongo']
['primary-controller']
['compute', 'cinder', 'ceph-osd']

I believe the root cause is incorrect dependencies on the ceph-osd, compute, and cinder groups that causes them to be run out of order.

To order the groups for deployment we start with a list of root nodes in the graph (zabbix-server and base-os) and a list of all groups that have already been processed [0]. This initial list of already-processed nodes contains the difference between all available groups and the groups in our environment. For this particular deployment scenario we start with this: ['base-os', 'controller', 'zabbix-server'] (these are the roles that are not assigned to any node in the environment).

After the priorities are processed for the current groups (zabbix-server and base-os) we get the next set of groups to process [1]. Getting the next groups involves iterating over all roles in the graph and checking [2]:
1. That the current role's predecessors are in our list of processed nodes.
2. That the current role has not already been processed.

The failure in this case is that 'controller' is in the list of already processed groups. Because ceph-osd, compute, and cinder depend only on controller they pass check number one. They also do not exist in the list of processed nodes (as shown above) so the second check is satisfied. This means that cinder, compute, and ceph-osd all get added in parallel with the mongo group (which also passes both checks at this point). The primary-controller is not added at this time because its predecessors have not been processed (mongo and primary-mongo are in the environment and therefore not in the list of already-processed nodes yet).

This is not a problem with one controller and no mongo because in this case 'mongo' and 'primary-mongo' are in the list of already processed nodes (because they're not in the environment). This means that the first time primary-controller is checked its predecessors have already been processed and primary-controller is added to the groups to process.

[0] https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/orchestrator/deployment_graph.py#L343-L344
[1] https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/orchestrator/deployment_graph.py#L95
[2] https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/orchestrator/deployment_graph.py#L103-L104