Comment 1 for bug 1677682

Revision history for this message
Adam Spiers (adam.spiers) wrote :

Briefly (for now):

http://blog.clusterlabs.org/blog/2016/next-openstack-ha-arch contends that managing of OpenStack services via Pacemaker can be replaced by systemd under certain circumstances. Whilst the article makes several good points, I feel that it misses others. I have already talked to Andrew at length about this, and I think we are mostly on the same page by now.

My take is that systemd *may* be adequate for active/active services where:

1) systemd is configured to auto-restart the services on failure
2) no cross-node ordering is required
3) the services can keep functioning correctly (e.g. graceful failure) even if their dependencies go down
4) the services can recover correctly if their dependencies come back up
5) nothing more than pid-level monitoring is required
6) an external alerting / notification system is present

Whilst 1) should be easily satisfied, and 2) is true in some cases, especially if 3) and 4) hold, some caveats are as follows:

- 3) and 4) are all well and good in theory, but in practice I'm dubious that all OpenStack services have reached this level of robustness yet.

- Regarding 5), only doing pid-level monitoring misses some key failure cases such as a service hanging rather than crashing, or falling victim to a bug which renders it non-functional even though the process is still running. This is why I continue to believe that the openstack-resource-agents project still brings value, although as the (bad) maintainer I am of course horribly biased towards it.

- 6) can of course be satisfied, but requires a lot of additional work to ensure that such a system is deployed and configured in parallel to the services a way which matches the deployment of the services. It's definitely worth doing, but the effort shouldn't be underestimated. Pacemaker serves as a poor man's alternative to a real monitoring system, which is not a good long-term solution, but is useful in the short term.