Missing HA Guide Content: Systemd (vs Cluster Managers)
Bug #1677682 reported by
ianeta hutchinson
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openstack-manuals |
Won't Fix
|
Low
|
Unassigned |
Bug Description
An explanation of Systemd is missing from the HA Guide. At the moment, an explanation of cluster managers is present, but with no alternative. A comparison to communicate why a user would utilize Systemd or a cluster manager would also be beneficial.
tags: | added: ha-guide |
tags: |
added: ha-guide-draft removed: ha-guide |
description: | updated |
summary: |
- Missing HA Guide Content: Cluster Managers + Missing HA Guide Content: Systemd (vs Cluster Managers) |
description: | updated |
Changed in openstack-manuals: | |
status: | New → Triaged |
importance: | Undecided → Low |
milestone: | none → pike |
To post a comment you must log in.
Briefly (for now):
http:// blog.clusterlab s.org/blog/ 2016/next- openstack- ha-arch contends that managing of OpenStack services via Pacemaker can be replaced by systemd under certain circumstances. Whilst the article makes several good points, I feel that it misses others. I have already talked to Andrew at length about this, and I think we are mostly on the same page by now.
My take is that systemd *may* be adequate for active/active services where:
1) systemd is configured to auto-restart the services on failure
2) no cross-node ordering is required
3) the services can keep functioning correctly (e.g. graceful failure) even if their dependencies go down
4) the services can recover correctly if their dependencies come back up
5) nothing more than pid-level monitoring is required
6) an external alerting / notification system is present
Whilst 1) should be easily satisfied, and 2) is true in some cases, especially if 3) and 4) hold, some caveats are as follows:
- 3) and 4) are all well and good in theory, but in practice I'm dubious that all OpenStack services have reached this level of robustness yet.
- Regarding 5), only doing pid-level monitoring misses some key failure cases such as a service hanging rather than crashing, or falling victim to a bug which renders it non-functional even though the process is still running. This is why I continue to believe that the openstack- resource- agents project still brings value, although as the (bad) maintainer I am of course horribly biased towards it.
- 6) can of course be satisfied, but requires a lot of additional work to ensure that such a system is deployed and configured in parallel to the services a way which matches the deployment of the services. It's definitely worth doing, but the effort shouldn't be underestimated. Pacemaker serves as a poor man's alternative to a real monitoring system, which is not a good long-term solution, but is useful in the short term.