> So haproxy is using pacemaker as well? No, Pacemaker is managing HAProxy. But as I said, the HA Guide is out of date and currently undergoing a huge revamp, so please don't rely on it for accurate information right now. If you need answers now, please use one of the other resources available which I list further down in this comment. > I was reading some haproxy docs earlier and it looks like the > creator recommends keepalived for haproxy use for the following > reasons: > > http://www.formilux.org/archives/haproxy/1003/3259.html That post, whilst very old (so old in fact that it refers to Heartbeat which is Pacemaker's predecessor), is mostly spot-on. However I disagree with the sentence "A cluster-based product may very well end up with none of the nodes offering the service, to ensure that the shared resource is never corrupted by concurrent accesses." In a correctly configured Pacemaker cluster managing HAProxy, this should not happen, because at least one node should always have quorum and be able to run the service. And whilst Keepalived is a perfectly good solution for network-based resources which don't require fencing, if Pacemaker is already required for other reasons, it doesn't make much sense to add Keepalived when Pacemaker can already achieve the same thing. > I'm not opposed to pacemaker, but it is a bit of overkill for some of this. With respect, that's a slightly vague generalization :-) There are contexts within OpenStack where (say) fencing is required, and then Pacemaker is the obvious choice. Yes it might look a bit like a sledgehammer, but if you need to crack not only nuts but some large stones, sometimes it makes sense to reuse the sledgehammer for the nuts instead of spending extra effort getting a smaller hammer just for the nuts. > We have used it in the public cloud and it always seems to turn out > badly. The original builders may know how to use it, but your > average tech has a hard time with it. It usually ends up with them > getting in and breaking pacemaker in the process of bringing up the > services manually. It usually ends up staying in a broken status as > no-one wants to take it down while in the process of fixing the > cluster to keep SLA. I appreciate what you're saying. Yes Pacemaker is a complex beast, but in some cases that complexity is necessary. That's why we are revamping this HA guide, to mitigate these kinds of problems. > If all we are doing is restarting the service if it dies, No, in general that's not all we are doing. It may make sense to move some of the active/active services to be managed by systemd (RH are already switching to this in fact), but that does not eliminate the need for a cluster manager altogether. But this bug is not the correct place for a Pacemaker vs. keepalived debate, so please let's not continue this here. > This brings up other questions that we might need to think about. Do > all of the services need to be ran directly on those controller > nodes? Most of the deployment tools are there are starting to use > lxc or docker to house each of the services. Are each of them going > to need to be a member of the same controller cluster? If not, can > the openstack pcs resources support the handling of services setting > in docker or lxc containers? I really do welcome your thoughts and input on these kinds of topics, but please provide them in the right place, e.g. any of: - the openstack-dev mailing list (put "[HA]" as a prefix in the Subject header) - the #openstack-ha IRC channel on Freenode - the weekly OpenStack HA meetings on IRC - any of the OpenStack events (e.g. if you're going to Boston then do grab me for a chat) In contrast this bug is *specifically* about the keystone section of the HA guide, so please don't pollute it with other topics. Thanks for your understanding!