system health resource agent

Bug #1880613 reported by Andrea Ieri
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Triaged
Wishlist
Unassigned

Bug Description

The hacluster charm is currently only leveraging pacemaker for managing VIPs and running haproxy. The assumption is that haproxy should run whenever possible, and that haproxy health checks should take care of not routing traffic to backends that aren't functional.

The haproxy resource agent (lsb:haproxy → /etc/init.d/haproxy) is however only providing a basic status check via the systemd service: if the haproxy service is running, the resource is considered functional. In some cases the unit running the haproxy frontend may however not be fully healthy (e.g. memory pressure or high load) and should be fenced.

This is traditionally handled via STONITH, but I think a lighter-weight approach could also be successful: a new resource agent could dynamically set node attributes reporting cpu load and memory pressure, while additionally calculating a "overloaded" boolean attribute.Having a separate "overloaded" attribute (instead of directly setting node scores) would be needed to make the calculation[*] smarter, because although haproxy shouldn't run on an overloaded node, it totally should if all of the cluster nodes are equally under stress.

Once we have an "overloaded" attribute, placement rules could then be set to ensure haproxy runs only on non-overloaded nodes. Colocation constraints (see bug 1810919) would ensure that the VIPs are only instantiated where a haproxy service is running.

The new resource agent should then be upstreamed to https://github.com/ClusterLabs/resource-agents

[*] how to calculate what "overloaded" means is to be defined. A starting point could be "1m load > 8*nproc OR 5m load > 4*nproc OR 15m load > 2*nproc OR (>90% used ram if 100% used swap)", but actual parameters may need to be exposed as config options to make the attribute fit all deployments

Note: once openstack resource agents are available (bug 1880611), placement rules could additionally affect the backends themselves.

James Page (james-page)
Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.