Masking haproxy.service makes the haproxy RA unable to detect failures
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Neutron API Charm |
Triaged
|
High
|
Unassigned |
Bug Description
On a recently deployed cloud using 19.10 charms, I paused the neutron-api unit holding the vip. This masked and stopped the haproxy service, but since the haproxy resource agent (RA) did not report the failure, Pacemaker did not take corrective actions and left the VIP in place. As a consequence, no neutron API command could be submitted anymore.
This can also be easily reproduced manually:
$ /etc/init.d/haproxy status &>/dev/null; echo $?
0
$ systemctl status haproxy &>/dev/null; echo $?
0
$ systemctl mask haproxy
Created symlink /etc/systemd/
$ /etc/init.d/haproxy status &>/dev/null; echo $?
0
$ systemctl status haproxy &>/dev/null; echo $?
0
$ systemctl stop haproxy
$ /etc/init.d/haproxy status &>/dev/null; echo $?
0
$ systemctl status haproxy &>/dev/null; echo $?
3
Digging further, the problem comes from /lib/lsb/
$ bash -x /etc/init.d/haproxy status
+ PATH=/sbin:
+ BASENAME=haproxy
+ PIDFILE=
+ CONFIG=
+ HAPROXY=
+ RUNDIR=/run/haproxy
+ EXTRAOPTS=
+ test -x /usr/sbin/haproxy
+ '[' -e /etc/default/
+ . /etc/default/
++ ENABLED=1
+ test -f /etc/haproxy/
+ '[' -f /etc/default/rcS ']'
+ . /lib/lsb/
+++ run-parts --lsbsysinit --list /lib/lsb/
++ for hook in $(run-parts --lsbsysinit --list /lib/lsb/
++ '[' -r /lib/lsb/
++ . /lib/lsb/
++ for hook in $(run-parts --lsbsysinit --list /lib/lsb/
++ '[' -r /lib/lsb/
++ . /lib/lsb/
+++ _use_systemctl=0
+++ '[' -d /run/systemd/system ']'
+++ prog=haproxy
+++ service=
++++ systemctl -p LoadState --value show haproxy.service
+++ state=masked
+++ '[' masked = masked ']'
+++ exit 0
root@juju-
13: [ "$state" = "masked" ] && exit 0
The code above makes the haproxy RA not LSB-compliant[0]. If the RA decides not to investigate a masked service, it should - I believe - at least return 4.
I think that either the haproxy initscript should be made LSB-compliant, or we should switch to a different resource agent altogether (e.g. an ocf one).
This bug has been opened for neutron-api, but will affect any charm using the haproxy LSB resource agent.
Please also note that the API failure described initially would still occur even if this bug were to be resolved, due to LP#1810918. [1]
[0] https:/
[1] https:/
Changed in charm-neutron-api: | |
assignee: | Alex Kavanagh (ajkavanagh) → nobody |
description: | updated |
TRIAGE: High because stopping a service using Pause actually breaks the system.