pike to queens update leaves a pacemaker "failed action" behind on glance haproxy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack HA Cluster Charm |
Triaged
|
Medium
|
Unassigned |
Bug Description
Hi,
I just upgraded a cloud from pike to queens, on xenial, with 18.02 charms.
All openstack infra services are HA (on 2 nodes), except glance, which is deployed on a single node. glance is related to a glance-hacluster subordinate.
After the upgrade, the Nagios checks for the glance unit started failing :
+ /usr/local/
check_crm WARNING - : res_glance_haproxy failure detected, fail-count=1
This is because there was a failed action for res_glance_
The charms should probably handle (automatically cleanup ?) such failed actions.
Thanks
Relevant logs from syslog :
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping HAProxy Load Balancer...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped HAProxy Load Balancer.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping OpenStack Image Service API...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped OpenStack Image Service API.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping OpenStack Image Service Registry...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped OpenStack Image Service Registry.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd-
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping memcached daemon...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped memcached daemon.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping LSB: Apache2 web server...
Apr 4 09:22:51 juju-000dfe-4-lxd-3 systemd[1]: Stopped LSB: Apache2 web server.
Apr 4 09:22:53 juju-000dfe-4-lxd-3 crmd[1449]: notice: juju-000dfe-
Apr 4 09:22:53 juju-000dfe-4-lxd-3 crmd[1449]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_
Apr 4 09:22:53 juju-000dfe-4-lxd-3 crmd[1449]: notice: juju-000dfe-
Apr 4 09:22:53 juju-000dfe-4-lxd-3 systemd[1]: Failed to reset devices.list on /system.
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Starting HAProxy Load Balancer...
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Started HAProxy Load Balancer.
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Failed to reset devices.list on /system.
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Starting OpenStack Image Service API...
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Started OpenStack Image Service API.
no longer affects: | charm-keystone |
Changed in charm-hacluster: | |
status: | New → Triaged |
importance: | Undecided → Medium |
tags: | added: openstack-upgrade |
This kinda goes back to a bit of a management conflict on the haproxy service; the principle service manages its configuration file and will do restarts when the conf file changes, however the pacemaker cluster is also 'managing' the daemon including monitoring checks to ensure it keeps running.
A haproxy reload might be a good compromise OR awareness of HA in the principle charm causes a "crm resource restart res_<svc>_haproxy" call to restart the service via pacemaker.