pike to queens update leaves a pacemaker "failed action" behind on glance haproxy

Bug #1761170 reported by Junien F
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Triaged
Medium
Unassigned

Bug Description

Hi,

I just upgraded a cloud from pike to queens, on xenial, with 18.02 charms.

All openstack infra services are HA (on 2 nodes), except glance, which is deployed on a single node. glance is related to a glance-hacluster subordinate.

After the upgrade, the Nagios checks for the glance unit started failing :
+ /usr/local/lib/nagios/plugins/check_crm
check_crm WARNING - : res_glance_haproxy failure detected, fail-count=1

This is because there was a failed action for res_glance_haproxy_monitor_5000 (sadly, I failed to capture the output from "crm status"). I think this is simply because haproxy got restarted during the openstack upgrade. I was able to get rid of it by running "sudo crm resource cleanup res_glance_haproxy".

The charms should probably handle (automatically cleanup ?) such failed actions.

Thanks

Relevant logs from syslog :

Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping HAProxy Load Balancer...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped HAProxy Load Balancer.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping OpenStack Image Service API...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped OpenStack Image Service API.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping OpenStack Image Service Registry...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped OpenStack Image Service Registry.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd-memcached-wrapper[1227]: Signal handled: Terminated.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping memcached daemon...
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopped memcached daemon.
Apr 4 09:22:50 juju-000dfe-4-lxd-3 systemd[1]: Stopping LSB: Apache2 web server...
Apr 4 09:22:51 juju-000dfe-4-lxd-3 systemd[1]: Stopped LSB: Apache2 web server.
Apr 4 09:22:53 juju-000dfe-4-lxd-3 crmd[1449]: notice: juju-000dfe-4-lxd-3-res_glance_haproxy_monitor_5000:12 [ * haproxy.service - HAProxy Load Balancer\n Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Wed 2018-04-04 09:22:50 UTC; 3s ago\n Docs: man:haproxy(1)\n file:/usr/share/doc/haproxy/configuration.txt.gz\n Main PID: 1409 (code=exited, status=0/SUCCESS)\n\nApr 03 08:00:10 juju-000dfe-4-lxd-3 systemd[1]: Starting HAProxy Load Balancer...\nApr 03 08:00:11 juj
Apr 4 09:22:53 juju-000dfe-4-lxd-3 crmd[1449]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Apr 4 09:22:53 juju-000dfe-4-lxd-3 crmd[1449]: notice: juju-000dfe-4-lxd-3-res_glance_haproxy_monitor_5000:12 [ * haproxy.service - HAProxy Load Balancer\n Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Wed 2018-04-04 09:22:50 UTC; 3s ago\n Docs: man:haproxy(1)\n file:/usr/share/doc/haproxy/configuration.txt.gz\n Main PID: 1409 (code=exited, status=0/SUCCESS)\n\nApr 03 08:00:10 juju-000dfe-4-lxd-3 systemd[1]: Starting HAProxy Load Balancer...\nApr 03 08:00:11 juj
Apr 4 09:22:53 juju-000dfe-4-lxd-3 systemd[1]: Failed to reset devices.list on /system.slice/haproxy.service: Operation not permitted
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Starting HAProxy Load Balancer...
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Started HAProxy Load Balancer.
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Failed to reset devices.list on /system.slice/glance-api.service: Operation not permitted
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Starting OpenStack Image Service API...
Apr 4 09:22:54 juju-000dfe-4-lxd-3 systemd[1]: Started OpenStack Image Service API.

Junien F (axino)
no longer affects: charm-keystone
Revision history for this message
James Page (james-page) wrote :

This kinda goes back to a bit of a management conflict on the haproxy service; the principle service manages its configuration file and will do restarts when the conf file changes, however the pacemaker cluster is also 'managing' the daemon including monitoring checks to ensure it keeps running.

A haproxy reload might be a good compromise OR awareness of HA in the principle charm causes a "crm resource restart res_<svc>_haproxy" call to restart the service via pacemaker.

James Page (james-page)
Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → Medium
tags: added: openstack-upgrade
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.