Incomplete CRM resource cleanup, res_nova_consoleauth, res_heat_haproxy

Bug #1719279 reported by Nobuto Murata
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Confirmed
Low
Unassigned
OpenStack Heat Charm
Invalid
Undecided
Unassigned
OpenStack Nova Cloud Controller Charm
Invalid
Undecided
Unassigned

Bug Description

Some charms have some resource cleanup like below.

unit-hacluster-ceilometer-2: 18:24:01 DEBUG unit.hacluster-ceilometer/2.ha-relation-changed Cleaning up res_ceilometer_haproxy:0 on juju-152473-0, removing fail-count-res_ceilometer_haproxy
unit-hacluster-ceilometer-2: 18:24:01 DEBUG unit.hacluster-ceilometer/2.ha-relation-changed Cleaning up res_ceilometer_haproxy:0 on juju-152473-1, removing fail-count-res_ceilometer_haproxy
unit-hacluster-ceilometer-2: 18:24:01 DEBUG unit.hacluster-ceilometer/2.ha-relation-changed Cleaning up res_ceilometer_haproxy:0 on juju-152473-2, removing fail-count-res_ceilometer_haproxy
unit-hacluster-ceilometer-0: 18:24:01 DEBUG unit.hacluster-ceilometer/0.ha-relation-changed Waiting for 3 replies from the CRMd... OK
unit-hacluster-ceilometer-0: 18:24:01 DEBUG unit.hacluster-ceilometer/0.ha-relation-changed Cleaning up res_ceilometer_eth0_vip on juju-152473-0, removing fail-count-res_ceilometer_eth0_vip
unit-hacluster-ceilometer-0: 18:24:01 DEBUG unit.hacluster-ceilometer/0.ha-relation-changed Cleaning up res_ceilometer_eth0_vip on juju-152473-1, removing fail-count-res_ceilometer_eth0_vip
unit-hacluster-ceilometer-0: 18:24:01 DEBUG unit.hacluster-ceilometer/0.ha-relation-changed Cleaning up res_ceilometer_eth0_vip on juju-152473-2, removing fail-count-res_ceilometer_eth0_vip
unit-hacluster-ceilometer-2: 18:24:01 DEBUG unit.hacluster-ceilometer/2.ha-relation-changed Waiting for 3 replies from the CRMd... OK

However, I saw some deployments I have to cleanup resources manually at the end of deployments. Maybe those resources are missing from the coverage of cleanup. Would be nice if every resource is cleaned up automatically.

juju run --unit nova-cloud-controller/0 '
    sudo crm resource cleanup res_nova_consoleauth
'

juju run --unit heat/0 '
    sudo crm resource cleanup res_heat_haproxy
'

Tags: cpe-onsite
Nobuto Murata (nobuto)
summary: - Incomplete CRM resource cleanup
+ Incomplete CRM resource cleanup, res_nova_consoleauth, res_heat_haproxy
Revision history for this message
James Page (james-page) wrote :

The hacluster charm should attempt to cleanup all resources that its passed from its principle via the ha relation.

Changed in charm-heat:
status: New → Invalid
Changed in charm-nova-cloud-controller:
status: New → Invalid
Revision history for this message
James Page (james-page) wrote :

Nobuto

The code specifically won't cleanup a resource *if* its already running; Are services actually down at the end of your deployments, or are there some failed tasks reported by pacemaker with the actual services still being active?

Changed in charm-hacluster:
status: New → Incomplete
importance: Undecided → Low
Revision history for this message
Nobuto Murata (nobuto) wrote :

> The code specifically won't cleanup a resource *if* its already running; Are services actually down at the end of your deployments, or are there some failed tasks reported by pacemaker with the actual services still being active?

The crm status was all services were up and actually running, but it had some failed actions during the deployment. So nrpe check reports those failed actions as Critical.

tags: added: cpe-onsite
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack hacluster charm because there has been no activity for 60 days.]

Changed in charm-hacluster:
status: Incomplete → Expired
Revision history for this message
Nobuto Murata (nobuto) wrote :

I'm still seeing the failures just after deployments. So the cleanup seems not to be run by charms.

* res_cinder_haproxy_monitor_5000 on juju-d1f5be-18-lxd-2 'not running' (7): call=33, status=complete, exitreason='',
* res_heat_haproxy_monitor_5000 on juju-d1f5be-19-lxd-5 'not running' (7): call=42, status=complete, exitreason='',
* res_horizon_haproxy_monitor_5000 on juju-d1f5be-21-lxd-6 'not running' (7): call=33, status=complete, exitreason=''

Changed in charm-hacluster:
status: Expired → New
Revision history for this message
Xav Paice (xavpaice) wrote :

the cleanup action also didn't work for failed counts, where I needed to run the following on all the cluster nodes to clean up:

crm_failcount -r res_horizon_haproxy -D

It's possible that what I failed to do was run the action on all 3 nodes in the cluster rather than just on one.

Changed in charm-hacluster:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.