pause failing on HA deployment: haproxy is running

Bug #1599636 reported by Ursula Junque
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
Triaged
Low
Unassigned
OpenStack Ceilometer Charm
Triaged
Low
Unassigned
OpenStack Charms Deployment Guide
Fix Released
High
Peter Matulis
OpenStack Cinder Charm
Triaged
Low
Unassigned
OpenStack Dashboard Charm
Triaged
Low
Unassigned
OpenStack Glance Charm
Triaged
Low
Unassigned
OpenStack Keystone Charm
Triaged
Low
Unassigned
OpenStack Neutron API Charm
Triaged
Low
Unassigned
OpenStack Nova Cloud Controller Charm
Triaged
Low
Unassigned

Bug Description

I have an HA mitaka cloud (deployed with autopilot), and am checking the pause/resume actions of its units.

When trying to "action pause" the units of most services that use hacluster (keystone, cinder, neutron-api, ceilometer, nova-cloud-controller, ceph-radosgw and less often openstack-dashboard), the action fails *most of the time* because haproxy is running. "juju action pause" tries to stop haproxy service, but once pacemaker/corosync detects haproxy isn't running, the service is restarted. haproxy is never really stopped and the action fails.

Keystone example:
$ juju action fetch 783e4ee0-a498-42e3-8448-45ac26f6a847
message: 'Couldn''t pause: Services should be paused but these services running: haproxy, these ports which should be closed, but are open: 5000, 35357, Paused. Use ''resume'' action to resume normal service.'
status: failed

juju logs excerpt for the paused unit: https://pastebin.canonical.com/160484/

juju status of the whole environment:
https://pastebin.canonical.com/160483/

/var/log/syslog excerpt right after juju action pause is issued:
https://pastebin.canonical.com/160379/

The actions for these services sometimes work, but vast majority attempts fail. This could indicate that something incidental is been relied upon (e.g. assuming network is "fast" enough that races aren't an issue).

Output of a script that pauses and resumes one unit per service to check the behavior: https://pastebin.canonical.com/160490/. Notice that neutron-api despite failing the action reports the unit as successfully paused shortly after.

Tags: landscape sts
Ursula Junque (ursinha)
description: updated
Ursula Junque (ursinha)
description: updated
summary: - pause/resume failing: haproxy is running and ports are open
+ pause/resume failing on HA deployment: haproxy is running and ports are
+ open
Ursula Junque (ursinha)
summary: - pause/resume failing on HA deployment: haproxy is running and ports are
- open
+ pause/resume failing on HA deployment: haproxy is running
Ursula Junque (ursinha)
description: updated
Ursula Junque (ursinha)
description: updated
Ursula Junque (ursinha)
description: updated
Ursula Junque (ursinha)
description: updated
summary: - pause/resume failing on HA deployment: haproxy is running
+ pause failing on HA deployment: haproxy is running
Ursula Junque (ursinha)
description: updated
Revision history for this message
James Page (james-page) wrote :

The problem here is that corosync and pacemaker take control of the execution of the haproxy process - so even if the principle charm stops and disables haproxy, pacemaker will just restart it again; really the stop/disable should happen in the pacemaker layer for processes that it is taking control of.

Changed in ceilometer (Juju Charms Collection):
importance: Undecided → High
Changed in ceph-radosgw (Juju Charms Collection):
importance: Undecided → High
Changed in openstack-dashboard (Juju Charms Collection):
importance: Undecided → High
Changed in keystone (Juju Charms Collection):
importance: Undecided → High
Changed in cinder (Juju Charms Collection):
status: New → Triaged
Changed in keystone (Juju Charms Collection):
status: New → Triaged
Changed in cinder (Juju Charms Collection):
importance: Undecided → High
Changed in neutron-api (Juju Charms Collection):
importance: Undecided → High
Changed in nova-cloud-controller (Juju Charms Collection):
importance: Undecided → High
Changed in neutron-api (Juju Charms Collection):
status: New → Triaged
Changed in nova-cloud-controller (Juju Charms Collection):
status: New → Triaged
Changed in openstack-dashboard (Juju Charms Collection):
status: New → Triaged
Changed in ceph-radosgw (Juju Charms Collection):
status: New → Triaged
Changed in ceilometer (Juju Charms Collection):
status: New → Triaged
Ursula Junque (ursinha)
tags: added: kanban-cross-team
tags: removed: kanban-cross-team
Revision history for this message
Ursula Junque (ursinha) wrote :

I just hit the issue with glance as well. Differently from the other charms, juju action do pause completed successfully, but in fact juju status was the same: "blocked" with error message that haproxy was still up.

James Page (james-page)
Changed in glance (Juju Charms Collection):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Liam Young (gnuoy) wrote :

Please could you try pausing hacluster on the unit before pausing the principle unit.
Thanks
Liam

Revision history for this message
James Page (james-page) wrote :

Pausing the hacluster unit first does disable the haproxy process running on the unit; pausing the principle will not pause haproxy, as its directly managed by pacemaker, so will get restarted as soon as its shutdown using the normal init script.

Revision history for this message
James Page (james-page) wrote :

Infact pausing the hacluster subordinate first is really important as that will ensure that any virtual IP's are also moved to different units in the cluster.

That said, the principle should really know that its no longer in charge of haproxy, and report the correct status.

Changed in ceilometer (Juju Charms Collection):
importance: High → Low
Changed in ceph-radosgw (Juju Charms Collection):
importance: High → Low
Changed in cinder (Juju Charms Collection):
importance: High → Low
Changed in glance (Juju Charms Collection):
importance: High → Low
Changed in keystone (Juju Charms Collection):
importance: High → Low
Changed in neutron-api (Juju Charms Collection):
importance: High → Low
Changed in nova-cloud-controller (Juju Charms Collection):
importance: High → Low
Changed in openstack-dashboard (Juju Charms Collection):
importance: High → Low
Changed in ceilometer (Juju Charms Collection):
milestone: none → 16.07
Changed in cinder (Juju Charms Collection):
milestone: none → 16.10
Changed in ceph-radosgw (Juju Charms Collection):
milestone: none → 16.07
milestone: 16.07 → 16.10
Changed in ceilometer (Juju Charms Collection):
milestone: 16.07 → 16.10
Changed in glance (Juju Charms Collection):
milestone: none → 16.10
Changed in keystone (Juju Charms Collection):
milestone: none → 16.10
Changed in neutron-api (Juju Charms Collection):
milestone: none → 16.10
Changed in nova-cloud-controller (Juju Charms Collection):
milestone: none → 16.10
Changed in openstack-dashboard (Juju Charms Collection):
milestone: none → 16.10
Revision history for this message
James Page (james-page) wrote :

Dropping priority to low as a workaround exists.

James Page (james-page)
Changed in charm-ceilometer:
importance: Undecided → Low
status: New → Triaged
Changed in ceilometer (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page)
Changed in charm-ceph-radosgw:
importance: Undecided → Low
status: New → Triaged
Changed in ceph-radosgw (Juju Charms Collection):
status: Triaged → Invalid
Changed in charm-cinder:
importance: Undecided → Low
status: New → Triaged
Changed in cinder (Juju Charms Collection):
status: Triaged → Invalid
Changed in charm-glance:
importance: Undecided → Low
status: New → Triaged
Changed in glance (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page)
Changed in charm-keystone:
importance: Undecided → Low
status: New → Triaged
Changed in keystone (Juju Charms Collection):
status: Triaged → Invalid
Changed in charm-neutron-api:
importance: Undecided → Low
status: New → Triaged
Changed in neutron-api (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page)
Changed in charm-nova-cloud-controller:
importance: Undecided → Low
status: New → Triaged
Changed in nova-cloud-controller (Juju Charms Collection):
status: Triaged → Invalid
Changed in charm-openstack-dashboard:
importance: Undecided → Low
status: New → Triaged
Changed in openstack-dashboard (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Edward Hope-Morley (hopem) wrote :

There is currently no mention of this workaround at [1] and since it is too easy to miss this when doing upgrades I think we should bump priority (and fix the guide).

[1] https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-upgrade-openstack.html#cinder-ceph-topology-change-upgrading-from-newton-to-ocata

no longer affects: openstack-dashboard (Juju Charms Collection)
no longer affects: nova-cloud-controller (Juju Charms Collection)
no longer affects: neutron-api (Juju Charms Collection)
no longer affects: keystone (Juju Charms Collection)
no longer affects: glance (Juju Charms Collection)
no longer affects: cinder (Juju Charms Collection)
no longer affects: ceph-radosgw (Juju Charms Collection)
no longer affects: ceilometer (Juju Charms Collection)
Changed in charm-ceilometer:
milestone: none → 19.04
Changed in charm-ceph-radosgw:
milestone: none → 19.04
Changed in charm-cinder:
milestone: none → 19.04
Changed in charm-glance:
milestone: none → 19.04
Changed in charm-keystone:
milestone: none → 19.04
Changed in charm-neutron-api:
milestone: none → 19.04
Changed in charm-nova-cloud-controller:
milestone: none → 19.04
Changed in charm-openstack-dashboard:
milestone: none → 19.04
tags: added: sts
Changed in charm-deployment-guide:
importance: Undecided → High
David Ames (thedac)
Changed in charm-ceilometer:
milestone: 19.04 → 19.07
Changed in charm-ceph-radosgw:
milestone: 19.04 → 19.07
Changed in charm-cinder:
milestone: 19.04 → 19.07
Changed in charm-glance:
milestone: 19.04 → 19.07
Changed in charm-keystone:
milestone: 19.04 → 19.07
Changed in charm-neutron-api:
milestone: 19.04 → 19.07
Changed in charm-nova-cloud-controller:
milestone: 19.04 → 19.07
Changed in charm-openstack-dashboard:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-ceilometer:
milestone: 19.07 → 19.10
Changed in charm-ceph-radosgw:
milestone: 19.07 → 19.10
Changed in charm-cinder:
milestone: 19.07 → 19.10
Changed in charm-glance:
milestone: 19.07 → 19.10
Changed in charm-keystone:
milestone: 19.07 → 19.10
Changed in charm-neutron-api:
milestone: 19.07 → 19.10
Changed in charm-nova-cloud-controller:
milestone: 19.07 → 19.10
Changed in charm-openstack-dashboard:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-ceilometer:
milestone: 19.10 → 20.01
Changed in charm-ceph-radosgw:
milestone: 19.10 → 20.01
Changed in charm-cinder:
milestone: 19.10 → 20.01
Changed in charm-glance:
milestone: 19.10 → 20.01
Changed in charm-keystone:
milestone: 19.10 → 20.01
Changed in charm-neutron-api:
milestone: 19.10 → 20.01
Changed in charm-nova-cloud-controller:
milestone: 19.10 → 20.01
Changed in charm-openstack-dashboard:
milestone: 19.10 → 20.01
Changed in charm-deployment-guide:
assignee: nobody → Peter Matulis (petermatulis)
status: New → In Progress
Revision history for this message
Peter Matulis (petermatulis) wrote :

This has been fixed here:

https://review.opendev.org/690454

Changed in charm-deployment-guide:
status: In Progress → Fix Released
James Page (james-page)
Changed in charm-ceilometer:
milestone: 20.01 → 20.05
Changed in charm-ceph-radosgw:
milestone: 20.01 → 20.05
Changed in charm-cinder:
milestone: 20.01 → 20.05
Changed in charm-glance:
milestone: 20.01 → 20.05
Changed in charm-keystone:
milestone: 20.01 → 20.05
Changed in charm-neutron-api:
milestone: 20.01 → 20.05
Changed in charm-nova-cloud-controller:
milestone: 20.01 → 20.05
Changed in charm-openstack-dashboard:
milestone: 20.01 → 20.05
David Ames (thedac)
Changed in charm-ceilometer:
milestone: 20.05 → 20.08
Changed in charm-ceph-radosgw:
milestone: 20.05 → 20.08
Changed in charm-cinder:
milestone: 20.05 → 20.08
Changed in charm-glance:
milestone: 20.05 → 20.08
Changed in charm-keystone:
milestone: 20.05 → 20.08
Changed in charm-neutron-api:
milestone: 20.05 → 20.08
Changed in charm-nova-cloud-controller:
milestone: 20.05 → 20.08
Changed in charm-openstack-dashboard:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-ceilometer:
milestone: 20.08 → none
Changed in charm-ceph-radosgw:
milestone: 20.08 → none
Changed in charm-cinder:
milestone: 20.08 → none
Changed in charm-glance:
milestone: 20.08 → none
Changed in charm-keystone:
milestone: 20.08 → none
Changed in charm-neutron-api:
milestone: 20.08 → none
Changed in charm-nova-cloud-controller:
milestone: 20.08 → none
Changed in charm-openstack-dashboard:
milestone: 20.08 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.