OpenStack-Ansible

HAProxy not set into maintenance for various services during upgrades

Bug #2047017 reported by Andrew Bonney on 2023-12-20

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack-Ansible	Fix Released	Undecided	Unassigned

Bug Description

Various service playbooks make use of the 'haproxy-endpoint-manage' playbook to prevent access to API endpoints whilst they are being upgraded. This seems to be incomplete however, with services like Magnum and Horizon not making use of this mechanism. This can result in errors for user requests whilst upgrades are being carried out.

In principal the fix appears to require a simple copy/paste of existing 'haproxy-endpoint-manage' tasks into the full set of os-*-install playbooks.

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-20:

#1

I'm pretty sure I was pushing smth to address that years ago, but there was no consensus if the approach is actually needed at all...

Revision history for this message

Andrew Bonney (andrewbonney) wrote on 2023-12-20:

#2

It may be that some services handle the upgrade process more gracefully than others, but certainly in the case of Magnum it starts responding with 503s or similar if the back ends aren't put into maintenance whilst they're worked on. I'm not sure why in this case the HAProxy healthcheck presumably remains healthy throughout.

It feels like having the maintenance mode managed consistently for all services would at least handle cases like this where the healthcheck isn't necessarily representative during an upgrade.

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-20:

#3

We also do have haproxy_drain variable, that handles endpoint disablement even more gracefully, since it waits for all connections to be finished before moving to maintenance mode

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-12-29: Fix proposed to openstack-ansible (master)

#4

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/openstack-ansible/+/904452

Changed in openstack-ansible:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-01-05: Fix merged to openstack-ansible (master)

#5

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible/+/904452
Committed: https://opendev.org/openstack/openstack-ansible/commit/9694ae8c2346daec443c5d74f288c146eb05d93f
Submitter: "Zuul (22348)"
Branch: master

commit 9694ae8c2346daec443c5d74f288c146eb05d93f
Author: Dmitriy Rabotyagov <email address hidden>
Date: Fri Dec 29 17:41:58 2023 +0100

Ensure disable/enable haproxy backends exists for all services

    Right now we ensure that services are enabled/disabled while running
    playbooks only for core services. At the same time some services still do
    not have this mechanism, that might result in unexpected outages.

So we ensure that all service playbooks will behave in the same way and
disable backends in advance before playbook will do any modifications.

    With that, setting variable `haproxy_drain: true` will ensure that moving
    backend to the MAINT state will be graceful and all current connections
    will close normally unless a timeout is reached, which is 2 min by default.

Closes-Bug: #2047017
Change-Id: I8554defec4df54d14be72ae9a1560907ff1aaddf

Changed in openstack-ansible:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.