HAProxy not set into maintenance for various services during upgrades

Bug #2047017 reported by Andrew Bonney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Unassigned

Bug Description

Various service playbooks make use of the 'haproxy-endpoint-manage' playbook to prevent access to API endpoints whilst they are being upgraded. This seems to be incomplete however, with services like Magnum and Horizon not making use of this mechanism. This can result in errors for user requests whilst upgrades are being carried out.

In principal the fix appears to require a simple copy/paste of existing 'haproxy-endpoint-manage' tasks into the full set of os-*-install playbooks.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I'm pretty sure I was pushing smth to address that years ago, but there was no consensus if the approach is actually needed at all...

Revision history for this message
Andrew Bonney (andrewbonney) wrote :

It may be that some services handle the upgrade process more gracefully than others, but certainly in the case of Magnum it starts responding with 503s or similar if the back ends aren't put into maintenance whilst they're worked on. I'm not sure why in this case the HAProxy healthcheck presumably remains healthy throughout.

It feels like having the maintenance mode managed consistently for all services would at least handle cases like this where the healthcheck isn't necessarily representative during an upgrade.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

We also do have haproxy_drain variable, that handles endpoint disablement even more gracefully, since it waits for all connections to be finished before moving to maintenance mode

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (master)
Changed in openstack-ansible:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible/+/904452
Committed: https://opendev.org/openstack/openstack-ansible/commit/9694ae8c2346daec443c5d74f288c146eb05d93f
Submitter: "Zuul (22348)"
Branch: master

commit 9694ae8c2346daec443c5d74f288c146eb05d93f
Author: Dmitriy Rabotyagov <email address hidden>
Date: Fri Dec 29 17:41:58 2023 +0100

    Ensure disable/enable haproxy backends exists for all services

    Right now we ensure that services are enabled/disabled while running
    playbooks only for core services. At the same time some services still do
    not have this mechanism, that might result in unexpected outages.

    So we ensure that all service playbooks will behave in the same way and
    disable backends in advance before playbook will do any modifications.

    With that, setting variable `haproxy_drain: true` will ensure that moving
    backend to the MAINT state will be graceful and all current connections
    will close normally unless a timeout is reached, which is 2 min by default.

    Closes-Bug: #2047017
    Change-Id: I8554defec4df54d14be72ae9a1560907ff1aaddf

Changed in openstack-ansible:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.