Minor updates in composable HA break due to haproxy rules being applied too late

Bug #1871646 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Michele Baldessari

Bug Description

Any role that has haproxy needs custom iptables rules that open up the traffic for all the haproxy stanzas. This is normally not spectacularly interesting or important when the role containing haproxy also contains all other controller services (mysql/redis/rabbit/etc), because those controller services open up their own ports. However in the composable HA case where databases and/or messaging is split off to a separate role these haproxy iptables rules become crucial.

In such a composable HA scenario minor updates can potentially break. Imagine the following scenario. Note that a minor update only runs the update tasks, host_prep_tasks and the docker_config tasks, aka the transient containers.

Now imagine the following scenario:
1) Minor update on controller-2, followed by controller-1
At this point the haproxy rules have disappeared from controller-2 and controller-1 because they run on the deployment steps which are not run during minor update.
2) Minor update of controller-0
At this point any transient container that tries to update or poke the DB will be stuck with:
 2020-04-07 15:00:53.606 12 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -300 attempts left.: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'overcloud.internalapi.redhat.local' (timed out)
Because those haproxy ports (3306 in this specific case) will not appear until we run the converge step.

Only stein and train and onwards are affected (queens created iptables rules inside a transient container which was run at the right time)

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/718159
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6220fe1bd319b8fabcc46b4b3fc705ac2f5526ed
Submitter: Zuul
Branch: master

commit 6220fe1bd319b8fabcc46b4b3fc705ac2f5526ed
Author: Michele Baldessari <email address hidden>
Date: Tue Apr 7 18:00:13 2020 +0200

    Move the haproxy iptables rules creation to host_prep_tasks

    The reason for this is that under deploy_tasks they won't be run during
    an update (until the converge command is run). This is problematic
    because in a composable HA being updated the haproxy firewall rules
    might disappear due to other tasks cleaning the rules up and they won't
    be recreated until converge. The problem is that that the temporary
    containers will run during the minor update and try to access the db
    which is now effectively firewalled off.

    Historically this was at step 2, because haproxy was configured during
    that step. Nothing should prevent us from creating the rules before and
    that is what we do for the non-haproxy rules too anyway.

    While moving it we need to take out the code from
    ::tripleo::profile::base::haproxy and use it directly because we do not
    have the required 'step' variable set in host_prep_tasks and silly
    puppet has now way of passing a hiera value on the command line (or via
    other simple means)

    Tested as follows:
    1) Deployed a fresh Train environment with this patch and correctly
    observed the haproxy fw rules:
    [root@controller-0 ~]# iptables -nvL INPUT |grep _haproxy |wc -l
    27

    2) Ran a minor update of controller-2, controller-1 and controller-0
    (in that order) and verified that afterwards all _haproxy rules
    are in place *before* the converge.

    3) Confirmed that in the minor update logs we do see the step where
    haproxy rules are enforced (previously this was not the case):
    $ grep 'Run puppet on the host to apply IPtables rules' update-controller-2.log
    TASK [Run puppet on the host to apply IPtables rules] **************************

    4) Run a full minor update + converge of a composable HA environment

    Closes-Bug: #1871646

    Change-Id: Icba8a8292d1e2675c7da3513d00a4a0f4191747e

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/718201
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6c04309a35c4b9e30696795e25c7fe80076b733a
Submitter: Zuul
Branch: stable/train

commit 6c04309a35c4b9e30696795e25c7fe80076b733a
Author: Michele Baldessari <email address hidden>
Date: Tue Apr 7 18:00:13 2020 +0200

    Move the haproxy iptables rules creation to host_prep_tasks

    The reason for this is that under deploy_tasks they won't be run during
    an update (until the converge command is run). This is problematic
    because in a composable HA being updated the haproxy firewall rules
    might disappear due to other tasks cleaning the rules up and they won't
    be recreated until converge. The problem is that that the temporary
    containers will run during the minor update and try to access the db
    which is now effectively firewalled off.

    Historically this was at step 2, because haproxy was configured during
    that step. Nothing should prevent us from creating the rules before and
    that is what we do for the non-haproxy rules too anyway.

    While moving it we need to take out the code from
    ::tripleo::profile::base::haproxy and use it directly because we do not
    have the required 'step' variable set in host_prep_tasks and silly
    puppet has now way of passing a hiera value on the command line (or via
    other simple means)

    Tested as follows:
    1) Deployed a fresh Train environment with this patch and correctly
    observed the haproxy fw rules:
    [root@controller-0 ~]# iptables -nvL INPUT |grep _haproxy |wc -l
    27

    2) Ran a minor update of controller-2, controller-1 and controller-0
    (in that order) and verified that afterwards all _haproxy rules
    are in place *before* the converge.

    3) Confirmed that in the minor update logs we do see the step where
    haproxy rules are enforced (previously this was not the case):
    $ grep 'Run puppet on the host to apply IPtables rules' update-controller-2.log
    TASK [Run puppet on the host to apply IPtables rules] **************************

    4) Run a full minor update + converge of a composable HA environment

    NB: Cherry-pick not 100% clean due to context

    Closes-Bug: #1871646

    Change-Id: Icba8a8292d1e2675c7da3513d00a4a0f4191747e
    (cherry picked from commit 6220fe1bd319b8fabcc46b4b3fc705ac2f5526ed)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/718995

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/718995
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=09c8fdb977c964d62008c7a1b0e54c3ab47d69db
Submitter: Zuul
Branch: stable/stein

commit 09c8fdb977c964d62008c7a1b0e54c3ab47d69db
Author: Michele Baldessari <email address hidden>
Date: Tue Apr 7 18:00:13 2020 +0200

    Move the haproxy iptables rules creation to host_prep_tasks

    The reason for this is that under deploy_tasks they won't be run during
    an update (until the converge command is run). This is problematic
    because in a composable HA being updated the haproxy firewall rules
    might disappear due to other tasks cleaning the rules up and they won't
    be recreated until converge. The problem is that that the temporary
    containers will run during the minor update and try to access the db
    which is now effectively firewalled off.

    Historically this was at step 2, because haproxy was configured during
    that step. Nothing should prevent us from creating the rules before and
    that is what we do for the non-haproxy rules too anyway.

    While moving it we need to take out the code from
    ::tripleo::profile::base::haproxy and use it directly because we do not
    have the required 'step' variable set in host_prep_tasks and silly
    puppet has now way of passing a hiera value on the command line (or via
    other simple means)

    Tested as follows:
    1) Deployed a fresh Train environment with this patch and correctly
    observed the haproxy fw rules:
    [root@controller-0 ~]# iptables -nvL INPUT |grep _haproxy |wc -l
    27

    2) Ran a minor update of controller-2, controller-1 and controller-0
    (in that order) and verified that afterwards all _haproxy rules
    are in place *before* the converge.

    3) Confirmed that in the minor update logs we do see the step where
    haproxy rules are enforced (previously this was not the case):
    $ grep 'Run puppet on the host to apply IPtables rules' update-controller-2.log
    TASK [Run puppet on the host to apply IPtables rules] **************************

    4) Run a full minor update + converge of a composable HA environment

    NB: Cherry-pick not 100% clean due to context

    Closes-Bug: #1871646

    Change-Id: Icba8a8292d1e2675c7da3513d00a4a0f4191747e
    (cherry picked from commit 6220fe1bd319b8fabcc46b4b3fc705ac2f5526ed)
    (cherry picked from commit 6c04309a35c4b9e30696795e25c7fe80076b733a)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates stein-eol

This issue was fixed in the openstack/tripleo-heat-templates stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.