we need an explicit guarantee that firewall rules are opened before we invoke any pcs commands

Bug #1866209 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Michele Baldessari

Bug Description

We want to make sure that any firewall rule set to open pacemaker ports
is executed before we run any commands that invoke pcs to
authenticate remote nodes.

It simply makes sense from a high-level POV to explicitely open
up firewall rules before we invoke pcs commands that will talk to
remote nodes.

I have actually seen one case in the wild where during a scaleup
the node being scaled up was waiting on Exec['wait-for-settle']
and the bootstrap node failed to contact pcs on the scaled up node
because there the firewall rules were never opened up as it was
waiting on the 'wait-for-settle' step.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Michele Baldessari (michele)
tags: added: train-backport-potential
tags: added: queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.opendev.org/711509

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)
Download full text (4.0 KiB)

Reviewed: https://review.opendev.org/711509
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=88e119d747c432030b6491592c17293783ea3d19
Submitter: Zuul
Branch: master

commit 88e119d747c432030b6491592c17293783ea3d19
Author: Michele Baldessari <email address hidden>
Date: Thu Mar 5 17:38:30 2020 +0100

    Enforce firewall rules before pacemaker-auth

    We want to make sure that any firewall rule set to open pacemaker ports
    is executed before we run any commands that invoke pcs to
    authenticate remote nodes.

    It simply makes sense from a high-level POV to explicitely open
    up firewall rules before we invoke pcs commands that will talk to
    remote nodes.

    I have actually seen one case in the wild where during a scaleup
    the node being scaled up was waiting on Exec['wait-for-settle']
    and the bootstrap node failed to contact pcs on the scaled up node
    because there the firewall rules were never opened up as it was
    waiting on the 'wait-for-settle' step.

    Note that we *cannot* impose the ordering via a too-generic
    Firewall<||> collector because in tripleo::firewall we have

        Service<||> -> Class['tripleo::firewall::post']

    and we would create a circular dependency.

    Tested a queens deploy with this change and we are correctly
    guaranteed to open up firewalling before invoking pcs:
    Mar 05 16:22:51 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[swift_storage]/Tripleo::Firewall::Rule[123 swift storage]/Firewall[123 swift storage ipv4]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[swift_storage]/Tripleo::Firewall::Rule[123 swift storage]/Firewall[123 swift storage ipv6]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[tripleo_firewall]/Tripleo::Firewall::Rule[003 accept ssh from any]/Firewall[003 accept ssh from any ipv4]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[tripleo_firewall]/Tripleo::Firewall::Rule[003 accept ssh from any]/Firewall[003 accept ssh from any ipv6]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (Exec[reauthenticate-across-all-nodes](provider=posix)) Executing '/sbin/pcs cluster auth controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messag
    ing-2 -u hacluster -p foobar --force'
    Mar 05 16:22:52 controller-0. puppet-user[18840]: Executing: '/sbin/pcs cluster auth controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 -u hacluster -p AQtEeE6e3FDEqrfm --force'
    Mar 05 16:22:55 controller-0. puppet-user[18840]: (Exec[Create Cluster tripleo_cluster](provider=posix)) Executing '/sbin/pcs cluster setup --wait --name tripleo_cluster controller-0 controller-1 controller-2 database-0 database-1 database-2
    messaging-0 messaging-1 messaging-2 --token 10000 --encryption 1'
    M...

Read more...

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/711814

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/train)
Download full text (4.0 KiB)

Reviewed: https://review.opendev.org/711814
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=4db9d1531ab90724b4750a4dea19e25c84237477
Submitter: Zuul
Branch: stable/train

commit 4db9d1531ab90724b4750a4dea19e25c84237477
Author: Michele Baldessari <email address hidden>
Date: Thu Mar 5 17:38:30 2020 +0100

    Enforce firewall rules before pacemaker-auth

    We want to make sure that any firewall rule set to open pacemaker ports
    is executed before we run any commands that invoke pcs to
    authenticate remote nodes.

    It simply makes sense from a high-level POV to explicitely open
    up firewall rules before we invoke pcs commands that will talk to
    remote nodes.

    I have actually seen one case in the wild where during a scaleup
    the node being scaled up was waiting on Exec['wait-for-settle']
    and the bootstrap node failed to contact pcs on the scaled up node
    because there the firewall rules were never opened up as it was
    waiting on the 'wait-for-settle' step.

    Note that we *cannot* impose the ordering via a too-generic
    Firewall<||> collector because in tripleo::firewall we have

        Service<||> -> Class['tripleo::firewall::post']

    and we would create a circular dependency.

    Tested a queens deploy with this change and we are correctly
    guaranteed to open up firewalling before invoking pcs:
    Mar 05 16:22:51 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[swift_storage]/Tripleo::Firewall::Rule[123 swift storage]/Firewall[123 swift storage ipv4]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[swift_storage]/Tripleo::Firewall::Rule[123 swift storage]/Firewall[123 swift storage ipv6]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[tripleo_firewall]/Tripleo::Firewall::Rule[003 accept ssh from any]/Firewall[003 accept ssh from any ipv4]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[tripleo_firewall]/Tripleo::Firewall::Rule[003 accept ssh from any]/Firewall[003 accept ssh from any ipv6]/ensure) created
    Mar 05 16:22:52 controller-0. puppet-user[18840]: (Exec[reauthenticate-across-all-nodes](provider=posix)) Executing '/sbin/pcs cluster auth controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messag
    ing-2 -u hacluster -p foobar --force'
    Mar 05 16:22:52 controller-0. puppet-user[18840]: Executing: '/sbin/pcs cluster auth controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 -u hacluster -p AQtEeE6e3FDEqrfm --force'
    Mar 05 16:22:55 controller-0. puppet-user[18840]: (Exec[Create Cluster tripleo_cluster](provider=posix)) Executing '/sbin/pcs cluster setup --wait --name tripleo_cluster controller-0 controller-1 controller-2 database-0 database-1 database-2
    messaging-0 messaging-1 messaging-2 --token 10000 --encryption 1'...

Read more...

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 12.2.0

This issue was fixed in the openstack/puppet-tripleo 12.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 11.5.0

This issue was fixed in the openstack/puppet-tripleo 11.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.