The tripleo firewall module has fundamentally three pieces:
1) firewall::pre (allows existing connections/ssh/icmp)
2) firewall::rule (allows services traffic)
3) firewall::post (drops all traffic)
One of the assumptions coded in the module is the following line:
Service<||> -> Class['tripleo::firewall::post']
Which has been added so that:
"""
use ordering to make sure we start all Services in catalog before post
rules. It ensure that we don't drop all traffic before starting the
services, which could lead to services errors (e.g. trying to reach database or amqp)
"""
(see also bug LP#1643575)
Now the problem is that while we guarantee that pre comes before post and that services should start before post, we are not guaranteeing that the rules are applied before post.
In fact in my deployment I see the following:
Jul 10 05:04:13 overcloud-controller-1 systemd: Started OpenSSH server daemon.
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Ssh::Server::Service/Service[sshd]) Triggered 'refresh' from 2 events
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[000 accept related established rules]/Firewall[000 accept related established rules ipv4]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[000 accept related established rules]/Firewall[000 accept related established rules ipv6]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[001 accept all icmp]/Firewall[001 accept all icmp ipv4]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[001 accept all icmp]/Firewall[001 accept all icmp ipv6]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[002 accept all to lo interface]/Firewall[002 accept all to lo interface ipv4]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[002 accept all to lo interface]/Firewall[002 accept all to lo interface ipv6]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[003 accept ssh]/Firewall[003 accept ssh ipv4]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[003 accept ssh]/Firewall[003 accept ssh ipv6]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Pre/Tripleo::Firewall::Rule[004 accept ipv6 dhcpv6]/Firewall[004 accept ipv6 dhcpv6 ipv6]/ensure) created
Jul 10 05:04:13 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[998 log all]/Firewall[998 log all ipv4]/ensure) created
Jul 10 05:04:14 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[998 log all]/Firewall[998 log all ipv6]/ensure) created
Jul 10 05:04:14 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]/ensure) created
Jul 10 05:04:14 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]/ensure) created
Jul 10 05:04:14 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[cinder_api]/Tripleo::Firewall::Rule[119 cinder]/Firewall[119 cinder ipv4]/ensure) created
Jul 10 05:04:14 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[cinder_api]/Tripleo::Firewall::Rule[119 cinder]/Firewall[119 cinder ipv6]/ensure) created
Jul 10 05:04:14 overcloud-controller-1 puppet-user[32418]: (/Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[cinder_volume]/Tripleo::Firewall::Rule[120 iscsi initiator]/Firewall[120 iscsi initiator ipv4]/ensure) created
As we can see above the service rules (aka item 2) were added after the post rules, which is breaking the assumption that the service is up and running and reachable.
In fact I am hitting this issue while trying to get controllers to scale up because the cluster is up and running and only later we apply pre+post, and since it takes some time to apply all the iptables rules the cluster thinks the other nodes are unreachable and will fence them
Fix proposed to branch: master /review. openstack. org/581634
Review: https:/