tripleo::firewall does not work as intended if the image has prepopulated firewall rules

Bug #1657108 reported by Michele Baldessari on 2017-01-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Michele Baldessari

Bug Description

I believe we've seen something of the sort here:
http://logs.openstack.org/99/418099/6/experimental/gate-tripleo-ci-centos-7-scenario005-multinode/320e09e/logs/postci.txt.gz#_2017-01-16_20_35_51_000

So my initial thinking is the following:
- We started off with an image that has firewall enabled and lets only
  ICMP and ssh through
- We call the cluster setup stuff which will fail because the pcsd port
  is not open
- The tripleo firewall opens the cluster ports too late in the game

tags: added: composable-roles
Changed in tripleo:
importance: Undecided → High
status: New → Triaged
Changed in tripleo:
milestone: ocata-3 → ocata-rc1
Michele Baldessari (michele) wrote :
Download full text (5.7 KiB)

So I can confirm this theory. I have injected the following (standard?) iptables file in /etc/sysconfig:
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

On a deploy where pacemaker is included in controller0,1,2 and galera0. And where pacemaker-remote is added to remote-0 and rabbit-0 nodes I get the following situation:
A) controller0,1,2:
[root@overcloud-controller-0 ~]# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
 6394 1363K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
    0 0 ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0
  189 11340 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
    1 60 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
  861 54822 REJECT all -- * * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination
    0 0 REJECT all -- * * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT 7933 packets, 1231K bytes)
 pkts bytes target prot opt in out source destination
[root@overcloud-controller-0 ~]# systemctl is-active iptables
active

So in this situation the command that sets up the cluster and which expects to be able to talk to the pcsd ports will fail:
2017-01-18 11:39:54 +0000 Puppet (debug): Executing: '/sbin/pcs cluster auth overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 overcloud-galera-0 -u hacluster -p a6KKVbqfDqep2zgL --force'

Interestingly we also have
B) overcloud-galera-0
[root@overcloud-galera-0 ~]# more /etc/sysconfig/iptables
# sample configuration for iptables service
# you can edit this manually or use system-config-firewall
# please do not ask us to add additional ports/services to this default configuration
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
[root@overcloud-galera-0 ~]# systemctl status iptables
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

C) overcloud-rabbit-0
[root@overcloud-rabbit-0 ~]# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot op...

Read more...

summary: - pacemaker cluster setup fails if the image has prepopulated firewall
- rules with no cluster/pcsd access
+ tripleo::firewall does not work as intended if the image has
+ prepopulated firewall rules
Michele Baldessari (michele) wrote :

Ok so the bug is really somewhere in tripleo::firewall. Here is what happens:
1) Image starts with prepopulated /etc/sysconfig/iptables only allowing ssh and icmp
2) During boot (either because the puppet class "firewall" enforces it or because the image has it configured) the "iptables" systemd service starts and sets the above iptables rules
3) We have the following in puppet-pacemaker:
Service['pcsd'] -> exec { 'auth-successful-across-all-nodes':
  command => "${::pacemaker::pcs_bin} cluster auth ${cluster_members} -u hacluster -p ${::pacemaker::hacluster_pwd}",
4) tripleo::firewall guarantees the following:
Class['tripleo::firewall::pre'] -> Class['tripleo::firewall::post']
Service<||> -> Class['tripleo::firewall::post']

So we have this sequence overall:
A) Class['tripleo::firewall::pre'] -> B) Service['pcsd'] -> C) exec { 'auth-successful-across-all-nodes'} -> D) Class['tripleo::firewall::post']

The problem being that when C) runs there are no open ports yet so it will hang trying for many times.

Potential solutions:
A) Empty /etc/sysconfig/iptables in the image itself (it makes little sense to have it anyway)
B) Find another way to purge the rules in there. I tried the following:
diff --git a/manifests/firewall.pp b/manifests/firewall.pp
index 8c6a53b..d577ca1 100644
--- a/manifests/firewall.pp
+++ b/manifests/firewall.pp
@@ -56,7 +56,8 @@ class tripleo::firewall(
     # Only purges IPv4 rules
     if $purge_firewall_rules {
       resources { 'firewall':
- purge => true
+ purge => true,
+ before => Class['tripleo::firewall::pre'],
       }
     }

and with setting:
parameter_defaults:
  PurgeFirewallRules: true

but somehow it does clean the live rules on the system but it will start the iptables service again which will reprovision the previous rules.

C) Add some special rules in firewall::pre that open up the cluster ports (or any service that might be impacted)
D) Any other approaches here?

Note that with commit 2ca3cb03ad5f05469e5ae181981e559ccc77371f "firewall: stop using stdlib stages" we stated that:
- use ordering to make sure we start all Services in catalog before post
  rules. It ensure that we don't drop all traffic before starting the
  services, which could lead to services errors (e.g. trying to reach database
  or amqp)

The problem is that the above holds true only when the iptables starts as clean.

Michele Baldessari (michele) wrote :

So I *think* the reason why even with "purge" enabled this is not working is due to the following bug in puppetlabs-firewall:
https://tickets.puppetlabs.com/browse/MODULES-3184

So option E) to fix this would be to enable purge and have the bug above fixed.

Fix proposed to branch: master
Review: https://review.openstack.org/422425

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
status: Triaged → In Progress

Reviewed: https://review.openstack.org/422472
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=48c2a3f7ce958a8593795e29bbe244ba48f2708e
Submitter: Jenkins
Branch: master

commit 48c2a3f7ce958a8593795e29bbe244ba48f2708e
Author: Michele Baldessari <email address hidden>
Date: Thu Jan 19 09:53:19 2017 +0100

    Add a script to zero /etc/sysconfig/iptables at build time

    When including this element we empty the stock /etc/sysconfig/iptables
    file as shipped by the iptables rpm package. The reason for this is that
    puppet firewall has a hard time to cope with exiting rules when
    /etc/sysconfig/iptables is populated and the iptables service is not
    active. The referenced bug has a full explanation for the problem.

    Partial-Bug: #1657108

    Change-Id: Iddc21316a1a3d42a1a43cbb4b9c178adba8f8db3

Change abandoned by Michele Baldessari (<email address hidden>) on branch: master
Review: https://review.openstack.org/422436
Reason: Abandoned in favour of https://review.openstack.org/#/c/422475/ and https://review.openstack.org/#/c/422472/ aka clear the file in the image building process

Change abandoned by Michele Baldessari (<email address hidden>) on branch: master
Review: https://review.openstack.org/422425
Reason: Abandoned in favour of https://review.openstack.org/#/c/422475/ and https://review.openstack.org/#/c/422472/ aka clear the file in the image building process

Reviewed: https://review.openstack.org/422475
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=439d0d5d057258e9950ccc9b356c24a31e833d53
Submitter: Jenkins
Branch: master

commit 439d0d5d057258e9950ccc9b356c24a31e833d53
Author: Michele Baldessari <email address hidden>
Date: Thu Jan 19 09:55:03 2017 +0100

    Add the iptables element to the image building process

    Adding the iptables element will effectively empty
    /etc/sysconfig/iptables and will solve the linked bug where a full
    explanation of the different approaches has been provided.

    This approach has been chosen because other approaches proved to
    be either to complex *or* required that we enable the purging of
    rules by default which might be disruptive for existing installations.

    Closes-Bug: #1657108
    Change-Id: Id0498a1158b5ace0df961248946f3ab5f11c26da
    Depends-On: Iddc21316a1a3d42a1a43cbb4b9c178adba8f8db3

Changed in tripleo:
status: In Progress → Fix Released
Changed in tripleo:
status: Fix Released → In Progress
Michele Baldessari (michele) wrote :

So I think I have seen ipv6 jobs timeout because we did not address the ipv6 part of this problem. Namely in a stock iptables rpm we have the following /etc/sysconfig/ip6tables:
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-adm-prohibited
-A FORWARD -j REJECT --reject-with icmp6-adm-prohibited
COMMIT

Now we can just clean this file like we did for ipv4 via Iddc21316a1a3d42a1a43cbb4b9c178adba8f8db3, but ipv6 is slightly different because of the dhcpv6 rule which we would lose:
-A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT

Which according to https://bugzilla.redhat.com/show_bug.cgi?id=1169036 would breack dhcpv6 responses and is present today in our installations where firewall is enabled.

So to fix this we either do
A) add this only rule to /etc/sysconfig/ip6tables like this:
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT
COMMIT

That way when puppet-firewall kicks in for ipv6 it will add its rules and will let the above rule be around

B) We add a specific generic rule for ipv6 to mimick the above rule so that puppet-firewall will

Also note that the dhcpv6 iptables client rules started being shipped by default in the iptables package due to https://bugzilla.redhat.com/show_bug.cgi?id=1169036

I will implement B.

Reviewed: https://review.openstack.org/426143
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=d5d4cc1094365b6bb147216d2ec99ddc36020a31
Submitter: Jenkins
Branch: master

commit d5d4cc1094365b6bb147216d2ec99ddc36020a31
Author: Michele Baldessari <email address hidden>
Date: Fri Jan 27 10:54:28 2017 +0100

    Add a default rule for dhcpv6 traffic

    Via bug https://bugs.launchpad.net/tripleo/+bug/1657108 we need
    to zero out the default rules in /etc/sysconfig/ip{6}tables in
    the image.
    We have done this for ipv4, but when we will do it for ipv6 we
    will also need to make sure we add a rule for dhcpv6 traffic
    as it is shipped in the iptables rpm. (See
    https://bugzilla.redhat.com/show_bug.cgi?id=1169036 for more info)

    With this change we correctly get the rule present (aka the first
    ACCEPT line. The second line is due to the stock ip6tables rule
    I had in my testing):
    [root@overcloud-controller-0 ~]# iptables -nvL |grep 546
    [root@overcloud-controller-0 ~]# ip6tables -nvL |grep 546
        0 0 ACCEPT udp * * ::/0 fe80::/64 multiport dports 546 /* 004 accept ipv6 dhcpv6 ipv6 */ state NEW
        0 0 ACCEPT udp * * ::/0 fe80::/64 udp dpt:546 state NEW

    Change-Id: If22080054b2b1fa7acfd101e8c34d2707e8e7864
    Partial-Bug: #1657108

Reviewed: https://review.openstack.org/426144
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=96cb130c5ac5fb3a312d9831ed2f92568d778399
Submitter: Jenkins
Branch: master

commit 96cb130c5ac5fb3a312d9831ed2f92568d778399
Author: Michele Baldessari <email address hidden>
Date: Fri Jan 27 10:49:12 2017 +0100

    Add a script to zero /etc/sysconfig/ip6tables at build time

    In change Iddc21316a1a3d42a1a43cbb4b9c178adba8f8db3 we zeroed out
    /etc/sysconfig/iptables, but we did not take care of ipv6. This change
    is meant to take of the ipv6 part of the problem.
    When including this element we empty the stock /etc/sysconfig/ip6tables
    file as shipped by the iptables rpm package. The reason for this is that
    puppet firewall has a hard time to cope with exiting rules when
    /etc/sysconfig/iptables is populated and the iptables service is not
    active. The referenced bug has a full explanation for the problem.

    Note that ipv6 is slightly more delicate because we will also need a puppet-tripleo
    change that implements the dhcpv6 rule that is contained by default
    in /etc/sysconfig/ip6tables:
    Depends-On: If22080054b2b1fa7acfd101e8c34d2707e8e7864

    Change-Id: I0dee5ff045fbfe7b55d078583e16b107eec534aa
    Partial-Bug: #1657108

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/tripleo-common 5.8.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.