Periodic rocky fs020 job fails tempest tests tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_cross_tenant_traffic and tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_multiple_security_groups

Bug #1843259 reported by Gabriele Cerami
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Terry Wilson

Bug Description

logs at https://logs.rdoproject.org/openstack-periodic-24hr/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-rocky/e9d92b4/logs/undercloud/home/zuul/tempest.log.txt.gz#_2019-09-09_09_12_54

show

2019-09-09 09:12:54 | tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_cross_tenant_traffic[compute,id-e79f879e-debb-440c-a7e4-efeda05b6848,network]
2019-09-09 09:12:54 | -------------------------------------------------------------------------------------------------------------------------------------------------------------
2019-09-09 09:12:54 |
2019-09-09 09:12:54 | Captured traceback:
2019-09-09 09:12:54 | ~~~~~~~~~~~~~~~~~~~
2019-09-09 09:12:54 | Traceback (most recent call last):
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
2019-09-09 09:12:54 | return f(*func_args, **func_kwargs)
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/tempest/scenario/test_security_groups_basic_ops.py", line 488, in test_cross_tenant_traffic
2019-09-09 09:12:54 | self._test_cross_tenant_block(source_tenant, dest_tenant)
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/tempest/scenario/test_security_groups_basic_ops.py", line 406, in _test_cross_tenant_block
2019-09-09 09:12:54 | should_succeed=False)
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 960, in check_remote_connectivity
2019-09-09 09:12:54 | self.fail(msg)
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 690, in fail
2019-09-09 09:12:54 | raise self.failureException(msg)
2019-09-09 09:12:54 | AssertionError: 10.0.0.105 is reachable from 10.0.0.106

2019-09-09 09:12:54 | tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_multiple_security_groups[compute,id-d2f77418-fcc4-439d-b935-72eca704e293,network,slow]
2019-09-09 09:12:54 | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
2019-09-09 09:12:54 |
2019-09-09 09:12:54 | Captured traceback:
2019-09-09 09:12:54 | ~~~~~~~~~~~~~~~~~~~
2019-09-09 09:12:54 | Traceback (most recent call last):
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
2019-09-09 09:12:54 | return f(*func_args, **func_kwargs)
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/tempest/scenario/test_security_groups_basic_ops.py", line 575, in test_multiple_security_groups
2019-09-09 09:12:54 | should_connect=False)
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 622, in check_vm_connectivity
2019-09-09 09:12:54 | msg=msg)
2019-09-09 09:12:54 | File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 702, in assertTrue
2019-09-09 09:12:54 | raise self.failureException(msg)
2019-09-09 09:12:54 | AssertionError: False is not true : ip address 10.0.0.105 is reachable

indicating errors in security groups setup

Changed in tripleo:
importance: Undecided → Critical
tags: added: tempest
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Revision history for this message
Nate Johnston (nate-johnston) wrote :

Can we get access to CI nodes with those failed tests? It would be much easier to debug, if that is possible. If so please let me know, plus <email address hidden> - thanks!

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Just some info to review the logs:
Port:
- id: 427e785f-...
- ip: 10.0.0.105
- mac: fa:16:3e:d2:16:58
- subnet: 0a891adb-...
- net: 90e0670a-...
SG:
- id: 929f0211-...
- rule(ssh): 524aa39b-...

The port (in compute1, OVS agent logs), is:
- bond: 08:53:16.262
- processed by the OVS agent: 08:53:18.384
- preparing filters for port: 08:53:19.657
- iptables finishes applying 83 rules: 08:53:19.742

The main problem here is, unlike in OVS firewall, the IPtables rules are not logged (even in DEBUG level). I'm going to propose a patch to have this output in the logs.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I was checking logs from failed jobs and I don't see anything which could cause this issue.
I also checked patches merged in those days to stable/rocky, as suggested by Sagi, but even with couple more days: https://review.opendev.org/#/q/status:merged+AND+branch:stable/rocky+before:2019-09-09+after:2019-09-05 - there is nothing really suspicious there.

So for now I think that maybe it's some change in centos 7 image used for tests? Like e.g. different docker, iptables or kernel version maybe.
Can we somehow compare what versions of this software was used in those different jobs?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Ok, I found list of packages in https://logs.rdoproject.org/openstack-periodic-24hr/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-rocky/e9d92b4/logs/overcloud-novacompute-1/var/log/extra/ and it looks that packages installed in passing job (6.09) are exactly the same as those installed on e.g. 9.09 when job failed.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I think I might found the reason. In job where tests are failing, I see

net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0

While on "passing" job those values are set to 1

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

So I have no idea why those settings are switched to 0 but I'm pretty sure that this is a reason of why this job is failing.
If we could have access to such CI nodes which runs this job, we could than switch those settings to 1 and run tests again to see if this will really helps.
Now I think that also someone much more familiar with those CI jobs and TripleO should take a look to check maybe where those values are changed.

Revision history for this message
Emilien Macchi (emilienm) wrote :
Download full text (3.4 KiB)

sysctl settings managed by Puppet are visible in this hieradata:

https://logs.rdoproject.org/openstack-periodic-24hr/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-rocky/e9d92b4/logs/overcloud-novacompute-1/etc/puppet/hieradata/service_configs.json.txt.gz

Pasting here:

    "sysctl_settings": {
        "fs.inotify.max_user_instances": {
            "value": 1024
        },
        "fs.suid_dumpable": {
            "value": 0
        },
        "kernel.dmesg_restrict": {
            "value": 1
        },
        "kernel.pid_max": {
            "value": 1048576
        },
        "net.core.netdev_max_backlog": {
            "value": 10000
        },
        "net.ipv4.conf.all.arp_accept": {
            "value": 1
        },
        "net.ipv4.conf.all.arp_notify": {
            "value": 1
        },
        "net.ipv4.conf.all.log_martians": {
            "value": 1
        },
        "net.ipv4.conf.all.secure_redirects": {
            "value": 0
        },
        "net.ipv4.conf.all.send_redirects": {
            "value": 0
        },
        "net.ipv4.conf.default.accept_redirects": {
            "value": 0
        },
        "net.ipv4.conf.default.log_martians": {
            "value": 1
        },
        "net.ipv4.conf.default.secure_redirects": {
            "value": 0
        },
        "net.ipv4.conf.default.send_redirects": {
            "value": 0
        },
        "net.ipv4.ip_forward": {
            "value": 1
        },
        "net.ipv4.ip_nonlocal_bind": {
            "value": 0
        },
        "net.ipv4.neigh.default.gc_thresh1": {
            "value": 1024
        },
        "net.ipv4.neigh.default.gc_thresh2": {
            "value": 2048
        },
        "net.ipv4.neigh.default.gc_thresh3": {
            "value": 4096
        },
        "net.ipv4.tcp_keepalive_intvl": {
            "value": 1
        },
        "net.ipv4.tcp_keepalive_probes": {
            "value": 5
        },
        "net.ipv4.tcp_keepalive_time": {
            "value": 5
        },
        "net.ipv6.conf.all.accept_ra": {
            "value": 0
        },
        "net.ipv6.conf.all.accept_redirects": {
            "value": 0
        },
        "net.ipv6.conf.all.autoconf": {
            "value": 0
        },
        "net.ipv6.conf.all.disable_ipv6": {
            "value": 0
        },
        "net.ipv6.conf.all.ndisc_notify": {
            "value": 1
        },
        "net.ipv6.conf.default.accept_ra": {
            "value": 0
        },
        "net.ipv6.conf.default.accept_redirects": {
            "value": 0
        },
        "net.ipv6.conf.default.autoconf": {
            "value": 0
        },
        "net.ipv6.conf.default.disable_ipv6": {
            "value": 0
        },
        "net.ipv6.conf.lo.disable_ipv6": {
            "value": 0
        },
        "net.ipv6.ip_nonlocal_bind": {
            "value": 0
        },
        "net.netfilter.nf_conntrack_max": {
            "value": 500000
        },
        "net.nf_conntrack_max": {
            "value": 500000
        }
    },

As you can see, nothing about the net.bridge.*; so I suspect this is done outside of TripleO.
Maybe in the RDO node...

Read more...

Changed in tripleo:
milestone: train-3 → train-rc1
Revision history for this message
Terry Wilson (otherwiseguy) wrote :

From /usr/lib/sysctl.d/00-system.conf:

# Kernel sysctl configuration file
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
# sysctl.conf(5) for more details.

# Disable netfilter on bridges.
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

Also, when loading br_netfiter, those values default to 1. BUT, doing a sysctl network restart resets them to 0.

It seems pretty clear that "THINGS BREAK WHEN THESE ARE 0" and "TRUST THE DEFAULT VALUE" is not the best idea. I'm not super familiar with where these get configured in OOO. I'll try to make a quick patch to force the correct values, but if someone else beats me to it that's great. :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685766

Changed in tripleo:
assignee: Gabriele Cerami (gcerami) → Terry Wilson (otherwiseguy)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/683350
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=62babbfe379dad4dbc9a87c216f236e3a70d711b
Submitter: Zuul
Branch: master

commit 62babbfe379dad4dbc9a87c216f236e3a70d711b
Author: Arx Cruz <email address hidden>
Date: Fri Sep 20 12:07:49 2019 +0200

    Adding tests to skip list in Rocky

    The two tests added to skip list are failing on featureset020. A
    launchpad bug is open and the responsible team are investigating. Once
    get fixed, we will remove from skip list.

    Related-Bug: 1843259
    Change-Id: I3aaa38ff0180826a6e749d0686d1a0e37e97eb84

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/687057

Changed in tripleo:
assignee: Terry Wilson (otherwiseguy) → Luke Short (ekultails)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Luke Short (<email address hidden>) on branch: master
Review: https://review.opendev.org/687057
Reason: An existing/conflicting review already covers the exact same logic: https://review.opendev.org/#/c/685766/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/687084

Changed in tripleo:
milestone: train-rc1 → ussuri-1
Changed in tripleo:
assignee: Luke Short (ekultails) → Terry Wilson (otherwiseguy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/687375

Changed in tripleo:
assignee: Terry Wilson (otherwiseguy) → Luke Short (ekultails)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/687383

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/687384

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/687386

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/687387

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/687084
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=9bf90c0af856162968ff3678832edde7ad1f8fee
Submitter: Zuul
Branch: master

commit 9bf90c0af856162968ff3678832edde7ad1f8fee
Author: ekultails <email address hidden>
Date: Tue Oct 8 09:50:48 2019 -0400

    Install packages and load kernel modules before configuring sysctl.

    Otherwise we run into dependency problems where a kernel module
    may not be loaded yet. This results in the sysctl options not
    existing yet in the virtual file system /proc/sys/.

    Also add br_netfilter to the required modules for TripleO.
    It is required for bridge-nf-call-* sysctl options to work.

    Change-Id: Ia28f2fdef34e739801c51828c99e9e6598dd2efb
    Related-Bug: #1843259
    Signed-off-by: ekultails <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by Luke Short (<email address hidden>) on branch: master
Review: https://review.opendev.org/687375
Reason: As Alex has explained, the existing functionality actually works as we want it to. Kernel modules are loaded before configuring sysctl.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (stable/stein)

Change abandoned by Luke Short (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/687383

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (stable/rocky)

Change abandoned by Luke Short (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/687384

Revision history for this message
Marios Andreou (marios-b) wrote :

patch merged https://review.opendev.org/687084 and we had a couple green runs at https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-rocky but wondering if it is wise to have a job which is in the promote criteria [1] to run only on the wednesday/weekend pipeline [2]

maybe we want to move that back to the more frequent pipeline periodic-master or just periodic-24hr at least?

[1] https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/config/CentOS-7/rocky.ini
[2] https://github.com/rdo-infra/review.rdoproject.org-config/blob/0d44eb2851086d54e537bfad87008f35b35ec12a/zuul.d/tripleo.yaml#L329-L353

Changed in tripleo:
assignee: Luke Short (ekultails) → Terry Wilson (otherwiseguy)
Revision history for this message
Marios Andreou (marios-b) wrote :

to answer my own question from comment #24 looks like the weekend pipeline is for pike/queens which we don't need to promote as often so it was intentional

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/685766
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3d722dbc810b0f9521ce1cfc461789bdfe20e36d
Submitter: Zuul
Branch: master

commit 3d722dbc810b0f9521ce1cfc461789bdfe20e36d
Author: Terry Wilson <email address hidden>
Date: Mon Sep 30 13:00:49 2019 -0500

    Set bridge-nf-call-* values to 1

    Although the kernel default is 1, some distros override the defaults
    via sysctl.conf. Loading br_netfilter manually will show values of
    1, but then doing a 'sysctl network restart' will set the values to
    0--so go ahead and override these values.

    Co-Author: Luke Short <email address hidden>
    Depends-On: Ia28f2fdef34e739801c51828c99e9e6598dd2efb
    Change-Id: I53dec308d359b27e62ed44e91a8eaae38d945a4f
    Closes-Bug: #1843259

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/689451

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/rocky)

Reviewed: https://review.opendev.org/687387
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1d5eae2651dc544c7948afed2f50c2c59e347428
Submitter: Zuul
Branch: stable/rocky

commit 1d5eae2651dc544c7948afed2f50c2c59e347428
Author: Terry Wilson <email address hidden>
Date: Mon Sep 30 13:00:49 2019 -0500

    Set bridge-nf-call-* values to 1

    Although the kernel default is 1, some distros override the defaults
    via sysctl.conf. Loading br_netfilter manually will show values of
    1, but then doing a 'sysctl network restart' will set the values to
    0--so go ahead and override these values.

    Co-Author: Luke Short <email address hidden>
    Change-Id: I53dec308d359b27e62ed44e91a8eaae38d945a4f
    Closes-Bug: #1843259
    (cherry picked from commit 3d722dbc810b0f9521ce1cfc461789bdfe20e36d)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/687386
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6f5c3f9444ae31bbbd65044137853384a12390ad
Submitter: Zuul
Branch: stable/stein

commit 6f5c3f9444ae31bbbd65044137853384a12390ad
Author: Terry Wilson <email address hidden>
Date: Mon Sep 30 13:00:49 2019 -0500

    Set bridge-nf-call-* values to 1

    Although the kernel default is 1, some distros override the defaults
    via sysctl.conf. Loading br_netfilter manually will show values of
    1, but then doing a 'sysctl network restart' will set the values to
    0--so go ahead and override these values.

    Co-Author: Luke Short <email address hidden>
    Change-Id: I53dec308d359b27e62ed44e91a8eaae38d945a4f
    Closes-Bug: #1843259
    (cherry picked from commit 3d722dbc810b0f9521ce1cfc461789bdfe20e36d)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.3.0

This issue was fixed in the openstack/tripleo-heat-templates 11.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.opendev.org/689451
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=187d8cc0436149b65efdd6ba778efd03ec3c367f
Submitter: Zuul
Branch: stable/queens

commit 187d8cc0436149b65efdd6ba778efd03ec3c367f
Author: Terry Wilson <email address hidden>
Date: Mon Sep 30 13:00:49 2019 -0500

    Set bridge-nf-call-* values to 1

    Although the kernel default is 1, some distros override the defaults
    via sysctl.conf. Loading br_netfilter manually will show values of
    1, but then doing a 'sysctl network restart' will set the values to
    0--so go ahead and override these values.

    Co-Author: Luke Short <email address hidden>
    Change-Id: I53dec308d359b27e62ed44e91a8eaae38d945a4f
    Closes-Bug: #1843259
    (cherry picked from commit 3d722dbc810b0f9521ce1cfc461789bdfe20e36d)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.6.2

This issue was fixed in the openstack/tripleo-heat-templates 10.6.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/699394
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=f1b00f047569913eb33fe67c6cf714ca0e8b4373
Submitter: Zuul
Branch: master

commit f1b00f047569913eb33fe67c6cf714ca0e8b4373
Author: Soniya Vyas <email address hidden>
Date: Fri Jan 3 15:40:56 2020 +0530

    [Rocky] Removed passing tests from skiplist

    Class neutron_tempest_plugin.scenario has maximum
    number of tests passed. Hence, it is removed and
    only failing testsis kept.

    In addition to above, added two more tests from
    class 'TestSecurityGroupsBasicOps' and a test from
    class 'ServersOnMultiNodesTest'

    Related-bug: #1737940
    Related-bug: #1753209
    Related-bug: #1843259
    Related-bug: #1793482
    Related-bug: #1831223
    Related-bug: #1857365

    Signed-off by: Soniya Vyas<email address hidden>
    Change-Id: I8239bb694187d7f912163742e836a7362cdb1483

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates rocky-eol

This issue was fixed in the openstack/tripleo-heat-templates rocky-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates queens-eol

This issue was fixed in the openstack/tripleo-heat-templates queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.