All tempest tests fail on traceroute call with "TimeoutException: Request timed out"

Bug #1783997 reported by Bernard Cafarelli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-sfc
Fix Released
Undecided
Bernard Cafarelli

Bug Description

Since around Rocky mid-cycle, the tempest gates always fail on all tests. Sample failure:
http://logs.openstack.org/05/575705/4/check/networking-sfc-tempest-dsvm/fefcd56/

VMs creation looks OK, but the test fails when trying to connect to a VM and run traceroute to the other:
2018-07-23 17:56:56.323 6755 INFO tempest.lib.common.ssh [-] Creating ssh connection to '172.24.5.20:22' as 'cirros' with public key authentication
2018-07-23 17:56:56.333 6755 INFO paramiko.transport [-] Connected (version 2.0, client dropbear_2012.55)
2018-07-23 17:56:56.607 6755 INFO paramiko.transport [-] Authentication (publickey) successful!
2018-07-23 17:56:56.608 6755 INFO tempest.lib.common.ssh [-] ssh connection to cirros@172.24.5.20 successfully created
2018-07-23 18:00:13.667 6755 ERROR tempest.lib.common.utils.linux.remote_client [-] (TestSfc:test_create_port_chain) Executing command on 172.24.5.20 failed. Error: Request timed out
Details: Command: 'set -eu -o pipefail; PATH=$PATH:/sbin; traceroute -n -I 10.1.0.13' executed on host '172.24.5.20'.: TimeoutException: Request timed out

After some digging I suspect some security group issue, as I deployed a master devstack and manually tested SFC, still working fine. But I disable port security in my manual tests

Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

While tempest test is running, I made a quick test and run "openstack port set --disable-port-security --no-security-group" on all ports related to the test.

This allowed traceroute to finally report in:
    traceroute to 10.0.0.5 (10.0.0.5), 30 hops max, 46 byte packets
     1 * * *
     2 * * *
     3 * * *
     4 * * *
     5 * * *
     6 * 10.0.0.5 2.316 ms 1.935 ms

    2018-07-27 15:07:36,557 16774 ERROR [networking_sfc.tests.tempest_plugin.tests.scenario.base] length mismatch:
    [u' 1 * * *', u' 2 * * *', u' 3 * * *', u' 4 * * *', u' 5 * * *']
    vs
    [[u'10.0.0.8']]

The first '* * *' were timeouts until I disabled port security

Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

Also tweaking the code to run with port security disabled, all tests pass

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-sfc (master)

Change abandoned by Bernard Cafarelli (<email address hidden>) on branch: master
Review: https://review.openstack.org/575705
Reason: Merged in https://review.openstack.org/#/c/584873/ to get zuul +1

Changed in networking-sfc:
status: New → In Progress
assignee: nobody → Bernard Cafarelli (bcafarel)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-sfc (master)

Reviewed: https://review.openstack.org/584873
Committed: https://git.openstack.org/cgit/openstack/networking-sfc/commit/?id=2000d47b57e093f7de844b22ff67555d7933bc55
Submitter: Zuul
Branch: master

commit 2000d47b57e093f7de844b22ff67555d7933bc55
Author: Boden R <email address hidden>
Date: Mon Jul 23 07:20:08 2018 -0600

    update requirements for neutron-lib 1.18.0

    Neutron-lib 1.18.0 is our Rocky RC and is already being used by neutron
    [1]. This patch updates the neutron-lib required version to match
    neutron [1] in prep for the Rocky release.

    lower bounds of individual python modules are declared to
    (test-)requirements.txt as requirements-check requires.

    To work with neutron-lib 1.18.0, we need Rocky version of neutron,
    so the minimum version of neutron is bumped to 13.0.0.0b2 (Rocky-2).

    We also need to bump the minimum version of SQLAlchemy. Rocky neutron
    depends on pending_to_persistent ORM event in SQLAlchemy which was added
    in SQLAlchemy 1.1. Rocky neutron now requires SqlAlchemy>=1.2.0,
    so the min version of SQLAlchemy is bumped to 1.2.0.

    To fix tempest tests, we also disable port security. As current tests
    already run with a wildcard security group to allow all traffic,
    switching to port security disabled does not change much

    [1] https://review.openstack.org/#/c/583671/

    Co-Authored-By: Akihiro Motoki <email address hidden>
    Co-Authored-By: Bernard Cafarelli <email address hidden>
    Change-Id: Ie83ab14ae74d1f4c35876f1d0d5ca3106b4cd12e
    Closes-Bug: #1783997

Changed in networking-sfc:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-sfc 7.0.0.0rc1

This issue was fixed in the openstack/networking-sfc 7.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.