[Stable/Queens] Functional tests neutron.tests.functional.agent.l3.test_ha_router failing 100% times

Bug #1788185 reported by Slawek Kaplonski on 2018-08-21
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Critical
Slawek Kaplonski

Bug Description

Tests from module neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase are failing 100% times since few days in stable/queens branch.
Example of failure: http://logs.openstack.org/78/593078/1/check/neutron-functional/28fe681/logs/testr_results.html.gz

Miguel Lavalle (minsel) on 2018-08-21
Changed in neutron:
assignee: nobody → Miguel Lavalle (minsel)
Miguel Lavalle (minsel) on 2018-08-23
tags: added: l3-ha
Slawek Kaplonski (slaweq) wrote :

After some checks it looks for me that reason of issue is in "broken" keepalived package. In master branch version "1:1.3.9-1ubuntu0.18.04.1~cloud0" is used and this one works fine. In Queens, Pike and Ocata patches it uses version "1:1.2.24-1ubuntu0.16.04.1" which seems to be broken and which removes all IP addresses from qr-XXX interface during spawning process.
I checked locally that versions older and newer than 1:1.2.24-1ubuntu0.16.04.1 works fine and on this one version there is this issue every time.

Slawek Kaplonski (slaweq) wrote :

I reported bug against keepalived package now: https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1789045

Reviewed: https://review.openstack.org/593078
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=159490502e206f474c6828090e8b86c613f9c8db
Submitter: Zuul
Branch: stable/queens

commit 159490502e206f474c6828090e8b86c613f9c8db
Author: Slawek Kaplonski <email address hidden>
Date: Fri Aug 17 17:14:21 2018 +0200

    cap bandit in test-requirements.txt

    bandit is a linter and is listed in the "blacklist" from the
    requirements repo, so it does not appear in the constraints lists.
    Project teams are expected to manage the verions(s) allowed on their
    own, to allow different teams to roll ahead to new versions as they can
    rather than having the entire community do it in lock-step. This change
    caps the version of bandit to the one available during the rocky
    development cycle to avoid introducing the new rules from newer releases
    into a stable branch.

    This patch also changes to use older keepalived version in functional
    tests.
    This issue is reported in bug 1788185.

    It looks that current keepalived version which is available in
    Ubuntu Xenial repositories (1:1.2.24-1ubuntu0.16.04.1) is broken
    and cause failure of some functional tests in Neutron.
    Details are in [1].
    Older version works fine so as temporary solution we can use
    this version in functional tests.

    This issue don't happens on master and stable/rocky branch, as there
    newer cloud-archive repo is used and it has newer version of keepalived
    which works fine.

    [1] https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1789045

    Change-Id: Ia59de069b29f584cce21163a77812ec0ed243e65
    Closes-Bug: #1788185

tags: added: in-stable-queens

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/596556
Reason: it's already merged together with "cap bandit version" patch

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/596488

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/596500

Reviewed: https://review.openstack.org/596560
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=01d82626646963d3f614d4586b1285d5dad58a87
Submitter: Zuul
Branch: stable/ocata

commit 01d82626646963d3f614d4586b1285d5dad58a87
Author: Slawek Kaplonski <email address hidden>
Date: Sat Aug 25 22:15:22 2018 +0200

    Use older keepalived version in functional tests

    It looks that current keepalived version which is available in
    Ubuntu Xenial repositories (1:1.2.24-1ubuntu0.16.04.1) is broken
    and cause failure of some functional tests in Neutron.
    Details are in [1].
    Older version works fine so as temporary solution we can use
    this version in functional tests.

    This issue don't happens on master and stable/rocky branch, as there
    newer cloud-archive repo is used and it has newer version of keepalived
    which works fine.

    [1] https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1789045

    Change-Id: I418a967cd503991736e72134d4a105b6e97021e8
    Closes-Bug: #1788185

tags: added: in-stable-ocata
Dr. Jens Harbott (j-harbott) wrote :

I tested a bit with 1.2.24 and it seems that it is doing the correct thing as opposed to 1.2.19 in that it removes the virtual IP from the qr- interface while it is in backup state. When I add a sleep(10) to neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase.test_ha_router_process_ipv6_subnets_to_existing_port, the test passes in that scenario[0]. So I think the correct fix for this particular test would be to have it wait for router.ha_state == 'master' like it is done for some other tests in the same file.

Also, test_ha_router_process_ipv6_subnets_to_existing_port is the only test that seems to be failing on master with that keepalived version. For stable branches there are some additional failures, but they look to be happening more like 50% of the time only instead of consistently.

I'm still not sure why the issue isn't seen with the version 1.3.9 from Queens UCA.

[0] http://paste.openstack.org/show/729042/

Dr. Jens Harbott (j-harbott) wrote :

Here are more results from my testing: http://paste.openstack.org/show/729057/

After adding timestamps, it looks like there is some flapping happening with 1.2.24 while it transitions to master state. Adding the sleep makes the test happen after the flapping is over, so it passes, but this still would be a regression in keepalived, then. Sadly no success still with reproducing with standalone keepalived outside of neutron and no time currently for me to continue on this.

Miguel Lavalle (minsel) on 2018-08-30
Changed in neutron:
assignee: Miguel Lavalle (minsel) → Slawek Kaplonski (slaweq)

Reviewed: https://review.openstack.org/596559
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=27519e8ff5e11e0fbb69bb2211bae13a015d1654
Submitter: Zuul
Branch: stable/pike

commit 27519e8ff5e11e0fbb69bb2211bae13a015d1654
Author: Slawek Kaplonski <email address hidden>
Date: Sat Aug 25 22:15:22 2018 +0200

    Use older keepalived version in functional tests

    It looks that current keepalived version which is available in
    Ubuntu Xenial repositories (1:1.2.24-1ubuntu0.16.04.1) is broken
    and cause failure of some functional tests in Neutron.
    Details are in [1].
    Older version works fine so as temporary solution we can use
    this version in functional tests.

    This issue don't happens on master and stable/rocky branch, as there
    newer cloud-archive repo is used and it has newer version of keepalived
    which works fine.

    [1] https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1789045

    Change-Id: I418a967cd503991736e72134d4a105b6e97021e8
    Closes-Bug: #1788185

tags: added: in-stable-pike
Changed in neutron:
status: Confirmed → Fix Committed

This issue was fixed in the openstack/neutron 12.0.4 release.

Edward Hope-Morley (hopem) wrote :

Looks like this is the real cause of your problems - https://bugs.launchpad.net/neutron/+bug/1793102

This issue was fixed in the openstack/neutron 11.0.6 release.

Changed in neutron:
status: Fix Committed → Fix Released

Reviewed: https://review.opendev.org/663363
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4bdd17a74359eaf42c0653e73e84183dba12491a
Submitter: Zuul
Branch: stable/pike

commit 4bdd17a74359eaf42c0653e73e84183dba12491a
Author: Slawek Kaplonski <email address hidden>
Date: Fri Aug 17 17:14:21 2018 +0200

    cap bandit in test-requirements.txt

    bandit is a linter and is listed in the "blacklist" from the
    requirements repo, so it does not appear in the constraints lists.
    Project teams are expected to manage the verions(s) allowed on their
    own, to allow different teams to roll ahead to new versions as they can
    rather than having the entire community do it in lock-step. This change
    caps the version of bandit to the one available during the rocky
    development cycle to avoid introducing the new rules from newer releases
    into a stable branch.

    This patch also changes to use older keepalived version in functional
    tests.
    This issue is reported in bug 1788185.

    It looks that current keepalived version which is available in
    Ubuntu Xenial repositories (1:1.2.24-1ubuntu0.16.04.1) is broken
    and cause failure of some functional tests in Neutron.
    Details are in [1].
    Older version works fine so as temporary solution we can use
    this version in functional tests.

    This issue don't happens on master and stable/rocky branch, as there
    newer cloud-archive repo is used and it has newer version of keepalived
    which works fine.

    [1] https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1789045

    Change-Id: Ia59de069b29f584cce21163a77812ec0ed243e65
    Closes-Bug: #1788185
    (cherry picked from commit 159490502e206f474c6828090e8b86c613f9c8db)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers