neutron-tempest-plugin-openvswitch-* jobs randomly failing in gate

Bug #2037239 reported by Brian Haley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Unassigned

Bug Description

A number of different scenario tests seem to be failing randomly in the same way:

Details: Router 01dda41e-67ed-4af0-ac56-72fd895cef9a is not active on any of the L3 agents

One example is in https://review.opendev.org/c/openstack/neutron/+/895832 where these three jobs are failing:

neutron-tempest-plugin-openvswitch-iptables_hybrid FAILURE
neutron-tempest-plugin-openvswitch FAILURE
neutron-tempest-plugin-openvswitch-enforce-scope-new-defaults FAILURE

I see combinations of these three failing in other recent checks as well.

Further investigation required.

Tags: gate-failure
Changed in neutron:
status: New → Confirmed
Revision history for this message
Felix Huettner (felix.huettner) wrote :
Revision history for this message
Lajos Katona (lajos-katona) wrote :
Revision history for this message
Lajos Katona (lajos-katona) wrote :

I can't see anything that was merged and can be related or something that was recently released.
(Of course everything was released in the recent weeks, but something that can be related).
The neutron-tempest-plugin-linuxbridge job in the periodic queue failing with the same pattern in the last 3 days. (https://zuul.openstack.org/buildsets?project=openstack%2Fneutron&pipeline=periodic&branch=master )

Revision history for this message
Brian Haley (brian-haley) wrote :

I also didn't see any recent changes in this area. Looking through reviews, this was the first one where I saw these jobs fail, which was September 22nd 7pm:

https://review.opendev.org/c/openstack/neutron/+/896299

And looking at the logs this was the parent when the job ran:

commit 8e38e57b8000ca6ce9ab84692a9aba6220556a3d
Merge: dbe4ba910b 2c0e9cfa71
Author: Zuul <email address hidden>
Date: Thu Sep 21 17:52:23 2023 +0000

    Merge "Create a single method to set the quota usage dirty bit"

It does seem only the master and 2023.2 gates are affected, at least looking at the n-t-p change introducing the new 2023.2 checks, https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/895516

And the above change is only on master, not 2023.2.

The only thing I've noticed is that when the l3-agent complains about an interface not existing in the namespace, looking at the ovs-vswitchd log shows the interface in question was added to br-int just a few milliseconds later. The l3-agent then proceeds to fail doing a lot of sysctl calls, but it's unclear if it's all related yet.

I will try and continue looking for clues tomorrow.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/896463

Revision history for this message
yatin (yatinkarel) wrote :

Based on the timings and impacted gates for linuxbridge/openvswitch for master/stable2023.2 https://review.opendev.org/c/openstack/os-vif/+/881751 looks suspicious which was added to upper-constraints around that time. Would try revert + job tests to confirm.

Revision history for this message
Lajos Katona (lajos-katona) wrote :

Test patch for the revert: https://review.opendev.org/c/openstack/neutron/+/896504

By the results it is not solving the issue.

Revision history for this message
yatin (yatinkarel) wrote :

Pushed https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/897233 to Temporary turn off l3_ha in ovs/lb master/bobcat jobs

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-tempest-plugin (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/897233
Committed: https://opendev.org/openstack/neutron-tempest-plugin/commit/209d363cce20d48d21455a196086a19f1d0d97e1
Submitter: "Zuul (22348)"
Branch: master

commit 209d363cce20d48d21455a196086a19f1d0d97e1
Author: yatinkarel <email address hidden>
Date: Tue Oct 3 20:10:44 2023 +0530

    Temporary turn off l3_ha in ovs/lb master/bobcat jobs

    Tests fails randomly with l3_ha in openvswitch and
    linuxbridge master and stable/2023.2 jobs, until the
    issue is fixed let's turn off l3_ha temporary.

    Related-Bug: #2037239
    Change-Id: Ia23b39819834e2bc12ec7113bb841007f7cf1ff5

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/897439

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/897462

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/2023.1)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/897462
Reason: This version is not affected, my bad.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/897332
Committed: https://opendev.org/openstack/neutron/commit/0aa154b5ce9dc8da73309fb212843a2b69b68696
Submitter: "Zuul (22348)"
Branch: master

commit 0aa154b5ce9dc8da73309fb212843a2b69b68696
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Oct 3 21:20:34 2023 +0000

    Fix the ``log.setup`` method call with "fix_eventlet=False"

    Since [1], present in oslo.log 5.3.0, the ``log.setup`` method is
    unpatching the evenlet thread module. That is causing several problems
    in some Neutron services, in particular the keepalived-state-change
    service.

    Within this oslo.log version, the patch [2] is provided to call this
    method without unpatching any eventlet module.

    This patch is also bumping the minimum required version of oslo.log
    to 5.3.0, in order to call the ``log.setup`` method with the kwarg
    "fix_eventlet=False".

    [1]https://review.opendev.org/c/openstack/oslo.log/+/852443
    [2]I4bbcfe7db6d75188e61b9084cb02b2dd2aaa0c76

    Closes-Bug: #2037239

    Change-Id: Iea77d20bec330b692e3e8c9e38b3a62e2047b4f4

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/897439
Committed: https://opendev.org/openstack/neutron/commit/de121943ee8edcdbf8c4e8c0212f89f3e2526fd5
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit de121943ee8edcdbf8c4e8c0212f89f3e2526fd5
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Oct 3 21:20:34 2023 +0000

    Fix the ``log.setup`` method call with "fix_eventlet=False"

    Since [1], present in oslo.log 5.3.0, the ``log.setup`` method is
    unpatching the evenlet thread module. That is causing several problems
    in some Neutron services, in particular the keepalived-state-change
    service.

    Within this oslo.log version, the patch [2] is provided to call this
    method without unpatching any eventlet module.

    This patch is also bumping the minimum required version of oslo.log
    to 5.3.0, in order to call the ``log.setup`` method with the kwarg
    "fix_eventlet=False".

    [1]https://review.opendev.org/c/openstack/oslo.log/+/852443
    [2]I4bbcfe7db6d75188e61b9084cb02b2dd2aaa0c76

    Closes-Bug: #2037239

    Change-Id: Iea77d20bec330b692e3e8c9e38b3a62e2047b4f4
    (cherry picked from commit 0aa154b5ce9dc8da73309fb212843a2b69b68696)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/896463
Committed: https://opendev.org/openstack/neutron/commit/b7201b9fbbd4d86f3cf580f00130f3e96a24f350
Submitter: "Zuul (22348)"
Branch: master

commit b7201b9fbbd4d86f3cf580f00130f3e96a24f350
Author: Brian Haley <email address hidden>
Date: Mon Sep 25 14:53:23 2023 -0400

    Add router ID in HA router process() debug message

    All the other messages in this file have the router ID
    in them, add it in the process() method as well to aid
    in debugging HA port issues.

    Related-bug: #2037239
    Change-Id: Ic6fba46d5e80aae95c977b63228ec8458ea60f5d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.1.0

This issue was fixed in the openstack/neutron 23.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.0.0b1

This issue was fixed in the openstack/neutron 24.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.