Race condition while waiting for L2 agent to be DOWN

Bug #2045757 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Slawek Kaplonski

Bug Description

In Fullstack tests: neutron.tests.fullstack.test_ports_rebind.TestVMPortRebind.test_vm_port_rebound_when_L2_agent_revived and neutron.tests.fullstack.test_ports_rebind.TestRouterPortRebind.test_vm_port_rebound_when_L2_agent_revived L2 agent is disabled, test is waiting for agent to be DOWN and then it tries to create port which is marked as "binding failed" due to dead agent on the compute node.

In some cases like:
http://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_abf/901827/1/check/neutron-fullstack-with-uwsgi/abf43a8/testr_results.html
or
http://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_607/901894/6/check/neutron-fullstack-with-uwsgi/6071fab/testr_results.html

it may happen that L2 agent is found dead already but immediately after it is reported like that to the client, it is revived because heartbeat was just received. In the meantime test's client is creating port expecting that this port will be failed to bound but it's actually bound properly and test fails.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/902762

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/902762
Committed: https://opendev.org/openstack/neutron/commit/58dcd30dbba67464f6fd7880ce7aee543156af65
Submitter: "Zuul (22348)"
Branch: master

commit 58dcd30dbba67464f6fd7880ce7aee543156af65
Author: Slawek Kaplonski <email address hidden>
Date: Wed Dec 6 12:56:30 2023 +0100

    [Fullstack] Double check that agent is dead when it should be dead

    In some fullstack tests it is expected that agent is DOWN in the Neutron
    DB. It could happen sometimes that in almost the same time test's client
    was doing GET /v2.0/agents/{agent_id} call and got result with
    "alive=False" and in other thread rpc worker was processing heartbeat
    from the agent so it was revived just after API request was finished.
    That was causing test failures in some cases.
    This patch adds second API call to get agent again after 2 seconds if it
    was already marked as DEAD, just to make sure that it is really dead ;)

    Closes-Bug: #2045757
    Change-Id: I1c20c90b8abd760f3a53b24024f19ef2bd189b5a

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/902881

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/902882

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/902883

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/902882
Committed: https://opendev.org/openstack/neutron/commit/a28ec9ed3c4495b554233ee2d055387203b528cd
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit a28ec9ed3c4495b554233ee2d055387203b528cd
Author: Slawek Kaplonski <email address hidden>
Date: Wed Dec 6 12:56:30 2023 +0100

    [Fullstack] Double check that agent is dead when it should be dead

    In some fullstack tests it is expected that agent is DOWN in the Neutron
    DB. It could happen sometimes that in almost the same time test's client
    was doing GET /v2.0/agents/{agent_id} call and got result with
    "alive=False" and in other thread rpc worker was processing heartbeat
    from the agent so it was revived just after API request was finished.
    That was causing test failures in some cases.
    This patch adds second API call to get agent again after 2 seconds if it
    was already marked as DEAD, just to make sure that it is really dead ;)

    Closes-Bug: #2045757
    Change-Id: I1c20c90b8abd760f3a53b24024f19ef2bd189b5a
    (cherry picked from commit 58dcd30dbba67464f6fd7880ce7aee543156af65)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/902883
Committed: https://opendev.org/openstack/neutron/commit/de5b7244ee116590bf187471647b113bfb0c9472
Submitter: "Zuul (22348)"
Branch: stable/zed

commit de5b7244ee116590bf187471647b113bfb0c9472
Author: Slawek Kaplonski <email address hidden>
Date: Wed Dec 6 12:56:30 2023 +0100

    [Fullstack] Double check that agent is dead when it should be dead

    In some fullstack tests it is expected that agent is DOWN in the Neutron
    DB. It could happen sometimes that in almost the same time test's client
    was doing GET /v2.0/agents/{agent_id} call and got result with
    "alive=False" and in other thread rpc worker was processing heartbeat
    from the agent so it was revived just after API request was finished.
    That was causing test failures in some cases.
    This patch adds second API call to get agent again after 2 seconds if it
    was already marked as DEAD, just to make sure that it is really dead ;)

    Closes-Bug: #2045757
    Change-Id: I1c20c90b8abd760f3a53b24024f19ef2bd189b5a
    (cherry picked from commit 58dcd30dbba67464f6fd7880ce7aee543156af65)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/902881
Committed: https://opendev.org/openstack/neutron/commit/9fcec1d59bc1e15151719668f80056d7b9ea57e7
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 9fcec1d59bc1e15151719668f80056d7b9ea57e7
Author: Slawek Kaplonski <email address hidden>
Date: Wed Dec 6 12:56:30 2023 +0100

    [Fullstack] Double check that agent is dead when it should be dead

    In some fullstack tests it is expected that agent is DOWN in the Neutron
    DB. It could happen sometimes that in almost the same time test's client
    was doing GET /v2.0/agents/{agent_id} call and got result with
    "alive=False" and in other thread rpc worker was processing heartbeat
    from the agent so it was revived just after API request was finished.
    That was causing test failures in some cases.
    This patch adds second API call to get agent again after 2 seconds if it
    was already marked as DEAD, just to make sure that it is really dead ;)

    Closes-Bug: #2045757
    Change-Id: I1c20c90b8abd760f3a53b24024f19ef2bd189b5a
    (cherry picked from commit 58dcd30dbba67464f6fd7880ce7aee543156af65)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.0.0b1

This issue was fixed in the openstack/neutron 24.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.2.1

This issue was fixed in the openstack/neutron 21.2.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.