[OVN] LSP register race condition with two controllers

Bug #1922934 reported by Rodolfo Alonso
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Rodolfo Alonso

Bug Description

A race condition between two Neutron controller happened during the creation and the binding of a port. This problem happened when one Neutron controller received the port creation command. The controller added this new LSP to the OVN database.

But the second controller does not receive the OVN database update and does not update the local database cache (in the IDL instance). That means, one second after the port creation done in the first controller, the second controller does not find the LSP.

Nova error: http://paste.openstack.org/show/804261/
First Neutron controller adding the port: http://paste.openstack.org/show/804262/
Second Neutron controller failing to find the port: http://paste.openstack.org/show/804263/

As seen in the logs, the second controller did not receive the transaction update adding the LSP.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1946262

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Miguel Lavalle (minsel)
Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
status: Confirmed → In Progress
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/787337

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/787586

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/787337
Committed: https://opendev.org/openstack/neutron/commit/32e938e698f260af6896f6179a5ffc0838677124
Submitter: "Zuul (22348)"
Branch: master

commit 32e938e698f260af6896f6179a5ffc0838677124
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Wed Apr 21 11:41:57 2021 +0200

    [OVN] Only account for bound ports in metadata agent

    Due to bug #1922934 there might be situations with stale ports
    in the OVN database. When instances request metadata, the agent
    will try to fetch all ports for that particular network with the
    same IP address and there should be only one.

    However, when there are stale ports in OVN database but not in
    Neutron, those ports may have the same IP address and the result
    is that metadata won't work. As the stale ports will never be
    bound to any hypervisor, this patch is accounting only for
    bound ports so that those ports don't interfere until they
    eventually get fixed by the maintenance task.

    Related-Bug: #1922934
    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: I708904b982d243359f2eeda809beae0321f1a7db

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/788154

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/788155

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/788237

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/788154
Committed: https://opendev.org/openstack/neutron/commit/f748962e5acfe7c6f44f02aaf9883e7a71de63cd
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit f748962e5acfe7c6f44f02aaf9883e7a71de63cd
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Wed Apr 21 11:41:57 2021 +0200

    [OVN] Only account for bound ports in metadata agent

    Due to bug #1922934 there might be situations with stale ports
    in the OVN database. When instances request metadata, the agent
    will try to fetch all ports for that particular network with the
    same IP address and there should be only one.

    However, when there are stale ports in OVN database but not in
    Neutron, those ports may have the same IP address and the result
    is that metadata won't work. As the stale ports will never be
    bound to any hypervisor, this patch is accounting only for
    bound ports so that those ports don't interfere until they
    eventually get fixed by the maintenance task.

    Related-Bug: #1922934
    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: I708904b982d243359f2eeda809beae0321f1a7db
    (cherry picked from commit 32e938e698f260af6896f6179a5ffc0838677124)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/788237
Committed: https://opendev.org/openstack/neutron/commit/a9ed835e69102397b192a3d23d80084b5d76a7f8
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit a9ed835e69102397b192a3d23d80084b5d76a7f8
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Wed Apr 21 11:41:57 2021 +0200

    [OVN] Only account for bound ports in metadata agent

    Due to bug #1922934 there might be situations with stale ports
    in the OVN database. When instances request metadata, the agent
    will try to fetch all ports for that particular network with the
    same IP address and there should be only one.

    However, when there are stale ports in OVN database but not in
    Neutron, those ports may have the same IP address and the result
    is that metadata won't work. As the stale ports will never be
    bound to any hypervisor, this patch is accounting only for
    bound ports so that those ports don't interfere until they
    eventually get fixed by the maintenance task.

    Conflicts:
            neutron/agent/ovn/metadata/server.py

    Related-Bug: #1922934
    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: I708904b982d243359f2eeda809beae0321f1a7db
    (cherry picked from commit 32e938e698f260af6896f6179a5ffc0838677124)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/787586
Committed: https://opendev.org/openstack/neutron/commit/1a3fa2206db3c9485db84ade3d5cc179bc1c22ce
Submitter: "Zuul (22348)"
Branch: master

commit 1a3fa2206db3c9485db84ade3d5cc179bc1c22ce
Author: Jakub Libosvar <email address hidden>
Date: Thu Apr 22 15:26:12 2021 +0000

    ovn: Add functional tests for get_network_port_bindings_by_ip

    The change I708904b982d243359f2eeda809beae0321f1a7db lacked tests for
    unbounds port with the same IP. This patch adds simple test that creates
    2 LSPs on the same LS with the same IP and one is unbound. Such setup
    should return only the bound port.

    Related-bug: #1922934
    Change-Id: I71659a1c846852f9cb9bedba23c946438357b079
    Signed-off-by: Jakub Libosvar <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/788155
Committed: https://opendev.org/openstack/neutron/commit/60597d2503bba839dca96def3fa2bf2cb28f5205
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 60597d2503bba839dca96def3fa2bf2cb28f5205
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Wed Apr 21 11:41:57 2021 +0200

    [OVN] Only account for bound ports in metadata agent

    Due to bug #1922934 there might be situations with stale ports
    in the OVN database. When instances request metadata, the agent
    will try to fetch all ports for that particular network with the
    same IP address and there should be only one.

    However, when there are stale ports in OVN database but not in
    Neutron, those ports may have the same IP address and the result
    is that metadata won't work. As the stale ports will never be
    bound to any hypervisor, this patch is accounting only for
    bound ports so that those ports don't interfere until they
    eventually get fixed by the maintenance task.

    Related-Bug: #1922934
    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: I708904b982d243359f2eeda809beae0321f1a7db
    (cherry picked from commit 32e938e698f260af6896f6179a5ffc0838677124)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 1.10.0

This issue was fixed in the openstack/ovsdbapp 1.10.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/790912

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/790912
Committed: https://opendev.org/openstack/neutron/commit/f9bda4b1e439b5e761b445ce6686166eb41a0e45
Submitter: "Zuul (22348)"
Branch: master

commit f9bda4b1e439b5e761b445ce6686166eb41a0e45
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed May 12 07:55:06 2021 +0000

    Set the default ``Backend.lookup`` timeout to 2 seconds

    In order to broadly cover the issue described in the referred bug,
    this patch sets a default timeout of 2 seconds in the
    ``ovs_idl.Backend.lookup`` method.

    This timeout should cover most of the situations where the IDL local
    cache update is delayed. This patch does not change the default
    behavior, except it will not fail if the DB cache is synchronized
    within 2 seconds.

    If we don't pass the notify handler or the backend does not implement
    one, in case of not finding the requested register, the method will
    raise like before.

    ovsdbapp library is bumped to version 1.10.0 to receive the change
    that introduces the active wait in ``Backend.lookup`` and adds the
    timeout parameter to the method signature.

    Change-Id: Ib40eabd6a8e9d59896e0e20383d8061eb4b5c710
    Related-Bug: #1922934

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 1.2.3

This issue was fixed in the openstack/ovsdbapp 1.2.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 1.9.1

This issue was fixed in the openstack/ovsdbapp 1.9.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 1.6.1

This issue was fixed in the openstack/ovsdbapp 1.6.1 release.

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp train-eol

This issue was fixed in the openstack/ovsdbapp train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.