[OVN] RowNotFound exception while waiting for Chassis metadata networks

Bug #1914394 reported by Lucas Alvares Gomes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Lucas Alvares Gomes

Bug Description

In the set_port_status_up() the OVN driver tries waiting for the metadata to be provisioned (15 seconds) prior to sending the event to Nova indicating that everything is done (network-vif-plugged). But there could be a race condition while trying to get that information which results in a RowNotFound being raise in the waiting loop.

Once that happens, the exception is bubbled up and the OVN driver end up not sending the event to Nova and the instance will fail to deploy (it will be stuck in BUILD state until it times out).

Here's a traceback from neutron server (q-svc) when it happens:

Jan 28 10:59:55.066255 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: INFO neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver [None req-17c35f6c-ba4f-49c3-9795-77b2edd352c7 None None] OVN reports status up for port: 2694a4c0-4ff0-414d-b780-b74da2c91197
Jan 28 10:59:55.068736 ubuntu-focal-airship-kna1-0022760767 neutron-server[68995]: DEBUG neutron.wsgi [-] (68995) accepted ('10.0.1.179', 42564) {{(pid=68995) server /usr/local/lib/python3.8/dist-packages/eventlet/wsgi.py:992}}
Jan 28 10:59:55.110806 ubuntu-focal-airship-kna1-0022760767 neutron-server[68995]: INFO neutron.wsgi [req-177263b2-82ba-4f7e-84c9-d5b86115b3e7 req-f545cbf1-c1f0-4bc2-9d7e-d0420a9b43ee service neutron] 10.0.1.208,10.0.1.179 "GET /v2.0/floatingips?fixed_ip_address=10.1.0.12&port_id=2694a4c0-4ff0-414d-b780-b74da2c91197 HTTP/1.1" status: 200 len: 217 time: 0.0412221
Jan 28 10:59:55.134847 ubuntu-focal-airship-kna1-0022760767 neutron-server[68995]: DEBUG neutron.wsgi [-] (68995) accepted ('10.0.1.179', 42568) {{(pid=68995) server /usr/local/lib/python3.8/dist-packages/eventlet/wsgi.py:992}}
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event [None req-17c35f6c-ba4f-49c3-9795-77b2edd352c7 None None] Unexpected exception in notify_loop: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Chassis with name=282e849e-30b0-4e7c-9df2-2d0b14050df0
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event Traceback (most recent call last):
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/usr/local/lib/python3.8/dist-packages/ovsdbapp/event.py", line 159, in notify_loop
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event match.run(event, row, updates)
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py", line 392, in run
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event self.driver.set_port_status_up(row.name)
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 982, in set_port_status_up
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event self._wait_for_metadata_provisioned_if_needed(port_id)
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 1116, in _wait_for_metadata_provisioned_if_needed
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event n_utils.wait_until_true(
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/opt/stack/neutron/neutron/common/utils.py", line 703, in wait_until_true
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event while not predicate():
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 1118, in <lambda>
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event self._sb_ovn.get_chassis_metadata_networks(chassis),
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 795, in get_chassis_metadata_networks
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event chassis = self.lookup('Chassis', chassis_name)
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/usr/local/lib/python3.8/dist-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 177, in lookup
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event return self._lookup(table, record)
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/usr/local/lib/python3.8/dist-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 224, in _lookup
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event row = idlutils.row_by_value(self, rl.table, rl.column, record)
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event File "/usr/local/lib/python3.8/dist-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 114, in row_by_value
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event raise RowNotFound(table=table, col=column, match=match)
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Chassis with name=282e849e-30b0-4e7c-9df2-2d0b14050df0
Jan 28 10:59:55.135218 ubuntu-focal-airship-kna1-0022760767 neutron-server[68996]: ERROR ovsdbapp.event

Changed in neutron:
importance: Undecided → High
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
status: New → Confirmed
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.0.0.0rc1

This issue was fixed in the openstack/neutron 18.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/823593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/823626

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/823627

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ussuri)

Change abandoned by "Frode Nordahl <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/823626
Reason: This cherry-pick erroneously introduced a new Change-ID, superseded by https://review.opendev.org/c/openstack/neutron/+/823627

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/823628

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ussuri)

Change abandoned by "Frode Nordahl <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/823627
Reason: This cherry-pick erroneously introduced a new Change-ID, superseded by https://review.opendev.org/c/openstack/neutron/+/823628

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/823593
Committed: https://opendev.org/openstack/neutron/commit/cbf3fe098bb44605174b8e582d56814fd36632c7
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit cbf3fe098bb44605174b8e582d56814fd36632c7
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Feb 3 14:00:13 2021 +0000

    [OVN] Fix RowNotFound exception while waiting for metadata networks

    In the set_port_status_up() the OVN driver waits for the metadata to be
    provisioned (15 seconds) [0] prior to sending the event to Nova notifying
    that the provisioning of the port is done (network-vif-plugged). But
    there could be a race condition while trying to get that information
    which results in a RowNotFound being raise in the waiting loop.

    Once that happens, the exception is bubbled up and the OVN driver end up
    not sending the event to Nova and the instance will fail to deploy (it
    will be stuck in BUILD state until it times out).

    This patch changes the logic of the method looking for the metadata
    network information to not raise RowNotFound so that the waiting loop
    can iteract again [0] until the information is available.

    [0]
    https://github.com/openstack/neutron/blob/cbd72e2f4846ec64ff6e6ef24099a8e90ddebf31/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1124

    Change-Id: I3c106ace74b5c6e4ed0cb7e827baf5d6595ec5d0
    Closes-Bug: #1914394
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit b618d98541599ad0ef73be41a3edd83d1fc75a56)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/823628
Committed: https://opendev.org/openstack/neutron/commit/7699feb4a61661129c79bc7da517f07f1257124c
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 7699feb4a61661129c79bc7da517f07f1257124c
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Feb 3 14:00:13 2021 +0000

    [OVN] Fix RowNotFound exception while waiting for metadata networks

    In the set_port_status_up() the OVN driver waits for the metadata to be
    provisioned (15 seconds) [0] prior to sending the event to Nova notifying
    that the provisioning of the port is done (network-vif-plugged). But
    there could be a race condition while trying to get that information
    which results in a RowNotFound being raise in the waiting loop.

    Once that happens, the exception is bubbled up and the OVN driver end up
    not sending the event to Nova and the instance will fail to deploy (it
    will be stuck in BUILD state until it times out).

    This patch changes the logic of the method looking for the metadata
    network information to not raise RowNotFound so that the waiting loop
    can iteract again [0] until the information is available.

    [0]
    https://github.com/openstack/neutron/blob/cbd72e2f4846ec64ff6e6ef24099a8e90ddebf31/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1124

    Change-Id: I3c106ace74b5c6e4ed0cb7e827baf5d6595ec5d0
    Closes-Bug: #1914394
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit b618d98541599ad0ef73be41a3edd83d1fc75a56)
    (cherry picked from commit cbf3fe098bb44605174b8e582d56814fd36632c7)
    Conflicts cleanly resolved by removing not relevant code added before
    the addittion of the TestSBImplIdlOvn class. Also added missing
    ovsdbapp.backend import:
        neutron/tests/unit/plugins/ml2/drivers/ovn/mech_driver/ovsdb/test_impl_idl_ovn.py

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.3.0

This issue was fixed in the openstack/neutron 17.3.0 release.

Changed in neutron:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ussuri-eol

This issue was fixed in the openstack/neutron ussuri-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.