RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using trunk bridges with DPDK vhostusermode

Bug #1869244 reported by Nate Johnston
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Nate Johnston

Bug Description

DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created. This means a reboot is functionally a super fast
delete-then-create. Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge. The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation. That means that if the port in question is
the only port on the trunk on that compute node, this happens:

1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated

If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1]. But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X".

2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 37, in execute
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command self.run_idl(None)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/schema/open_vswitch/commands.py", line 335, in run_idl
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command br = idlutils.row_by_value(self.api.idl, 'Bridge', 'name', self.bridge)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command raise RowNotFound(table=table, col=column, match=match)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command
2020-03-02 10:37:45.932 6278 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Cannot obtain interface list for bridge tbr-XXXXXXXX-X: Cannot find Bridge with name=tbr-XXXXXXXX-X: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X

What I believe is happening in this case is that the trunk is being
deleted in the middle of the execution of #3, so that it stops
existing in the middle of the port creation logic but before the
port is actually recreated.

This issue was observed in setups running Queens.

Revision history for this message
Bence Romsics (bence-romsics) wrote :
tags: added: trunk
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Nate proposed a fix, but launchpad did not link to it for some reason:
https://review.opendev.org/623275

Revision history for this message
Bence Romsics (bence-romsics) wrote :

It seems I'm having problems copy-pasting nowadays. :-)

This is what I meant:
https://review.opendev.org/714783

Changed in neutron:
status: New → In Progress
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/714783
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e37722c0f5f0b746135200db6f654674dc0f6f12
Submitter: Zuul
Branch: master

commit e37722c0f5f0b746135200db6f654674dc0f6f12
Author: Nate Johnston <email address hidden>
Date: Tue Mar 24 18:05:16 2020 -0400

    Wait before deleting trunk bridges for DPDK vhu

    DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
    off the port is deleted, and when an instance is powered on a port is
    created. This means a reboot is functionally a super fast
    delete-then-create. Neutron trunking mode in combination with DPDK/vhu
    implements a trunk bridge for each tenant, and the ports for the
    instances are created as subports of that bridge. The standard way a
    trunk bridge works is that when all the subports are deleted, a thread
    is spawned to delete the trunk bridge, because that is an expensive and
    time-consuming operation. That means that if the port in question is
    the only port on the trunk on that compute node, this happens:

    1. The port is deleted
    2. A thread is spawned to delete the trunk
    3. The port is recreated

    If the trunk is deleted after #3 happens then the instance has no
    networking and is inaccessible; this is the scenario that was dealt with
    in a previous change [1]. But there continue to be issues with errors
    "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
    happening in this case is that the trunk is being deleted in the middle
    of the execution of #3, so that it stops existing in the middle of the
    port creation logic but before the port is actually recreated.

    Since this is a timing issue between two different threads it's
    difficult to stamp out entirely, but I think the best way to do it is to
    add a slight delay in the trunk deletion thread, just a second or two.
    That will give the port time to come back online and avoid the trunk
    deletion entirely.

    [1] https://review.opendev.org/623275

    Related-Bug: #1869244
    Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/717392

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/717393

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/717394

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/717394
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1c768810cdd6949d27989ff1ae3bb9d657a9be1c
Submitter: Zuul
Branch: stable/queens

commit 1c768810cdd6949d27989ff1ae3bb9d657a9be1c
Author: Nate Johnston <email address hidden>
Date: Tue Mar 24 18:05:16 2020 -0400

    Wait before deleting trunk bridges for DPDK vhu

    DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
    off the port is deleted, and when an instance is powered on a port is
    created. This means a reboot is functionally a super fast
    delete-then-create. Neutron trunking mode in combination with DPDK/vhu
    implements a trunk bridge for each tenant, and the ports for the
    instances are created as subports of that bridge. The standard way a
    trunk bridge works is that when all the subports are deleted, a thread
    is spawned to delete the trunk bridge, because that is an expensive and
    time-consuming operation. That means that if the port in question is
    the only port on the trunk on that compute node, this happens:

    1. The port is deleted
    2. A thread is spawned to delete the trunk
    3. The port is recreated

    If the trunk is deleted after #3 happens then the instance has no
    networking and is inaccessible; this is the scenario that was dealt with
    in a previous change [1]. But there continue to be issues with errors
    "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
    happening in this case is that the trunk is being deleted in the middle
    of the execution of #3, so that it stops existing in the middle of the
    port creation logic but before the port is actually recreated.

    Since this is a timing issue between two different threads it's
    difficult to stamp out entirely, but I think the best way to do it is to
    add a slight delay in the trunk deletion thread, just a second or two.
    That will give the port time to come back online and avoid the trunk
    deletion entirely.

    [1] https://review.opendev.org/623275

    Related-Bug: #1869244
    Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b
    (cherry picked from commit e37722c0f5f0b746135200db6f654674dc0f6f12)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/717391
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=92b2d9c25ade84d43538b2c42c620f91468f852c
Submitter: Zuul
Branch: stable/train

commit 92b2d9c25ade84d43538b2c42c620f91468f852c
Author: Nate Johnston <email address hidden>
Date: Tue Mar 24 18:05:16 2020 -0400

    Wait before deleting trunk bridges for DPDK vhu

    DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
    off the port is deleted, and when an instance is powered on a port is
    created. This means a reboot is functionally a super fast
    delete-then-create. Neutron trunking mode in combination with DPDK/vhu
    implements a trunk bridge for each tenant, and the ports for the
    instances are created as subports of that bridge. The standard way a
    trunk bridge works is that when all the subports are deleted, a thread
    is spawned to delete the trunk bridge, because that is an expensive and
    time-consuming operation. That means that if the port in question is
    the only port on the trunk on that compute node, this happens:

    1. The port is deleted
    2. A thread is spawned to delete the trunk
    3. The port is recreated

    If the trunk is deleted after #3 happens then the instance has no
    networking and is inaccessible; this is the scenario that was dealt with
    in a previous change [1]. But there continue to be issues with errors
    "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
    happening in this case is that the trunk is being deleted in the middle
    of the execution of #3, so that it stops existing in the middle of the
    port creation logic but before the port is actually recreated.

    Since this is a timing issue between two different threads it's
    difficult to stamp out entirely, but I think the best way to do it is to
    add a slight delay in the trunk deletion thread, just a second or two.
    That will give the port time to come back online and avoid the trunk
    deletion entirely.

    [1] https://review.opendev.org/623275

    Related-Bug: #1869244
    Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b
    (cherry picked from commit e37722c0f5f0b746135200db6f654674dc0f6f12)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/717393
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c0b15a8cbeb17ed5024aaf2347a04a5929dbbd5d
Submitter: Zuul
Branch: stable/rocky

commit c0b15a8cbeb17ed5024aaf2347a04a5929dbbd5d
Author: Nate Johnston <email address hidden>
Date: Tue Mar 24 18:05:16 2020 -0400

    Wait before deleting trunk bridges for DPDK vhu

    DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
    off the port is deleted, and when an instance is powered on a port is
    created. This means a reboot is functionally a super fast
    delete-then-create. Neutron trunking mode in combination with DPDK/vhu
    implements a trunk bridge for each tenant, and the ports for the
    instances are created as subports of that bridge. The standard way a
    trunk bridge works is that when all the subports are deleted, a thread
    is spawned to delete the trunk bridge, because that is an expensive and
    time-consuming operation. That means that if the port in question is
    the only port on the trunk on that compute node, this happens:

    1. The port is deleted
    2. A thread is spawned to delete the trunk
    3. The port is recreated

    If the trunk is deleted after #3 happens then the instance has no
    networking and is inaccessible; this is the scenario that was dealt with
    in a previous change [1]. But there continue to be issues with errors
    "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
    happening in this case is that the trunk is being deleted in the middle
    of the execution of #3, so that it stops existing in the middle of the
    port creation logic but before the port is actually recreated.

    Since this is a timing issue between two different threads it's
    difficult to stamp out entirely, but I think the best way to do it is to
    add a slight delay in the trunk deletion thread, just a second or two.
    That will give the port time to come back online and avoid the trunk
    deletion entirely.

    [1] https://review.opendev.org/623275

    Related-Bug: #1869244
    Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b
    (cherry picked from commit e37722c0f5f0b746135200db6f654674dc0f6f12)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/717392
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=355f454747ea5f6486c45589ae7be61823b2adab
Submitter: Zuul
Branch: stable/stein

commit 355f454747ea5f6486c45589ae7be61823b2adab
Author: Nate Johnston <email address hidden>
Date: Tue Mar 24 18:05:16 2020 -0400

    Wait before deleting trunk bridges for DPDK vhu

    DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
    off the port is deleted, and when an instance is powered on a port is
    created. This means a reboot is functionally a super fast
    delete-then-create. Neutron trunking mode in combination with DPDK/vhu
    implements a trunk bridge for each tenant, and the ports for the
    instances are created as subports of that bridge. The standard way a
    trunk bridge works is that when all the subports are deleted, a thread
    is spawned to delete the trunk bridge, because that is an expensive and
    time-consuming operation. That means that if the port in question is
    the only port on the trunk on that compute node, this happens:

    1. The port is deleted
    2. A thread is spawned to delete the trunk
    3. The port is recreated

    If the trunk is deleted after #3 happens then the instance has no
    networking and is inaccessible; this is the scenario that was dealt with
    in a previous change [1]. But there continue to be issues with errors
    "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
    happening in this case is that the trunk is being deleted in the middle
    of the execution of #3, so that it stops existing in the middle of the
    port creation logic but before the port is actually recreated.

    Since this is a timing issue between two different threads it's
    difficult to stamp out entirely, but I think the best way to do it is to
    add a slight delay in the trunk deletion thread, just a second or two.
    That will give the port time to come back online and avoid the trunk
    deletion entirely.

    [1] https://review.opendev.org/623275

    Related-Bug: #1869244
    Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b
    (cherry picked from commit e37722c0f5f0b746135200db6f654674dc0f6f12)

tags: added: in-stable-stein
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/827580

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/827580
Committed: https://opendev.org/openstack/neutron/commit/140bb63665223d7cd2a7fee8c1d1494ebd2a802f
Submitter: "Zuul (22348)"
Branch: master

commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/829139

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/829039

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/829040

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/829041

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/829042

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/neutron/+/829043

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/c/openstack/neutron/+/829044

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/c/openstack/neutron/+/829045

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/c/openstack/neutron/+/829046

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829039
Committed: https://opendev.org/openstack/neutron/commit/ac4aa7ef7e33a5ff9fbc9ce5c8249c40334e66a4
Submitter: "Zuul (22348)"
Branch: stable/xena

commit ac4aa7ef7e33a5ff9fbc9ce5c8249c40334e66a4
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829040
Committed: https://opendev.org/openstack/neutron/commit/b9240125e69111b84a44292336315184a34f854a
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit b9240125e69111b84a44292336315184a34f854a
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829041
Committed: https://opendev.org/openstack/neutron/commit/e9e73d63b1a6989c6732a7372314e60514a73435
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit e9e73d63b1a6989c6732a7372314e60514a73435
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

tags: added: in-stable-victoria
tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829042
Committed: https://opendev.org/openstack/neutron/commit/7338441c41854984b40c614dd6c31af4bf78ecce
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 7338441c41854984b40c614dd6c31af4bf78ecce
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829043
Committed: https://opendev.org/openstack/neutron/commit/34b1360379502039479b2cebca848dad7dc117d7
Submitter: "Zuul (22348)"
Branch: stable/train

commit 34b1360379502039479b2cebca848dad7dc117d7
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829044
Committed: https://opendev.org/openstack/neutron/commit/7d6bde20ce3252e090400909ff21d286f1c21459
Submitter: "Zuul (22348)"
Branch: stable/stein

commit 7d6bde20ce3252e090400909ff21d286f1c21459
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829045
Committed: https://opendev.org/openstack/neutron/commit/9c009ec60eaf1bdebb2885158b6090ad73f2d029
Submitter: "Zuul (22348)"
Branch: stable/rocky

commit 9c009ec60eaf1bdebb2885158b6090ad73f2d029
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/829046
Committed: https://opendev.org/openstack/neutron/commit/655aa905a37446e6e43186cb8cd6d3a43c391783
Submitter: "Zuul (22348)"
Branch: stable/queens

commit 655aa905a37446e6e43186cb8cd6d3a43c391783
Author: Miguel Lavalle <email address hidden>
Date: Wed Feb 2 19:25:40 2022 -0600

    Wait longer before deleting DPDK vhu trunk bridges

    In [1] we added a delay before deleting DPDK vhu trunk bridges to
    mitigate a race condition when instances are rebooted. As explained in
    [1], with DPDK rhu, a reboot is esentially a super fast bridge
    delete-then-create that is prone to race conditions. We have recently
    encountered in customer deployments that the wait added in [1] is not
    long enough. As a consequence, this change increases the wait.

    [1] https://review.opendev.org/c/openstack/neutron/+/717394

    Change-Id: I5c1474b405d436d3b1e5db745d77999f1723b660
    Partial-Bug: #1869244
    (cherry picked from commit 140bb63665223d7cd2a7fee8c1d1494ebd2a802f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/837780

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/837780
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/837780
Committed: https://opendev.org/openstack/neutron/commit/33de608f04dcc8117eeba63876598dc2ae93013a
Submitter: "Zuul (22348)"
Branch: master

commit 33de608f04dcc8117eeba63876598dc2ae93013a
Author: Miguel Lavalle <email address hidden>
Date: Wed Apr 13 18:00:12 2022 -0500

    Avoid race condition when deleting trunk bridges

    Prior to this change, trunk bridges are created by os-vif but deleted
    by Neutron when the last vif is removed from it. This creates race
    conditions in some use cases, like DPDK with vhostuserclient mode, when
    VMs are rebooted. To avoid these races, Neutron will not delete trunk
    bridges anymore. Their creation and deletion will be os-vif's
    responsiblity. Since [1], Nova uses the os-vif version that contains
    this functionality.

    This patch also changes the trunk status change event. During a live
    migration, when the trunk parent port has been bound to the destination
    host (that means there is only one port binding associated) and the
    status has changed to ACTIVE, the method triggers the subport binding
    to the new host too. This is because there could be a race condition
    between the subport binding, triggered by the OVS agent, and the parent
    port binding, triggered by Nova. If when the OVS agent tries to bind the
    subports, the parent port is still bound to the source host, the subport
    binding remains in the source host too, instead of changing to the
    destination.

    This patch also reverts [2] and [3]. As commented in the previous
    paragraph, this patch fixes the issue reported in LP#1997025. The trunk
    port live migration with ML2/OVS must be fixed with this patch.

    [1]https://review.opendev.org/c/openstack/nova/+/865031
    [2]https://review.opendev.org/c/openstack/neutron/+/865295
    [3]https://review.opendev.org/c/openstack/neutron/+/865424

    Closes-Bug: #1869244
    Closes-Bug: #1997025

    Change-Id: I4e16357f3ff214fcf41e418982806c24088a2665

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.0.0.0rc1

This issue was fixed in the openstack/neutron 22.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/829139
Reason: Please, feel free to restore the patch, address the comments and rebase the patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.