neutron cleanup service can interfere with agents on reboot

Bug #1913623 reported by Brent Eagles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
High
Brent Eagles

Bug Description

originally reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1908638

From analysis included in rhbz:

This is exactly when this issue can appear: node reboot and high port count.

When the node is rebooted, the cleanup script is executed. If the port count is reduced, the script will remove the ports and the bridges fast enough not to interfere with the OVS agent. But as we can see in the high loaded node (hundreds of ports), the script takes more time to delete the ports [1]. When this loop is finished, the tunnel bridge is deleted [2] but at this point the OVS has already cached the br-tun datapath ID and assumes that this bridge will be always present (during the OVS agent execution, no other manual operation can be performed to the OVS instance).

This script should be executed first and the OVS agent start should be delayed until the script finalization.

The fix would be to make sure the cleanup service is complete before the agent's are started.

Brent Eagles (beagles)
Changed in tripleo:
milestone: none → wallaby-2
assignee: nobody → Brent Eagles (beagles)
importance: Undecided → High
status: New → Triaged
tags: added: train-backport-potential
tags: added: queens-backport-potential
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/785217
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/a22239e27938e43f08c0322bb78224499d10a167
Submitter: "Zuul (22348)"
Branch: stable/train

commit a22239e27938e43f08c0322bb78224499d10a167
Author: Brent Eagles <email address hidden>
Date: Thu Jan 28 13:28:11 2021 -0330

    Add service ordering to cleanup service to avoid conflicts with agent startup

    If the port cleanup takes too long, the neutron agents might begin
    operations on the ovs bridges while cleanup is still ongoing. This can
    cause undefined behavior and errors in the agent.

    Change-Id: Ia0e31c9469033c50a8b65af7fee1adf03b22d4c2
    Closes-Bug: #1913623
    (cherry picked from commit 0c20e1e1ac320e30aaedba980270dffbf4528fc3)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 14.1.0

This issue was fixed in the openstack/tripleo-heat-templates 14.1.0 release.

Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.6.0

This issue was fixed in the openstack/tripleo-heat-templates 11.6.0 release.

Changed in tripleo:
milestone: xena-1 → xena-2
Changed in tripleo:
milestone: xena-2 → xena-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/785195
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/7ce489863a7e2199597c049cf8f3085743128143
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 7ce489863a7e2199597c049cf8f3085743128143
Author: Brent Eagles <email address hidden>
Date: Thu Jan 28 13:28:11 2021 -0330

    Add service ordering to cleanup service to avoid conflicts with agent startup

    If the port cleanup takes too long, the neutron agents might begin
    operations on the ovs bridges while cleanup is still ongoing. This can
    cause undefined behavior and errors in the agent.

    Change-Id: Ia0e31c9469033c50a8b65af7fee1adf03b22d4c2
    Closes-Bug: #1913623
    (cherry picked from commit 0c20e1e1ac320e30aaedba980270dffbf4528fc3)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/785216
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/5dbffe9002cca67c3ba74fe3a09e368686938830
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 5dbffe9002cca67c3ba74fe3a09e368686938830
Author: Brent Eagles <email address hidden>
Date: Thu Jan 28 13:28:11 2021 -0330

    Add service ordering to cleanup service to avoid conflicts with agent startup

    If the port cleanup takes too long, the neutron agents might begin
    operations on the ovs bridges while cleanup is still ongoing. This can
    cause undefined behavior and errors in the agent.

    Change-Id: Ia0e31c9469033c50a8b65af7fee1adf03b22d4c2
    Closes-Bug: #1913623
    (cherry picked from commit 0c20e1e1ac320e30aaedba980270dffbf4528fc3)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.4.6

This issue was fixed in the openstack/tripleo-heat-templates 12.4.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 13.6.0

This issue was fixed in the openstack/tripleo-heat-templates 13.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.