OVS agent "hangs" while processing trusted ports

Bug #1836023 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Oleg Bondarev

Bug Description

Queens, ovsdb native interface.

On a loaded gtw node hosting > 1000 ports when restarting neutron-openvswitch-agent at some moment agent stops sending state reports and do any logging for a significant time, depending on number of ports. In our case gtw node hosts > 1400 ports and agent hangs for ~100 seconds. Thus if configured agent_down_time is less that 100 seconds, neutron server sees agent as down, starts resources rescheduling. After agent stops hanging it sees itself as "revived" and starts new full sync. This loop is almost endless.

Debug showed the culprit is process_trusted_ports: https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655 - this func does not yield control to other greenthreads and blocks until all trusted ports are processed. Since on gateway nodes almost al ports are "trusted" (router and dhcp ports) process_trusted_ports may take significant time.

The proposal would be to add greenlet.sleep(0) inside loop in process_trusted_ports - that fixed the issue on our environment.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/670014

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/670162

tags: added: loadimpact
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/670014
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=da539da3780188f01e18ef106dde9ca180324c2a
Submitter: Zuul
Branch: master

commit da539da3780188f01e18ef106dde9ca180324c2a
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/670498

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/670499

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/670500

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/670501

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/670162
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ae1d36fa9d8e2115a5241b5da2e941cdefa2c463
Submitter: Zuul
Branch: master

commit ae1d36fa9d8e2115a5241b5da2e941cdefa2c463
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 10 18:57:02 2019 +0000

    Improve "OVSFirewallDriver.process_trusted_ports"

    FirewallDriver.process_trusted_ports" is called with many ports,
    "_initialize_egress_no_port_security" retrieves the VIF ports
    ("Interface" registers in OVS DB), one per iteration, based in the
    port_id. Instead of this procedure, if the DB is called only once to
    retrieve all the VIF ports, the performance increase is noticeable.
    E.g.: bridge with 1000 ports and interfaces.

    Retrieving 100 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 5.6 secs

    Retrieving 1000 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 59 secs

    Closes-Bug: #1836095
    Related-Bug: #1836023

    Change-Id: I5b259717c0fdb8991f1df86b1ef4fb8ad0f18e70

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.opendev.org/670501
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fe403825ab08728e8b2d0fe15cfba3b452921f57
Submitter: Zuul
Branch: stable/pike

commit fe403825ab08728e8b2d0fe15cfba3b452921f57
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/670500
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=639f5788bf42e3ace6425f3c727751f60b880e4d
Submitter: Zuul
Branch: stable/queens

commit 639f5788bf42e3ace6425f3c727751f60b880e4d
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/670499
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eabd114a9b926a95a124acec15a98a3b1d96a6c9
Submitter: Zuul
Branch: stable/rocky

commit eabd114a9b926a95a124acec15a98a3b1d96a6c9
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/670498
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=20e3f25cf76e48bc6106fba7eb5cafe9cf35fdbc
Submitter: Zuul
Branch: stable/stein

commit 20e3f25cf76e48bc6106fba7eb5cafe9cf35fdbc
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-stein
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.0.0b1

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.3

This issue was fixed in the openstack/neutron 14.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.5

This issue was fixed in the openstack/neutron 13.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.1.1

This issue was fixed in the openstack/neutron 12.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/705186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/705187

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/705188

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/705186
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=843eccb9ee481f6eb5c47a84f91f8f8487d8a7c1
Submitter: Zuul
Branch: stable/stein

commit 843eccb9ee481f6eb5c47a84f91f8f8487d8a7c1
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 10 18:57:02 2019 +0000

    Improve "OVSFirewallDriver.process_trusted_ports"

    FirewallDriver.process_trusted_ports" is called with many ports,
    "_initialize_egress_no_port_security" retrieves the VIF ports
    ("Interface" registers in OVS DB), one per iteration, based in the
    port_id. Instead of this procedure, if the DB is called only once to
    retrieve all the VIF ports, the performance increase is noticeable.
    E.g.: bridge with 1000 ports and interfaces.

    Retrieving 100 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 5.6 secs

    Retrieving 1000 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 59 secs

    Closes-Bug: #1836095
    Related-Bug: #1836023

    Change-Id: I5b259717c0fdb8991f1df86b1ef4fb8ad0f18e70
    (cherry picked from commit ae1d36fa9d8e2115a5241b5da2e941cdefa2c463)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/705187
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7d10d29020f3b2d4c0331f8becd81677c61121a9
Submitter: Zuul
Branch: stable/rocky

commit 7d10d29020f3b2d4c0331f8becd81677c61121a9
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 10 18:57:02 2019 +0000

    Improve "OVSFirewallDriver.process_trusted_ports"

    FirewallDriver.process_trusted_ports" is called with many ports,
    "_initialize_egress_no_port_security" retrieves the VIF ports
    ("Interface" registers in OVS DB), one per iteration, based in the
    port_id. Instead of this procedure, if the DB is called only once to
    retrieve all the VIF ports, the performance increase is noticeable.
    E.g.: bridge with 1000 ports and interfaces.

    Retrieving 100 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 5.6 secs

    Retrieving 1000 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 59 secs

    Closes-Bug: #1836095
    Related-Bug: #1836023

    Change-Id: I5b259717c0fdb8991f1df86b1ef4fb8ad0f18e70
    (cherry picked from commit ae1d36fa9d8e2115a5241b5da2e941cdefa2c463)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/705188
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f26f986f1325750747f3510a6cb8f125308eba01
Submitter: Zuul
Branch: stable/queens

commit f26f986f1325750747f3510a6cb8f125308eba01
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 10 18:57:02 2019 +0000

    Improve "OVSFirewallDriver.process_trusted_ports"

    FirewallDriver.process_trusted_ports" is called with many ports,
    "_initialize_egress_no_port_security" retrieves the VIF ports
    ("Interface" registers in OVS DB), one per iteration, based in the
    port_id. Instead of this procedure, if the DB is called only once to
    retrieve all the VIF ports, the performance increase is noticeable.
    E.g.: bridge with 1000 ports and interfaces.

    Retrieving 100 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 5.6 secs

    Retrieving 1000 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 59 secs

    Closes-Bug: #1836095
    Related-Bug: #1836023

    Change-Id: I5b259717c0fdb8991f1df86b1ef4fb8ad0f18e70
    (cherry picked from commit ae1d36fa9d8e2115a5241b5da2e941cdefa2c463)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron pike-eol

This issue was fixed in the openstack/neutron pike-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.