OVS agent "hangs" while processing trusted ports

Bug #1836023 reported by Oleg Bondarev on 2019-07-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
High
Oleg Bondarev

Bug Description

Queens, ovsdb native interface.

On a loaded gtw node hosting > 1000 ports when restarting neutron-openvswitch-agent at some moment agent stops sending state reports and do any logging for a significant time, depending on number of ports. In our case gtw node hosts > 1400 ports and agent hangs for ~100 seconds. Thus if configured agent_down_time is less that 100 seconds, neutron server sees agent as down, starts resources rescheduling. After agent stops hanging it sees itself as "revived" and starts new full sync. This loop is almost endless.

Debug showed the culprit is process_trusted_ports: https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655 - this func does not yield control to other greenthreads and blocks until all trusted ports are processed. Since on gateway nodes almost al ports are "trusted" (router and dhcp ports) process_trusted_ports may take significant time.

The proposal would be to add greenlet.sleep(0) inside loop in process_trusted_ports - that fixed the issue on our environment.

Fix proposed to branch: master
Review: https://review.opendev.org/670014

Changed in neutron:
status: New → In Progress
tags: added: loadimpact

Reviewed: https://review.opendev.org/670014
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=da539da3780188f01e18ef106dde9ca180324c2a
Submitter: Zuul
Branch: master

commit da539da3780188f01e18ef106dde9ca180324c2a
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023

Changed in neutron:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/670162
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ae1d36fa9d8e2115a5241b5da2e941cdefa2c463
Submitter: Zuul
Branch: master

commit ae1d36fa9d8e2115a5241b5da2e941cdefa2c463
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 10 18:57:02 2019 +0000

    Improve "OVSFirewallDriver.process_trusted_ports"

    FirewallDriver.process_trusted_ports" is called with many ports,
    "_initialize_egress_no_port_security" retrieves the VIF ports
    ("Interface" registers in OVS DB), one per iteration, based in the
    port_id. Instead of this procedure, if the DB is called only once to
    retrieve all the VIF ports, the performance increase is noticeable.
    E.g.: bridge with 1000 ports and interfaces.

    Retrieving 100 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 5.6 secs

    Retrieving 1000 ports:
    - Bulk operation: 0.08 secs
    - Loop operation: 59 secs

    Closes-Bug: #1836095
    Related-Bug: #1836023

    Change-Id: I5b259717c0fdb8991f1df86b1ef4fb8ad0f18e70

Reviewed: https://review.opendev.org/670501
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fe403825ab08728e8b2d0fe15cfba3b452921f57
Submitter: Zuul
Branch: stable/pike

commit fe403825ab08728e8b2d0fe15cfba3b452921f57
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-pike

Reviewed: https://review.opendev.org/670500
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=639f5788bf42e3ace6425f3c727751f60b880e4d
Submitter: Zuul
Branch: stable/queens

commit 639f5788bf42e3ace6425f3c727751f60b880e4d
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-queens

Reviewed: https://review.opendev.org/670499
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eabd114a9b926a95a124acec15a98a3b1d96a6c9
Submitter: Zuul
Branch: stable/rocky

commit eabd114a9b926a95a124acec15a98a3b1d96a6c9
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-rocky

Reviewed: https://review.opendev.org/670498
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=20e3f25cf76e48bc6106fba7eb5cafe9cf35fdbc
Submitter: Zuul
Branch: stable/stein

commit 20e3f25cf76e48bc6106fba7eb5cafe9cf35fdbc
Author: Oleg Bondarev <email address hidden>
Date: Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports

    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.

    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023
    (cherry picked from commit da539da3780188f01e18ef106dde9ca180324c2a)

tags: added: in-stable-stein
tags: added: neutron-proactive-backport-potential

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

This issue was fixed in the openstack/neutron 14.0.3 release.

This issue was fixed in the openstack/neutron 13.0.5 release.

This issue was fixed in the openstack/neutron 12.1.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers