OVN metadata agent can be slow with large amount of subnets

Bug #1981113 reported by Miro Tomaska
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Miro Tomaska

Bug Description

OVN metadata agent can take very long time (observed ~40s) to add cidrs under a metadata namespace tap interface when a network consist of many subnets (observed ~1700 subnets). The long processing time can result in ovn-metada-agent not having haproxy ready by the time the first VM cloud-init requests for its metadata. Thus resulting in VM missing metadata for proper operation.

Reproducing step:
- Create a network with hundreds or thousands of subnets under this network. The more subnets the more obvious the problem is
- Create a VM connected to the network from above. Make sure this is the first VM on the deployed compute node(hypervisor).
- Once VM is created, observe that VM's cloud-init request time out due to no response from 169.256.169.256/openstack
- Inspect ovn-metadata-agent log and notice this is due to ovn-metadata-agent taking very long time to process [1]

Possible solutions:
1. (Low hanging fruit?) See if there is a way to improve execution time of `ip.add` call. Perhaps passing a list of cidrs instead of a single cidr at the time can improve performance?
2. (more involved) refactor the code such that ovn-metadata-agent only adds a single cidr which belongs to the VM being created. Instead of unconditionally adding all cidrs for the network when the first VM is created(current implementation)

[1] https://github.com/openstack/neutron/blob/41bf8054017c72815226d5df50fd321b30fcba13/neutron/agent/ovn/metadata/agent.py#L488-L495

Miro Tomaska (mtomaska)
Changed in neutron:
assignee: nobody → Miro Tomaska (mtomaska)
description: updated
Changed in neutron:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/855677

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/861124

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/855677
Committed: https://opendev.org/openstack/neutron/commit/81980146cbdcc42d2398f4e777e659fc59c3b55f
Submitter: "Zuul (22348)"
Branch: master

commit 81980146cbdcc42d2398f4e777e659fc59c3b55f
Author: Miro Tomaska <email address hidden>
Date: Fri Sep 2 10:26:11 2022 -0500

    Add and delete multiple ip addresses in one priv call

    Created new add_ip_addresses privileged function
    which takes an iterable of cidrs and adds them
    in one privileged call. This is so we dont have to
    take on additional priv overhead when calling
    add_ip_address in a loop.
    For parity, performed the same change on the
    delete_ip_address function.

    Closes-Bug: #1987281
    Partial-Bug: #1981113
    Change-Id: Ib1278af20c3b3b057712453cb249aba34b684a21

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/neutron/+/861124
Committed: https://opendev.org/openstack/neutron/commit/edf48e46a1f0227f84b05ab39da005393e5fa73f
Submitter: "Zuul (22348)"
Branch: master

commit edf48e46a1f0227f84b05ab39da005393e5fa73f
Author: Miro Tomaska <email address hidden>
Date: Wed Oct 12 08:42:18 2022 -0500

    Improve agent provision performance for large networks

    Before this patch, the metadata agent would provision network namespace
    for all subnets under a network(datapath) as soon as the first
    VM(vif port) was mounted on the chassis. This operation can take very
    long time for networks with lots of subnets. See the linked bug for
    more details.
    This patch changes this mechanism to "lazy load" where metadata agent
    provisions metadata namespace with only the subnets belonging to the
    active ports on the chassis. This results in virtually constant
    throughput not effected by the number of subnets.

    Closes-Bug: #1981113
    Change-Id: Ia2a66cfd3fd1380c5204109742d44f09160548d2

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/869940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/869941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/869942

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/869943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/869944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/869945

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/870474

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/870504

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/zed)

Change abandoned by "Miro Tomaska <email address hidden>" on branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/869940
Reason: many backport conflicts outweight the benefit of backporting.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/yoga)

Change abandoned by "Miro Tomaska <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/869941
Reason: many backport conflicts outweight the benefit of backporting

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/xena)

Change abandoned by "Miro Tomaska <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/869942
Reason: many backport conflicts outweight the benefit of backporting

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/wallaby)

Change abandoned by "Miro Tomaska <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/869943
Reason: many backport conflicts outweight the benefit of backporting

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/victoria)

Change abandoned by "Miro Tomaska <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/869944
Reason: many backport conflicts outweight the benefit of backporting

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ussuri)

Change abandoned by "Miro Tomaska <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/869945
Reason: many backport conflicts outweight the benefit of backporting

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/870785

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/870786

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/870787

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/victoria)

Change abandoned by "Miro Tomaska <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/870787
Reason: Decided to only backport to Wallaby. The potential backport problems outweight the benefit of this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/870474
Committed: https://opendev.org/openstack/neutron/commit/bbaca0c86b3cb12aa3414a5e2921f86a8d01de7f
Submitter: "Zuul (22348)"
Branch: stable/zed

commit bbaca0c86b3cb12aa3414a5e2921f86a8d01de7f
Author: Miro Tomaska <email address hidden>
Date: Wed Oct 12 08:42:18 2022 -0500

    Improve agent provision performance for large networks

    Before this patch, the metadata agent would provision network namespace
    for all subnets under a network(datapath) as soon as the first
    VM(vif port) was mounted on the chassis. This operation can take very
    long time for networks with lots of subnets. See the linked bug for
    more details.
    This patch changes this mechanism to "lazy load" where metadata agent
    provisions metadata namespace with only the subnets belonging to the
    active ports on the chassis. This results in virtually constant
    throughput not effected by the number of subnets.

    Merge Conflict:
            Implicitly removes now dead function
            ensure_all_networks_provisioned
            neutron/agent/ovn/metadata/agent.py

    Closes-Bug: #1981113
    Change-Id: Ia2a66cfd3fd1380c5204109742d44f09160548d2
    (cherry picked from commit edf3b3f191c2eae229a754dcbfc448fa41bd8bc3)

tags: added: in-stable-zed
tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/870504
Committed: https://opendev.org/openstack/neutron/commit/1d9ce0406883dc9ec386a01eba826af6fde6ceb1
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 1d9ce0406883dc9ec386a01eba826af6fde6ceb1
Author: Miro Tomaska <email address hidden>
Date: Wed Oct 12 08:42:18 2022 -0500

    Improve agent provision performance for large networks

    Before this patch, the metadata agent would provision network namespace
    for all subnets under a network(datapath) as soon as the first
    VM(vif port) was mounted on the chassis. This operation can take very
    long time for networks with lots of subnets. See the linked bug for
    more details.
    This patch changes this mechanism to "lazy load" where metadata agent
    provisions metadata namespace with only the subnets belonging to the
    active ports on the chassis. This results in virtually constant
    throughput not effected by the number of subnets.

    Merge Conflict:
            Using datapath_uuid :str in addition to net_name for
            teardown_datapath method to remain compatible with the
            method implementation in Yoga and before. Updated unit
            tests accordingly
            neutron/agent/ovn/metadata/agent.py
            neutron/tests/unit/agent/ovn/metadata/test_agent.py

    Closes-Bug: #1981113
    Change-Id: Ia2a66cfd3fd1380c5204109742d44f09160548d2
    (cherry picked from commit edf3b3f191c2eae229a754dcbfc448fa41bd8bc3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/870785
Committed: https://opendev.org/openstack/neutron/commit/44f95a48fc6346d009528245dac472f87131ce0a
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 44f95a48fc6346d009528245dac472f87131ce0a
Author: Miro Tomaska <email address hidden>
Date: Wed Oct 12 08:42:18 2022 -0500

    Improve agent provision performance for large networks

    Before this patch, the metadata agent would provision network namespace
    for all subnets under a network(datapath) as soon as the first
    VM(vif port) was mounted on the chassis. This operation can take very
    long time for networks with lots of subnets. See the linked bug for
    more details.
    This patch changes this mechanism to "lazy load" where metadata agent
    provisions metadata namespace with only the subnets belonging to the
    active ports on the chassis. This results in virtually constant
    throughput not effected by the number of subnets.

    Merge Conflict:
            Using datapath_uuid :str in addition to net_name for
            teardown_datapath method to remain compatible with the
            method implementation in Yoga and before. Updated unit
            tests accordingly
            neutron/agent/ovn/metadata/agent.py
            neutron/tests/unit/agent/ovn/metadata/test_agent.py

    Closes-Bug: #1981113
    Change-Id: Ia2a66cfd3fd1380c5204109742d44f09160548d2
    (cherry picked from commit edf3b3f191c2eae229a754dcbfc448fa41bd8bc3)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/870786
Committed: https://opendev.org/openstack/neutron/commit/9f90d18ad2bc5209c3caece51898401851ee738b
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 9f90d18ad2bc5209c3caece51898401851ee738b
Author: Miro Tomaska <email address hidden>
Date: Wed Oct 12 08:42:18 2022 -0500

    Improve agent provision performance for large networks

    Before this patch, the metadata agent would provision network namespace
    for all subnets under a network(datapath) as soon as the first
    VM(vif port) was mounted on the chassis. This operation can take very
    long time for networks with lots of subnets. See the linked bug for
    more details.
    This patch changes this mechanism to "lazy load" where metadata agent
    provisions metadata namespace with only the subnets belonging to the
    active ports on the chassis. This results in virtually constant
    throughput not effected by the number of subnets.

    Merge Conflict:
            Using datapath_uuid :str in addition to net_name for
            teardown_datapath method to remain compatible with the
            method implementation in Yoga and before. Updated unit
            tests accordingly
            neutron/agent/ovn/metadata/agent.py
            neutron/tests/unit/agent/ovn/metadata/test_agent.py

    Closes-Bug: #1981113
    Change-Id: Ia2a66cfd3fd1380c5204109742d44f09160548d2
    (cherry picked from commit edf3b3f191c2eae229a754dcbfc448fa41bd8bc3)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.5.0

This issue was fixed in the openstack/neutron 19.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.0.0.0rc1

This issue was fixed in the openstack/neutron 22.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.3.0

This issue was fixed in the openstack/neutron 20.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.1.0

This issue was fixed in the openstack/neutron 21.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron wallaby-eom

This issue was fixed in the openstack/neutron wallaby-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.