Port create time grows at scale due to dvr arp update

Bug #1614452 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Oleg Bondarev

Bug Description

Scale tests show that sometimes VMs are not able to spawn because of timeouts on port creation.
Neutron server logs show that port creation time grows due to dvr arp table updates being sent to each l3 dvr agent hosting the router one by one - this takes > 90% of time: http://paste.openstack.org/show/560761/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/357052

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Kevin mentioned something similar and that's where [1] stemmed from. There might be something else going on.

[1] https://review.openstack.org/#/c/355078/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/357052
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4bdab5cf1da333cf4e7aaf893e14b094fc5fad61
Submitter: Jenkins
Branch: master

commit 4bdab5cf1da333cf4e7aaf893e14b094fc5fad61
Author: Oleg Bondarev <email address hidden>
Date: Thu Aug 18 11:55:33 2016 +0300

    L3 DVR: use fanout when sending dvr arp table update

    Sending arp update to each l3 dvr agent one by one on every port
    creation is not scalable and causes serious performance degradation
    if router is hosted on lots of l3 dvr agents on compute nodes (see
    bug report). This increases port creation time and eventually leads
    to timeouts in Nova and VMs going to ERROR state.

    This patch changes notification to be fanout.
    The downside is that with fanout the arp notification will be sent to
    each l3 agent, even those not hosting the router. However such agents
    will just skip the notification if not hosting the router - this should
    be quite cheap.

    Closes-Bug: #1614452
    Change-Id: I1fb533d7804b131f709b790fc730ed7b97cb5499

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/360732

tags: added: mitaka-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/360732
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d56016425d85118dba8a8123cd357b4082478278
Submitter: Jenkins
Branch: stable/mitaka

commit d56016425d85118dba8a8123cd357b4082478278
Author: Oleg Bondarev <email address hidden>
Date: Thu Aug 18 11:55:33 2016 +0300

    L3 DVR: use fanout when sending dvr arp table update

    Sending arp update to each l3 dvr agent one by one on every port
    creation is not scalable and causes serious performance degradation
    if router is hosted on lots of l3 dvr agents on compute nodes (see
    bug report). This increases port creation time and eventually leads
    to timeouts in Nova and VMs going to ERROR state.

    This patch changes notification to be fanout.
    The downside is that with fanout the arp notification will be sent to
    each l3 agent, even those not hosting the router. However such agents
    will just skip the notification if not hosting the router - this should
    be quite cheap.

    Closes-Bug: #1614452
    Change-Id: I1fb533d7804b131f709b790fc730ed7b97cb5499
    (cherry picked from commit 4bdab5cf1da333cf4e7aaf893e14b094fc5fad61)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

tags: removed: mitaka-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 8.3.0

This issue was fixed in the openstack/neutron 8.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.