dvr router slow response during port update

Bug #1830456 reported by norman shen
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
norman shen

Bug Description

We are having a distributed router which used by hundreds of virtual machines scattered across around 150 compute nodes. When nova sends port update request to neutron, it will generally taking nearly 4 min to complete.

Neutron version is openstack Queens 12.0.5.

I found the following log entries printed by neutron-server,

2019-05-25 05:24:16,285.285 11834 INFO neutron.wsgi [req-xxxx xxxxx - default default] x.x.x.x "PUT /v2.0/ports/8c252d91-741a-4627-9600-916d1da5178f HTTP/1.1" status: 200 len: 0 time: 233.6103470

You can see it takes around 240 seconds to finish request.

Right now I am suspecting this code snippet https://github.com/openstack/neutron/blob/de59a21754747335d0d9d26082c7f0df105a30c9/neutron/db/l3_dvrscheduler_db.py#L139 leads to the issue.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

So if I understand right this only happens when the VM is created first and then the subnet is associated with the router.

Based on what you mention here it seems that it is taking some time to check for the connected routers.

Probably we should check if there are connected routers for the router_id before going through the other hosts and updating all hosts.

We should probably use this function to check with this function first self._get_other_dvr_router_ids_connected_router

Changed in neutron:
status: New → Confirmed
Changed in neutron:
importance: Undecided → Medium
tags: added: l3-dvr-backlog
Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
norman shen (jshen28)
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → norman shen (jshen28)
Changed in neutron:
status: Confirmed → In Progress
norman shen (jshen28)
Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
norman shen (jshen28) wrote :

a patch is submitted https://review.opendev.org/#/c/661522/ here

Changed in neutron:
status: Fix Committed → In Progress
Revision history for this message
norman shen (jshen28) wrote :

My observation is `_check_for_rtr_serviceable_ports` which calls `get_subnet_ids_on_router` internally takes about a second to return. Maybe connected dvr could also affect performance but I do not know since we do not test scenarios where multiple router shares one subnet

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/661522
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=00eb6f26f6165a00d647d2bf35fb7996534cfc09
Submitter: Zuul
Branch: master

commit 00eb6f26f6165a00d647d2bf35fb7996534cfc09
Author: shenjiatong <email address hidden>
Date: Mon May 27 11:26:49 2019 +0800

    improve dvr port update under large scale deployment

    update port may takes an excessive number of seconds
    to complete if dvr routers are running on more than 100
    compute nodes. This patch tries to save some time by removing
    unnecessary calls inside looping through hosts.

    Change-Id: Ide740e0c5c43c2d2b842460a37c8ce125da12b28
    Closes-Bug: #1830456

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/663167

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/663311

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/663433

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/663461

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/663167
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fc0e0a5725a2331e6ccd9707701198de2073da1c
Submitter: Zuul
Branch: stable/queens

commit fc0e0a5725a2331e6ccd9707701198de2073da1c
Author: shenjiatong <email address hidden>
Date: Mon May 27 11:26:49 2019 +0800

    improve dvr port update under large scale deployment

    update port may takes an excessive number of seconds
    to complete if dvr routers are running on more than 100
    compute nodes. This patch tries to save some time by removing
    unnecessary calls inside looping through hosts.

    Change-Id: Ide740e0c5c43c2d2b842460a37c8ce125da12b28
    Closes-Bug: #1830456
    (cherry picked from commit 00eb6f26f6165a00d647d2bf35fb7996534cfc09)

tags: added: in-stable-queens
tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/663461
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cace9239514c76a76a4172199e6823fd02aa7f06
Submitter: Zuul
Branch: stable/rocky

commit cace9239514c76a76a4172199e6823fd02aa7f06
Author: shenjiatong <email address hidden>
Date: Mon May 27 11:26:49 2019 +0800

    improve dvr port update under large scale deployment

    update port may takes an excessive number of seconds
    to complete if dvr routers are running on more than 100
    compute nodes. This patch tries to save some time by removing
    unnecessary calls inside looping through hosts.

    Change-Id: Ide740e0c5c43c2d2b842460a37c8ce125da12b28
    Closes-Bug: #1830456
    (cherry picked from commit 00eb6f26f6165a00d647d2bf35fb7996534cfc09)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/663311
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cb172b4c204f823d73f4b020aff39b4d09f4d7d9
Submitter: Zuul
Branch: stable/stein

commit cb172b4c204f823d73f4b020aff39b4d09f4d7d9
Author: shenjiatong <email address hidden>
Date: Mon May 27 11:26:49 2019 +0800

    improve dvr port update under large scale deployment

    update port may takes an excessive number of seconds
    to complete if dvr routers are running on more than 100
    compute nodes. This patch tries to save some time by removing
    unnecessary calls inside looping through hosts.

    Change-Id: Ide740e0c5c43c2d2b842460a37c8ce125da12b28
    Closes-Bug: #1830456
    (cherry picked from commit 00eb6f26f6165a00d647d2bf35fb7996534cfc09)

tags: added: in-stable-stein
tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.opendev.org/663433
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4f2dd7c3f304b0bb5562cbe7dbd8241eab6a2b72
Submitter: Zuul
Branch: stable/pike

commit 4f2dd7c3f304b0bb5562cbe7dbd8241eab6a2b72
Author: shenjiatong <email address hidden>
Date: Mon May 27 11:26:49 2019 +0800

    improve dvr port update under large scale deployment

    update port may takes an excessive number of seconds
    to complete if dvr routers are running on more than 100
    compute nodes. This patch tries to save some time by removing
    unnecessary calls inside looping through hosts.

    Change-Id: Ide740e0c5c43c2d2b842460a37c8ce125da12b28
    Closes-Bug: #1830456
    (cherry picked from commit 00eb6f26f6165a00d647d2bf35fb7996534cfc09)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/664525

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/664525
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=52529bc949acff9a3454abd44925342468064b78
Submitter: Zuul
Branch: master

commit 52529bc949acff9a3454abd44925342468064b78
Author: Oleg Bondarev <email address hidden>
Date: Tue Jun 11 12:22:14 2019 +0400

    DVR: on new port only send router update on port's host

    When new DVR serviceable port appears on new node we need
    to update node's l3 agent with all routers which have the
    port's subnets, including connected routers.
    We don't need to update all nodes hosting these routers.
    It costs us much as all l3 agents then go back to neutron server
    and request routers info for no good reason.
    This was one of the main issues with DVR at scale fixed in Mitaka.

    Change-Id: I99d01d7bf29f236eff0f80d1ae8659f64ac55d39
    Related-Bug: #1830456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/665831

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/665832

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/665833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.opendev.org/665858

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/665831
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4cd7ba7fde57fb57407863606fc8f12ef0af3d6e
Submitter: Zuul
Branch: stable/stein

commit 4cd7ba7fde57fb57407863606fc8f12ef0af3d6e
Author: Oleg Bondarev <email address hidden>
Date: Tue Jun 11 12:22:14 2019 +0400

    DVR: on new port only send router update on port's host

    When new DVR serviceable port appears on new node we need
    to update node's l3 agent with all routers which have the
    port's subnets, including connected routers.
    We don't need to update all nodes hosting these routers.
    It costs us much as all l3 agents then go back to neutron server
    and request routers info for no good reason.
    This was one of the main issues with DVR at scale fixed in Mitaka.

    Change-Id: I99d01d7bf29f236eff0f80d1ae8659f64ac55d39
    Related-Bug: #1830456
    (cherry picked from commit 52529bc949acff9a3454abd44925342468064b78)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/665832
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b1b0937eb42442ae6f585eba477189321ef08c80
Submitter: Zuul
Branch: stable/rocky

commit b1b0937eb42442ae6f585eba477189321ef08c80
Author: Oleg Bondarev <email address hidden>
Date: Tue Jun 11 12:22:14 2019 +0400

    DVR: on new port only send router update on port's host

    When new DVR serviceable port appears on new node we need
    to update node's l3 agent with all routers which have the
    port's subnets, including connected routers.
    We don't need to update all nodes hosting these routers.
    It costs us much as all l3 agents then go back to neutron server
    and request routers info for no good reason.
    This was one of the main issues with DVR at scale fixed in Mitaka.

    Change-Id: I99d01d7bf29f236eff0f80d1ae8659f64ac55d39
    Related-Bug: #1830456
    (cherry picked from commit 52529bc949acff9a3454abd44925342468064b78)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/665833
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=05d6f0892f204fb9b4b2e35a557ebf3ed1d61177
Submitter: Zuul
Branch: stable/queens

commit 05d6f0892f204fb9b4b2e35a557ebf3ed1d61177
Author: Oleg Bondarev <email address hidden>
Date: Tue Jun 11 12:22:14 2019 +0400

    DVR: on new port only send router update on port's host

    When new DVR serviceable port appears on new node we need
    to update node's l3 agent with all routers which have the
    port's subnets, including connected routers.
    We don't need to update all nodes hosting these routers.
    It costs us much as all l3 agents then go back to neutron server
    and request routers info for no good reason.
    This was one of the main issues with DVR at scale fixed in Mitaka.

    Change-Id: I99d01d7bf29f236eff0f80d1ae8659f64ac55d39
    Related-Bug: #1830456
    (cherry picked from commit 52529bc949acff9a3454abd44925342468064b78)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/pike)

Reviewed: https://review.opendev.org/665858
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=249549379cdb2d43ebde061b76acb07fa4274146
Submitter: Zuul
Branch: stable/pike

commit 249549379cdb2d43ebde061b76acb07fa4274146
Author: Oleg Bondarev <email address hidden>
Date: Tue Jun 11 12:22:14 2019 +0400

    DVR: on new port only send router update on port's host

    When new DVR serviceable port appears on new node we need
    to update node's l3 agent with all routers which have the
    port's subnets, including connected routers.
    We don't need to update all nodes hosting these routers.
    It costs us much as all l3 agents then go back to neutron server
    and request routers info for no good reason.
    This was one of the main issues with DVR at scale fixed in Mitaka.

    Change-Id: I99d01d7bf29f236eff0f80d1ae8659f64ac55d39
    Related-Bug: #1830456
    (cherry picked from commit 52529bc949acff9a3454abd44925342468064b78)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.4

This issue was fixed in the openstack/neutron 13.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.2

This issue was fixed in the openstack/neutron 14.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.1.0

This issue was fixed in the openstack/neutron 12.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.0.0b1

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron pike-eol

This issue was fixed in the openstack/neutron pike-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.