[scale issue] the root rootwrap deamon causes l3 agent router procssing very very slow

Bug #1825152 reported by LIU Yulong
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Medium
Unassigned

Bug Description

[scale issue] the root rootwrap deamon causes l3 agent router procssing very very slow

ENV: stable/queens

When l3 agent has a large number of scheduled routers, during the agent restart time, l3-agent will re-sync all router info, and processing them again. Routers will be processed concurrently in the green thread pool.
But, we noticed that it taks really long time for all the router to finish the processing. For 50 router, it could be 30 mins or more.

By comparison, if we set the `use_helper_for_ns_read=False` and `root_helper_daemon=''`, the total processing time reduced.

The following are the LOG comparison. The critical log message is "Starting router update for" and "Finished a router update".

Before change the config:

2019-04-17 14:51:56.418 2702 INFO neutron.agent.l3.agent [-] Starting router update for 4e7af6a8-830d-4647-8dfc-3b9b825a3c37, action None, priority 0
2019-04-17 14:51:56.419 2702 INFO neutron.agent.l3.agent [-] Starting router update for 834be7ac-a2ac-4125-8fda-1a54c3e9e12a, action None, priority 0
2019-04-17 14:51:56.427 2702 INFO neutron.agent.l3.agent [-] Starting router update for e40915aa-5d5a-4b39-a823-d935b91e3255, action None, priority 0
2019-04-17 14:51:56.435 2702 INFO neutron.agent.l3.agent [-] Starting router update for 9d1207e3-b8e9-4959-9d03-bded785c76bd, action None, priority 0
2019-04-17 14:51:56.442 2702 INFO neutron.agent.l3.agent [-] Starting router update for 4247f759-827c-476b-90e9-29f725158daf, action None, priority 0
2019-04-17 14:51:56.449 2702 INFO neutron.agent.l3.agent [-] Starting router update for c74132d7-4490-45d2-bedb-c7b09f19460f, action None, priority 0
2019-04-17 14:51:56.457 2702 INFO neutron.agent.l3.agent [-] Starting router update for 03a87c31-d25f-4260-8b43-408b6f963459, action None, priority 0
2019-04-17 14:51:56.464 2702 INFO neutron.agent.l3.agent [-] Starting router update for b6c8d929-b0aa-4d35-86dd-34a7a62c89fa, action None, priority 0
2019-04-17 14:52:39.798 2702 INFO neutron.agent.l3.agent [-] Starting router update for 02f9e40f-715e-48f3-966a-e2f8038dc41c, action None, priority 0
2019-04-17 14:54:02.965 2702 INFO neutron.agent.l3.agent [-] Starting router update for 3a9dbd08-bd30-4e7f-95c3-dda0b1b8fadc, action None, priority 0
2019-04-17 14:54:02.966 2702 INFO neutron.agent.l3.agent [-] Starting router update for 36c9cc1c-bef2-4970-adb5-a3c42ac1a434, action None, priority 0
2019-04-17 14:54:06.014 2702 INFO neutron.agent.l3.agent [-] Starting router update for 3936e670-4c26-41d0-b216-0566a832bc64, action None, priority 0

2019-04-17 14:52:39.798 2702 INFO neutron.agent.l3.agent [None req-3b510ecb-c216-4b16-9135-a43b64424469 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for 03a87c31-d25f-4260-8b43-408b6f963459
2019-04-17 14:54:02.965 2702 INFO neutron.agent.l3.agent [None req-2e15f903-d609-46a0-9276-f7ab3bba46e4 - ac0e1aa4871d45618e7df05b5c2a5845 - - -] Finished a router update for c74132d7-4490-45d2-bedb-c7b09f19460f
2019-04-17 14:54:06.014 2702 INFO neutron.agent.l3.agent [None req-f5b64995-61b2-4c26-891c-cb76afee6568 - 410c89fde3d84cb68d6f574dddf415ce - - -] Finished a router update for 834be7ac-a2ac-4125-8fda-1a54c3e9e12a
2019-04-17 14:54:08.441 2702 INFO neutron.agent.l3.agent [None req-af5f15ed-5728-41d9-90e0-88aeadd020a9 - 49c0d60352614e6dba4257d8ad40418a - - -] Finished a router update for b6c8d929-b0aa-4d35-86dd-34a7a62c89fa
2019-04-17 14:54:11.582 2702 INFO neutron.agent.l3.agent [None req-d5378b2e-905d-42a8-8939-aa23dbb08641 - d2ba0b9b94e244a6a69129a480a8f2d0 - - -] Finished a router update for 4247f759-827c-476b-90e9-29f725158daf
2019-04-17 14:54:14.922 2702 INFO neutron.agent.l3.agent [None req-1829b2e3-1567-402e-8605-42ceb3d2192f - 072dc794d1d048c3a0c57831bbe2c7b9 - - -] Finished a router update for 4e7af6a8-830d-4647-8dfc-3b9b825a3c37
2019-04-17 14:54:18.070 2702 INFO neutron.agent.l3.agent [None req-cd35fb0a-8870-4af2-b177-bff89618ecf4 - d18a6c1d1a094b3695b7b7e553e756ef - - -] Finished a router update for 9d1207e3-b8e9-4959-9d03-bded785c76bd
2019-04-17 14:55:05.289 2702 INFO neutron.agent.l3.agent [None req-52f31054-f145-4df2-8acc-39a79e53ad10 - 6f993511df2049578d814f9c56426893 - - -] Finished a router update for 02f9e40f-715e-48f3-966a-e2f8038dc41c
2019-04-17 14:55:14.674 2702 INFO neutron.agent.l3.agent [None req-7db354ce-6282-45fe-a0d9-98b0e4d93db5 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for e40915aa-5d5a-4b39-a823-d935b91e3255
2019-04-17 14:59:19.380 2702 INFO neutron.agent.l3.agent [None req-0ea12c69-5e81-4cf7-97fc-cdbdedcc8125 - 6a4c2f76e5dc4f9e9addd4dcd615adbe - - -] Finished a router update for 3a9dbd08-bd30-4e7f-95c3-dda0b1b8fadc
2019-04-17 14:59:19.567 2702 INFO neutron.agent.l3.agent [None req-a6396764-c74e-46ee-8d40-16ce59a5a859 - ce284ef67a8d457ca5f087d1ee626a83 - - -] Finished a router update for 36c9cc1c-bef2-4970-adb5-a3c42ac1a434
2019-04-17 14:59:54.042 2702 INFO neutron.agent.l3.agent [None req-469fd4a1-591b-4447-9ba6-22f60d6f66c4 - 11fcdd40b7c34b7b9bea56b41a422ab2 - - -] Finished a router update for 3936e670-4c26-41d0-b216-0566a832bc64

After set `use_helper_for_ns_read=False` and `root_helper_daemon=''`:
2019-04-17 16:38:14.622 25866 INFO neutron.agent.l3.agent [-] Starting router update for 00b4373c-1c61-4ac6-9997-65b0f93c5f5b, action None, priority 1
2019-04-17 16:38:14.631 25866 INFO neutron.agent.l3.agent [-] Starting router update for 039bec69-a997-4dcd-a099-a6700ef39741, action None, priority 1
2019-04-17 16:38:14.640 25866 INFO neutron.agent.l3.agent [-] Starting router update for 03a87c31-d25f-4260-8b43-408b6f963459, action None, priority 1
2019-04-17 16:38:14.648 25866 INFO neutron.agent.l3.agent [-] Starting router update for 13f685f3-da25-4c80-b320-edf5f7593feb, action None, priority 1
2019-04-17 16:38:14.656 25866 INFO neutron.agent.l3.agent [-] Starting router update for 188c8e34-8790-412a-93b1-27221d5a2827, action None, priority 1
2019-04-17 16:38:14.664 25866 INFO neutron.agent.l3.agent [-] Starting router update for 636fb7ca-da52-4c39-b2cb-8654f9e87a90, action None, priority 1
2019-04-17 16:38:14.673 25866 INFO neutron.agent.l3.agent [-] Starting router update for 708720d6-1a67-444e-b145-921a87f826df, action None, priority 1
2019-04-17 16:38:14.681 25866 INFO neutron.agent.l3.agent [-] Starting router update for bb98923a-ce8f-4586-bf84-a1b0e7d74bf9, action None, priority 1
2019-04-17 16:38:23.694 25866 INFO neutron.agent.l3.agent [-] Starting router update for bcd0539a-6230-4de3-a8be-aafdf5c98132, action None, priority 1
2019-04-17 16:38:24.127 25866 INFO neutron.agent.l3.agent [-] Starting router update for bd4b90a5-8fa1-4209-80cd-eb01bbca540c, action None, priority 1
2019-04-17 16:38:36.203 25866 INFO neutron.agent.l3.agent [-] Starting router update for bf608480-0791-47bb-b95c-11c719696acc, action None, priority 1
2019-04-17 16:38:36.215 25866 INFO neutron.agent.l3.agent [-] Starting router update for cf57179c-0005-4f43-9c2c-53cd46fc2328, action None, priority 1
2019-04-17 16:38:36.225 25866 INFO neutron.agent.l3.agent [-] Starting router update for db1cfdb4-18cc-4f1a-8316-61a52f402fcd, action None, priority 1
2019-04-17 16:38:37.764 25866 INFO neutron.agent.l3.agent [-] Starting router update for e40915aa-5d5a-4b39-a823-d935b91e3255, action None, priority 1
2019-04-17 16:38:42.710 25866 INFO neutron.agent.l3.agent [-] Starting router update for e451f0ce-689b-438d-baa1-acd21610a295, action None, priority 1

2019-04-17 16:38:23.693 25866 INFO neutron.agent.l3.agent [None req-f7091185-b4fb-4345-9c66-4ba58a16981b - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for 03a87c31-d25f-4260-8b43-408b6f963459
2019-04-17 16:38:24.126 25866 INFO neutron.agent.l3.agent [None req-4ce442a6-0ddc-4a8f-bc30-2c2c6fdacd5d - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for 13f685f3-da25-4c80-b320-edf5f7593feb
2019-04-17 16:38:36.203 25866 INFO neutron.agent.l3.agent [None req-322a11d4-51ea-4591-b207-58b2b77193f9 - e8c177ba044240978aa13c03479ba91a - - -] Finished a router update for 188c8e34-8790-412a-93b1-27221d5a2827
2019-04-17 16:38:37.763 25866 INFO neutron.agent.l3.agent [None req-b3c3d41c-e9de-4fbb-9a4e-09b39d27fc19 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for bd4b90a5-8fa1-4209-80cd-eb01bbca540c
2019-04-17 16:38:42.709 25866 INFO neutron.agent.l3.agent [None req-81be2ff5-7049-40ea-afb8-011744838127 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for 039bec69-a997-4dcd-a099-a6700ef39741
2019-04-17 16:38:43.039 25866 INFO neutron.agent.l3.agent [None req-21ab2ab3-512c-4163-a006-8d342ea2ac9b - e8c177ba044240978aa13c03479ba91a - - -] Finished a router update for bb98923a-ce8f-4586-bf84-a1b0e7d74bf9
2019-04-17 16:38:44.376 25866 INFO neutron.agent.l3.agent [None req-e3913fd2-6b34-470e-86e3-f423256d85af - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for 00b4373c-1c61-4ac6-9997-65b0f93c5f5b
2019-04-17 16:38:44.416 25866 INFO neutron.agent.l3.agent [None req-f595f102-fed9-440c-a451-42ebaf3edcbd - e8c177ba044240978aa13c03479ba91a - - -] Finished a router update for 708720d6-1a67-444e-b145-921a87f826df
2019-04-17 16:38:45.825 25866 INFO neutron.agent.l3.agent [None req-d8f77b43-f4fe-4f04-9a40-c488cf11eaf2 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for 636fb7ca-da52-4c39-b2cb-8654f9e87a90
2019-04-17 16:38:52.694 25866 INFO neutron.agent.l3.agent [None req-cd9b9373-e64f-495f-b9c6-6050910dcbe5 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for bf608480-0791-47bb-b95c-11c719696acc
2019-04-17 16:39:04.784 25866 INFO neutron.agent.l3.agent [None req-05300052-040a-4f28-b6c7-82b78e29356e - b3a80b3348204d15be2f36afe6873a6a - - -] Finished a router update for db1cfdb4-18cc-4f1a-8316-61a52f402fcd
2019-04-17 16:39:06.388 25866 INFO neutron.agent.l3.agent [None req-2aaec35a-f326-4660-b097-f38cabdef986 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for e40915aa-5d5a-4b39-a823-d935b91e3255
2019-04-17 16:39:13.406 25866 INFO neutron.agent.l3.agent [None req-4c88c40f-dafd-4731-b73a-6f95e1bf5edf - 66095bb6c95c411a86a17f89fa6af571 - - -] Finished a router update for e451f0ce-689b-438d-baa1-acd21610a295
2019-04-17 16:39:23.395 25866 INFO neutron.agent.l3.agent [None req-4f4e768b-30e9-42cc-a58c-3d2eddf1d57e - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for cf57179c-0005-4f43-9c2c-53cd46fc2328
2019-04-17 16:40:12.184 25866 INFO neutron.agent.l3.agent [None req-669f303b-7860-4c97-b6b9-f31fb56b84e7 - 001a18cadd4b401e9fdeab6c411d9816 - - -] Finished a router update for bcd0539a-6230-4de3-a8be-aafdf5c98132

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/653378

Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
status: New → In Progress
Hongbin Lu (hongbin.lu)
Changed in neutron:
status: In Progress → Confirmed
importance: Undecided → Low
Hongbin Lu (hongbin.lu)
Changed in neutron:
importance: Low → Medium
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/654815

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/654815
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9d60716cf1d61286a684f20ef8e05c77a0df5aa3
Submitter: Zuul
Branch: master

commit 9d60716cf1d61286a684f20ef8e05c77a0df5aa3
Author: LIU Yulong <email address hidden>
Date: Tue Apr 23 15:27:02 2019 +0800

    Add update_id for ResourceUpdate

    Add a unique id for resource update, then we can calculate
    the resource processing time and track it.

    Related-Bug: #1825152
    Related-Bug: #1824911
    Related-Bug: #1821912
    Related-Bug: #1813787

    Change-Id: Ib4d197c6c180c32860964440882393794aabb6ef

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/660758

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/660758
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fe4fc33f1c1bbb655e99733edf762e3e9debdd3e
Submitter: Zuul
Branch: stable/stein

commit fe4fc33f1c1bbb655e99733edf762e3e9debdd3e
Author: LIU Yulong <email address hidden>
Date: Tue Apr 23 15:27:02 2019 +0800

    Add update_id for ResourceUpdate

    Add a unique id for resource update, then we can calculate
    the resource processing time and track it.

    Related-Bug: #1825152
    Related-Bug: #1824911
    Related-Bug: #1821912
    Related-Bug: #1813787

    Change-Id: Ib4d197c6c180c32860964440882393794aabb6ef
    (cherry picked from commit 9d60716cf1d61286a684f20ef8e05c77a0df5aa3)

tags: added: in-stable-stein
Revision history for this message
LIU Yulong (dragon889) wrote :

Increase the bug level, because this issue has been submitted for a long time.

Changed in neutron:
importance: Medium → High
Revision history for this message
Xav Paice (xavpaice) wrote :

also relevant, from http://docs.openvswitch.org/en/latest/faq/issues/

 I want to add thousands of ports to an Open vSwitch bridge, but it takes too long (minutes or hours) to do it with ovs-vsctl. How can I do it faster?

    A: If you add them one at a time with ovs-vsctl, it can take a long time to add thousands of ports to an Open vSwitch bridge. This is because every invocation of ovs-vsctl first reads the current configuration from OVSDB. As the number of ports grows, this starts to take an appreciable amount of time, and when it is repeated thousands of times the total time becomes significant.

    The solution is to add the ports in one invocation of ovs-vsctl (or a small number of them).

If there's a way we could query the API to get a delta of what the ovs db should look like, and does look like, then we could construct a much smaller statement to run rather than one per router.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Xav, what version of Neutron are You using? Since long time default value of ovsdb_interface config option was set to "native" and it is even removed with https://review.opendev.org/#/c/570259/

Revision history for this message
Xav Paice (xavpaice) wrote :

The cloud we support with 500+ routers is a xenial-queens cloud, packages from the Ubuntu Cloud Archive. You're absolutely right, this is using the native interface, not ovs-vsctl, so the docs linked aren't a lot of use directly. The point I'm making (not very clearly) is that we're creating each router individually, and that's slow for openvswitch (maybe others too?). I was wondering if it's an option to group those transactions together so there's less calls to the ovs api, to speed things up a bit. For some of the clouds I support, it can take hours to restart a L3 agent machine, and even moving a router from one agent to another takes 15 mins+ because (we think) of the slow ovs db calls at that scale.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Xav: are You sure it's due to slow ovs db calls? IIUC L3 agent is only creating port in br-int and it don't have any other interactions with ovs db. Everything else is configured by neutron-ovs-agent later.
So if it is slow L3 agent, maybe the root cause is different?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/757662

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/757662
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=297f4c20c4c914f7fc8528e70d572adbeff68b0f
Submitter: Zuul
Branch: stable/rocky

commit 297f4c20c4c914f7fc8528e70d572adbeff68b0f
Author: LIU Yulong <email address hidden>
Date: Tue Apr 23 15:27:02 2019 +0800

    Add update_id for ResourceUpdate

    Add a unique id for resource update, then we can calculate
    the resource processing time and track it.

    Related-Bug: #1825152
    Related-Bug: #1824911
    Related-Bug: #1821912
    Related-Bug: #1813787

    Conflicts:
            neutron/agent/l3/agent.py

    Change-Id: Ib4d197c6c180c32860964440882393794aabb6ef
    (cherry picked from commit 9d60716cf1d61286a684f20ef8e05c77a0df5aa3)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "liuyulong <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/653378
Reason: in particular, looks like only needed for stable branches.

Revision history for this message
LIU Yulong (dragon889) wrote :

Revisit if someone wants to take care of this issue.

Changed in neutron:
assignee: LIU Yulong (dragon889) → nobody
status: In Progress → New
importance: High → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers