Fullstack tests failing due to "hang" neutron-server process

Bug #1862178 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Rodolfo Alonso

Bug Description

From time to time some fullstack tests are failing and it seems that the problem is with not responding neutron-server process.
There is almost nothing in neutron-server's logs in such case.

Example of such failure: https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/testr_results.html

neutron-server logs (https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_/neutron-server--2020-02-05--11-00-19-067345_log.txt) ends on:

2020-02-05 11:00:36.851 3180 INFO neutron.plugins.ml2.managers [req-03044462-4bac-4f8e-acac-d6aae381a3d9 - - - - -] Initializing driver for type 'gre'
2020-02-05 11:00:36.852 3180 INFO neutron.plugins.ml2.drivers.type_tunnel [req-03044462-4bac-4f8e-acac-d6aae381a3d9 - - - - -] gre ID ranges: [(1, 1000)]
2020-02-05 11:00:37.796 3180 INFO neutron.plugins.ml2.managers [req-03044462-4bac-4f8e-acac-d6aae381a3d9 - - - - -] Initializing driver for type 'local'
2020-02-05 11:00:37.797 3180 INFO neutron.plugins.ml2.managers [req-03044462-4bac-4f8e-acac-d6aae381a3d9 - - - - -] Initializing driver for type 'vlan'
2020-02-05 11:01:11.476 3180 INFO neutron.plugins.ml2.drivers.type_vlan [req-03044462-4bac-4f8e-acac-d6aae381a3d9 - - - - -] VlanTypeDriver initialization complete
2020-02-05 11:01:11.479 3180 INFO neutron.plugins.ml2.managers [req-03044462-4bac-4f8e-acac-d6aae381a3d9 - - - - -] Initializing driver for type 'vxlan'
2020-02-05 11:01:11.480 3180 INFO neutron.plugins.ml2.drivers.type_tunnel [req-03044462-4bac-4f8e-acac-d6aae381a3d9 - - - - -] vxlan ID ranges: [(1001, 2000)]

So it seems that it didn't initialize properly ml2 extension drivers and mechanism drivers.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

Reviewing the logs and executing the Neutron server initialization with a profiler, the VLAN driver most time consuming operation is the retrieval of all VLAN allocations from the DB [1]. This can take several seconds in a loaded system.

Usually the number of VLANs per physnet is >1000. In fullstack tests we can reduce this number [2] in order to reduce the number of registers in the DB.

We can also improve the way the VLAN driver is handling those registers:
- Retrieve only those ones with physical_network == ranges networks.
- Delete the VLAN registers with unused physical network in a bulk operation.

Regards.

[1] https://github.com/openstack/neutron/blob/de0d9da2fe9431401b282923018f80c195bdaf55/neutron/plugins/ml2/drivers/type_vlan.py#L109
[2] https://github.com/openstack/neutron/blob/de0d9da2fe9431401b282923018f80c195bdaf55/neutron/tests/fullstack/resources/config.py#L176

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/707151

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/707222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/707151
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=efec8fc153e143068dff6ef3ac02ef722677b045
Submitter: Zuul
Branch: master

commit efec8fc153e143068dff6ef3ac02ef722677b045
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Feb 11 13:56:59 2020 +0000

    Reduce the VLAN/tunneled ranges in fullstack tests

    Reduce the number of VLAN and tunneled network ranges (GRE, VXLAN)
    to 30 tags, in fullstack tests. This will reduce the amount of time
    spent, during the Neutron server start, in the VLAN and tunnel
    drivers initialization.

    Partial-Bug: #1862178

    Change-Id: I7ae82d163c46bbc3ee7430293555c66fbda17b08

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/707222
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=016e7826f165e69c3956375fc3aa8b8d642c9dc9
Submitter: Zuul
Branch: master

commit 016e7826f165e69c3956375fc3aa8b8d642c9dc9
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Feb 11 14:09:06 2020 +0000

    Improve VLAN allocations synchronization

    In order to reduce the number of elements retrieved from the DB, this
    patch, before processing the VLAN allocations per physical network,
    deleted those registers belonging to any unconfigured physical network.

    The VLAN registers per physical network are deleted using a bulk delete
    operation, to speed up the process.

    Those missing VLAN registers per network are now created using a bulk
    insert operation, available in the ORM. This bulk operation speeds up
    the sync process.

    Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
    Partial-Bug: #1862178

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/713281

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Rodolfo, can You tell what more we can do except https://review.opendev.org/707222 to mark this bug as fixed? I didn't saw this issue in gate in last few weeks so IMO Your patch helped for that.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi @Slawek:

IMO, that was the main (and only) problem in this bug and is addresses in this patch. Maybe I should have used "Closes-Bug" in the patch.

I think we can close. The stable patch has a +2 vote.

Regards.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thx Rodolfo :)

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/713281
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=651eb12bec5e5e986bed79f3a5c006c617e79bda
Submitter: Zuul
Branch: stable/train

commit 651eb12bec5e5e986bed79f3a5c006c617e79bda
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Feb 11 14:09:06 2020 +0000

    Improve VLAN allocations synchronization

    In order to reduce the number of elements retrieved from the DB, this
    patch, before processing the VLAN allocations per physical network,
    deleted those registers belonging to any unconfigured physical network.

    The VLAN registers per physical network are deleted using a bulk delete
    operation, to speed up the process.

    Those missing VLAN registers per network are now created using a bulk
    insert operation, available in the ORM. This bulk operation speeds up
    the sync process.

    Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
    Partial-Bug: #1862178
    (cherry picked from commit 016e7826f165e69c3956375fc3aa8b8d642c9dc9)

tags: added: in-stable-train
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/721639

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/721642

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/721644

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/721644
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=89c56d738df62fd4806e958679cfd64fe2f29631
Submitter: Zuul
Branch: stable/queens

commit 89c56d738df62fd4806e958679cfd64fe2f29631
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Feb 11 14:09:06 2020 +0000

    Improve VLAN allocations synchronization

    In order to reduce the number of elements retrieved from the DB, this
    patch, before processing the VLAN allocations per physical network,
    deleted those registers belonging to any unconfigured physical network.

    The VLAN registers per physical network are deleted using a bulk delete
    operation, to speed up the process.

    Those missing VLAN registers per network are now created using a bulk
    insert operation, available in the ORM. This bulk operation speeds up
    the sync process.

    Conflicts:
          neutron/plugins/ml2/drivers/type_vlan.py

    Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
    Partial-Bug: #1862178
    (cherry picked from commit 016e7826f165e69c3956375fc3aa8b8d642c9dc9)
    (cherry picked from commit 651eb12bec5e5e986bed79f3a5c006c617e79bda)
    (cherry picked from commit 4fff732b76ee5d9a4917901b33ce80ccafb5203f)
    (cherry picked from commit 8a6521e4457acbd7ceed941f363ee7c9872422ba)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/721639
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bdfdb812a2d36a669e2a771a36737c7165cdd722
Submitter: Zuul
Branch: stable/stein

commit bdfdb812a2d36a669e2a771a36737c7165cdd722
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Feb 11 14:09:06 2020 +0000

    Improve VLAN allocations synchronization

    In order to reduce the number of elements retrieved from the DB, this
    patch, before processing the VLAN allocations per physical network,
    deleted those registers belonging to any unconfigured physical network.

    The VLAN registers per physical network are deleted using a bulk delete
    operation, to speed up the process.

    Those missing VLAN registers per network are now created using a bulk
    insert operation, available in the ORM. This bulk operation speeds up
    the sync process.

    Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
    Partial-Bug: #1862178
    (cherry picked from commit 016e7826f165e69c3956375fc3aa8b8d642c9dc9)
    (cherry picked from commit 651eb12bec5e5e986bed79f3a5c006c617e79bda)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/721642
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8a6521e4457acbd7ceed941f363ee7c9872422ba
Submitter: Zuul
Branch: stable/rocky

commit 8a6521e4457acbd7ceed941f363ee7c9872422ba
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Feb 11 14:09:06 2020 +0000

    Improve VLAN allocations synchronization

    In order to reduce the number of elements retrieved from the DB, this
    patch, before processing the VLAN allocations per physical network,
    deleted those registers belonging to any unconfigured physical network.

    The VLAN registers per physical network are deleted using a bulk delete
    operation, to speed up the process.

    Those missing VLAN registers per network are now created using a bulk
    insert operation, available in the ORM. This bulk operation speeds up
    the sync process.

    Conflicts:
          neutron/plugins/ml2/drivers/type_vlan.py

    Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
    Partial-Bug: #1862178
    (cherry picked from commit 016e7826f165e69c3956375fc3aa8b8d642c9dc9)
    (cherry picked from commit 651eb12bec5e5e986bed79f3a5c006c617e79bda)
    (cherry picked from commit 4fff732b76ee5d9a4917901b33ce80ccafb5203f)

tags: added: in-stable-rocky
tags: removed: neutron-proactive-backport-potential
Changed in neutron:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.