Fullstack l3 agent tests failing due to timeout waiting until port is active

Bug #1930401 reported by Slawek Kaplonski
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Lajos Katona
oslo.privsep
New
Undecided
Unassigned
Revision history for this message
LIU Yulong (dragon889) wrote :

I've checked one log, seems this is related to bug:
https://bugs.launchpad.net/neutron/+bug/1930432

The router interface's port never had the provisioning_complete action, so it's status will not be set to ACTIVE forever.

Revision history for this message
Lajos Katona (lajos-katona) wrote :
Revision history for this message
Oleg Bondarev (obondarev) wrote :

So I checked logs of one failure:

 - port 45ed2855-2935-406e-87a8-83ac7cc08a3b failed to reach ACTIVE status, failed at 08.20.28
 - L2 agent provisioned port in time:
   - 2021-05-27 08:19:36.646 138296 DEBUG neutron.db.provisioning_blocks [req-c2b1dbf8-4d37-4ae0-8006-e0cc28097338 - - - - -] Provisioning for port 45ed2855-2935-406e-87a8-83ac7cc08a3b completed by entity L2. provisioning_complete
 - provisioning was not completed by DHCP agent, looking at DHCP agent logs I see they stop at 08:19:28.266:
   - https://b87ba208d44b7f1356ad-f27c11edabee52a7804784593cf2712d.ssl.cf5.rackcdn.com/791365/5/check/neutron-fullstack-with-uwsgi/634ccb1/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_external_subnet_changed/neutron-dhcp-agent--2021-05-27--08-19-12-515519_log.txt

So seems the issue is that dhcp agent was hanging or stopped

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Same for other failures: dhcp agent hangs after

2021-05-27 16:39:36.049 177163 DEBUG neutron.agent.linux.dhcp [req-93c95823-781c-4647-b0bb-03a77d0a634b - - - - -] DHCP port dhcp71422628-436d-5fcb-b3f9-d2554dc1fc25-d0842bd2-8f81-4454-a912-ee8942f3aaf3 on network d0842bd2-8f81-4454-a912-ee8942f3aaf3 does not yet exist. Creating new one. _setup_new_dhcp_port /home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/dhcp.py:1536
2021-05-27 16:39:36.667 178150 DEBUG oslo.privsep.daemon [req-ae0eb87f-1b2b-4785-8dc4-b14bf84f860f - - - - -] privsep: reply[140041414596384]: (4, None) _call_back /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/daemon.py:510

issue probably related to privsep

Revision history for this message
Oleg Bondarev (obondarev) wrote :

according to log dhcp agent hangs at:
https://github.com/openstack/oslo.privsep/blob/master/oslo_privsep/daemon.py#L363

when spawning daemon for with neutron.privileged.default context:

Running privsep helper: ['sudo', '/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/bin/neutron-rootwrap', '/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/tmp/tmp85j24qwy/tmp66u3c94b/neutron.conf', '--config-file', '/tmp/tmp85j24qwy/tmp66u3c94b/dhcp_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpihodg8vw/privsep.sock']

Revision history for this message
Oleg Bondarev (obondarev) wrote :

I think need to set timeout for proc.wait() at https://github.com/openstack/oslo.privsep/blob/master/oslo_privsep/daemon.py#L363 and handle TimeoutExpired exception properly

Revision history for this message
Lajos Katona (lajos-katona) wrote :

Yeah that's a good idea, the hard part is to drive through the call chain without breaking everything else. I check it.

Changed in neutron:
assignee: nobody → Lajos Katona (lajos-katona)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.privsep (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/oslo.privsep/+/794993

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/794994

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/795300

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/795300
Committed: https://opendev.org/openstack/neutron/commit/07337f9e99cdcfb1af3546a537d5595330e8bded
Submitter: "Zuul (22348)"
Branch: master

commit 07337f9e99cdcfb1af3546a537d5595330e8bded
Author: Oleg Bondarev <email address hidden>
Date: Tue Jun 8 14:39:43 2021 +0300

    Use 2 dhcp agents in TestLegacyL3Agent

    This is a workaround for privsep hanging issue described in bug 1930401.
    Proper fix is developed in
    https://review.opendev.org/c/openstack/neutron/+/794994
    - this fix will revert current change to reproduce and verify
    privsep issue is fixed.

    Related-Bug: #1930401
    Change-Id: I143cd55612118f243c0e502fb77a611d1ee48761

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.privsep (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.privsep/+/794993
Committed: https://opendev.org/openstack/oslo.privsep/commit/f7f3349d6a4def52f810ab1728879521c12fe2d0
Submitter: "Zuul (22348)"
Branch: master

commit f7f3349d6a4def52f810ab1728879521c12fe2d0
Author: elajkat <email address hidden>
Date: Tue Jun 8 18:09:31 2021 +0200

    Add timeout to PrivContext and entrypoint_with_timeout decorator

    entrypoint_with_timeout decorator can be used with a timeout parameter,
    if the timeout is reached PrivsepTimeout is raised.
    The PrivContext has timeout variable, which will be used for all
    functions decorated with entrypoint, and PrivsepTimeout is raised if
    timeout is reached.

    Co-authored-by: Rodolfo Alonso <email address hidden>
    Change-Id: Ie3b1fc255c0c05fd5403b90ef49b954fe397fb77
    Related-Bug: #1930401

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/794994
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/807134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/813129

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/807134
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/813129
Committed: https://opendev.org/openstack/neutron/commit/b57fdf7038727a488bd73bfb05858347a9ebcc09
Submitter: "Zuul (22348)"
Branch: master

commit b57fdf7038727a488bd73bfb05858347a9ebcc09
Author: Slawek Kaplonski <email address hidden>
Date: Fri Oct 8 08:50:07 2021 +0200

    Revert "Use 2 dhcp agents in TestLegacyL3Agent"

    This reverts commit 07337f9e99cdcfb1af3546a537d5595330e8bded.

    Now we don't use dhcp in the L3 agent tests at all so this isn't
    needed anymore.

    Related-Bug: #1930401
    Related-Bug: #1946186
    Change-Id: If3a48251770c3e669ac5a9d6a44085d295809240

Changed in neutron:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.