neutron

Timeout in executing ovs command crash ovs agent

Bug #1838563 reported by Slawek Kaplonski on 2019-07-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	Fix Released	Medium	Slawek Kaplonski

Bug Description

In case when there is timeout while executing command in ovs command during agent initialization, agent crash and will not try to recover.

Example of such error in CI: http://logs.openstack.org/84/673784/1/check/tempest-multinode-full-py3/283e76b/compute1/logs/screen-q-agt.txt.gz#_Jul_31_17_48_48_877755

Tags:

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2019-08-01:

I'm not sure if that is really a bug and should be changed.
Exception which was raised in this case was: ovsdbapp.exceptions.TimeoutException which inherits from RuntimeError. Such kind of errors are handled in ovs agent's code in https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L2574
so it looks that this is properly handled now.

Maybe we could handle this TimeoutException in special way and try to retry couple of times to initialize agent?

Slawek Kaplonski (slaweq) on 2019-08-01

Changed in neutron:
importance:	High → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-01: Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/674085

Changed in neutron:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-08: Fix merged to neutron (master)

Reviewed: https://review.opendev.org/674085
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=30a60d04f098581340f83b38b7a79104308c66bc
Submitter: Zuul
Branch: master

commit 30a60d04f098581340f83b38b7a79104308c66bc
Author: Slawek Kaplonski <email address hidden>
Date: Thu Aug 1 18:10:19 2019 +0200

Add 3 retry attempts to initialize ovs agent

    In case when TimeoutException will be raised by ovsdbapp during
    initialization of neutron-ovs-agent, it will now try to create
    object of agents class 3 times before agent will be terminated.

    Such timeouts shouldn't happend usually but if for some reason
    it happens e.g. once, e.g. in CI job, it should be better to just
    retry and initialize this agent instead of left it dead on the
    node.

Change-Id: I93e8d21d612e343479f26f8adc4477473579bab1
Closes-Bug: #1838563