Timeout in executing ovs command crash ovs agent

Bug #1838563 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Slawek Kaplonski

Bug Description

In case when there is timeout while executing command in ovs command during agent initialization, agent crash and will not try to recover.

Example of such error in CI: http://logs.openstack.org/84/673784/1/check/tempest-multinode-full-py3/283e76b/compute1/logs/screen-q-agt.txt.gz#_Jul_31_17_48_48_877755

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I'm not sure if that is really a bug and should be changed.
Exception which was raised in this case was: ovsdbapp.exceptions.TimeoutException which inherits from RuntimeError. Such kind of errors are handled in ovs agent's code in https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L2574
so it looks that this is properly handled now.

Maybe we could handle this TimeoutException in special way and try to retry couple of times to initialize agent?

Changed in neutron:
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/674085

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/674085
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=30a60d04f098581340f83b38b7a79104308c66bc
Submitter: Zuul
Branch: master

commit 30a60d04f098581340f83b38b7a79104308c66bc
Author: Slawek Kaplonski <email address hidden>
Date: Thu Aug 1 18:10:19 2019 +0200

    Add 3 retry attempts to initialize ovs agent

    In case when TimeoutException will be raised by ovsdbapp during
    initialization of neutron-ovs-agent, it will now try to create
    object of agents class 3 times before agent will be terminated.

    Such timeouts shouldn't happend usually but if for some reason
    it happens e.g. once, e.g. in CI job, it should be better to just
    retry and initialize this agent instead of left it dead on the
    node.

    Change-Id: I93e8d21d612e343479f26f8adc4477473579bab1
    Closes-Bug: #1838563

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.0.0b1

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.