neutron openvswitch agent exits if unix:/var/run/openvswitch/db.sock is not yet created

Bug #1634123 reported by Serguei Bezverkhi on 2016-10-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Medium
Unassigned

Bug Description

In kubernetes environment is very difficult to make sure that neutron-openvswitch-agent pod starts after openvswitch db and vswitchd pods during kubernetes cluster startup process. As a result it does not find socket, fails and stay in failed state. If instead of just failing it could retry several times then result would be different as by that time openvswitch would have come up and created db socket. It would be great to add retry mechanism to neutron-openvswitch-agent to make it more robust in kubernetes environment.

neutron (9.0.0)
neutron-lib (0.4.0)
python-neutronclient (6.0.0)

2016-10-17 13:05:57.922 36 INFO ryu.base.app_manager [-] loading app ryu.app.ofctl.service
2016-10-17 13:05:57.984 36 INFO ryu.base.app_manager [-] loading app ryu.controller.ofp_handler
2016-10-17 13:05:57.989 36 INFO ryu.base.app_manager [-] instantiating app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp of OVSNeutronAgentRyuApp
2016-10-17 13:05:57.990 36 INFO ryu.base.app_manager [-] instantiating app ryu.controller.ofp_handler of OFPHandler
2016-10-17 13:05:57.991 36 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
2016-10-17 13:05:57.994 36 DEBUG neutron.callbacks.manager [-] Subscribe: <function init_handler at 0x4549ed8> Open vSwitch agent after_init subscribe /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/callbacks/manager.py:42
2016-10-17 13:05:57.995 36 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'addr', 'show', 'to', '172.29.75.24'] create_process /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/utils.py:83
2016-10-17 13:05:58.181 36 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/utils.py:140
2016-10-17 13:05:58.225 36 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', 'set-manager', 'ptcp:6640:0.0.0.0'] create_process /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/utils.py:83
2016-10-17 13:05:58.792 36 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)

2016-10-17 13:05:58.793 36 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
 Agent terminated!
2016-10-17 13:05:58.796 36 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
  File "/var/lib/kolla/venv/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 37, in agent_main_wrapper
    ovs_agent.main(bridge_classes)
  File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2174, in main
    sys.exit(1)
SystemExit: 1

Tags: ovs Edit Tag help
Assaf Muller (amuller) wrote :

I agree, OVS agent should continuously try to connect instead of exit.

tags: added: ovs
Changed in neutron:
importance: Undecided → High
summary: - neutron openvswitch agent exists if unix:/var/run/openvswitch/db.sock is
+ neutron openvswitch agent exits if unix:/var/run/openvswitch/db.sock is
not yet created
Changed in neutron:
assignee: nobody → Serguei Bezverkhi (sbezverk)
Changed in neutron:
status: New → In Progress
Sreekumar S (sreesiv) wrote :

This seems to be handled by PS https://review.openstack.org/#/c/388105/

I believe this should be handled by the --retry option of ovs-vsctl instead of a wait loop in the neutron agent code.
http://openvswitch.org/support/dist-docs/ovs-vsctl.8.txt

Changed in neutron:
assignee: Serguei Bezverkhi (sbezverk) → Kevin Benton (kevinbenton)

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Kevin Benton (kevinbenton) → nobody
status: In Progress → New
tags: added: timeout-abandon

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/388105
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

This is still a legit bug. I lowered severity to Medium because it doesn't affect bare metal Neutron installations, and there is a workaround (manage dependencies between containers in some other way, like via systemd unit service files).

Changed in neutron:
status: New → Confirmed
importance: High → Medium
tags: removed: timeout-abandon
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers