OpenvSwitch service doesn't restart after a failure automatically

Bug #1772881 reported by Alexander Rubtsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Alexander Rubtsov

Bug Description

--- Environment ---
MOS 9.2 (build 606)

--- Description ---
OpenvSwitch service doesn't start after a failure automatically

--- Steps to reproduce ---
1) Deploy an OpenStack environment

2) Log into a Controller node

3) Determine PID of OpenvSwitch process:
[root@node-1 ~]$ ps aux | grep "ovs-vswitchd.pi[d]"
root 7046 0.1 0.3 246308 7224 ? S<Ll May21 4:05 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

4) Kill this process
kill -9 7046

5) Wait for a while

6) Check the OpenvSwitch process again
ps aux | grep "ovs-vswitchd.pi[d]"

--- Actual result ---
There is no OpenvSwitch process running

--- Expected result ---
OpenvSwitch process is running again:
root 13679 0.3 0.1 246308 3304 ? S<Ll 10:03 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

--- Notes ---
- 1 -
The corresponding Upstart service is not configured to auto-restart:
[root@node-1 ~]$ grep -i "respawn" /etc/init/openvswitch-switch.conf
[root@node-1 ~]$

However, many other services on Controller nodes are configured to auto-restart:
[root@node-1 ~]$ grep -rl "respawn" /etc/init/ | wc -l
66

- 2 -
The service doesn't notice killing of the corresponding process
[root@node-1 ~]$ service openvswitch-switch status
openvswitch-switch start/running
[root@node-1 ~]$ kill -9 13679
[root@node-1 ~]$ ps aux | grep "ovs-vswitchd.pi[d]"
[root@node-1 ~]$ service openvswitch-switch status
openvswitch-switch start/running

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

sla2 for 9.0-updates

Changed in fuel:
importance: Undecided → Medium
assignee: nobody → MOS Maintenance (mos-maintenance)
milestone: none → 9.x-updates
tags: added: customer-found sla2
Changed in fuel:
milestone: 9.x-updates → 9.2-mu-7
status: New → Confirmed
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

After some debug time I found that there is no easy and reliable way to fix this bug. Upstream configuration file for openvswitch-switch service was written in an odd way: there is defined only pre-start and post-stop scripts which contains daemon start/stop logic. There is no "script"/"exec" section. Without it upstart does not know "what should I monitor" (this is also the reason of false-positive "service status" command) and "respawn" feature will not works even if we add it. Proper solution requires rewriting more than half of upstart configuration file which is risky enough.

Also we were notified that there is no cases of openvswitch daemon's failures - the observed one was a result of consequence of actions during testing. This means that this bug is out of scope of maintenance updates because fix for it is too risky and can produce more errors than fixing this one.

Changed in fuel:
assignee: MOS Maintenance (mos-maintenance) → Alexander Rubtsov (arubtsov)
status: Confirmed → Invalid
milestone: 9.2-mu-7 → 9.x-updates
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.