supervisorctl status can't find socket

Bug #1309588 reported by Alexander Charykov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Low
Matthew Mosesohn

Bug Description

I tried to add system test of ostf and it fail.

Debugging of function update_ostf(func) in fuelweb_test/helpers/decorators.py

I discovered that it fails in this part:

                helpers.wait(
                    lambda: "RUNNING" in
                    remote.execute("supervisorctl status ostf | awk\
                                   '{print $2}'")['stdout'][0],
                    timeout=60)

I tried to run supervisorctl status ostf on nailgun and it failed:

[root@nailgun ~]# supervisorctl status ostf
unix:///var/run/supervisor.sock no such file
[root@nailgun ~]# /etc/init.d/supervisord restart
Stopping supervisord: ERROR: unix:///var/run/supervisor.sock no such file (already shut down?)
Waiting roughly 60 seconds for /var/run/supervisord.pid to be removed after child processes exit
Supervisord exited as expected in under seconds
Starting supervisord:
assassind STARTING
astute STARTING
nailgun STARTING
ostf STARTING
receiverd STARTING
[root@nailgun ~]# /etc/init.d/supervisord restart
Stopping supervisord: Shut down
Waiting roughly 60 seconds for /var/run/supervisord.pid to be removed after child processes exit
Supervisord still working on shutting down. We've waited roughly 60 seconds, we'll let it do its thing from here
Starting supervisord:
ALREADY STARTED
[root@nailgun ~]# supervisorctl status ostf
unix:///var/run/supervisor.sock no such file

Changed in fuel:
milestone: none → 5.0
Revision history for this message
Alexander Charykov (acharykov) wrote :

sys_test.log

Changed in fuel:
assignee: nobody → Matthew Mosesohn (raytrac3r)
Changed in fuel:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

I can reproduce it. It's related to the case where supervisord sleeps for 60s and then gives up waiting for supervisorctl shutdown to complete. supervisord doesn't remove its pid file for a little over 1 minute. We need to modify the timeout to maybe 15s. They're all stateless, so it should be fine to kill them.

I'll wait for some Python team person to comment here to see if there are any risks in lowering the timeout.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This is not a blocker since we don't normally need to restart supervisord process. When it does eventually shut down, it removes its PID. If you wait another minute, you can start supervisord service and check statuses as usual.

Changed in fuel:
importance: High → Medium
Changed in fuel:
status: Confirmed → Incomplete
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: 5.0 → 5.1
Changed in fuel:
status: Incomplete → Confirmed
importance: Medium → Low
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Alexander, I believe we've solved this issue already by reducing the wait time for each daemon to die. I'm marking this as fix released, but if you can reproduce it, let me know.

Changed in fuel:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.