Fuel for OpenStack

supervisorctl status can't find socket

Bug #1309588 reported by Alexander Charykov on 2014-04-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	Low	Matthew Mosesohn	Fuel for OpenStack 5.1

Bug Description

I tried to add system test of ostf and it fail.

Debugging of function update_ostf(func) in fuelweb_test/helpers/decorators.py

I discovered that it fails in this part:

                helpers.wait(
                    lambda: "RUNNING" in
                    remote.execute("supervisorctl status ostf | awk\
                                   '{print $2}'")['stdout'][0],
                    timeout=60)

I tried to run supervisorctl status ostf on nailgun and it failed:

[root@nailgun ~]# supervisorctl status ostf
unix:///var/run/supervisor.sock no such file
[root@nailgun ~]# /etc/init.d/supervisord restart
Stopping supervisord: ERROR: unix:///var/run/supervisor.sock no such file (already shut down?)
Waiting roughly 60 seconds for /var/run/supervisord.pid to be removed after child processes exit
Supervisord exited as expected in under seconds
Starting supervisord:
assassind STARTING
astute STARTING
nailgun STARTING
ostf STARTING
receiverd STARTING
[root@nailgun ~]# /etc/init.d/supervisord restart
Stopping supervisord: Shut down
Waiting roughly 60 seconds for /var/run/supervisord.pid to be removed after child processes exit
Supervisord still working on shutting down. We've waited roughly 60 seconds, we'll let it do its thing from here
Starting supervisord:
ALREADY STARTED
[root@nailgun ~]# supervisorctl status ostf
unix:///var/run/supervisor.sock no such file

Alexander Charykov (acharykov) on 2014-04-18

Changed in fuel:
milestone:	none → 5.0

Revision history for this message

Alexander Charykov (acharykov) wrote on 2014-04-18:

sys_test.log Edit (308.1 KiB, text/plain)

sys_test.log

Matthew Mosesohn (raytrac3r) on 2014-04-18

Changed in fuel:
assignee:	nobody → Matthew Mosesohn (raytrac3r)

Vladimir Kuklin (vkuklin) on 2014-04-18

Changed in fuel:
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2014-04-18:

I can reproduce it. It's related to the case where supervisord sleeps for 60s and then gives up waiting for supervisorctl shutdown to complete. supervisord doesn't remove its pid file for a little over 1 minute. We need to modify the timeout to maybe 15s. They're all stateless, so it should be fine to kill them.

I'll wait for some Python team person to comment here to see if there are any risks in lowering the timeout.

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2014-04-18:

This is not a blocker since we don't normally need to restart supervisord process. When it does eventually shut down, it removes its PID. If you wait another minute, you can start supervisord service and check statuses as usual.

Changed in fuel:
importance:	High → Medium

Vladimir Kuklin (vkuklin) on 2014-04-24

Changed in fuel:
status:	Confirmed → Incomplete

Mike Scherbakov (mihgen) on 2014-04-26

Changed in fuel:
milestone:	5.0 → 5.1

Dmitry Borodaenko (angdraug) on 2014-06-18

Changed in fuel:
status:	Incomplete → Confirmed
importance:	Medium → Low

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2014-06-27:

Alexander, I believe we've solved this issue already by reducing the wait time for each daemon to die. I'm marking this as fix released, but if you can reproduce it, let me know.

Changed in fuel:
status:	Confirmed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

sys_test.log Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.