Looked at the customer setup: The issue seems to be that "service supervisor-analytics stop" command issued by fab stop_collector is hanging. "service supervisor-analytics stop" in turn calls "supervisorctl -s unix:///var/run/supervisord_analytics.sock stop all" and that command hangs. The reason for the hang seems to be that once the fab stop_database is run, contrail-alarm-gen loses connection to kafka and it then outputs a lot of messages to stdout FailedPayloadsError for -uve-19:0 Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: FailedPayloadsError for -uve-19:0 Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: FailedPayloadsError for -uve-19:0 Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: FailedPayloadsError for -uve-19:0 Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: FailedPayloadsError for -uve-19:0 Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: Connect attempt to returned error 111. Disconnecting. Skipping unconnected connection: FailedPayloadsError for -uve-19:0 supervisord gets busy in reading those and is not able to process the stop all command. This was confirmed by removing /var/log/contrail/contrail-alarm-gen-0-stdout.log and then observing that fab stop_collector moved on to the next node. root@sv-35:~# echo > /var/log/contrail/contrail-alarm-gen-0-stdout.log root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log -rw-r--r-- 1 root root 18783175 Aug 3 14:24 /var/log/contrail/contrail-alarm-gen-0-stdout.log root@sv-35:~# date Thu Aug 3 14:24:42 JST 2017 root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log -rw-r--r-- 1 root root 21450583 Aug 3 14:24 /var/log/contrail/contrail-alarm-gen-0-stdout.log root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log -rw-r--r-- 1 root root 22888661 Aug 3 14:24 /var/log/contrail/contrail-alarm-gen-0-stdout.log root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log -rw-r--r-- 1 root root 23655489 Aug 3 14:24 /var/log/contrail/contrail-alarm-gen-0-stdout.log root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log -rw-r--r-- 1 root root 24355706 Aug 3 14:24 /var/log/contrail/contrail-alarm-gen-0-stdout.log root@sv-35:~# rm -f /var/log/contrail/contrail-alarm-gen-0-stdout.log root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log ls: cannot access /var/log/contrail/contrail-alarm-gen-0-stdout.log: No such file or directory root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log ls: cannot access /var/log/contrail/contrail-alarm-gen-0-stdout.log: No such file or directory root@sv-35:~# ls -lart /var/log/contrail/contrail-alarm-gen-0-stdout.log ls: cannot access /var/log/contrail/contrail-alarm-gen-0-stdout.log: No such file or directory [root@10.0.0.134] Executing task 'stop_collector' 2017-08-03 13:48:37:485017: [root@10.0.0.134] sudo: service supervisor-analytics stop 2017-08-03 13:48:37:485274: [root@10.0.0.134] out: supervisor-analytics stop/waiting 2017-08-03 13:49:22:778913: [root@10.0.0.134] out: 2017-08-03 13:49:22:779408: 2017-08-03 13:49:22:779728: [root@10.0.0.135] Executing task 'stop_collector' 2017-08-03 13:49:22:780124: [root@10.0.0.135] sudo: service supervisor-analytics stop 2017-08-03 13:49:22:780520: [root@10.0.0.135] out: 2017-08-03 14:11:51:352998: [root@10.0.0.135] out: 2017-08-03 14:11:51:785694: [root@10.0.0.135] out: 2017-08-03 14:22:11:156542: [root@10.0.0.135] out: 2017-08-03 14:22:11:357726: [root@10.0.0.135] out: 2017-08-03 14:22:11:522193: [root@10.0.0.135] out: 2017-08-03 14:25:13:052065: [root@10.0.0.135] out: 2017-08-03 14:25:13:252827: [root@10.0.0.135] out: 2017-08-03 14:25:13:653796: [root@10.0.0.135] out: 2017-08-03 14:25:13:955165: [root@10.0.0.135] out: supervisor-analytics stop/waiting 2017-08-03 14:26:04:152905: [root@10.0.0.135] out: 2017-08-03 14:26:04:153275: 2017-08-03 14:26:04:153553: [root@10.0.0.136] Executing task 'stop_collector'