If I change the queue.asynchronous_workers config option from 1 to 2, then if I start barbican-worker via systemd and stop it again, it hangs on shutdown:
2017-07-20 16:35:22.158 8435 INFO barbican.queue.server [-] Halting the TaskServer
2017-07-20 16:35:22.159 8436 INFO barbican.queue.server [-] Halting the TaskServer
2017-07-20 16:35:22.168 8256 INFO oslo_service.service [-] Caught SIGTERM, stopping children
2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2017-07-20 16:35:22.170 8256 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:611
2017-07-20 16:35:22.170 8256 INFO barbican.queue.server [-] Halting the TaskServer
2017-07-20 16:35:26.659 8436 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2017-07-20 16:35:26.660 8436 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2017-07-20 16:35:26.671 8435 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2017-07-20 16:35:26.672 8435 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2017-07-20 16:35:52.171 8256 WARNING oslo_messaging.server [-] Possible hang: stop is waiting for start to complete
2017-07-20 16:35:52.173 8256 DEBUG oslo_messaging.server [-] File "/usr/bin/barbican-worker", line 10, in <module> sys.exit(main())
File "/usr/lib/python2.7/site-packages/barbican/cmd/worker.py", line 68, in main workers=CONF.queue.asynchronous_workers
File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 605, in wait
self.stop()
File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 614, in stop service.stop()
File "/usr/lib/python2.7/site-packages/barbican/queue/server.py", line 290, in stop self._server.stop()
File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 264, in wrapper
log_after, timeout_timer)
File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 163, in wait_for_completion
msg, log_after, timeout_timer)
File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 128, in _wait LOG.debug(''.join(traceback.format_stack()))
_wait /usr/lib/python2.7/site-packages/oslo_messaging/server.py:128
I'm very far from being an oslo.messaging expert, but this *appears* to be the same issue which Sahara had, namely that the RPC server needs to be started before you can safely call wait() on it:
If I change the queue.asynchron ous_workers config option from 1 to 2, then if I start barbican-worker via systemd and stop it again, it hangs on shutdown:
2017-07-20 16:35:22.158 8435 INFO barbican. queue.server [-] Halting the TaskServer queue.server [-] Halting the TaskServer service [-] Caught SIGTERM, stopping children y.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/ python2. 7/site- packages/ oslo_concurrenc y/lockutils. py:212 y.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/ python2. 7/site- packages/ oslo_concurrenc y/lockutils. py:225 service [-] Stop services. stop /usr/lib/ python2. 7/site- packages/ oslo_service/ service. py:611 queue.server [-] Halting the TaskServer y.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/ python2. 7/site- packages/ oslo_concurrenc y/lockutils. py:212 y.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/ python2. 7/site- packages/ oslo_concurrenc y/lockutils. py:225 y.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/ python2. 7/site- packages/ oslo_concurrenc y/lockutils. py:212 y.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/ python2. 7/site- packages/ oslo_concurrenc y/lockutils. py:225 server [-] Possible hang: stop is waiting for start to complete server [-] File "/usr/bin/ barbican- worker" , line 10, in <module>
sys.exit( main()) python2. 7/site- packages/ barbican/ cmd/worker. py", line 68, in main
workers= CONF.queue. asynchronous_ workers python2. 7/site- packages/ oslo_service/ service. py", line 605, in wait python2. 7/site- packages/ oslo_service/ service. py", line 614, in stop
service. stop() python2. 7/site- packages/ barbican/ queue/server. py", line 290, in stop
self._ server. stop() python2. 7/site- packages/ oslo_messaging/ server. py", line 264, in wrapper python2. 7/site- packages/ oslo_messaging/ server. py", line 163, in wait_for_completion python2. 7/site- packages/ oslo_messaging/ server. py", line 128, in _wait
LOG.debug( ''.join( traceback. format_ stack() )) python2. 7/site- packages/ oslo_messaging/ server. py:128
2017-07-20 16:35:22.159 8436 INFO barbican.
2017-07-20 16:35:22.168 8256 INFO oslo_service.
2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrenc
2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrenc
2017-07-20 16:35:22.170 8256 DEBUG oslo_service.
2017-07-20 16:35:22.170 8256 INFO barbican.
2017-07-20 16:35:26.659 8436 DEBUG oslo_concurrenc
2017-07-20 16:35:26.660 8436 DEBUG oslo_concurrenc
2017-07-20 16:35:26.671 8435 DEBUG oslo_concurrenc
2017-07-20 16:35:26.672 8435 DEBUG oslo_concurrenc
2017-07-20 16:35:52.171 8256 WARNING oslo_messaging.
2017-07-20 16:35:52.173 8256 DEBUG oslo_messaging.
File "/usr/lib/
File "/usr/lib/
self.stop()
File "/usr/lib/
File "/usr/lib/
File "/usr/lib/
log_after, timeout_timer)
File "/usr/lib/
msg, log_after, timeout_timer)
File "/usr/lib/
_wait /usr/lib/
I'm very far from being an oslo.messaging expert, but this *appears* to be the same issue which Sahara had, namely that the RPC server needs to be started before you can safely call wait() on it:
https:/ /bugs.launchpad .net/sahara/ +bug/1546119
I've ported the fix over from Sahara and it seems to fix the issue so I'll submit to gerrit shortly.