Comment 0 for bug 1705543

Revision history for this message
Adam Spiers (adam.spiers) wrote :

If I change the queue.asynchronous_workers config option from 1 to 2, then if I start barbican-worker via systemd and stop it again, it hangs on shutdown:

    2017-07-20 16:35:22.158 8435 INFO barbican.queue.server [-] Halting the TaskServer
    2017-07-20 16:35:22.159 8436 INFO barbican.queue.server [-] Halting the TaskServer
    2017-07-20 16:35:22.168 8256 INFO oslo_service.service [-] Caught SIGTERM, stopping children
    2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
    2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
    2017-07-20 16:35:22.170 8256 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:611
    2017-07-20 16:35:22.170 8256 INFO barbican.queue.server [-] Halting the TaskServer
    2017-07-20 16:35:26.659 8436 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
    2017-07-20 16:35:26.660 8436 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
    2017-07-20 16:35:26.671 8435 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
    2017-07-20 16:35:26.672 8435 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
    2017-07-20 16:35:52.171 8256 WARNING oslo_messaging.server [-] Possible hang: stop is waiting for start to complete
    2017-07-20 16:35:52.173 8256 DEBUG oslo_messaging.server [-] File "/usr/bin/barbican-worker", line 10, in <module>
        sys.exit(main())
      File "/usr/lib/python2.7/site-packages/barbican/cmd/worker.py", line 68, in main
        workers=CONF.queue.asynchronous_workers
      File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 605, in wait
        self.stop()
      File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 614, in stop
        service.stop()
      File "/usr/lib/python2.7/site-packages/barbican/queue/server.py", line 290, in stop
        self._server.stop()
      File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 264, in wrapper
        log_after, timeout_timer)
      File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 163, in wait_for_completion
        msg, log_after, timeout_timer)
      File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 128, in _wait
        LOG.debug(''.join(traceback.format_stack()))
     _wait /usr/lib/python2.7/site-packages/oslo_messaging/server.py:128

I'm very far from being an oslo.messaging expert, but this *appears* to be the same issue which Sahara had, namely that the RPC server needs to be started before you can safely call wait() on it:

    https://bugs.launchpad.net/sahara/+bug/1546119

I've ported the fix over from Sahara and it seems to fix the issue so I'll submit to gerrit shortly.