stopping barbican-worker times out waiting for RPC task service to start

Bug #1705543 reported by Adam Spiers
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Barbican
Won't Fix
Medium
Unassigned

Bug Description

If I change the queue.asynchronous_workers config option from 1 to 2, then if I start barbican-worker via systemd and stop it again, it hangs on shutdown:

    2017-07-20 16:35:22.158 8435 INFO barbican.queue.server [-] Halting the TaskServer
    2017-07-20 16:35:22.159 8436 INFO barbican.queue.server [-] Halting the TaskServer
    2017-07-20 16:35:22.168 8256 INFO oslo_service.service [-] Caught SIGTERM, stopping children
    2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
    2017-07-20 16:35:22.169 8256 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
    2017-07-20 16:35:22.170 8256 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:611
    2017-07-20 16:35:22.170 8256 INFO barbican.queue.server [-] Halting the TaskServer
    2017-07-20 16:35:26.659 8436 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
    2017-07-20 16:35:26.660 8436 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
    2017-07-20 16:35:26.671 8435 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
    2017-07-20 16:35:26.672 8435 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
    2017-07-20 16:35:52.171 8256 WARNING oslo_messaging.server [-] Possible hang: stop is waiting for start to complete
    2017-07-20 16:35:52.173 8256 DEBUG oslo_messaging.server [-] File "/usr/bin/barbican-worker", line 10, in <module>
        sys.exit(main())
      File "/usr/lib/python2.7/site-packages/barbican/cmd/worker.py", line 68, in main
        workers=CONF.queue.asynchronous_workers
      File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 605, in wait
        self.stop()
      File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 614, in stop
        service.stop()
      File "/usr/lib/python2.7/site-packages/barbican/queue/server.py", line 290, in stop
        self._server.stop()
      File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 264, in wrapper
        log_after, timeout_timer)
      File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 163, in wait_for_completion
        msg, log_after, timeout_timer)
      File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 128, in _wait
        LOG.debug(''.join(traceback.format_stack()))
     _wait /usr/lib/python2.7/site-packages/oslo_messaging/server.py:128

I'm very far from being an oslo.messaging expert, but this *appears* to be the same issue which Sahara had, namely that the RPC server needs to be started before you can safely call wait() on it:

    https://bugs.launchpad.net/sahara/+bug/1546119

I've ported the fix over from Sahara and it seems to fix the issue so I'll submit to gerrit shortly.

One weird thing I couldn't explain is that the bug occurs with asynchronous_workers = 2 regardless of whether queue.enabled is True or False ...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to barbican (master)

Fix proposed to branch: master
Review: https://review.openstack.org/485755

Changed in barbican:
status: New → In Progress
description: updated
Changed in barbican:
importance: Undecided → Medium
Revision history for this message
Adam Spiers (adam.spiers) wrote :

I'm here at the PTG - perhaps we could discuss this?

Revision history for this message
Adam Spiers (adam.spiers) wrote :
Revision history for this message
Adam Spiers (adam.spiers) wrote :
Revision history for this message
Adam Spiers (adam.spiers) wrote :

We're here at the Dublin PTG and hoping to discuss this shortly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on barbican (master)

Change abandoned by Douglas Mendizábal (<email address hidden>) on branch: master
Review: https://review.openstack.org/485755
Reason: Abandoning patch due to lack of activity for months. Feel free to re-submit if needed.

Revision history for this message
Grzegorz Grasza (xek) wrote :

Closing out bugs created before migration to StoryBoard. Please re-open if you are of the opinion it is still current.

Changed in barbican:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.