Comment 5 for bug 1696905

Revision history for this message
Sanchit Malhotra (isanchitm) wrote :

It has been further identified that the oslo_messaging is stuck in wait_condition().

oslo_messaging/_drivers/pool.py
{code}
 85 def get(self):
     86 """Return an item from the pool, when one is available.
     87
     88 This may cause the calling thread to block.
     89 """
     90 with self._cond:
     91 while True:
     92 try:
     93 ttl_watch, item = self._items.pop()
     94 self.expire()
     95 return item
     96 except IndexError:
     97 pass
     98
     99 if self._current_size < self._max_size:
    100 self._current_size += 1
    101 break
    102
    103 wait_condition(self._cond)
    104
    105 # We've grabbed a slot and dropped the lock, now do the creation
    106 try:
    107 return self.create()
    108 except Exception:
    109 with self._cond:
    110 self._current_size -= 1
    111 raise
{code}

In the problematic case when requests and service hangs, the following condition is seen:
self._cond = <Condition(<_RLock owner='MainThread' count=1>, 0)>

In our system, the cond.wait() interface is called with timeout of 1.

{code}
# TODO(harlowja): remove this when we no longer have to support 2.7
     29 if sys.version_info[0:2] < (3, 2):
     30 def wait_condition(cond):
     31 # FIXME(markmc): timeout needed to allow keyboard interrupt
     32 # http://bugs.python.org/issue8844
     33 cond.wait(timeout=1)
     34 else:
     35 def wait_condition(cond):
     36 cond.wait()
{code}

There seems to be some problem with RLock. What do you think?