Comment 4 for bug 1896734

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : Re: A privsep daemon hangs when debug logging is enabled

The issue got reproduced again.

The problem with accessing c.handlers was something else - in the (r)pdb debugger "c" is interpreted as "continue" so c.handlers was interpreted as "continue" - !c.handlers has to be used instead.

So the issue happens down the line when the PrivsepLogHandler tries to send data over a unix socket back to the client (/tmp/tmpn6wjd_lk/privsep.sock). The socket file does not exist in the file system which is normal considering it gets opened for a short period of time so that processes (client & privsep) can get a handle to it and then gets deleted (while being kept alive by the kernel while it is open).

https://github.com/openstack/oslo.privsep/blob/stable/ussuri/oslo_privsep/daemon.py#L190

I can see that logging locks get acquired just fine so this isn't a contention between threads (the ThreadPoolExecutor was also modified in-place to only have 1 thread to prove this).

...
862 if rv:
863 -> self.acquire()
864 try:
865 self.emit(record)
866 finally:
867 self.release()
868 return rv
(Pdb) dir(self.lock)
['__class__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_acquire_restore', '_is_owned', '_release_save', 'acquire', 'release']
(Pdb) self.lock
<unlocked _thread.RLock object owner=0 count=0 at 0x7f9f29aaa120>

> /usr/lib/python3/dist-packages/oslo_privsep/daemon.py(170)emit()
-> self.channel.send((None, (Message.LOG, data)))
(Pdb) l
165
166 # serialise msg now so we can drop (potentially unserialisable) args
167 data['msg'] = record.getMessage()
168 data['args'] = ()
169
170 -> self.channel.send((None, (Message.LOG, data)))
171
172
173 class _ClientChannel(comm.ClientChannel):
174 """Our protocol, layered on the basic primitives in comm.ClientChannel"""
175
(Pdb) self.channel
<oslo_privsep.comm.ServerChannel object at 0x7f9f29a98cf8>

(Pdb) self.channel.wlock
<unlocked _thread.lock object at 0x7f9f29af0800>
(Pdb) self.channel.rlock
<unlocked _thread.lock object at 0x7f9f29af0fd0>

And here is the place where the execution halts:

(Pdb) self.writesock
<socket.socket fd=3, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, raddr=/tmp/tmpn6wjd_lk/privsep.sock>

(Pdb) l
 51 self.writesock = writesock
 52
 53 def send(self, msg):
 54 buf = msgpack.packb(msg, use_bin_type=True,
 55 unicode_errors='surrogateescape')
 56 -> self.writesock.sendall(buf)
 57
 58 def close(self):
 59 # Hilarious. `socket._socketobject.close()` doesn't actually
 60 # call `self._sock.close()`. Oh well, we really wanted a half
 61 # close anyway.

(Pdb) s

<hangs here>