Comment 68 for bug 1927868

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@christian-rohmann The problem essentially boils down to the exception at [1] being raised because prior to that [2] gets called as a result of a timeout exception but the code is not actually catching the exception. This was traced to be the result of a privileged call being used as argument to [3] from [4] (which is in the patch we reverted).

So the *real* problem with privsep code is that if an unexpected exception is raised, it does not get caught thus either killing the reader thread and/or never releasing the lock. There is a separate bug [5] which was raised about the same issue that led to the fix [6] being added to privsep which, crucially, replaces the raised AttributeError with a continue thus stopping it from killing the reader thread. I have not yet tested whether this actually fixes all the agent issues we have seen though and while we should do this, there is still room for improvement in the privsep code namely [7] which should have an except clause that, if nothing else, prints a log message to say that the message timed out.

[1] https://github.com/openstack/oslo.privsep/blob/6d41ef9f91b297091aa37721ba10456142fc5107/oslo_privsep/comm.py#L141
[2] https://github.com/openstack/oslo.privsep/blob/6d41ef9f91b297091aa37721ba10456142fc5107/oslo_privsep/comm.py#L174
[3] https://github.com/openstack/neutron/blob/d4b1b4a0729c187551e1fa2b2855db136456d496/neutron/common/utils.py#L689
[4] https://github.com/openstack/neutron/blob/d8f1f1118d3cde0b5264220836a250f14687893e/neutron/agent/linux/interface.py#L328
[5] https://bugs.launchpad.net/neutron/+bug/1930401
[6] https://github.com/openstack/oslo.privsep/commit/f7f3349d6a4def52f810ab1728879521c12fe2d0
[7] https://github.com/openstack/oslo.privsep/blob/f7f3349d6a4def52f810ab1728879521c12fe2d0/oslo_privsep/comm.py#L189