Comment 19 for bug 1327946

Revision history for this message
Joshua Harlow (harlowja) wrote : [Bug 1327946] Re: PosixLock not resistant to program termination

+2 from me,

This is a bug that is just going to cause problems for multiple releases
(especially since lockutils.py is copy and pasted across many projects..),
it's likely already caused serious problems for operators and we don't even
know it...

When a fundamental lock mechanism is broken (thankfully it just stalls) it
raises serious questions IMHO), how are they (or other users) coping with
this bug (restarting nodes to clear old locks?? something else??)...

Looking forward to seeing a reliable fix...

On Thursday, August 7, 2014, Ben Nemec <<email address hidden>
<javascript:_e(%7B%7D,'cvml','<email address hidden>');>> wrote:

> Okay, this has lingered too long for as serious a bug as it is. A
> couple of thoughts:
>
> As far as OS X compatibility, it's not a huge concern of mine. If
> something doesn't work there we can just fall back to file locks like we
> do on Windows. If someone wants to provide a file-free lock for either
> of those platforms I'm sure we'd be happy to take it.
>
> That said, if sysv provides a proper way to do this I'm okay with using
> that. If I'm reading the man page correctly, the bug is only relevant
> on kernels <= 2.6.10? None of our supported platforms are using a
> kernel that old, so it isn't a concern for us.
>
> I'll take a closer look at Yuriy's change to switch to sysv_ipc.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1327946
>
> Title:
> PosixLock not resistant to program termination
>
> Status in Oslo - a Library of Common OpenStack Code:
> In Progress
>
> Bug description:
> If a program is terminated via ctrl-c or other signal and it has
> acquired locks using the posix_ipc sempahore those sempahores are
> never released (unlike the filelock which does release automatically
> on program termination). This can be easily seen by trying the
> following in a shell (see below). You can come back after minutes and
> hours and try to acquire the abandoned semaphore and still be unable
> to (it's still apparently busy), perhaps there is a automatic timeout
> that needs to be provided? I can imagine that if people do upgrades of
> a service (using for example `service nova-api stop`) using this lock
> and the program is currently locked that they would have to have some
> kind of sempahore cleaning script to even restart that application
> (which seems bad).
>
> Python 2.7.6 (default, Mar 22 2014, 22:59:56)
> [GCC 4.8.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from openstack.common import lockutils
> >>> import os
> >>> os.getpid()
> 16123
> >>> lockutils.InterProcessLock("testing")
> <openstack.common.lockutils._PosixLock object at 0x7f8bb4a8e210>
> >>> a = lockutils.InterProcessLock("testing")
> >>> a.acquire()
> <openstack.common.lockutils._PosixLock object at 0x7f8bb30e7ad0>
> >>> exit()
> (.dev)josh@lappy:~/Dev/oslo-incubator$ python
> Python 2.7.6 (default, Mar 22 2014, 22:59:56)
> [GCC 4.8.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from openstack.common import lockutils
> >>> import os
> >>> os.getpid()
> 16128
> >>> a = lockutils.InterProcessLock("testing")
> >>> a.acquire(timeout=5)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "openstack/common/lockutils.py", line 168, in acquire
> self.semaphore.acquire(timeout)
> posix_ipc.BusyError: Semaphore is busy
> ....
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/oslo/+bug/1327946/+subscriptions
>

--
--
facebook.com/jshharlow <http://www.facebook.com/jshharlow>
flickr.com/jshharlow
YIM: jshharlow