Failure to grab the secret.lock

Bug #1496401 reported by Blake Rouse
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Blake Rouse

Bug Description

This appeared in the clusterd.log. Doesn't look like it affected operation, I believe it tried again and worked. We need to either handler this better or just not show a stack trace if it does end up working.

2015-09-16 21:34:47+0800 [ClusterClient,client] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the Authenticate command. [trusty-maas9:pid=17233:cmd=Authenticate:ask=2]
 Traceback (most recent call last):
   File "/usr/lib/python2.7/dist-packages/provisioningserver/rpc/common.py", line 217, in dispatchCommand
     d = super(RPCProtocol, self).dispatchCommand(box)
   File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 946, in dispatchCommand
     return maybeDeferred(responder, box)
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 139, in maybeDeferred
     result = f(*args, **kw)
   File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 1035, in doit
     return maybeDeferred(aCallable, **kw).addCallback(
 --- <exception caught here> ---
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 139, in maybeDeferred
     result = f(*args, **kw)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/rpc/clusterservice.py", line 137, in authenticate
     secret = get_shared_secret_from_filesystem()
   File "/usr/lib/python2.7/dist-packages/provisioningserver/security.py", line 73, in get_shared_secret_from_filesystem
     with FileLock(secret_path):
   File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/fs.py", line 418, in __enter__
     raise self.NotAvailable(self.fslock.name)
 provisioningserver.utils.fs.NotAvailable: /var/lib/maas/secret.lock

Related branches

Revision history for this message
Gavin Panella (allenap) wrote :

The problematic code:

    def get_shared_secret_from_filesystem():
        ...
        secret_path = get_shared_secret_filesystem_path()
        ensure_dir(dirname(secret_path))
--> with FileLock(secret_path):
            ...

This could be changed to:

        with FileLock(secret_path).wait(10):

so that it'll try for up to 10 seconds. At present it will fail
immediately if the lock is already taken.

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.