Failure to grab the secret.lock

Bug #1496401 reported by Blake Rouse on 2015-09-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Blake Rouse

Bug Description

This appeared in the clusterd.log. Doesn't look like it affected operation, I believe it tried again and worked. We need to either handler this better or just not show a stack trace if it does end up working.

2015-09-16 21:34:47+0800 [ClusterClient,client] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the Authenticate command. [trusty-maas9:pid=17233:cmd=Authenticate:ask=2]
 Traceback (most recent call last):
   File "/usr/lib/python2.7/dist-packages/provisioningserver/rpc/common.py", line 217, in dispatchCommand
     d = super(RPCProtocol, self).dispatchCommand(box)
   File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 946, in dispatchCommand
     return maybeDeferred(responder, box)
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 139, in maybeDeferred
     result = f(*args, **kw)
   File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 1035, in doit
     return maybeDeferred(aCallable, **kw).addCallback(
 --- <exception caught here> ---
   File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 139, in maybeDeferred
     result = f(*args, **kw)
   File "/usr/lib/python2.7/dist-packages/provisioningserver/rpc/clusterservice.py", line 137, in authenticate
     secret = get_shared_secret_from_filesystem()
   File "/usr/lib/python2.7/dist-packages/provisioningserver/security.py", line 73, in get_shared_secret_from_filesystem
     with FileLock(secret_path):
   File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/fs.py", line 418, in __enter__
     raise self.NotAvailable(self.fslock.name)
 provisioningserver.utils.fs.NotAvailable: /var/lib/maas/secret.lock

Related branches

Gavin Panella (allenap) wrote :

The problematic code:

    def get_shared_secret_from_filesystem():
        ...
        secret_path = get_shared_secret_filesystem_path()
        ensure_dir(dirname(secret_path))
--> with FileLock(secret_path):
            ...

This could be changed to:

        with FileLock(secret_path).wait(10):

so that it'll try for up to 10 seconds. At present it will fail
immediately if the lock is already taken.

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers