Failure to grab the secret.lock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Blake Rouse |
Bug Description
This appeared in the clusterd.log. Doesn't look like it affected operation, I believe it tried again and worked. We need to either handler this better or just not show a stack trace if it does end up working.
2015-09-16 21:34:47+0800 [ClusterClient,
Traceback (most recent call last):
File "/usr/lib/
d = super(RPCProtocol, self).dispatchC
File "/usr/lib/
return maybeDeferred(
File "/usr/lib/
result = f(*args, **kw)
File "/usr/lib/
return maybeDeferred(
--- <exception caught here> ---
File "/usr/lib/
result = f(*args, **kw)
File "/usr/lib/
secret = get_shared_
File "/usr/lib/
with FileLock(
File "/usr/lib/
raise self.NotAvailab
provisioningse
Related branches
- Andres Rodriguez (community): Approve
- Gavin Panella (community): Approve
-
Diff: 21 lines (+2/-2)1 file modifiedsrc/provisioningserver/security.py (+2/-2)
Changed in maas: | |
status: | Triaged → In Progress |
assignee: | nobody → Blake Rouse (blake-rouse) |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
The problematic code:
def get_shared_ secret_ from_filesystem (): secret_ filesystem_ path()
ensure_ dir(dirname( secret_ path)) secret_ path):
...
secret_path = get_shared_
--> with FileLock(
...
This could be changed to:
with FileLock( secret_ path).wait( 10):
so that it'll try for up to 10 seconds. At present it will fail
immediately if the lock is already taken.