test_prepare_holds_startup_lock() fails spuriously

Bug #1308069 reported by Graham Binns
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Gavin Panella
1.5
Undecided
Gavin Panella
Trunk
High
Gavin Panella
maas (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned

Bug Description

[Test Case]
No test case; the code that's being patched is only a test and does not actually appear in the package.

[Description of the problem]

This happened when trying to land a documentation-only branch:

======================================================================
FAIL: maasserver.rpc.tests.test_regionservice.TestRegionAdvertisingService.test_prepare_holds_startup_lock
----------------------------------------------------------------------
_StringException: Traceback (most recent call last):
  File "/tmp/tarmac/branch.wNIVV5/src/maasserver/rpc/tests/test_regionservice.py", line 667, in test_prepare_holds_startup_lock
    self.assertFalse(locks.eventloop.is_locked())
  File "/usr/lib/python2.7/unittest/case.py", line 418, in assertFalse
    raise self.failureException(msg)
AssertionError: True is not false

I've disabled the test and added an XXX.

Related branches

Gavin Panella (allenap)
Changed in maas:
assignee: nobody → Gavin Panella (allenap)
status: Triaged → In Progress
Revision history for this message
Gavin Panella (allenap) wrote :

I have run increasingly large subsets of the maasserver test suite several hundred times and I can't reproduce this. I suspect there was an external factor, like a left-over process, interfering with the test run. I'm going to reenable this test and and hope it either never reappears, or someone is able to reproduce it reliably.

Revision history for this message
Graham Binns (gmb) wrote : Re: [Bug 1308069] Re: test_prepare_holds_startup_lock() fails spuriously

Please make sure it's re-enabled in 1.5 as well; Blake said on my
trunk MP that he'd seen it there and was backporting my @skip.

Revision history for this message
Gavin Panella (allenap) wrote :

Okay, will do. If you see it happen again, freeze and record what's happening. `lsof +D .` is useful to figure out what else might be interfering with the test suite. Advisory locks are taken out per cluster, not per database, so a rogue process could be the culprit.

description: updated
Chris J Arges (arges)
Changed in maas (Ubuntu):
status: New → Fix Released
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

It happened again. I suspect that it may be a matter of ordering of decorators: RegionAdvertisingService.prepare is decorated as @synchronous, and *then* as taking two locks.

Given decorators' "wrapping" behaviour, which reverses the order of entrance, I understand that to mean: "grab these two locks, then run this function synchronously." Which would mean that if the lock weren't available at the time of the call, or if either attempt to grab a lock yielded to the reactor for whatever reason, the function could still run asynchronously from the caller's perspective.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

No, my hypothesis doesn't look correct. We don't see anything that would make prepare() jump into the reactor thread.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

New hypothesis: the code in DatabaseLock opens, and closes, a cursor for each locking/unlocking command. Do we actually know that these cursors will be in the same database session? If the command failed, do we know that we would see an error?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers