test_prepare_holds_startup_lock() fails spuriously

Bug #1308069 reported by Graham Binns on 2014-04-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Gavin Panella
1.5
Undecided
Gavin Panella
Trunk
High
Gavin Panella
maas (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned

Bug Description

[Test Case]
No test case; the code that's being patched is only a test and does not actually appear in the package.

[Description of the problem]

This happened when trying to land a documentation-only branch:

======================================================================
FAIL: maasserver.rpc.tests.test_regionservice.TestRegionAdvertisingService.test_prepare_holds_startup_lock
----------------------------------------------------------------------
_StringException: Traceback (most recent call last):
  File "/tmp/tarmac/branch.wNIVV5/src/maasserver/rpc/tests/test_regionservice.py", line 667, in test_prepare_holds_startup_lock
    self.assertFalse(locks.eventloop.is_locked())
  File "/usr/lib/python2.7/unittest/case.py", line 418, in assertFalse
    raise self.failureException(msg)
AssertionError: True is not false

I've disabled the test and added an XXX.

Related branches

Gavin Panella (allenap) on 2014-04-15
Changed in maas:
assignee: nobody → Gavin Panella (allenap)
status: Triaged → In Progress
Gavin Panella (allenap) wrote :

I have run increasingly large subsets of the maasserver test suite several hundred times and I can't reproduce this. I suspect there was an external factor, like a left-over process, interfering with the test run. I'm going to reenable this test and and hope it either never reappears, or someone is able to reproduce it reliably.

Please make sure it's re-enabled in 1.5 as well; Blake said on my
trunk MP that he'd seen it there and was backporting my @skip.

Gavin Panella (allenap) wrote :

Okay, will do. If you see it happen again, freeze and record what's happening. `lsof +D .` is useful to figure out what else might be interfering with the test suite. Advisory locks are taken out per cluster, not per database, so a rogue process could be the culprit.

description: updated
Chris J Arges (arges) on 2014-05-09
Changed in maas (Ubuntu):
status: New → Fix Released
Jeroen T. Vermeulen (jtv) wrote :

It happened again. I suspect that it may be a matter of ordering of decorators: RegionAdvertisingService.prepare is decorated as @synchronous, and *then* as taking two locks.

Given decorators' "wrapping" behaviour, which reverses the order of entrance, I understand that to mean: "grab these two locks, then run this function synchronously." Which would mean that if the lock weren't available at the time of the call, or if either attempt to grab a lock yielded to the reactor for whatever reason, the function could still run asynchronously from the caller's perspective.

Jeroen T. Vermeulen (jtv) wrote :

No, my hypothesis doesn't look correct. We don't see anything that would make prepare() jump into the reactor thread.

Jeroen T. Vermeulen (jtv) wrote :

New hypothesis: the code in DatabaseLock opens, and closes, a cursor for each locking/unlocking command. Do we actually know that these cursors will be in the same database session? If the command failed, do we know that we would see an error?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers