intermittent unit failure: ExpireSuite.TestExtend_ExpiryInFuture_TimePasses

Bug #1748381 reported by Anastasia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Unassigned
2.3
Won't Fix
Medium
Unassigned
Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Joseph Phillips (manadart) wrote :

Looks like a copy/paste boo-boo.

I think this fix was the intent:
https://github.com/juju/juju/pull/8360

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

My prior comment wasn't quite right.

The expiry tests are advancing the clock with a same duration as the new lease expiry. I have changed the tests so that these advance the clock sightly *longer* than the lease duration.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
milestone: none → 2.4-beta1
Revision history for this message
John A Meinel (jameinel) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

----------------------------------------------------------------------
FAIL: manager_expire_test.go:248: ExpireSuite.TestExtend_ExpiryInFuture_TimePasses

manager_expire_test.go:276:
    fix.RunTest(c, func(manager *lease.Manager, clock *testing.Clock) {
        // Ask for a minute, actually get 63s. Expire on time.
        err := manager.Claim("redis", "redis/0", time.Minute)
        c.Assert(err, jc.ErrorIsNil)
        clock.Advance(justAfterSeconds(newLeaseSecs))
    })
util_test.go:81:
    c.Fatalf("Client test took way too long")
... Error: Client test took way too long

OOPS: 47 passed, 1 FAILED

Changed in juju:
status: Fix Committed → Triaged
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1748381] Re: intermittent unit failure: ExpireSuite.TestExtend_ExpiryInFuture_TimePasses

Also happened under CI conditions:
http://10.125.0.203:8080/job/RunUnittests-race-amd64/218/testReport/junit/github/com_juju_juju_worker_lease/TestPackage/

On Sun, Mar 25, 2018 at 10:52 PM, John A Meinel <email address hidden>
wrote:

> Happened again
>
> http://ci.jujucharms.com/job/github-merge-
> juju/255/testReport/junit/github/com_juju_juju_worker_lease/TestPackage/
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1748381
>
> Title:
> intermittent unit failure:
> ExpireSuite.TestExtend_ExpiryInFuture_TimePasses
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1748381/+subscriptions
>

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

Based on work-to-date, this *looks* like clock behaviour.

In heisenbug style, beyond a certain level of logging detail, I can't get this to fail.

These are the last test run logs I could capture contrasting passing with failing:

----------

[LOG] 0:00.000 CRITICAL juju.worker.lease Next in 1s
[LOG] 0:00.000 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.000 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.000 CRITICAL juju.testing *** NOW WAITING 1
[LOG] 0:00.000 CRITICAL juju.worker.lease <-manager.claims
[LOG] 0:00.000 CRITICAL juju.worker.lease [] redis/0 extending lease redis for 1m0s
*** CALL ExtendLease
[LOG] 0:00.000 CRITICAL juju.testing *** ADVANCE
[LOG] 0:00.001 CRITICAL juju.worker.lease Next in 1m3s
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 0
*** Advanced 1m3.000000001s

[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 0
[LOG] 0:00.001 CRITICAL juju.worker.lease <-manager.nextTick()
[LOG] 0:00.001 CRITICAL juju.worker.lease [] waking up to refresh and expire leases
*** CALL Refresh
[LOG] 0:00.001 CRITICAL juju.worker.lease [] checking expiry on 1 leases
*** CALL ExpireLease
[LOG] 0:00.001 CRITICAL juju.worker.lease [] expired 1 leases: redis
[LOG] 0:00.001 CRITICAL juju.worker.lease Next in 1h0m0s
[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 1
[LOG] 0:00.001 CRITICAL juju.worker.lease manager.catacomb.Dying()

----------

[LOG] 0:00.000 CRITICAL juju.worker.lease Next in 1s
[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 1
[LOG] 0:00.001 CRITICAL juju.worker.lease <-manager.claims
[LOG] 0:00.001 CRITICAL juju.worker.lease [] redis/0 extending lease redis for 1m0s
*** CALL ExtendLease
[LOG] 0:00.001 CRITICAL juju.worker.lease Next in 1m3s
[LOG] 0:00.001 CRITICAL juju.testing *** ADVANCE
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 0
*** Advanced 1m3.000000001s

[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 1
manager_expire_test.go:276:
    fix.RunTest(c, func(manager *lease.Manager, clock *testing.Clock) {
        // Ask for a minute, actually get 63s. Expire on time.
        err := manager.Claim("redis", "redis/0", time.Minute)
        c.Assert(err, jc.ErrorIsNil)
        clock.Advance(justAfterSeconds(newLeaseSecs))
        c.Logf("*** Advanced")
    })
util_test.go:81:
    c.Fatalf("Client test took way too long")
... Error: Client test took way too long

[LOG] 0:10.020 CRITICAL juju.worker.lease manager.catacomb.Dying()

----------

I will tackle some other tasks and see if I can produce a failure with running it in a loop overnight.

Changed in juju:
assignee: Joseph Phillips (manadart) → nobody
status: In Progress → Triaged
Changed in juju:
milestone: 2.4-beta1 → none
Revision history for this message
Anastasia (anastasia-macmood) wrote :

The issue seems to be addressed by other work on testing clocks maybe?.. I can no longer reproduce the failure locally under stress.

Changed in juju:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.