intermittent unit failure: ExpireSuite.TestExtend_ExpiryInFuture_TimePasses

Bug #1748381 reported by Anastasia on 2018-02-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
Medium
Unassigned
2.3
Medium
Unassigned
Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
Joseph Phillips (manadart) wrote :

Looks like a copy/paste boo-boo.

I think this fix was the intent:
https://github.com/juju/juju/pull/8360

Changed in juju:
status: Triaged → In Progress
Joseph Phillips (manadart) wrote :

My prior comment wasn't quite right.

The expiry tests are advancing the clock with a same duration as the new lease expiry. I have changed the tests so that these advance the clock sightly *longer* than the lease duration.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
milestone: none → 2.4-beta1
John A Meinel (jameinel) wrote :

----------------------------------------------------------------------
FAIL: manager_expire_test.go:248: ExpireSuite.TestExtend_ExpiryInFuture_TimePasses

manager_expire_test.go:276:
    fix.RunTest(c, func(manager *lease.Manager, clock *testing.Clock) {
        // Ask for a minute, actually get 63s. Expire on time.
        err := manager.Claim("redis", "redis/0", time.Minute)
        c.Assert(err, jc.ErrorIsNil)
        clock.Advance(justAfterSeconds(newLeaseSecs))
    })
util_test.go:81:
    c.Fatalf("Client test took way too long")
... Error: Client test took way too long

OOPS: 47 passed, 1 FAILED

Changed in juju:
status: Fix Committed → Triaged

Also happened under CI conditions:
http://10.125.0.203:8080/job/RunUnittests-race-amd64/218/testReport/junit/github/com_juju_juju_worker_lease/TestPackage/

On Sun, Mar 25, 2018 at 10:52 PM, John A Meinel <email address hidden>
wrote:

> Happened again
>
> http://ci.jujucharms.com/job/github-merge-
> juju/255/testReport/junit/github/com_juju_juju_worker_lease/TestPackage/
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1748381
>
> Title:
> intermittent unit failure:
> ExpireSuite.TestExtend_ExpiryInFuture_TimePasses
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1748381/+subscriptions
>

Changed in juju:
status: Triaged → In Progress
Joseph Phillips (manadart) wrote :

Based on work-to-date, this *looks* like clock behaviour.

In heisenbug style, beyond a certain level of logging detail, I can't get this to fail.

These are the last test run logs I could capture contrasting passing with failing:

----------

[LOG] 0:00.000 CRITICAL juju.worker.lease Next in 1s
[LOG] 0:00.000 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.000 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.000 CRITICAL juju.testing *** NOW WAITING 1
[LOG] 0:00.000 CRITICAL juju.worker.lease <-manager.claims
[LOG] 0:00.000 CRITICAL juju.worker.lease [] redis/0 extending lease redis for 1m0s
*** CALL ExtendLease
[LOG] 0:00.000 CRITICAL juju.testing *** ADVANCE
[LOG] 0:00.001 CRITICAL juju.worker.lease Next in 1m3s
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 0
*** Advanced 1m3.000000001s

[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 0
[LOG] 0:00.001 CRITICAL juju.worker.lease <-manager.nextTick()
[LOG] 0:00.001 CRITICAL juju.worker.lease [] waking up to refresh and expire leases
*** CALL Refresh
[LOG] 0:00.001 CRITICAL juju.worker.lease [] checking expiry on 1 leases
*** CALL ExpireLease
[LOG] 0:00.001 CRITICAL juju.worker.lease [] expired 1 leases: redis
[LOG] 0:00.001 CRITICAL juju.worker.lease Next in 1h0m0s
[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 1
[LOG] 0:00.001 CRITICAL juju.worker.lease manager.catacomb.Dying()

----------

[LOG] 0:00.000 CRITICAL juju.worker.lease Next in 1s
[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 1
[LOG] 0:00.001 CRITICAL juju.worker.lease <-manager.claims
[LOG] 0:00.001 CRITICAL juju.worker.lease [] redis/0 extending lease redis for 1m0s
*** CALL ExtendLease
[LOG] 0:00.001 CRITICAL juju.worker.lease Next in 1m3s
[LOG] 0:00.001 CRITICAL juju.testing *** ADVANCE
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 0
*** Advanced 1m3.000000001s

[LOG] 0:00.001 CRITICAL juju.testing *** ADD ALARM
[LOG] 0:00.001 CRITICAL juju.testing *** WERE WAITING 1
[LOG] 0:00.001 CRITICAL juju.testing *** NOW WAITING 1
manager_expire_test.go:276:
    fix.RunTest(c, func(manager *lease.Manager, clock *testing.Clock) {
        // Ask for a minute, actually get 63s. Expire on time.
        err := manager.Claim("redis", "redis/0", time.Minute)
        c.Assert(err, jc.ErrorIsNil)
        clock.Advance(justAfterSeconds(newLeaseSecs))
        c.Logf("*** Advanced")
    })
util_test.go:81:
    c.Fatalf("Client test took way too long")
... Error: Client test took way too long

[LOG] 0:10.020 CRITICAL juju.worker.lease manager.catacomb.Dying()

----------

I will tackle some other tasks and see if I can produce a failure with running it in a loop overnight.

Changed in juju:
assignee: Joseph Phillips (manadart) → nobody
status: In Progress → Triaged
Changed in juju:
milestone: 2.4-beta1 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers