intermittent failure github.com/juju/juju/cmd/juju/action ShowOutputSuite.TestRun

Bug #1759461 reported by John A Meinel on 2018-03-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
Medium
John A Meinel

Bug Description

http://ci.jujucharms.com/job/github-merge-juju/272/testReport/junit/github/com_juju_juju_cmd_juju_action/TestPackage/

----------------------------------------------------------------------
FAIL: showoutput_test.go:59: ShowOutputSuite.TestRun

test 0 (model flag -m): should handle wait-time formatting errors
test 0 (model flag --model): should handle wait-time formatting errors
test 1 (model flag -m): should timeout if result never comes
test 1 (model flag --model): should timeout if result never comes
test 2 (model flag -m): should pass api error through properly
test 2 (model flag --model): should pass api error through properly
test 3 (model flag -m): should fail with no tag matches
test 3 (model flag --model): should fail with no tag matches
test 4 (model flag -m): should fail with no results
test 4 (model flag --model): should fail with no results
test 5 (model flag -m): should error correctly with multiple results
test 5 (model flag --model): should error correctly with multiple results
test 6 (model flag -m): should pass through an error from the API server
test 6 (model flag --model): should pass through an error from the API server
test 7 (model flag -m): should only return once status is no longer running or pending
test 7 (model flag --model): should only return once status is no longer running or pending
showoutput_test.go:274:
    testRunHelper(
        c, s,
        makeFakeClient(
            t.withAPIDelay,
            t.withAPITimeout,
            t.withTags,
            t.withAPIResponse,
            params.ActionsByNames{},
            t.withAPIError),
        t.expectedErr,
        t.expectedOutput,
        t.withClientWait,
        t.withClientQueryID,
        modelFlag,
    )
showoutput_test.go:303:
    c.Check(err, gc.ErrorMatches, expectedErr)
... value = nil
... regex string = "test timed out before wait time"
... Error value is nil

test 8 (model flag -m): should pretty-print action output
test 8 (model flag --model): should pretty-print action output
test 9 (model flag -m): should pretty-print action output with no completed time
test 9 (model flag --model): should pretty-print action output with no completed time
test 10 (model flag -m): should pretty-print action output with no enqueued time
test 10 (model flag --model): should pretty-print action output with no enqueued time
test 11 (model flag -m): should pretty-print action output with no started time
test 11 (model flag --model): should pretty-print action output with no started time
test 12 (model flag -m): should set an appropriate timer and wait, get a result
test 12 (model flag --model): should set an appropriate timer and wait, get a result
OOPS: 8 passed, 1 FAILED

Download full text (3.5 KiB)

This happened twice in a row:
http://ci.jujucharms.com/job/github-merge-juju/273/testReport/github/com_juju_juju_cmd_juju_action/TestPackage/

it seems it should be looked at soon.

On Wed, Mar 28, 2018 at 7:52 AM, John A Meinel <email address hidden>
wrote:

> Public bug reported:
>
> http://ci.jujucharms.com/job/github-merge-
> juju/272/testReport/junit/github/com_juju_juju_cmd_juju_
> action/TestPackage/
>
> ----------------------------------------------------------------------
> FAIL: showoutput_test.go:59: ShowOutputSuite.TestRun
>
> test 0 (model flag -m): should handle wait-time formatting errors
> test 0 (model flag --model): should handle wait-time formatting errors
> test 1 (model flag -m): should timeout if result never comes
> test 1 (model flag --model): should timeout if result never comes
> test 2 (model flag -m): should pass api error through properly
> test 2 (model flag --model): should pass api error through properly
> test 3 (model flag -m): should fail with no tag matches
> test 3 (model flag --model): should fail with no tag matches
> test 4 (model flag -m): should fail with no results
> test 4 (model flag --model): should fail with no results
> test 5 (model flag -m): should error correctly with multiple results
> test 5 (model flag --model): should error correctly with multiple results
> test 6 (model flag -m): should pass through an error from the API server
> test 6 (model flag --model): should pass through an error from the API
> server
> test 7 (model flag -m): should only return once status is no longer
> running or pending
> test 7 (model flag --model): should only return once status is no longer
> running or pending
> showoutput_test.go:274:
> testRunHelper(
> c, s,
> makeFakeClient(
> t.withAPIDelay,
> t.withAPITimeout,
> t.withTags,
> t.withAPIResponse,
> params.ActionsByNames{},
> t.withAPIError),
> t.expectedErr,
> t.expectedOutput,
> t.withClientWait,
> t.withClientQueryID,
> modelFlag,
> )
> showoutput_test.go:303:
> c.Check(err, gc.ErrorMatches, expectedErr)
> ... value = nil
> ... regex string = "test timed out before wait time"
> ... Error value is nil
>
> test 8 (model flag -m): should pretty-print action output
> test 8 (model flag --model): should pretty-print action output
> test 9 (model flag -m): should pretty-print action output with no
> completed time
> test 9 (model flag --model): should pretty-print action output with no
> completed time
> test 10 (model flag -m): should pretty-print action output with no
> enqueued time
> test 10 (model flag --model): should pretty-print action output with no
> enqueued time
> test 11 (model flag -m): should pretty-print action output with no started
> time
> test 11 (model flag --model): should pretty-print action output with no
> started time
> test 12 (model flag -m): should set an appropriate timer and wait, get a
> result
> test 12 (model flag --model): should set an appropriate timer and wait,
> get a result
> OOPS: 8 passed, 1 FAILED
>
> ** Affects: juju
> Importance: Medium
> Status: Triaged
>
>
...

Read more...

John A Meinel (jameinel) wrote :
Download full text (4.2 KiB)

The test seems a bit suspicious. It has an "withAPIDelay" of 2 seconds,
"withAPITimeout" of 4 seconds, and "ClientWait" of "6s" which I presume is
also 6 seconds. If API Delay is before we finish connecting, and API
Timeout is before the API returns, and ClientWait is exact time before the
client returns, how do we know for sure whether 6s comes before or after
4+2s ?

And why are we spending 18s mostly asleep in this test. Do we really need
real-world sleep times? Couldn't we do 200ms/400ms/600ms just as easily?

On Wed, Mar 28, 2018 at 10:23 AM, John Meinel <email address hidden>
wrote:

> This happened twice in a row:
> http://ci.jujucharms.com/job/github-merge-juju/273/
> testReport/github/com_juju_juju_cmd_juju_action/TestPackage/
>
> it seems it should be looked at soon.
>
>
> On Wed, Mar 28, 2018 at 7:52 AM, John A Meinel <email address hidden>
> wrote:
>
>> Public bug reported:
>>
>> http://ci.jujucharms.com/job/github-merge-
>> juju/272/testReport/junit/github/com_juju_juju_cmd_juju_acti
>> on/TestPackage/
>>
>> ----------------------------------------------------------------------
>> FAIL: showoutput_test.go:59: ShowOutputSuite.TestRun
>>
>> test 0 (model flag -m): should handle wait-time formatting errors
>> test 0 (model flag --model): should handle wait-time formatting errors
>> test 1 (model flag -m): should timeout if result never comes
>> test 1 (model flag --model): should timeout if result never comes
>> test 2 (model flag -m): should pass api error through properly
>> test 2 (model flag --model): should pass api error through properly
>> test 3 (model flag -m): should fail with no tag matches
>> test 3 (model flag --model): should fail with no tag matches
>> test 4 (model flag -m): should fail with no results
>> test 4 (model flag --model): should fail with no results
>> test 5 (model flag -m): should error correctly with multiple results
>> test 5 (model flag --model): should error correctly with multiple results
>> test 6 (model flag -m): should pass through an error from the API server
>> test 6 (model flag --model): should pass through an error from the API
>> server
>> test 7 (model flag -m): should only return once status is no longer
>> running or pending
>> test 7 (model flag --model): should only return once status is no longer
>> running or pending
>> showoutput_test.go:274:
>> testRunHelper(
>> c, s,
>> makeFakeClient(
>> t.withAPIDelay,
>> t.withAPITimeout,
>> t.withTags,
>> t.withAPIResponse,
>> params.ActionsByNames{},
>> t.withAPIError),
>> t.expectedErr,
>> t.expectedOutput,
>> t.withClientWait,
>> t.withClientQueryID,
>> modelFlag,
>> )
>> showoutput_test.go:303:
>> c.Check(err, gc.ErrorMatches, expectedErr)
>> ... value = nil
>> ... regex string = "test timed out before wait time"
>> ... Error value is nil
>>
>> test 8 (model flag -m): should pretty-print action output
>> test 8 (model flag --model): should pretty-print action output
>> test 9 (model flag -m): should pretty-print action output with no
>> completed time
>> test 9 (model flag --model): should pretty...

Read more...

John A Meinel (jameinel) wrote :
Download full text (7.5 KiB)

One thing I don't understand, with a patch like:

        }, {
                should: "only return once status is no longer
running or pending",
                withAPIDelay: 2 * time.Second,
- withClientWait: "6s",
+ withClientWait: "7s",
                withClientQueryID: validActionId,
- withAPITimeout: 4 * time.Second,
+ withAPITimeout: 4100 * time.Millisecond,
                withTags: tagsForIdPrefix(validActionId,
validActionTagString),

I see the test take 4s longer than if I do:
        }, {
                should: "only return once status is no longer
running or pending",
                withAPIDelay: 2 * time.Second,
                withClientWait: "6s",
                withClientQueryID: validActionId,
- withAPITimeout: 4 * time.Second,
+ withAPITimeout: 3900 * time.Millisecond,
                withTags: tagsForIdPrefix(validActionId,
validActionTagString),
                withAPIResponse: []params.ActionResult{{

I wonder if there is something causing us to cycle for >1 APITimeout
sleeps. (like if we *just* miss the timeout in the 2s tick, we end up
waiting an entire new cycle.)

I really don't understand why there is a full 4s difference, though.

On Wed, Mar 28, 2018 at 10:59 AM, John Meinel <email address hidden>
wrote:

> It seems to fail reliably with this change:
> diff --git a/cmd/juju/action/showoutput_test.go
> b/cmd/juju/action/showoutput_test.go
> index 3d073c4..00ee500 100644
> --- a/cmd/juju/action/showoutput_test.go
> +++ b/cmd/juju/action/showoutput_test.go
> @@ -127,7 +127,7 @@ timing:
> withAPIDelay: 2 * time.Second,
> withClientWait: "6s",
> withClientQueryID: validActionId,
> - withAPITimeout: 4 * time.Second,
> + withAPITimeout: 4010 * time.Millisecond,
> withTags: tagsForIdPrefix(validActionId,
> validActionTagString),
> withAPIResponse: []params.ActionResult{{
> Status: "running",
>
> So just adding 10ms is enough to cause the test to fail.
>
> Digging through the code, we seem to have some hard-coded very long times:
>
>
> func GetActionResult(api APIClient, requestedId string, wait *time.Timer)
> (params.ActionResult, error) {
>
> // tick every two seconds, to delay the loop timer.
> // TODO(fwereade): 2016-03-17 lp:1558657
> tick := time.NewTimer(2 * time.Second)
>
> return timerLoop(api, requestedId, wait, tick)
> }
> ...
> func timerLoop(api APIClient, requestedId string, wait, tick *time.Timer)
> (params.ActionResult, error) {
> ...
> case _ = <-tick.C:
> tick.Reset(2 * time.Second)
> }
> }
> }
>
> so a lot of the code is hard-coded to step on a 2-second real-world cycle.
>
> At 18s real-world time to run the test, it seems pretty poor. (wastes a
> lot of pre-commit check time, just sitting idle, etc.)
>
>
> On Wed, Mar 28, 2018 at 10:29 AM, John Meinel <email address hidden>
> wrote:
>
>> The test seems a bit suspicious. It has an "withAPIDela...

Read more...

John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/8547

though I'll admit I didn't fully seek to understand all the issues and/or rewrite the tests to be truly more reliable. I just tweaked a timeout to give a little bit more room for error (as I see it).

Changed in juju:
status: Triaged → In Progress
John A Meinel (jameinel) wrote :

looks like this was already modified in develop where it was changed to be strictly a 3s timeout instead of 4s.

Changed in juju:
assignee: nobody → John A Meinel (jameinel)
milestone: none → 2.3.6
status: In Progress → Fix Committed

Note it seems to fail in different ways now:

----------------------------------------------------------------------
FAIL: showoutput_test.go:59: ShowOutputSuite.TestRun

test 0 (model flag -m): should handle wait-time formatting errors
test 0 (model flag --model): should handle wait-time formatting errors
test 1 (model flag -m): should timeout if result never comes
test 1 (model flag --model): should timeout if result never comes
test 2 (model flag -m): should pass api error through properly
test 2 (model flag --model): should pass api error through properly
test 3 (model flag -m): should fail with no tag matches
test 3 (model flag --model): should fail with no tag matches
test 4 (model flag -m): should fail with no results
test 4 (model flag --model): should fail with no results
test 5 (model flag -m): should error correctly with multiple results
test 5 (model flag --model): should error correctly with multiple results
showoutput_test.go:275:
    testRunHelper(
        c, s,
        makeFakeClient(
            t.withAPIDelay,
            t.withAPITimeout,
            t.withTags,
            t.withAPIResponse,
            params.ActionsByNames{},
            t.withAPIError),
        t.expectedErr,
        t.expectedOutput,
        t.withClientWait,
        t.withClientQueryID,
        modelFlag,
    )
showoutput_test.go:304:
    c.Check(err, gc.ErrorMatches, expectedErr)
... value = nil
... regex string = "too many results for action
f47ac10b-58cc-4372-a567-0e02b2c3d479"
... Error value is nil

On Wed, Mar 28, 2018 at 2:30 PM, John A Meinel <email address hidden>
wrote:

> looks like this was already modified in develop where it was changed to
> be strictly a 3s timeout instead of 4s.
>
>
> ** Changed in: juju
> Status: In Progress => Fix Committed
>
> ** Changed in: juju
> Milestone: None => 2.3.6
>
> ** Changed in: juju
> Assignee: (unassigned) => John A Meinel (jameinel)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1759461
>
> Title:
> intermittent failure github.com/juju/juju/cmd/juju/action
> ShowOutputSuite.TestRun
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1759461/+subscriptions
>

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers