Canonical Juju

Bug #1759461
Comment #3

Comment 3 for bug 1759461

Revision history for this message

John A Meinel (jameinel) wrote on 2018-03-28: Re: [Bug 1759461] [NEW] intermittent failure github.com/juju/juju/cmd/juju/action ShowOutputSuite.TestRun

One thing I don't understand, with a patch like:

        }, {
                should: "only return once status is no longer
running or pending",
                withAPIDelay: 2 * time.Second,
- withClientWait: "6s",
+ withClientWait: "7s",
                withClientQueryID: validActionId,
- withAPITimeout: 4 * time.Second,
+ withAPITimeout: 4100 * time.Millisecond,
                withTags: tagsForIdPrefix(validActionId,
validActionTagString),

I see the test take 4s longer than if I do:
        }, {
                should: "only return once status is no longer
running or pending",
                withAPIDelay: 2 * time.Second,
                withClientWait: "6s",
                withClientQueryID: validActionId,
- withAPITimeout: 4 * time.Second,
+ withAPITimeout: 3900 * time.Millisecond,
                withTags: tagsForIdPrefix(validActionId,
validActionTagString),
                withAPIResponse: []params.ActionResult{{

I wonder if there is something causing us to cycle for >1 APITimeout
sleeps. (like if we *just* miss the timeout in the 2s tick, we end up
waiting an entire new cycle.)

I really don't understand why there is a full 4s difference, though.

On Wed, Mar 28, 2018 at 10:59 AM, John Meinel <email address hidden>
wrote:

> It seems to fail reliably with this change:
> diff --git a/cmd/juju/action/showoutput_test.go
> b/cmd/juju/action/showoutput_test.go
> index 3d073c4..00ee500 100644
> --- a/cmd/juju/action/showoutput_test.go
> +++ b/cmd/juju/action/showoutput_test.go
> @@ -127,7 +127,7 @@ timing:
> withAPIDelay: 2 * time.Second,
> withClientWait: "6s",
> withClientQueryID: validActionId,
> - withAPITimeout: 4 * time.Second,
> + withAPITimeout: 4010 * time.Millisecond,
> withTags: tagsForIdPrefix(validActionId,
> validActionTagString),
> withAPIResponse: []params.ActionResult{{
> Status: "running",
>
> So just adding 10ms is enough to cause the test to fail.
>
> Digging through the code, we seem to have some hard-coded very long times:
>
>
> func GetActionResult(api APIClient, requestedId string, wait *time.Timer)
> (params.ActionResult, error) {
>
> // tick every two seconds, to delay the loop timer.
> // TODO(fwereade): 2016-03-17 lp:1558657
> tick := time.NewTimer(2 * time.Second)
>
> return timerLoop(api, requestedId, wait, tick)
> }
> ...
> func timerLoop(api APIClient, requestedId string, wait, tick *time.Timer)
> (params.ActionResult, error) {
> ...
> case _ = <-tick.C:
> tick.Reset(2 * time.Second)
> }
> }
> }
>
> so a lot of the code is hard-coded to step on a 2-second real-world cycle.
>
> At 18s real-world time to run the test, it seems pretty poor. (wastes a
> lot of pre-commit check time, just sitting idle, etc.)
>
>
> On Wed, Mar 28, 2018 at 10:29 AM, John Meinel <email address hidden>
> wrote:
>
>> The test seems a bit suspicious. It has an "withAPIDelay" of 2 seconds,
>> "withAPITimeout" of 4 seconds, and "ClientWait" of "6s" which I presume is
>> also 6 seconds. If API Delay is before we finish connecting, and API
>> Timeout is before the API returns, and ClientWait is exact time before the
>> client returns, how do we know for sure whether 6s comes before or after
>> 4+2s ?
>>
>> And why are we spending 18s mostly asleep in this test. Do we really need
>> real-world sleep times? Couldn't we do 200ms/400ms/600ms just as easily?
>>
>>
>> On Wed, Mar 28, 2018 at 10:23 AM, John Meinel <email address hidden>
>> wrote:
>>
>>> This happened twice in a row:
>>> http://ci.jujucharms.com/job/github-merge-juju/273/testRepor
>>> t/github/com_juju_juju_cmd_juju_action/TestPackage/
>>>
>>> it seems it should be looked at soon.
>>>
>>>
>>> On Wed, Mar 28, 2018 at 7:52 AM, John A Meinel <email address hidden>
>>> wrote:
>>>
>>>> Public bug reported:
>>>>
>>>> http://ci.jujucharms.com/job/github-merge-
>>>> juju/272/testReport/junit/github/com_juju_juju_cmd_juju_acti
>>>> on/TestPackage/
>>>>
>>>> ----------------------------------------------------------------------
>>>> FAIL: showoutput_test.go:59: ShowOutputSuite.TestRun
>>>>
>>>> test 0 (model flag -m): should handle wait-time formatting errors
>>>> test 0 (model flag --model): should handle wait-time formatting errors
>>>> test 1 (model flag -m): should timeout if result never comes
>>>> test 1 (model flag --model): should timeout if result never comes
>>>> test 2 (model flag -m): should pass api error through properly
>>>> test 2 (model flag --model): should pass api error through properly
>>>> test 3 (model flag -m): should fail with no tag matches
>>>> test 3 (model flag --model): should fail with no tag matches
>>>> test 4 (model flag -m): should fail with no results
>>>> test 4 (model flag --model): should fail with no results
>>>> test 5 (model flag -m): should error correctly with multiple results
>>>> test 5 (model flag --model): should error correctly with multiple
>>>> results
>>>> test 6 (model flag -m): should pass through an error from the API server
>>>> test 6 (model flag --model): should pass through an error from the API
>>>> server
>>>> test 7 (model flag -m): should only return once status is no longer
>>>> running or pending
>>>> test 7 (model flag --model): should only return once status is no
>>>> longer running or pending
>>>> showoutput_test.go:274:
>>>> testRunHelper(
>>>> c, s,
>>>> makeFakeClient(
>>>> t.withAPIDelay,
>>>> t.withAPITimeout,
>>>> t.withTags,
>>>> t.withAPIResponse,
>>>> params.ActionsByNames{},
>>>> t.withAPIError),
>>>> t.expectedErr,
>>>> t.expectedOutput,
>>>> t.withClientWait,
>>>> t.withClientQueryID,
>>>> modelFlag,
>>>> )
>>>> showoutput_test.go:303:
>>>> c.Check(err, gc.ErrorMatches, expectedErr)
>>>> ... value = nil
>>>> ... regex string = "test timed out before wait time"
>>>> ... Error value is nil
>>>>
>>>> test 8 (model flag -m): should pretty-print action output
>>>> test 8 (model flag --model): should pretty-print action output
>>>> test 9 (model flag -m): should pretty-print action output with no
>>>> completed time
>>>> test 9 (model flag --model): should pretty-print action output with no
>>>> completed time
>>>> test 10 (model flag -m): should pretty-print action output with no
>>>> enqueued time
>>>> test 10 (model flag --model): should pretty-print action output with no
>>>> enqueued time
>>>> test 11 (model flag -m): should pretty-print action output with no
>>>> started time
>>>> test 11 (model flag --model): should pretty-print action output with no
>>>> started time
>>>> test 12 (model flag -m): should set an appropriate timer and wait, get
>>>> a result
>>>> test 12 (model flag --model): should set an appropriate timer and wait,
>>>> get a result
>>>> OOPS: 8 passed, 1 FAILED
>>>>
>>>> ** Affects: juju
>>>> Importance: Medium
>>>> Status: Triaged
>>>>
>>>>
>>>> ** Tags: actions intermittent-failure
>>>>
>>>> --
>>>> You received this bug notification because you are subscribed to the bug
>>>> report.
>>>> https://bugs.launchpad.net/bugs/1759461
>>>>
>>>> Title:
>>>> intermittent failure github.com/juju/juju/cmd/juju/action
>>>> ShowOutputSuite.TestRun
>>>>
>>>> To manage notifications about this bug go to:
>>>> https://bugs.launchpad.net/juju/+bug/1759461/+subscriptions
>>>>
>>>
>>>
>>
>

One thing I don't understand, with a patch like:

}, {
                should:            "only return once status is no longer
running or pending",
                withAPIDelay:      2 * time.Second,
-               withClientWait:    "6s",
+               withClientWait:    "7s",
                withClientQueryID: validActionId,
-               withAPITimeout:    4 * time.Second,
+               withAPITimeout:    4100 * time.Millisecond,
                withTags:          tagsForIdPrefix(validActionId,
validActionTagString),

I see the test take 4s longer than if I do:
        }, {
                should:            "only return once status is no longer
running or pending",
                withAPIDelay:      2 * time.Second,
                withClientWait:    "6s",
                withClientQueryID: validActionId,
-               withAPITimeout:    4 * time.Second,
+               withAPITimeout:    3900 * time.Millisecond,
                withTags:          tagsForIdPrefix(validActionId,
validActionTagString),
                withAPIResponse: []params.ActionResult{{

I wonder if there is something causing us to cycle for >1 APITimeout
sleeps. (like if we *just* miss the timeout in the 2s tick, we end up
waiting an entire new cycle.)

I really don't understand why there is a full 4s difference, though.

On Wed, Mar 28, 2018 at 10:59 AM, John Meinel <john@arbash-meinel.com>
wrote:

> It seems to fail reliably with this change:
> diff --git a/cmd/juju/action/showoutput_test.go
> b/cmd/juju/action/showoutput_test.go
> index 3d073c4..00ee500 100644
> --- a/cmd/juju/action/showoutput_test.go
> +++ b/cmd/juju/action/showoutput_test.go
> @@ -127,7 +127,7 @@ timing:
>                 withAPIDelay:      2 * time.Second,
>                 withClientWait:    "6s",
>                 withClientQueryID: validActionId,
> -               withAPITimeout:    4 * time.Second,
> +               withAPITimeout:    4010 * time.Millisecond,
>                 withTags:          tagsForIdPrefix(validActionId,
> validActionTagString),
>                 withAPIResponse: []params.ActionResult{{
>                         Status: "running",
>
> So just adding 10ms is enough to cause the test to fail.
>
> Digging through the code, we seem to have some hard-coded very long times:
>
>
> func GetActionResult(api APIClient, requestedId string, wait *time.Timer)
> (params.ActionResult, error) {
>
>     // tick every two seconds, to delay the loop timer.
>     // TODO(fwereade): 2016-03-17 lp:1558657
>     tick := time.NewTimer(2 * time.Second)
>
>     return timerLoop(api, requestedId, wait, tick)
> }
> ...
> func timerLoop(api APIClient, requestedId string, wait, tick *time.Timer)
> (params.ActionResult, error) {
>  ...
>         case _ = <-tick.C:
>             tick.Reset(2 * time.Second)
>         }
>     }
> }
>
> so a lot of the code is hard-coded to step on a 2-second real-world cycle.
>
> At 18s real-world time to run the test, it seems pretty poor. (wastes a
> lot of pre-commit check time, just sitting idle, etc.)
>
>
> On Wed, Mar 28, 2018 at 10:29 AM, John Meinel <john@arbash-meinel.com>
> wrote:
>
>> The test seems a bit suspicious. It has an "withAPIDelay" of 2 seconds,
>> "withAPITimeout" of 4 seconds, and "ClientWait" of "6s" which I presume is
>> also 6 seconds. If API Delay is before we finish connecting, and API
>> Timeout is before the API returns, and ClientWait is exact time before the
>> client returns, how do we know for sure whether 6s comes before or after
>> 4+2s ?
>>
>> And why are we spending 18s mostly asleep in this test. Do we really need
>> real-world sleep times? Couldn't we do 200ms/400ms/600ms just as easily?
>>
>>
>> On Wed, Mar 28, 2018 at 10:23 AM, John Meinel <john@arbash-meinel.com>
>> wrote:
>>
>>> This happened twice in a row:
>>> http://ci.jujucharms.com/job/github-merge-juju/273/testRepor
>>> t/github/com_juju_juju_cmd_juju_action/TestPackage/
>>>
>>> it seems it should be looked at soon.
>>>
>>>
>>> On Wed, Mar 28, 2018 at 7:52 AM, John A Meinel <john@arbash-meinel.com>
>>> wrote:
>>>
>>>> Public bug reported:
>>>>
>>>> http://ci.jujucharms.com/job/github-merge-
>>>> juju/272/testReport/junit/github/com_juju_juju_cmd_juju_acti
>>>> on/TestPackage/
>>>>
>>>> ----------------------------------------------------------------------
>>>> FAIL: showoutput_test.go:59: ShowOutputSuite.TestRun
>>>>
>>>> test 0 (model flag -m): should handle wait-time formatting errors
>>>> test 0 (model flag --model): should handle wait-time formatting errors
>>>> test 1 (model flag -m): should timeout if result never comes
>>>> test 1 (model flag --model): should timeout if result never comes
>>>> test 2 (model flag -m): should pass api error through properly
>>>> test 2 (model flag --model): should pass api error through properly
>>>> test 3 (model flag -m): should fail with no tag matches
>>>> test 3 (model flag --model): should fail with no tag matches
>>>> test 4 (model flag -m): should fail with no results
>>>> test 4 (model flag --model): should fail with no results
>>>> test 5 (model flag -m): should error correctly with multiple results
>>>> test 5 (model flag --model): should error correctly with multiple
>>>> results
>>>> test 6 (model flag -m): should pass through an error from the API server
>>>> test 6 (model flag --model): should pass through an error from the API
>>>> server
>>>> test 7 (model flag -m): should only return once status is no longer
>>>> running or pending
>>>> test 7 (model flag --model): should only return once status is no
>>>> longer running or pending
>>>> showoutput_test.go:274:
>>>>     testRunHelper(
>>>>         c, s,
>>>>         makeFakeClient(
>>>>             t.withAPIDelay,
>>>>             t.withAPITimeout,
>>>>             t.withTags,
>>>>             t.withAPIResponse,
>>>>             params.ActionsByNames{},
>>>>             t.withAPIError),
>>>>         t.expectedErr,
>>>>         t.expectedOutput,
>>>>         t.withClientWait,
>>>>         t.withClientQueryID,
>>>>         modelFlag,
>>>>     )
>>>> showoutput_test.go:303:
>>>>     c.Check(err, gc.ErrorMatches, expectedErr)
>>>> ... value = nil
>>>> ... regex string = "test timed out before wait time"
>>>> ... Error value is nil
>>>>
>>>> test 8 (model flag -m): should pretty-print action output
>>>> test 8 (model flag --model): should pretty-print action output
>>>> test 9 (model flag -m): should pretty-print action output with no
>>>> completed time
>>>> test 9 (model flag --model): should pretty-print action output with no
>>>> completed time
>>>> test 10 (model flag -m): should pretty-print action output with no
>>>> enqueued time
>>>> test 10 (model flag --model): should pretty-print action output with no
>>>> enqueued time
>>>> test 11 (model flag -m): should pretty-print action output with no
>>>> started time
>>>> test 11 (model flag --model): should pretty-print action output with no
>>>> started time
>>>> test 12 (model flag -m): should set an appropriate timer and wait, get
>>>> a result
>>>> test 12 (model flag --model): should set an appropriate timer and wait,
>>>> get a result
>>>> OOPS: 8 passed, 1 FAILED
>>>>
>>>> ** Affects: juju
>>>>      Importance: Medium
>>>>          Status: Triaged
>>>>
>>>>
>>>> ** Tags: actions intermittent-failure
>>>>
>>>> --
>>>> You received this bug notification because you are subscribed to the bug
>>>> report.
>>>> https://bugs.launchpad.net/bugs/1759461
>>>>
>>>> Title:
>>>>   intermittent failure github.com/juju/juju/cmd/juju/action
>>>>   ShowOutputSuite.TestRun
>>>>
>>>> To manage notifications about this bug go to:
>>>> https://bugs.launchpad.net/juju/+bug/1759461/+subscriptions
>>>>
>>>
>>>
>>
>