Intermittent unit test failure: StatusSuite.TestRun

Bug #1780766 reported by Simon Richardson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
John A Meinel

Bug Description

The following unit test fails intermittently: StatusSuite.TestRun

http://ci.jujucharms.com/job/github-merge-juju/688/console

```
status_test.go:91:
    s.runTestCase(c, test)
status_test.go:122:
    c.Check(ctx.Stdout.(*bytes.Buffer).String(), gc.Equals, out.String())
... obtained string = "" +
... "actions:\n" +
... "- completed at: n/a\n" +
... " status: pending\n"
... expected string = "" +
... "actions:\n" +
... "- action: fakeName\n" +
... " completed at: n/a\n" +
... " id: deadbeef-0000-4000-8000-feedfacebeef\n" +
... " status: some-random-status\n" +
... " unit: \"\"\n"
```

Changed in juju:
status: New → Incomplete
Tim Penhey (thumper)
tags: added: intermittent-failure test-failure
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
John A Meinel (jameinel) wrote :
Changed in juju:
status: Expired → Incomplete
Revision history for this message
John A Meinel (jameinel) wrote :

$ time go test -count=100 -race -check.v -check.f StatusSuite.TestRun

fails about 5 times in 100. In different places, but it does fail on my machine.

Revision history for this message
John A Meinel (jameinel) wrote :

The problem seems to lie in here:
func (c *fakeAPIClient) Actions(args params.Entities) (params.ActionResults, error) {
 // If the test supplies a delay time too long, we'll return an error
 // to prevent the test hanging. If the given wait is up, then return
 // the results; otherwise, return a pending status.

 // First, sync.
 _ = <-time.NewTimer(0 * time.Second).C

 select {
 case _ = <-c.delay.C:
  // The API delay timer is up. Pass pre-canned results back.
  return params.ActionResults{Results: c.actionResults}, c.apiErr
 case _ = <-c.timeout.C:
  // Timeout to prevent tests from hanging.
  return params.ActionResults{}, errors.New("test timed out before wait time")
 default:
  // Timeout should only be nonzero in case we want to test
  // pending behavior with a --wait flag on FetchCommand.
  return params.ActionResults{Results: []params.ActionResult{{
   Status: params.ActionPending,
   Output: map[string]interface{}{},
   Started: time.Date(2015, time.February, 14, 8, 15, 0, 0, time.UTC),
   Enqueued: time.Date(2015, time.February, 14, 8, 13, 0, 0, time.UTC),
  }}}, nil
 }

The issue is that while it is true that the setup does:
  fakeClient := makeFakeClient(
   0*time.Second, // No API delay
   5*time.Second, // 5 second test timeout
   tc.tags,
   tc.results,
   tc.actionsByNames,
   "", // No API error
  )

which creates:
  delay: time.NewTimer(delay),
  timeout: time.NewTimer(timeout),

Apparently there is no guarantee that time.NewTimer(0) will actually be ready before the 'default:' branch is evaluated.

the _ = <time.NewTimer(0 *time.Second).C seems very strange.
Presumably it is meant to handle the case where c.delay.C isn't actually ready, but I don't think it actually gives us that.

Note also that all of these are *real* time.NewTimer calls, not *clock* calls. Which also adds real-clock-skew issues.

I'd like to fix those things, but the easier thing to do is just add '--wait 1s' to all the calls, in which case the code will see the Pending result, and keep trying until it sees something else.

Revision history for this message
John A Meinel (jameinel) wrote :

Turns out "juju show-action-status" doesn't support --wait, that is only accepted by 'show-action-output --wait'.

So instead, I've just changed the code so that a delay of 0 never goes via a Timer object, it just returns immediately.
On the plus side, that lets me get rid of the "first sync <-" oddity, and also lets the tests run slightly faster (they complete under --race in 11-12ms instead of 15ms.)

And it is more deterministic. yay

Changed in juju:
milestone: none → 2.5.4
assignee: nobody → John A Meinel (jameinel)
importance: Undecided → Medium
status: Incomplete → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :
John A Meinel (jameinel)
Changed in juju:
milestone: 2.5.4 → 2.5.5
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Changed in juju:
milestone: 2.5.6 → 2.5.7
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.