test failure UniterSuite.TestUniterUpgradeConflicts

Bug #1311825 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
William Reade
1.18
Invalid
Undecided
Unassigned

Bug Description

UniterSuite.TestUniterUpgradeConflicts frequent fails on amd64 precise. It appears to be the cause of a few cursed revisions. 8 of the last 11 runs failed. Trusty's record is 100%

I am attaching the output from the failed test.

Related branches

Revision history for this message
Curtis Hovey (sinzui) wrote :
Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Critical
milestone: none → 1.19.1
tags: added: ci
John A Meinel (jameinel)
Changed in juju-core:
assignee: nobody → John A Meinel (jameinel)
status: Triaged → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :

I'm attaching a log for when this test passes, just to see if we can find an obvious difference.
I used this patch to force the test to fail so we could inspect the log:
=== modified file 'worker/uniter/uniter_test.go'
--- worker/uniter/uniter_test.go 2014-04-18 10:33:27 +0000
+++ worker/uniter/uniter_test.go 2014-04-24 06:47:45 +0000
@@ -934,6 +934,11 @@
   },
   verifyCharm{revision: 1},
  ), ut(
+ "fail it all",
+ custom{func(c *gc.C, ctx *context) {
+ c.Fatalf("cause the test suite to fail to get the trace")
+ }},
+ ), ut(
   `upgrade: forced upgrade does work without explicit resolution if underlying problem was fixed`,
   startUpgradeError{},
   resolveError{state.ResolvedNoHooks},

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (20.3 KiB)

To snip out the relevant bits:

FAILING:
step 7:

uniter_test.waitHooks{"upgrade-charm", "config-changed"}
waiting for hooks: []string{"install", "config-changed", "start", "upgrade-charm", "config-changed"}
ctx.hooksCompleted: []string{"install", "config-changed", "start"}
[LOG] 0:02.087 DEBUG juju.state.apiserver -> [AC] unit-u-0 50.837196ms {"RequestId":145,"Response":{}} NotifyWatcher["20"].Next
[LOG] 0:02.087 DEBUG juju.worker.uniter.filter got unit change
[LOG] 0:02.087 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":164,"Type":"NotifyWatcher","Id":"20","Request":"Next","Params":{}}
[LOG] 0:02.087 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":165,"Type":"Uniter","Request":"Life","Params":{"Entities":[{"Tag":"unit-u-0"}]}}
[LOG] 0:02.088 DEBUG juju.state.apiserver -> [AC] unit-u-0 414.94us {"RequestId":165,"Response":{"Results":[{"Life":"alive","Error":null}]}} Uniter[""].Life
[LOG] 0:02.088 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":166,"Type":"Uniter","Request":"Resolved","Params":{"Entities":[{"Tag":"unit-u-0"}]}}
[LOG] 0:02.088 DEBUG juju.state.apiserver -> [AC] unit-u-0 372.007us {"RequestId":166,"Response":{"Results":[{"Error":null,"Mode":"no-hooks"}]}} Uniter[""].Resolved
[LOG] 0:02.093 DEBUG juju.worker.uniter.charm preparing to deploy charm "cs:quantal/wordpress-1"
ctx.hooksCompleted: []string{"install", "config-changed", "start"}
[LOG] 0:02.136 DEBUG juju.worker.uniter.charm deploying charm "cs:quantal/wordpress-1"
[LOG] 0:02.139 DEBUG juju.worker.uniter.charm finishing deploy of charm "cs:quantal/wordpress-1"
[LOG] 0:02.178 INFO juju.worker.uniter charm "cs:quantal/wordpress-1" is deployed
ctx.hooksCompleted: []string{"install", "config-changed", "start"}
[LOG] 0:02.219 DEBUG juju.worker.uniter ModeUpgrading cs:quantal/wordpress-1 exiting
[LOG] 0:02.219 INFO juju.worker.uniter ModeContinue starting
[LOG] 0:02.219 INFO juju.worker.uniter found queued "upgrade-charm" hook
[LOG] 0:02.220 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":167,"Type":"Uniter","Request":"APIAddresses","Params":{}}
[LOG] 0:02.220 DEBUG juju.state.apiserver -> [AC] unit-u-0 379.912us {"RequestId":167,"Response":{"Error":null,"Result":["0.1.2.3:1234"]}} Uniter[""].APIAddresses
[LOG] 0:02.220 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":168,"Type":"Uniter","Request":"GetOwnerTag","Params":{"Entities":[{"Tag":"service-u"}]}}
[LOG] 0:02.221 DEBUG juju.state.apiserver -> [AC] unit-u-0 384.062us {"RequestId":168,"Response":{"Error":null,"Result":"user-admin"}} Uniter[""].GetOwnerTag
[LOG] 0:02.221 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":169,"Type":"Uniter","Request":"PublicAddress","Params":{"Entities":[{"Tag":"unit-u-0"}]}}
[LOG] 0:02.222 DEBUG juju.state.apiserver -> [AC] unit-u-0 753.759us {"RequestId":169,"Response":{"Results":[{"Error":null,"Result":"public.address.example.com"}]}} Uniter[""].PublicAddress
[LOG] 0:02.222 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":170,"Type":"Uniter","Request":"PrivateAddress","Params":{"Entities":[{"Tag":"unit-u-0"}]}}
[LOG] 0:02.223 DEBUG juju.state.apiserver -> [AC] unit-u-0 638.854us {"RequestId":170,"Response":{"Results":[{"Error":nu...

Revision history for this message
John A Meinel (jameinel) wrote :

I don't see much difference between the two, the only difference I see in the log is that in the failing case:

[LOG] 0:02.719 INFO juju.worker.uniter committing "config-changed" hook
ctx.hooksCompleted: []string{"install", "config-changed", "start", "upgrade-charm", "config-changed"}
step 8:

uniter_test.waitUnit{status:"started", info:"", data:params.StatusData(nil), charm:1, resolved:""}
[LOG] 0:02.763 INFO juju.worker.uniter committed "config-changed" hook
[LOG] 0:02.763 DEBUG juju.worker.uniter ModeConfigChanged exiting
[LOG] 0:02.763 INFO juju.worker.uniter ModeContinue starting
[LOG] 0:02.763 INFO juju.worker.uniter continuing after "config-changed" hook
[LOG] 0:02.763 DEBUG juju.worker.uniter ModeContinue exiting
[LOG] 0:02.763 INFO juju.worker.uniter ModeAbide starting
[LOG] 0:02.763 WARNING juju.worker.uniter.charm no current staging repo
[LOG] 0:02.764 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":175,"Type":"Uniter","Request":"SetStatus","Params":{"Entities":[{"Tag":"unit-u-0","Status":"started","Info":"","Data":null}]}}
[LOG] 0:02.769 DEBUG juju.state.apiserver -> [AC] unit-u-0 5.198803ms {"RequestId":175,"Response":{"Results":[{"Error":null}]}} Uniter[""].SetStatus
[LOG] 0:02.769 DEBUG juju.worker.uniter.filter want forced upgrade false
[LOG] 0:02.769 DEBUG juju.worker.uniter.filter no new charm event
want resolved mode "", got "no-hooks"; still waiting
want resolved mode "", got "no-hooks"; still waiting

You can see that once we hit "committing" it notices that config-changed is done, and then some stuff starts happening during step 8. while in the successful case:
[LOG] 0:00.996 INFO juju.worker.uniter committing "config-changed" hook
[LOG] 0:00.997 INFO juju.worker.uniter committed "config-changed" hook
[LOG] 0:00.997 DEBUG juju.worker.uniter ModeConfigChanged exiting
[LOG] 0:00.997 INFO juju.worker.uniter ModeContinue starting
[LOG] 0:00.997 INFO juju.worker.uniter continuing after "config-changed" hook
[LOG] 0:00.997 DEBUG juju.worker.uniter ModeContinue exiting
[LOG] 0:00.997 INFO juju.worker.uniter ModeAbide starting
[LOG] 0:00.997 WARNING juju.worker.uniter.charm no current staging repo
[LOG] 0:00.998 DEBUG juju.state.apiserver <- [2] unit-u-0 {"RequestId":188,"Type":"Uniter","Request":"SetStatus","Params":{"Entities":[{"Tag":"unit-u-0","Status":"started","Info":"","Data":null}]}}
[LOG] 0:01.001 DEBUG juju.state.apiserver -> [2] unit-u-0 3.295044ms {"RequestId":188,"Response":{"Results":[{"Error":null}]}} Uniter[""].SetStatus
[LOG] 0:01.001 DEBUG juju.worker.uniter.filter want forced upgrade false
[LOG] 0:01.001 DEBUG juju.worker.uniter.filter no new charm event
ctx.hooksCompleted: []string{"install", "config-changed", "start", "upgrade-charm", "config-changed"}
step 8:

We seem to not notice until after the ModeConfigChanged has already exited and we've set the status of the unit to "started".

On line 335 of modes.go:
   if e := u.f.ClearResolved(); e != nil {
    return nil, e
   }

Calling ClearResolved actually sets the field to ResolvedNode (aka "") rather than ResolvedHooks.

Actually, maybe I have context there in the next comment.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (8.9 KiB)

It may be that looking at the tail of the log is the wrong place, because I *do* see ClearResolved early on in the successful case, and I do *not* see it in the failing case:
FAIL
step 6:

uniter_test.resolveError{resolved:"no-hooks"}
[LOG] 0:02.073 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":162,"Type":"Uniter","Request":"WatchServiceRelations","Params":{"Entities":[{"Tag":"service-u"}]}}
[LOG] 0:02.074 DEBUG juju.state.apiserver -> [AC] unit-u-0 1.415215ms {"RequestId":162,"Response":{"Results":[{"StringsWatcherId":"25","Changes":[],"Error":null}]}} Uniter[""].WatchServiceRelations
[LOG] 0:02.075 DEBUG juju.worker.uniter.filter no new charm event
[LOG] 0:02.075 DEBUG juju.worker.uniter.filter got config change
[LOG] 0:02.075 DEBUG juju.worker.uniter.filter preparing new config event
[LOG] 0:02.075 DEBUG juju.worker.uniter.filter got relations change
[LOG] 0:02.075 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":163,"Type":"StringsWatcher","Id":"25","Request":"Next","Params":{}}

step 7:

uniter_test.waitHooks{"upgrade-charm", "config-changed"}
waiting for hooks: []string{"install", "config-changed", "start", "upgrade-charm", "config-changed"}
ctx.hooksCompleted: []string{"install", "config-changed", "start"}
[LOG] 0:02.087 DEBUG juju.state.apiserver -> [AC] unit-u-0 50.837196ms {"RequestId":145,"Response":{}} NotifyWatcher["20"].Next
[LOG] 0:02.087 DEBUG juju.worker.uniter.filter got unit change
[LOG] 0:02.087 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":164,"Type":"NotifyWatcher","Id":"20","Request":"Next","Params":{}}
[LOG] 0:02.087 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":165,"Type":"Uniter","Request":"Life","Params":{"Entities":[{"Tag":"unit-u-0"}]}}
[LOG] 0:02.088 DEBUG juju.state.apiserver -> [AC] unit-u-0 414.94us {"RequestId":165,"Response":{"Results":[{"Life":"alive","Error":null}]}} Uniter[""].Life
[LOG] 0:02.088 DEBUG juju.state.apiserver <- [AC] unit-u-0 {"RequestId":166,"Type":"Uniter","Request":"Resolved","Params":{"Entities":[{"Tag":"unit-u-0"}]}}
[LOG] 0:02.088 DEBUG juju.state.apiserver -> [AC] unit-u-0 372.007us {"RequestId":166,"Response":{"Results":[{"Error":null,"Mode":"no-hooks"}]}} Uniter[""].Resolved
[LOG] 0:02.093 DEBUG juju.worker.uniter.charm preparing to deploy charm "cs:quantal/wordpress-1"
ctx.hooksCompleted: []string{"install", "config-changed", "start"}
[LOG] 0:02.136 DEBUG juju.worker.uniter.charm deploying charm "cs:quantal/wordpress-1"
[LOG] 0:02.139 DEBUG juju.worker.uniter.charm finishing deploy of charm "cs:quantal/wordpress-1"
[LOG] 0:02.178 INFO juju.worker.uniter charm "cs:quantal/wordpress-1" is deployed
ctx.hooksCompleted: []string{"install", "config-changed", "start"}
[LOG] 0:02.219 DEBUG juju.worker.uniter ModeUpgrading cs:quantal/wordpress-1 exiting
[LOG] 0:02.219 INFO juju.worker.uniter ModeContinue starting
[LOG] 0:02.219 INFO juju.worker.uniter found queued "upgrade-charm" hook

SUCCESS
step 7:

uniter_test.waitHooks{"upgrade-charm", "config-changed"}
waiting for hooks: []string{"install", "config-changed", "start", "upgrade-charm", "config-changed"}
ctx.hooksCompleted: []string{"install", "config-changed", "start"}
[LOG] 0:00.861 DEBUG juju.s...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

=== modified file 'worker/uniter/modes.go'
--- worker/uniter/modes.go 2014-04-18 10:33:27 +0000
+++ worker/uniter/modes.go 2014-04-24 08:33:10 +0000
@@ -6,6 +6,7 @@
 import (
  stderrors "errors"
  "fmt"
+ "time"

  "launchpad.net/tomb"

@@ -86,6 +87,7 @@
   return ModeInstalling(curl), nil
  } else if u.s.Op == Upgrade {
   logger.Infof("resuming charm upgrade")
+ time.Sleep(5 * time.Second)
   return ModeUpgrading(curl), nil
  }
  panic(fmt.Errorf("unhandled uniter operation %q", u.s.Op))

=== modified file 'worker/uniter/uniter_test.go'

William and I paired on this, and it looks like just your normal race. The Uniter is shut down and woken up again, but we don't actually wait to see that it has tried to deploy (since the state before was failing, and the state after is supposed to be failing).
Which means that we then fixUpgradeError{}, and it is possible for the Uniter to try to deploy after we've fixed the error, in which case it never fails and never waits to call ClearResolved().

The above patch demonstrates the failure by forcing the actual upgrade attempt to wait longer.

We're working on a fix now.

John A Meinel (jameinel)
Changed in juju-core:
assignee: John A Meinel (jameinel) → William Reade (fwereade)
Go Bot (go-bot)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :

The test was different in 1.18, so we don't need to backport this.

Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.