cmd/jujud/agent MachineSuite.TestHostedModelWorkers fails because compute-provisioner never really starts

Bug #1600301 reported by James Tunnicliffe
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Menno Finlay-Smits

Bug Description

When this test passes it is because this has happened:

/home/dooferlad/out3.txt:64652b3e-26e0-40fa-870c-fc6ae6d16397: waiting for [application-scaler charm-revision-updater compute-provisioner environ-tracker firewaller instance-poller is-responsible-flag metric-worker migration-fortress migration-master not-alive-flag not-dead-flag space-importer state-cleaner status-history-pruner storage-provisioner unit-assigner]
/home/dooferlad/out3.txt:[LOG] 0:03.717 DEBUG juju.worker.dependency "compute-provisioner" manifold worker stopped: <nil>
/home/dooferlad/out3.txt:[LOG] 0:03.734 DEBUG juju.worker.dependency "compute-provisioner" manifold worker started
/home/dooferlad/out3.txt:[LOG] 0:03.736 DEBUG juju.worker.dependency "compute-provisioner" manifold worker started
/home/dooferlad/out3.txt:[LOG] 0:03.760 DEBUG juju.worker.dependency "compute-provisioner" manifold worker stopped: failed to process updated machines: failed to get all instances from broker: model is not prepared
/home/dooferlad/out3.txt:[LOG] 0:03.760 ERROR juju.worker.dependency "compute-provisioner" manifold worker returned unexpected error: failed to process updated machines: failed to get all instances from broker: model is not prepared
/home/dooferlad/out3.txt:64652b3e-26e0-40fa-870c-fc6ae6d16397: has workers [agent api-caller api-config-watcher application-scaler charm-revision-updater clock compute-provisioner environ-tracker firewaller instance-poller is-responsible-flag metric-worker migration-fortress migration-master not-alive-flag not-dead-flag space-importer spaces-imported-gate state-cleaner status-history-pruner storage-provisioner unit-assigner]
/home/dooferlad/out3.txt:[LOG] 0:03.769 DEBUG juju.worker.dependency "compute-provisioner" manifold worker stopped: <nil>
/home/dooferlad/out3.txt:[LOG] 0:03.770 DEBUG juju.worker.dependency "compute-provisioner" manifold worker stopped: could not retrieve the controller config.: connection is shut down

The compute-provisioner worker encounters the error "model is not prepared" from provider/dummy/environs.go:

func (e *environ) state() (*environState, error) {
 dummy.mu.Lock()
 defer dummy.mu.Unlock()
 state, ok := dummy.state[e.modelUUID]
 if !ok {
  return nil, errNotPrepared
 }
 return state, nil
}

This may be because there are two UUIDs in play, but only one has been bootstrapped (deadbeef-...)

See http://reports.vapour.ws/releases/issue/57865094749a567e82e1cc5b

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Have a fix for func (factory *Factory) MakeModel(c *gc.C, params *ModelParams) *state.State -- it now bootstraps the model.

Changed in juju-core:
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Tunnicliffe (dooferlad)
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Getting a hang about 10% of the time running this test. Better than nearly 90% I guess. It seems to be in provider/dummy/environs.go Reset. A provider of many mutexes.

==================
WARNING: DATA RACE
Read by goroutine 206:
  github.com/juju/juju/apiserver/observer.(*RequestNotifier).ServerReply()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/request_notifier.go:92 +0x46a
  github.com/juju/juju/apiserver/observer.(*Multiplexer).ServerReply.func1()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:89 +0xa9
  github.com/juju/juju/apiserver/observer.mapConcurrent.func1()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:112 +0x64

Previous write by goroutine 196:
  github.com/juju/juju/apiserver/observer.(*RequestNotifier).ServerRequest()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/request_notifier.go:63 +0xc9
  github.com/juju/juju/apiserver/observer.(*Multiplexer).ServerRequest.func1()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:94 +0x5c
  github.com/juju/juju/apiserver/observer.mapConcurrent.func1()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:112 +0x64

Goroutine 206 (running) created at:
  github.com/juju/juju/apiserver/observer.mapConcurrent()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:113 +0x159
  github.com/juju/juju/apiserver/observer.(*Multiplexer).ServerReply()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:89 +0x1a4
  github.com/juju/juju/rpc.(*Conn).runRequest()
      /home/dooferlad/dev/go/src/github.com/juju/juju/rpc/server.go:549 +0x46c

Goroutine 196 (finished) created at:
  github.com/juju/juju/apiserver/observer.mapConcurrent()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:113 +0x159
  github.com/juju/juju/apiserver/observer.(*Multiplexer).ServerRequest()
      /home/dooferlad/dev/go/src/github.com/juju/juju/apiserver/observer/observer.go:94 +0xf8
  github.com/juju/juju/rpc.(*Conn).handleRequest()
      /home/dooferlad/dev/go/src/github.com/juju/juju/rpc/server.go:459 +0x98c
  github.com/juju/juju/rpc.(*Conn).loop()
      /home/dooferlad/dev/go/src/github.com/juju/juju/rpc/server.go:402 +0x23a
  github.com/juju/juju/rpc.(*Conn).input()
      /home/dooferlad/dev/go/src/github.com/juju/juju/rpc/server.go:368 +0x40

summary: - cmd/jujud/agent MachineSuite.TestControllerModelWorkers fails because
+ cmd/jujud/agent MachineSuite.TestHostedModelWorkers fails because
compute-provisioner never really starts
Curtis Hovey (sinzui)
tags: added: ci intermittent-failure regression unit-tests
description: updated
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

I haven't seen this recently so marking as invalid.

Changed in juju-core:
status: In Progress → Invalid
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I'm still biting me regularly. I'll reopen when I see it again on master. Last I checked, the dummy provider didn't cater for multi-model properly.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

This is happening a lot, especially when the merge tests run on Windows. I've been bitten by it about 5 times today. Taking a look.

Changed in juju-core:
status: Invalid → Triaged
milestone: none → 2.0-beta14
assignee: James Tunnicliffe (dooferlad) → Menno Smits (menno.smits)
status: Triaged → In Progress
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

I see what's happening.

The test (and probably TestMigratingModelWorkers and TestDyingModelCleanedUp too) is only ever passing by luck. A hosted model is created in the DB but the dummy provider doesn't know about it (and has next to no provision for hosted models) so the compute-provisioner worker comes up and then fails soon after with a "model is not prepared" error. The worker isn't up for long but the test is checking for it in quite a tight loop so it usually happens to see the worker up and the test passes.

Except on Windows where it fails a lot more often because the timings are different...

I don't know a good way of making these tests pass (although I do wonder how necessary they are...) so I'm going to skip them for now, but leave this ticket open.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

You can make the problem happen reliably by slowing down the test cycle time. Changing the coretesting.ShortWait to time.Second did the trick for me.

Revision history for this message
William Reade (fwereade) wrote :

ISTM that the tests are *passing* spuriously -- they need to be less trusting that a single valid snapshot implies stability -- and that the real problem is indeed that the dummy provider wasn't updated to handle multi-model properly.

I would *love* for us not to have to touch the dummy provider; does anyone see an alternative?

Revision history for this message
Andrew Wilkins (axwalk) wrote :

@fwereade: have provisioner the worker depend on worker/environ, and use the Environ that it manages? And then inject a simpler/more programmable Environ. The Environ doesn't need to do much to keep the provisioner happy.

Revision history for this message
William Reade (fwereade) wrote :

@axwalk

1) In general, I'd love to extract that dependency... but it should really be "fix every worker that creates its own environ", otherwise we'll just see the same problems popping up elsewhere. (why does the firewaller work? perhaps the dummy provider doesn't do firewalls?) In short: yes, we should do that, but that helps with provisioner unit tests a lot more than it does with agent integration tests.

2) In this precise context, we'd still be injecting it from quite a significant distance. Is it much of a win over, say, patching environs.New? Not that I *want* to do that, but...

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

This PR makes TestHostedModelWorkers fail reliably as well as cleaning up lots of other stuff: https://github.com/juju/juju/pull/5908

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

This PR makes the provisioner get its Environ from the environ-tracker insteadf of grabbing one itself: https://github.com/juju/juju/pull/5911

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta14 → 2.0-beta15
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta15 → none
milestone: none → 2.0-beta15
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.