Intermittent unit test failure: juju.worker.dependency

Bug #1781250 reported by Simon Richardson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Low
Unassigned

Bug Description

Runtime panic during firewall worker loop. See the full logs here: http://ci.jujucharms.com/job/github-check-merge-juju/2267/

The following stacktrace was captured from the logs for posterity:

[LOG] 0:02.726 DEBUG juju.worker.dependency "firewaller" manifold worker stopped: panic resulted in: runtime error: invalid memory address or nil pointer dereference
[LOG] 0:02.726 ERROR juju.worker.dependency "firewaller" manifold worker returned unexpected error: panic resulted in: runtime error: invalid memory address or nil pointer dereference
[LOG] 0:02.726 DEBUG juju.worker.dependency stack trace:
panic resulted in: runtime error: invalid memory address or nil pointer dereference
stacktrace:
goroutine 15754 [running]:
runtime/debug.Stack(0x35a767c, 0x15, 0xc4224fd848)
 /snap/go/2130/src/runtime/debug/stack.go:24 +0xa7
github.com/juju/juju/worker/catacomb.runSafely.func1(0xc4224fdf68)
 /workspace/src/github.com/juju/juju/worker/catacomb/catacomb.go:286 +0xc7
panic(0x2ee7180, 0x54194f0)
 /snap/go/2130/src/runtime/panic.go:502 +0x229
github.com/juju/juju/cmd/jujud/agent.(*minModelWorkersEnviron).Instances(0xc421b00a80, 0x38fed40, 0xc42247e1c0, 0xc4226529e0, 0x1, 0x1, 0xc4224fdac8, 0xc4224fdae8, 0x412b59, 0xc4209f5620, ...)
 <autogenerated>:1 +0x36
github.com/juju/juju/worker/firewaller.(*Firewaller).reconcileInstances(0xc420f2c640, 0xc42213d93a, 0x1)
 /workspace/src/github.com/juju/juju/worker/firewaller/firewaller.go:552 +0x288
github.com/juju/juju/worker/firewaller.(*Firewaller).loop(0xc420f2c640, 0x0, 0xc422130f40)
 /workspace/src/github.com/juju/juju/worker/firewaller/firewaller.go:266 +0x4b6
github.com/juju/juju/worker/firewaller.(*Firewaller).(github.com/juju/juju/worker/firewaller.loop)-fm(0xc400000008, 0x36ce648)
 /workspace/src/github.com/juju/juju/worker/firewaller/firewaller.go:202 +0x2a
github.com/juju/juju/worker/catacomb.runSafely(0xc42210bc40, 0x38fdb20, 0xc422033360)
 /workspace/src/github.com/juju/juju/worker/catacomb/catacomb.go:289 +0x55
github.com/juju/juju/worker/catacomb.Invoke.func3(0x0, 0x0)
 /workspace/src/github.com/juju/juju/worker/catacomb/catacomb.go:115 +0x70
gopkg.in/tomb%2ev2.(*Tomb).run(0xc420f2c640, 0xc421a6e100)
 /workspace/src/gopkg.in/tomb.v2/tomb.go:153 +0x2b
created by gopkg.in/tomb%2ev2.(*Tomb).Go
 /workspace/src/gopkg.in/tomb.v2/tomb.go:149 +0xb9

Changed in juju:
status: New → Incomplete
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Simon Richardson,

Looking at the log you were trying to merge into develop. I have been chasing down the same problem yesterday.

Attached stack trace, whilst scary, may not be the cause of the problem. At this stage, I am narrowing it down to a test in cmd/juju/appplication/deploy_test that we have been skipping for a very long time and I have recently unskipped.

In my case, I was also getting issues in TestTearDown coming from mongo and tomb and some workers not releasing the connection (attaching).

The real solution here is to re-write these tests without JujuConnSuite.

Alternatively, we can again skip the test. Although, I fail to see that value of keeping skipped tests. My preference, in that case, would be to delete it altogether.

I'll have a look at it today.

Changed in juju:
status: Incomplete → In Progress
importance: Undecided → High
assignee: nobody → Anastasia (anastasia-macmood)
Revision history for this message
Anastasia (anastasia-macmood) wrote :
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Actually, with further investigation, these failures are not related to me unskipping tests. There is a lot of new things that have been added to deploy_test recently and the TestTearDown is failing. It has not failed locally on my machine though.

I still stand by my decision that this package tests need to be re-written away from JujuConnSuite. I'll work on it.

Changed in juju:
assignee: Anastasia (anastasia-macmood) → Simon Richardson (simonrichardson)
Revision history for this message
Simon Richardson (simonrichardson) wrote :

So I'm skipping the test that "seems" to be causing the CI issue, but, unfortunately it's intermittent - so only time will see
 https://github.com/juju/juju/pull/8918/commits/ec595f73202dd65ee842823d0e50b52083c53753

Revision history for this message
Anastasia (anastasia-macmood) wrote :

PR that among other things skips the test: https://github.com/juju/juju/pull/8918

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Simon Richardson (simonrichardson),

At closer examination of the code, I wonder whether it's 2 tests that are causing tear down issues in this package:
* (s *CAASModelDeployCharmStoreSuite) TestDeployBundleDevices (bundle_test.go);
* (s *CAASDeploySuite) TestDevices.

Note that CAASModelDeployCharmStoreSuite extends CAASDeploySuite, so they are related. Also, they are the only tests in the whole package that call s.State.Model()... I wonder if this is the cause.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Further to the comment above, s.State.Model() is only used in these 2 tests in the whole of cmd/juju package :D

Changed in juju:
assignee: Simon Richardson (simonrichardson) → Yang Kelvin Liu (kelvin.liu)
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1781250] Re: Intermittent unit test failure: juju.worker.dependency

Is it calling state.Model but not cleaning up/releasing the resource?

John
=:->

On Thu, Jul 12, 2018, 15:35 Anastasia <email address hidden>
wrote:

> Further to the comment above, s.State.Model() is only used in these 2
> tests in the whole of cmd/juju package :D
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1781250
>
> Title:
> Intermittent unit test failure: juju.worker.dependency
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1781250/+subscriptions
>

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Possibly...

If the reference to the Model is actually needed ( I am sure these tests can be re-written without it), I think the best would be to access it from JujuConnSuite.Model() [s.Model] rather than from State.Model() [s.State.Model()]

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Another occurrence of firewaller worker test failure from the original bug description - http://ci.jujucharms.com/job/github-merge-juju/755/

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

This test is being rewritten with mocks as part of the deploy command refactor.

Harry Pidcock (hpidcock)
Changed in juju:
status: In Progress → Triaged
assignee: Yang Kelvin Liu (kelvin.liu) → nobody
Revision history for this message
Joseph Phillips (manadart) wrote :

Marked as invalid, as we have not observed this issue on later versions.

Changed in juju:
status: Triaged → Invalid
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.