Intermittent panic: rescanned document

Bug #1449054 reported by Martin Packman
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Unassigned
juju-core
Fix Released
High
Unassigned
1.22
Fix Released
Critical
Dimiter Naydenov
1.23
Won't Fix
Critical
William Reade
1.24
Fix Released
Critical
William Reade
1.25
Fix Released
High
Unassigned

Bug Description

Both a gated landing job and a utopic unit test run on 1.23 failed on apiserver/client tests, but passed subsequently. The panic is similar to bug 1318366 which was fixed in 1.21.

Occurred in the wild in private bug #1452221.

<http://juju-ci.vapour.ws/job/github-merge-juju/2939/consoleFull>
<http://reports.vapour.ws/releases/2564/job/run-unit-tests-utopic-amd64/attempt/1939>

panic: rescanned document misses transaction in queue

goroutine 2460 [running]:
runtime.panic(0xcd51a0, 0xc210194e20)
 /usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
gopkg.in/mgo.v2/txn.(*flusher).rescan(0xc2103a14e0, 0xc2104d0680, 0x2b07dd89bf00, 0x0, 0x0, ...)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:461 +0xdab
gopkg.in/mgo.v2/txn.(*flusher).prepare(0xc2103a14e0, 0xc2104d0680, 0x0, 0x0, 0x0, ...)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:224 +0x9a
gopkg.in/mgo.v2/txn.(*flusher).advance(0xc2103a14e0, 0xc2104d0680, 0x0, 0x0, 0xc2104d0680, ...)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:161 +0x64
gopkg.in/mgo.v2/txn.(*flusher).recurse(0xc2103a14e0, 0xc2104d0680, 0xc2103a1630, 0xc2104d0680, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:134 +0x8d
gopkg.in/mgo.v2/txn.(*flusher).recurse(0xc2103a14e0, 0xc2104fbe80, 0xc2103a1630, 0xc2104fbe80, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:148 +0x329
gopkg.in/mgo.v2/txn.(*flusher).recurse(0xc2103a14e0, 0xc2101cc980, 0xc2103a1630, 0xc2101cc980, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:148 +0x329
gopkg.in/mgo.v2/txn.(*flusher).recurse(0xc2103a14e0, 0xc2101cc680, 0xc2103a1630, 0xc2101cc680, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:148 +0x329
gopkg.in/mgo.v2/txn.(*flusher).recurse(0xc2103a14e0, 0xc21019ee00, 0xc2103a1630, 0x2b07df3d3368, 0x1)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:148 +0x329
gopkg.in/mgo.v2/txn.(*flusher).run(0xc2103a14e0, 0x0, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:40 +0x15b
gopkg.in/mgo.v2/txn.flush(0xc2103dff40, 0xc21019ee00, 0x1, 0x1)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/flusher.go:22 +0x1a8
gopkg.in/mgo.v2/txn.(*Runner).Run(0xc2103dff40, 0xc21030ac60, 0x2, 0x2, 0xc210433520, ...)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/mgo.v2/txn/txn.go:302 +0x498
github.com/juju/txn.(*transactionRunner).RunTransaction(0xc210bfad20, 0xc21030ac60, 0x2, 0x2, 0x0, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/txn/txn.go:169 +0x202
github.com/juju/txn.(*transactionRunner).Run(0xc210bfad20, 0xc210c12140, 0xc2104b4ea0, 0xc210424180)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/txn/txn.go:134 +0x19f
github.com/juju/juju/state.(*multiEnvRunner).Run(0xc210bfad50, 0xc210bfacf0, 0x2b07dd8b27f8, 0xc210bfad50)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/state/txns.go:114 +0x9d
github.com/juju/juju/state.(*State).run(0xc210313380, 0xc210bfacf0, 0x0, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/state/txns.go:68 +0x97
github.com/juju/juju/state.(*State).PrepareStoreCharmUpload(0xc210313380, 0xc21052dd70, 0x0, 0x0, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/state/state.go:1009 +0x26e
github.com/juju/juju/apiserver/client.(*Client).AddCharm(0xc210c05f50, 0xc210526860, 0x16, 0x0, 0x0)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/apiserver/client/client.go:1220 +0x145
reflect.Value.call(0x1048ca0, 0xc210c05f50, 0x538, 0x108a9d0, 0x4, ...)
 /usr/lib/go/src/pkg/reflect/value.go:474 +0xe0b
reflect.Value.Call(0x1048ca0, 0xc210c05f50, 0x538, 0xc210c12000, 0x1, ...)
 /usr/lib/go/src/pkg/reflect/value.go:345 +0x9d
github.com/juju/juju/rpc/rpcreflect.func·005(0x1048ca0, 0xc210c05f50, 0x160, 0xedf360, 0xc210c05440, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/rpc/rpcreflect/type.go:306 +0x12f
github.com/juju/juju/apiserver.(*srvCaller).Call(0xc21063a040, 0x0, 0x0, 0xedf360, 0xc210c05440, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/apiserver/root.go:125 +0xe3
github.com/juju/juju/rpc.(*Conn).runRequest(0xc210529b40, 0x2b07dd8b85d8, 0xc21063a040, 0x12f2cc0, 0x7, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/rpc/server.go:552 +0xd5
created by github.com/juju/juju/rpc.(*Conn).handleRequest
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/rpc/server.go:481 +0x671

goroutine 10 [chan receive]:
gopkg.in/check%2ev1.(*suiteRunner).runTest(0xc210590080, 0xc210efb070, 0xc210d79d20)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/check.v1/check.go:799 +0x51
gopkg.in/check%2ev1.(*suiteRunner).run(0xc210590080, 0xc210098780)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/check.v1/check.go:604 +0x257
gopkg.in/check%2ev1.Run(0x1082f00, 0xc210098780, 0xc21009f1c0, 0xc21018d870)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/check.v1/run.go:92 +0x49
gopkg.in/check%2ev1.RunAll(0xc21009f1c0, 0x1086490)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/check.v1/run.go:84 +0xc1
gopkg.in/check%2ev1.TestingT(0xc21017eea0)
 /home/ubuntu/juju-core_1.23.2/src/gopkg.in/check.v1/run.go:72 +0x205
github.com/juju/testing.MgoTestPackage(0xc21017eea0, 0xc2100fcfc0)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/testing/mgo.go:353 +0x105
github.com/juju/juju/testing.MgoTestPackage(0xc21017eea0)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/testing/mgo.go:15 +0x34
github.com/juju/juju/apiserver/client_test.TestAll(0xc21017eea0)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/apiserver/client/api_test.go:31 +0x27
testing.tRunner(0xc21017eea0, 0x21469b0)
 /usr/lib/go/src/pkg/testing/testing.go:391 +0x8b
created by testing.RunTests
 /usr/lib/go/src/pkg/testing/testing.go:471 +0x8b2

goroutine 2447 [chan receive]:
github.com/juju/juju/rpc.(*Conn).Call(0xc210529be0, 0x108af10, 0x6, 0x0, 0x1086490, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/rpc/client.go:148 +0xb5
github.com/juju/juju/api.(*State).APICall(0xc21060e4e0, 0x108af10, 0x6, 0x0, 0x1086490, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/api/apiclient.go:305 +0x115
github.com/juju/juju/api/base.facadeCaller.FacadeCall(0x108af10, 0x6, 0x0, 0x2b07dd8b88b8, 0xc21060e4e0, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/api/base/caller.go:62 +0xbb
github.com/juju/juju/api/base.(*clientFacade).FacadeCall(0xc21021ae00, 0x10af150, 0x8, 0xedf360, 0xc210314410, ...)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/api/base/caller.go:1 +0x95
github.com/juju/juju/api.(*Client).AddCharm(0xc210bfaf90, 0xc21052d410, 0x0, 0x404eac)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/api/client.go:787 +0xcf
github.com/juju/juju/apiserver/client_test.func·008(0x1)
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/apiserver/client/client_test.go:3499 +0x82
created by github.com/juju/juju/apiserver/client_test.(*clientSuite).TestAddCharmConcurrently
 /home/ubuntu/juju-core_1.23.2/src/github.com/juju/juju/apiserver/client/client_test.go:3503 +0x286

FAIL github.com/juju/juju/apiserver/client 17.134s

Revision history for this message
Martin Packman (gz) wrote :

The earliest record we have of this in CI is pre-1.23-alpha1 so this is not a recent regression, or one we seem to hit very often.

<http://reports.vapour.ws/releases/2327/job/run-unit-tests-utopic-amd64/attempt/1509>

Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Medium
Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

This is likely due to a unit test I added as part of the fix for lp:1447846. I'm going to ask Menno to take a look.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

Actually, the issue appears to pre-date my patch. It *could* still be related though.

Revision history for this message
Martin Packman (gz) wrote :

There is exactly one other occurrence of this panic in CI history to date, on a precise test run:

<http://reports.vapour.ws/releases/2463/job/run-unit-tests-precise-amd64/attempt/2541>

In all cases, there is some evidence that the machine was unhealthy at the time (suffering from disk errors or similar), so Juju is likely not the culprit but just behaving badly in a failure case.

Changed in juju-core:
importance: Medium → Low
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Based on where the panic has occurred and when it started occurring, and what's in the fix for lp:1447846, I don't think the 2 are related.

John A Meinel (jameinel)
description: updated
Changed in juju-core:
assignee: nobody → William Reade (fwereade)
importance: Low → Critical
milestone: none → 1.25.0
Revision history for this message
William Reade (fwereade) wrote :

Latest state: confirmed that I found a race, also reasonably convinced that the proposed mgo/txn patch does not necessarily render it bulletproof. Determining which is the least risky path out of [close this door, leave panic, hope that legitimate reasons to hit that state are rare enough] and [replace panic with `goto RetryDoc`, hope that there are no illegitimate reasons to hit that state].

Probably best to goto RetryDoc, with a retry limit, but still thinking/discussing.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Just a heads up, William asked me to ensure the next 1.22 release has the patched version of mgo (about to be released today).

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Proposed https://github.com/juju/juju/pull/2457, which updates mgo.v2 revision for juju 1.22 to the latest (see http://blog.labix.org/2015/05/29/mgo-r2015-05-29). Hopefully this will resolve this issue.

Revision history for this message
John A Meinel (jameinel) wrote :

Did this get into master? I'm guessing so, but this bug doesn't reflect that.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

No, it doesn't appear so as the dependencies.tsv on master has the older mgo revision:

gopkg.in/mgo.v2 git dc255bb679efa273b6544a03261c4053505498a4 2014-07-30T20:00:37Z

Unlike 1.22 and 1.24 which have the most recent revision:
gopkg.in/mgo.v2 git 01ee097136da162d1dd3c9b44fbdf3abf4fd6552 2015-05-29T15:47:11Z

Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25-alpha1 → 1.25-beta1
Changed in juju-core:
milestone: 1.25-beta1 → 1.26-alpha1
assignee: William Reade (fwereade) → nobody
Revision history for this message
Martin Packman (gz) wrote :

Master and 1.25 include the mgo version with the changes for this issue. CI has not seen a repeat since the last report.

Changed in juju-core:
status: Triaged → Fix Released
milestone: 1.26-alpha1 → none
affects: juju-core → juju
Changed in juju-core:
importance: Undecided → High
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.