Intermittent panic: rescanned document
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju |
High
|
Unassigned | ||
| | juju-core |
High
|
Unassigned | ||
| | 1.22 |
Critical
|
Dimiter Naydenov | ||
| | 1.23 |
Critical
|
William Reade | ||
| | 1.24 |
Critical
|
William Reade | ||
| | 1.25 |
High
|
Unassigned | ||
Bug Description
Both a gated landing job and a utopic unit test run on 1.23 failed on apiserver/client tests, but passed subsequently. The panic is similar to bug 1318366 which was fixed in 1.21.
Occurred in the wild in private bug #1452221.
<http://
<http://
panic: rescanned document misses transaction in queue
goroutine 2460 [running]:
runtime.
/usr/lib/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
github.
/home/
github.
/home/
github.
/home/
github.
/home/
github.
/home/
github.
/home/
reflect.
/usr/lib/
reflect.
/usr/lib/
github.
/home/
github.
/home/
github.
/home/
created by github.
/home/
goroutine 10 [chan receive]:
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
gopkg.in/
/home/
github.
/home/
github.
/home/
github.
/home/
testing.
/usr/lib/
created by testing.RunTests
/usr/lib/
goroutine 2447 [chan receive]:
github.
/home/
github.
/home/
github.
/home/
github.
/home/
github.
/home/
github.
/home/
created by github.
/home/
FAIL github.
| Martin Packman (gz) wrote : | #1 |
| Changed in juju-core: | |
| importance: | High → Medium |
| Eric Snow (ericsnowcurrently) wrote : | #2 |
This is likely due to a unit test I added as part of the fix for lp:1447846. I'm going to ask Menno to take a look.
| Eric Snow (ericsnowcurrently) wrote : | #3 |
Actually, the issue appears to pre-date my patch. It *could* still be related though.
| Martin Packman (gz) wrote : | #4 |
There is exactly one other occurrence of this panic in CI history to date, on a precise test run:
<http://
In all cases, there is some evidence that the machine was unhealthy at the time (suffering from disk errors or similar), so Juju is likely not the culprit but just behaving badly in a failure case.
| Changed in juju-core: | |
| importance: | Medium → Low |
| Menno Finlay-Smits (menno.smits) wrote : | #5 |
Based on where the panic has occurred and when it started occurring, and what's in the fix for lp:1447846, I don't think the 2 are related.
| description: | updated |
| Changed in juju-core: | |
| assignee: | nobody → William Reade (fwereade) |
| importance: | Low → Critical |
| milestone: | none → 1.25.0 |
| William Reade (fwereade) wrote : | #6 |
Latest state: confirmed that I found a race, also reasonably convinced that the proposed mgo/txn patch does not necessarily render it bulletproof. Determining which is the least risky path out of [close this door, leave panic, hope that legitimate reasons to hit that state are rare enough] and [replace panic with `goto RetryDoc`, hope that there are no illegitimate reasons to hit that state].
Probably best to goto RetryDoc, with a retry limit, but still thinking/
| Dimiter Naydenov (dimitern) wrote : | #7 |
Just a heads up, William asked me to ensure the next 1.22 release has the patched version of mgo (about to be released today).
| Dimiter Naydenov (dimitern) wrote : | #8 |
Proposed https:/
| John A Meinel (jameinel) wrote : | #9 |
Did this get into master? I'm guessing so, but this bug doesn't reflect that.
| Dimiter Naydenov (dimitern) wrote : | #10 |
No, it doesn't appear so as the dependencies.tsv on master has the older mgo revision:
gopkg.in/mgo.v2 git dc255bb679efa27
Unlike 1.22 and 1.24 which have the most recent revision:
gopkg.in/mgo.v2 git 01ee097136da162
| Changed in juju-core: | |
| importance: | Critical → High |
| Changed in juju-core: | |
| milestone: | 1.25-alpha1 → 1.25-beta1 |
| Changed in juju-core: | |
| milestone: | 1.25-beta1 → 1.26-alpha1 |
| assignee: | William Reade (fwereade) → nobody |
| Martin Packman (gz) wrote : | #11 |
Master and 1.25 include the mgo version with the changes for this issue. CI has not seen a repeat since the last report.
| Changed in juju-core: | |
| status: | Triaged → Fix Released |
| milestone: | 1.26-alpha1 → none |
| affects: | juju-core → juju |
| Changed in juju-core: | |
| importance: | Undecided → High |
| status: | New → Fix Released |


The earliest record we have of this in CI is pre-1.23-alpha1 so this is not a recent regression, or one we seem to hit very often.
<http:// reports. vapour. ws/releases/ 2327/job/ run-unit- tests-utopic- amd64/attempt/ 1509>