backup-restore failed creating collection EOF

Bug #1605653 reported by Curtis Hovey
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Tim Penhey
juju-ci-tools
Fix Released
High
Curtis Hovey

Bug Description

As seen in
    http://reports.vapour.ws/releases/issue/57922732749a5624aac9f7b8

backup-restore failed creating collection juju.blockdevices

22T12:59:33.908+0000 Failed: juju.blockdevices: error creating collection juju.blockdevices: error running create command: EOF;

or

Failed: presence.presence.pings: error creating collection presence.presence.pings: error running create command: EOF;

Curtis Hovey (sinzui)
tags: added: xenial
Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Critical
tags: added: blocker
summary: - backup-restore failed creating collection juju.blockdevices
+ backup-restore failed creating collection EOF
description: updated
Curtis Hovey (sinzui)
Changed in juju-core:
assignee: nobody → William Reade (fwereade)
tags: removed: blocker
Revision history for this message
Alexis Bruemmer (alexis-bruemmer) wrote :

William, this may be related to lp:1604641

William Reade (fwereade)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
William Reade (fwereade) wrote :

I don't think it is related; and the closest I can see is https://jira.mongodb.org/browse/TOOLS-939 -- which presents slightly differently, but the workaround is something we should be able to use anyway.

I haven't reproed it, and there might well be something about the backed-up collection sizes that triggers it; do we have the backup artifact from a failed run, by any chance? an unexpectedly huge backup may be pointing to another issue...

http://reviews.vapour.ws/r/5300/

William Reade (fwereade)
Changed in juju-core:
status: In Progress → Incomplete
Revision history for this message
Curtis Hovey (sinzui) wrote :

This issue can also happen dropping a collection.

Revision history for this message
William Reade (fwereade) wrote :

Above PR has landed; so this *might* be fixed on master. (But if we see it again, it's definitely not...)

Revision history for this message
Curtis Hovey (sinzui) wrote :

The EOF is still seen restoring to a xenial host

Changed in juju-core:
status: Incomplete → Triaged
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta14 → 2.0-beta15
Changed in juju-core:
assignee: William Reade (fwereade) → Alexis Bruemmer (alexis-bruemmer)
Changed in juju-core:
milestone: 2.0-beta15 → 2.0-beta16
Changed in juju-core:
assignee: Alexis Bruemmer (alexis-bruemmer) → nobody
assignee: nobody → Christian Muirhead (2-xtian)
Revision history for this message
Christian Muirhead (2-xtian) wrote :

Are we still seeing this happening? On http://reports.vapour.ws/releases/issue/57922732749a5624aac9f7b8 the last occurrence is 2016-07-27, which is 3 weeks ago. Are instances not being added to the page, or is it being seen in some other context? It was happening multiple times a day before then.

That said, that last failure is definitely from a revision after Will's commit adding the batch size.

Changed in juju-core:
status: Triaged → Incomplete
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta16 → none
milestone: none → 2.0-beta16
Revision history for this message
Anastasia (anastasia-macmood) wrote :

The fix went in

Revision history for this message
Christopher Lee (veebers) wrote :

https://bugs.launchpad.net/juju/+bug/1606308 is blocking all restore tests so we're not getting a run of the test that will exercise this bug.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Christian
As per Christopher's comment - the bug still exists. The test results you've looked at are not reliable as this tests have not run recently.

Changed in juju:
status: Incomplete → Triaged
Revision history for this message
Christian Muirhead (2-xtian) wrote :

Ok, sorry. I'm looking at this again today.

Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta16 → 2.0-beta17
Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

I've updated the issue with to better catch examples of this type of failure that are very close to the original EOF. It's still happening now in CI; the issue page is recording the new failures better. Have a new look.

Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

Also, there's been a couple instances of rather similar restore failures;

http://reports.vapour.ws/releases/issue/57c72994749a56206988839a
http://reports.vapour.ws/releases/issue/57c72a21749a56206988839b

For now I'm assuming this bug covers it. We'll watch and open a new bug if we feel it's a new strain of error not covered here.

Changed in juju:
assignee: Christian Muirhead (2-xtian) → Horacio Durán (hduran-8)
status: Triaged → In Progress
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta17 → 2.0-beta18
Revision history for this message
Horacio Durán (hduran-8) wrote :
Changed in juju:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Curtis Hovey (sinzui) wrote :

This bug is still seen by CI. IT is not fixed yet. Manual testing shows a restore can succeed. We are investigating if the test script introduces timing or network issues.

Changed in juju:
status: Fix Released → Incomplete
Changed in juju-ci-tools:
status: New → Triaged
importance: Undecided → High
Changed in juju:
milestone: 2.0-beta18 → 2.0-rc1
Changed in juju:
assignee: Horacio Durán (hduran-8) → nobody
Revision history for this message
Curtis Hovey (sinzui) wrote :

The assess_recovery script is updated. It verifies that the hosted model is operational when restore reports it is successful. This bug is still observed. As the error is happening within a single juju command (juju restore-backup) there is nothing the test can change to mitigate that restore on xenial with mongo 3.x fails, unlike restore of trusty with mongio 2.4. The test passes when we use an older juju 2.

Manual tests of a restore of xenial have been observed to report a pass, but that leads to the next bug, juju thinks it is upgrading, which is impossible. If juju were to think it was upgrading during a restore, EOF my might be a symptom.

See bug 1606265 (juju thinks it is upgrade, but cannot possibly be upgrading) and bug 1625258 (juju is not permitted to auto upgrade, but it thinks it is).

Revision history for this message
Tim Penhey (thumper) wrote :

Curtis, this may have been fixed by my latest restore branch. Can we keep an eye on it?

Changed in juju:
milestone: 2.0-rc1 → 2.0-rc2
Tim Penhey (thumper)
Changed in juju:
assignee: nobody → Tim Penhey (thumper)
status: Incomplete → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
Changed in juju-ci-tools:
assignee: nobody → Curtis Hovey (sinzui)
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.