Environment destroy can miss manual machines

Bug #1475212 reported by Jesse Meek
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju-ci-tools
Fix Released
Critical
Christopher Lee
juju-core
Invalid
Undecided
Unassigned
1.25
Won't Fix
Undecided
Unassigned

Bug Description

We need to be able to assert the absence of these as part of the destroy txn in state/environ.go, but in order to do this manual machines need to add refcounts to their environments - such that they can be handled in a similar fashion to hosted environments.

Curtis Hovey (sinzui)
tags: added: manual-provider
tags: added: destroy-environment
tags: added: tech-debt
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25.0 → 1.25.1
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I think we'll change storage so that it doesn't need to be handled here. Persistent storage will be destroyed via the storage provisioner as per usual. There was a gap that has just been filled that meant persistent storage would be leaked by destroy-env; this is no longer true, so we can relax the rules of env destruction.

Andrew Wilkins (axwalk)
summary: - Environment destroy can miss manual machines and persistent
- volumes
+ Environment destroy can miss manual machines
description: updated
Changed in juju-core:
milestone: 1.25.1 → none
Aaron Bentley (abentley)
Changed in juju-core:
milestone: none → 1.26-alpha2
Changed in juju-core:
milestone: 1.26-alpha2 → 1.26.0
Changed in juju-core:
milestone: 1.26.0 → none
Curtis Hovey (sinzui)
tags: added: manual-story
Changed in juju-core:
milestone: none → 2.0.0
Revision history for this message
Andrew Wilkins (axwalk) wrote :

We've been refcounting machines for a little while now, so we can now automatically destroy manual machines.

Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
milestone: 2.0.0 → 2.0-beta15
Changed in juju-core:
milestone: 2.0-beta15 → 2.0-beta16
Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → In Progress
Revision history for this message
Curtis Hovey (sinzui) wrote :

Sorry, issue http://reports.vapour.ws/releases/issue/57b1c1f9749a567693457040 shows that the situation is now worse. We see upstart and systemd hosts have a juju2 left running. The console logs now show which more information about which processes and directories were left behind.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've done a bunch of tests, and the agents always uninstall. It looks like there's a delay, though. I'll dig in and see if we can cut that out.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Actually there's not even that much of a delay - only 10 seconds between the signal being delivered, and the agent being uninstalled...

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Finally was able to repro, using the same CI test. It looks like there might be a race which is causing the agent to skip the uninstall logic sometimes.

Revision history for this message
Andrew Wilkins (axwalk) wrote :
Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta16 → none
milestone: none → 2.0-beta16
Revision history for this message
Aaron Bentley (abentley) wrote :

We are still seeing this as of 95d0fd04

Revision history for this message
Aaron Bentley (abentley) wrote :
Changed in juju:
status: Fix Committed → Triaged
tags: added: blocker
Changed in juju:
importance: High → Critical
Tim Penhey (thumper)
Changed in juju:
assignee: Andrew Wilkins (axwalk) → Tim Penhey (thumper)
status: Triaged → In Progress
Revision history for this message
Aaron Bentley (abentley) wrote :

AIUI, this is won't-fix for 1.25

no longer affects: juju-core
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta16 → 2.0-beta17
tags: removed: blocker
Revision history for this message
Tim Penhey (thumper) wrote :

The tests were failing due to stale CI lxd images.

They were running pre-release LXD.

update and upgrade fixed amd64 tests.

All other long standing CI lxd containers need to be updated.

affects: juju → juju-ci-tools
Changed in juju-ci-tools:
milestone: 2.0-beta17 → none
status: In Progress → Triaged
assignee: Tim Penhey (thumper) → nobody
Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Invalid
Revision history for this message
Curtis Hovey (sinzui) wrote :

All manual tests are updating and upgrading the manual container before bootstrapping Juju. any other failures are now Juju's fault.

Changed in juju-ci-tools:
assignee: nobody → Christopher Lee (veebers)
status: Triaged → Fix Released
no longer affects: juju
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.