remove-machine --force does not remove lxd container

Bug #1808034 reported by Xav Paice
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Christian Muirhead

Bug Description

When I run `juju remove-machine --force $machine` for an LXD container on a maas deployed host, it doesn't remove the LXD container and leaves it running even though it's gone from the model.

This is a controller with 2.5-rc1 and model 2.5-beta3.

Revision history for this message
Tim Penhey (thumper) wrote :

After investigation we think this was a bug in the 1.25 upgrade process. The LXC containers when converted to LXD containers got new IDs, like juju-machine-1-lxd-0 -> juju-958f87-1-lxd-0, but the instanceID collection wasn't updated to reflect that. So when the LXD provisioning on the machine tries to stop the container, it gets told it doesn't exist.

From the logs:

2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/9 already started as instance "juju-958f87-1-lxd-9"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/0 already started as instance "juju-machine-1-lxd-0"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/1 already started as instance "juju-machine-1-lxd-1"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/2 already started as instance "juju-machine-1-lxd-2"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/3 already started as instance "juju-machine-1-lxd-3"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/4 already started as instance "juju-machine-1-lxd-4"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/5 already started as instance "juju-machine-1-lxd-5"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/6 already started as instance "juju-machine-1-lxd-6"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:282 provisioner-harvest-mode is set to destroyed; unknown instances not stopped [juju-958f87-1-lxd-5 juju-958f87-1-lxd-6 juju-958f87-1-lxd-0 juju-958f87-1-lxd-1 juju-958f87-1-lxd-2 juju-958f87-1-lxd-3 juju-958f87-1-lxd-4]

tags: added: 1.25-upgrade
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Tim Penhey (thumper) wrote :

NOTE: the fix isn't in Juju itself, but in the 1.25 upgrade process.

Revision history for this message
Chris Sanders (chris.sanders) wrote :

Subscribed field-medium

Revision history for this message
Xav Paice (xavpaice) wrote :

When upgrading from Juju 1 to Juju 2 on a cloud that hosts LXD containers, the upgrade process updates the instanceid field in the instanceData collection incorrectly. You need to fix this if you want to be able to remove a unit that is an LXD container. This can be corrected as follows:

Get a connection to the Juju database, this is a handy script to do so:

##################

#!/bin/bash

machine=${1:-0}
model=${2:-controller}

read -d '' -r cmds <<'EOF'
conf=/var/lib/juju/agents/machine-*/agent.conf
user=`sudo grep tag $conf | cut -d' ' -f2`
password=`sudo grep statepassword $conf | cut -d' ' -f2`
/usr/lib/juju/mongo*/bin/mongo 127.0.0.1:37017/juju --authenticationDatabase admin --ssl --sslAllowInvalidCertificates --username "$user" --password "$password"
EOF

juju ssh -m $model $machine "$cmds"

##################

Once in the database:

juju:PRIMARY> db.instanceData.find( {'instanceid': { $regex: /juju-machine.*/ }})

If that returns a bunch of records like the below, that confirms that you have this issue:
{ "_id" : "371812c9-f9e2-4da0-8eec-326a53958f87:0/lxd/0", "machineid" : "0/lxd/0", "instanceid" : "juju-machine-0-lxd-0", "model-uuid" : "371812c9-f9e2-4da0-8eec-326a53958f87", "arch" : "amd64", "txn-revno" : NumberLong(2), "txn-queue" : [ ] }

Confirm that the instanceid's can be changed correctly:

juju:PRIMARY> db.instanceData.find( {'instanceid': { $regex: /juju-machine.*/ }}).forEach( function(doc) { var modelSuffix = doc["model-uuid"].substr(30,6); var newID = "juju-" + modelSuffix + doc["instanceid"].substr(12); print(newID); })

juju-958f87-5-lxd-5
juju-958f87-5-lxd-6
juju-958f87-12-lxd-0

Check that the names output match the names in 'lxc list' on the host machines, e.g. "juju ssh 5 'sudo lxc list'". If we're all good:

Make a Juju backup. Store it locally, rather than just within Juju.

Stop the controller process on all the controller machines (i.e. service named jujud-machine-0 on the controller model machine 0).

Run the following to run the update:

juju:PRIMARY> db.instanceData.find( {'instanceid': { $regex: /juju-machine.*/ }}).forEach(
    function(doc) {
        var modelSuffix = doc["model-uuid"].substr(30,6);
        var newID = "juju-" + modelSuffix + doc["instanceid"].substr(12);
        print("updating "+ doc._id + " instanceid to " + newID);
        db.instanceData.update({_id: doc._id}, {$set: {"instanceid": newID}})
    })

Start controller services again.

Revision history for this message
Christian Muirhead (2-xtian) wrote :
Ian Booth (wallyworld)
Changed in juju:
assignee: nobody → Christian Muirhead (2-xtian)
status: Triaged → Fix Committed
Harry Pidcock (hpidcock)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.