remove-machine --force does not remove lxd container

Bug #1808034 reported by Xav Paice on 2018-12-11
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
Medium
Christian Muirhead

Bug Description

When I run `juju remove-machine --force $machine` for an LXD container on a maas deployed host, it doesn't remove the LXD container and leaves it running even though it's gone from the model.

This is a controller with 2.5-rc1 and model 2.5-beta3.

Tim Penhey (thumper) wrote :

After investigation we think this was a bug in the 1.25 upgrade process. The LXC containers when converted to LXD containers got new IDs, like juju-machine-1-lxd-0 -> juju-958f87-1-lxd-0, but the instanceID collection wasn't updated to reflect that. So when the LXD provisioning on the machine tries to stop the container, it gets told it doesn't exist.

From the logs:

2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/9 already started as instance "juju-958f87-1-lxd-9"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/0 already started as instance "juju-machine-1-lxd-0"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/1 already started as instance "juju-machine-1-lxd-1"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/2 already started as instance "juju-machine-1-lxd-2"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/3 already started as instance "juju-machine-1-lxd-3"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/4 already started as instance "juju-machine-1-lxd-4"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/5 already started as instance "juju-machine-1-lxd-5"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:565 machine 1/lxd/6 already started as instance "juju-machine-1-lxd-6"
2018-12-09 21:07:12 INFO juju.provisioner provisioner_task.go:282 provisioner-harvest-mode is set to destroyed; unknown instances not stopped [juju-958f87-1-lxd-5 juju-958f87-1-lxd-6 juju-958f87-1-lxd-0 juju-958f87-1-lxd-1 juju-958f87-1-lxd-2 juju-958f87-1-lxd-3 juju-958f87-1-lxd-4]

tags: added: 1.25-upgrade
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Tim Penhey (thumper) wrote :

NOTE: the fix isn't in Juju itself, but in the 1.25 upgrade process.

Chris Sanders (chris.sanders) wrote :

Subscribed field-medium

Xav Paice (xavpaice) wrote :

When upgrading from Juju 1 to Juju 2 on a cloud that hosts LXD containers, the upgrade process updates the instanceid field in the instanceData collection incorrectly. You need to fix this if you want to be able to remove a unit that is an LXD container. This can be corrected as follows:

Get a connection to the Juju database, this is a handy script to do so:

##################

#!/bin/bash

machine=${1:-0}
model=${2:-controller}

read -d '' -r cmds <<'EOF'
conf=/var/lib/juju/agents/machine-*/agent.conf
user=`sudo grep tag $conf | cut -d' ' -f2`
password=`sudo grep statepassword $conf | cut -d' ' -f2`
/usr/lib/juju/mongo*/bin/mongo 127.0.0.1:37017/juju --authenticationDatabase admin --ssl --sslAllowInvalidCertificates --username "$user" --password "$password"
EOF

juju ssh -m $model $machine "$cmds"

##################

Once in the database:

juju:PRIMARY> db.instanceData.find( {'instanceid': { $regex: /juju-machine.*/ }})

If that returns a bunch of records like the below, that confirms that you have this issue:
{ "_id" : "371812c9-f9e2-4da0-8eec-326a53958f87:0/lxd/0", "machineid" : "0/lxd/0", "instanceid" : "juju-machine-0-lxd-0", "model-uuid" : "371812c9-f9e2-4da0-8eec-326a53958f87", "arch" : "amd64", "txn-revno" : NumberLong(2), "txn-queue" : [ ] }

Confirm that the instanceid's can be changed correctly:

juju:PRIMARY> db.instanceData.find( {'instanceid': { $regex: /juju-machine.*/ }}).forEach( function(doc) { var modelSuffix = doc["model-uuid"].substr(30,6); var newID = "juju-" + modelSuffix + doc["instanceid"].substr(12); print(newID); })

juju-958f87-5-lxd-5
juju-958f87-5-lxd-6
juju-958f87-12-lxd-0

Check that the names output match the names in 'lxc list' on the host machines, e.g. "juju ssh 5 'sudo lxc list'". If we're all good:

Make a Juju backup. Store it locally, rather than just within Juju.

Stop the controller process on all the controller machines (i.e. service named jujud-machine-0 on the controller model machine 0).

Run the following to run the update:

juju:PRIMARY> db.instanceData.find( {'instanceid': { $regex: /juju-machine.*/ }}).forEach(
    function(doc) {
        var modelSuffix = doc["model-uuid"].substr(30,6);
        var newID = "juju-" + modelSuffix + doc["instanceid"].substr(12);
        print("updating "+ doc._id + " instanceid to " + newID);
        db.instanceData.update({_id: doc._id}, {$set: {"instanceid": newID}})
    })

Start controller services again.

Ian Booth (wallyworld) on 2019-02-19
Changed in juju:
assignee: nobody → Christian Muirhead (2-xtian)
status: Triaged → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers