remove-unit on last unit doesn't always remove machine
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
Medium
|
Unassigned |
Bug Description
As seen here:
http://
If all lxc containers are removed from a machine, a subsequent remove-unit may or may not delete the machine. This appears to depend on whether the containers were completely removed when remove-unit was invoked.
I've attached a script to reproduce the issue. By default, it fails to remove machine 0. If WAIT_LXD is set to "true", it waits until the container is removed before trying to remove the machine, and succeeds.
Example failure:
$ juju add-model container-fun7; ./containers_
+ juju deploy ubuntu
Located charm "cs:ubuntu-10".
Deploying charm "cs:ubuntu-10".
+ sleep 1
+ juju add-machine lxd:0
created container 0/lxd/0
+ juju remove-machine 0/lxd/0
+ echo 'Waiting for removal of machine 0/lxd/0'
Waiting for removal of machine 0/lxd/0
+ '[' false == true ']'
+ juju remove-unit ubuntu/0
+ echo 'Waiting for removal of machine 0.'
Waiting for removal of machine 0.
+ wait_for_null '.machines."0"'
++ date +%s
+ deadline=1491590952
+ set +x
.......
FAILURE: machine 0 was not removed.
Model Controller Cloud/Region Version
container-fun7 container-fun2 aws/us-west-1 2.1.2
App Version Status Scale Charm Store Rev OS Notes
ubuntu waiting 0 ubuntu jujucharms 10 ubuntu
Unit Workload Agent Machine Public address Ports Message
Machine State DNS Inst id Series AZ
0 started 54.193.18.186 i-0ca37bfbfd450a1de xenial us-west-1b
Example success:
$ juju add-model container-fun8; WAIT_LXD=true containers_
Using credential 'credentials' cached in controller
Added 'container-fun8' model on aws/us-west-1 with credential 'credentials' for user 'admin'
+ juju deploy ubuntu
Located charm "cs:ubuntu-10".
Deploying charm "cs:ubuntu-10".
+ sleep 1
+ juju add-machine lxd:0
created container 0/lxd/0
+ juju remove-machine 0/lxd/0
+ echo 'Waiting for removal of machine 0/lxd/0'
Waiting for removal of machine 0/lxd/0
+ '[' true == true ']'
+ wait_for_null '.machines.
++ date +%s
+ deadline=1491591412
+ set +x
.......
Query .machines.
Waiting for removal of machine 0.
.......
Query .machines."0" went null.
SUCCESS: machine 0 was removed.
tags: | added: teardown |
Changed in juju: | |
status: | Fix Committed → Fix Released |
So it doesn't seem surprising that if you have a container on a machine
removing a unit from the machine should not destroy the machine. It also
seems to follow that if you deployed just a container and then remove that
container you don't want to destroy that machine (you're quite likely to
want to create another container on it). Really the only time remove unit
seems very good for removing the machine is if the machine was provisioned
explicitly for that unit and only that unit.
remove-machine does exist as does remove-machine --force if you wanted to
cascade delete everything on the machine.
I'm open to other feedback but having remove-machine kill the host machine
if it is the last container doesn't feel like the right answer. (I'm not
sure that killing the host machine when removing the last unit when it's
been used for containers is the right thing either, TBH.)
John
=:->
On Apr 7, 2017 11:05 PM, "Aaron Bentley" <email address hidden>
wrote:
> Public bug reported: reports. vapour. ws/releases/ 5102/job/ hammer- time-gce- and_units. bash ....... ....... ....... ....... ....... ....... ....... .... ....... ....... ....... ....... ....... ....... ....... .... ....... ....... ....... ....... ....... ....... ....... .... and_units. bash
>
> As seen here:
> http://
> xenial/attempt/19
>
> If all lxc containers are removed from a machine, a subsequent remove-
> unit may or may not delete the machine. This appears to depend on
> whether the containers were completely removed when remove-unit was
> invoked.
>
> I've attached a script to reproduce the issue. By default, it fails to
> remove machine 0. If WAIT_LXD is set to "true", it waits until the
> container is removed before trying to remove the machine, and succeeds.
>
> Example failure:
> $ juju add-model container-fun7; ./containers_
> + juju deploy ubuntu
> Located charm "cs:ubuntu-10".
> Deploying charm "cs:ubuntu-10".
> + sleep 1
> + juju add-machine lxd:0
> created container 0/lxd/0
> + juju remove-machine 0/lxd/0
> + echo 'Waiting for removal of machine 0/lxd/0'
> Waiting for removal of machine 0/lxd/0
> + '[' false == true ']'
> + juju remove-unit ubuntu/0
> + echo 'Waiting for removal of machine 0.'
> Waiting for removal of machine 0.
> + wait_for_null '.machines."0"'
> ++ date +%s
> + deadline=1491590952
> + set +x
> .......
> .......
> .......
> FAILURE: machine 0 was not removed.
> Model Controller Cloud/Region Version
> container-fun7 container-fun2 aws/us-west-1 2.1.2
>
> App Version Status Scale Charm Store Rev OS Notes
> ubuntu waiting 0 ubuntu jujucharms 10 ubuntu
>
> Unit Workload Agent Machine Public address Ports Message
>
> Machine State DNS Inst id Series AZ
> 0 started 54.193.18.186 i-0ca37bfbfd450a1de xenial us-west-1b
>
> Example success:
> $ juju add-model container-fun8; WAIT_LXD=true containers_
> Using credential 'credentials' cached in controller
> Added 'container-fun8' model on aws/us-west-1 with credential
> 'credentials' for user 'admin'
> + juju deploy ubuntu
> Located charm "cs:ubuntu-10".
> Deploying charm "cs:ubuntu-10".
> + sleep 1
> + juju add-machine lxd:0
> cre...