Activity log for bug #1768064

Date Who What changed Old value New value Message
2018-04-30 15:38:51 Dmitriy Kropivnitskiy bug added bug
2018-04-30 15:39:30 Dmitriy Kropivnitskiy description Looks like there is some basic "order of actions" bug in Juju where when it is trying to terminate multiple AWS instances. I have seen this happen with both destroy-model command and remove-unit command. It seems that the instance gets terminated before juju marks the machine as stopped (I can observe the instance being terminated in AWS console and the machine is marked as "started" in juju status) resulting in juju repeatedly trying to communicate with a dead instance. As a result shutting down even a single instance takes a long time, since juju does a lot of retries. There are a few specifics to my setup that should be noted. I am using an existing VPC, so I have bootstrapped my controller via vpc-id-force=true. I have set multiple spaces (two actually, public and private) and my machines are spread between them (this does not seem to make any difference though, the issue I am describing seems to happen to machines in either space). Not sure if this matters or not, but I am using "instance-type" constraints. Juju version is 2.3.7 on both controller and the model. The model I am using is as follows, 1 machine is a t2.small that runs easyrsa and kubernetes-master and 3 machines are t2.large running 3 units of etcd and 3 units of kubernetes-worker. And everything is tied together with flannel. Latest charms from "containers" for everything. This should be fairly easy to replicate, but once I am done bringing my cluster back up, I will try to create a minimal repeatable setup for this issue. Looks like there is some basic "order of actions" bug in Juju when it is trying to terminate multiple AWS instances. I have seen this happen with both destroy-model command and remove-unit command. It seems that the instance gets terminated before juju marks the machine as stopped (I can observe the instance being terminated in AWS console and the machine is marked as "started" in juju status) resulting in juju repeatedly trying to communicate with a dead instance. As a result shutting down even a single instance takes a long time, since juju does a lot of retries. There are a few specifics to my setup that should be noted. I am using an existing VPC, so I have bootstrapped my controller via vpc-id-force=true. I have set multiple spaces (two actually, public and private) and my machines are spread between them (this does not seem to make any difference though, the issue I am describing seems to happen to machines in either space). Not sure if this matters or not, but I am using "instance-type" constraints. Juju version is 2.3.7 on both controller and the model. The model I am using is as follows, 1 machine is a t2.small that runs easyrsa and kubernetes-master and 3 machines are t2.large running 3 units of etcd and 3 units of kubernetes-worker. And everything is tied together with flannel. Latest charms from "containers" for everything. This should be fairly easy to replicate, but once I am done bringing my cluster back up, I will try to create a minimal repeatable setup for this issue.
2018-07-09 23:31:38 Anastasia bug task added juju
2018-07-09 23:31:43 Anastasia bug task deleted juju-core
2018-07-10 12:14:37 Anastasia juju: status New Triaged
2018-07-10 12:14:40 Anastasia juju: importance Undecided Medium
2018-07-10 12:14:47 Anastasia tags usability
2022-11-03 16:51:45 Canonical Juju QA Bot juju: importance Medium Low
2022-11-03 16:51:46 Canonical Juju QA Bot tags usability expirebugs-bot usability