Juju messes up when terminating AWS instances
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
Looks like there is some basic "order of actions" bug in Juju when it is trying to terminate multiple AWS instances. I have seen this happen with both destroy-model command and remove-unit command. It seems that the instance gets terminated before juju marks the machine as stopped (I can observe the instance being terminated in AWS console and the machine is marked as "started" in juju status) resulting in juju repeatedly trying to communicate with a dead instance. As a result shutting down even a single instance takes a long time, since juju does a lot of retries.
There are a few specifics to my setup that should be noted. I am using an existing VPC, so I have bootstrapped my controller via vpc-id-force=true. I have set multiple spaces (two actually, public and private) and my machines are spread between them (this does not seem to make any difference though, the issue I am describing seems to happen to machines in either space). Not sure if this matters or not, but I am using "instance-type" constraints. Juju version is 2.3.7 on both controller and the model.
The model I am using is as follows, 1 machine is a t2.small that runs easyrsa and kubernetes-master and 3 machines are t2.large running 3 units of etcd and 3 units of kubernetes-worker. And everything is tied together with flannel. Latest charms from "containers" for everything.
This should be fairly easy to replicate, but once I am done bringing my cluster back up, I will try to create a minimal repeatable setup for this issue.
description: | updated |
I am moving this bug under "juju" project for triaging. "juju-core" is dedicated to Juju 1.x series exclusively.