Canonical Juju

Juju messes up when terminating AWS instances

Bug #1768064 reported by Dmitriy Kropivnitskiy on 2018-04-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Triaged	Low	Unassigned

Bug Description

Looks like there is some basic "order of actions" bug in Juju when it is trying to terminate multiple AWS instances. I have seen this happen with both destroy-model command and remove-unit command. It seems that the instance gets terminated before juju marks the machine as stopped (I can observe the instance being terminated in AWS console and the machine is marked as "started" in juju status) resulting in juju repeatedly trying to communicate with a dead instance. As a result shutting down even a single instance takes a long time, since juju does a lot of retries.

There are a few specifics to my setup that should be noted. I am using an existing VPC, so I have bootstrapped my controller via vpc-id-force=true. I have set multiple spaces (two actually, public and private) and my machines are spread between them (this does not seem to make any difference though, the issue I am describing seems to happen to machines in either space). Not sure if this matters or not, but I am using "instance-type" constraints. Juju version is 2.3.7 on both controller and the model.

The model I am using is as follows, 1 machine is a t2.small that runs easyrsa and kubernetes-master and 3 machines are t2.large running 3 units of etcd and 3 units of kubernetes-worker. And everything is tied together with flannel. Latest charms from "containers" for everything.

This should be fairly easy to replicate, but once I am done bringing my cluster back up, I will try to create a minimal repeatable setup for this issue.

See original description

Tags:

Dmitriy Kropivnitskiy (nigde) on 2018-04-30

description:

updated

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2018-07-09:

I am moving this bug under "juju" project for triaging. "juju-core" is dedicated to Juju 1.x series exclusively.

no longer affects:

juju-core

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2018-07-10:

I had a closer look at your description and it is by design that Juju first terminates cloud instance and then will eventually mark the machine as terminated in Juju.

I do, however, agree that the lag and the re-tires are not necessarily when we can deterministically decide that the machine needs to be mark as stopped.

We have recently introduced a way to allow providers to use a predefined set of callbacks that are relevant to the context in which cloud call is being made. This seems to be a perfect case where we need to have a callback added to allow machines to be marked as stopped on a successful instance termination.

Changed in juju:
status:	New → Triaged
importance:	Undecided → Medium
tags:	added: usability

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2022-11-03:

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance:	Medium → Low
tags:	added: expirebugs-bot

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.