Canonical Juju

juju doesn't close connections to machines it thinks are dead

Bug #1698187 reported by John A Meinel on 2017-06-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Triaged	Low	Unassigned

Bug Description

We had a large deployment (~250 machines on MAAS) that was killed all at once with "juju destroy-model". Looking at the output of that command, it slowly reaps a bunch of applications, and then finally starts marking machines as going away, and then marks the model as dead.

Looking at the database, there are no more machine records, nor a model record.

However, it seems that something went wrong wrt actually shutting those machines down (MAAS saw that ~200 machines were still running).

Watching the API, we were seeing several APIs getting hammered:
LeadershipClaimLeadership
Metrics.WatchMeterStatus
RetryStrategy.RetryStrategy
Upgrader.SetTools

The last one is the first thing that the Upgrader worker does when it starts up. So the belief is that we had an active API connection, the associated Machine/Unit records were already deleted, so the Upgrader was getting an error trying to SetTools for a machine that no longer existed. That worker was then bouncing and restarting, and doing it again.

However, if we had actually dropped the TCP connection, we would have expected all of those agents to be trying to Login as machines that no longer exist, so we wouldn't expect Worker APIs to be called, but instead Login apis.

This hints that we aren't closing active connections to machines that we have otherwise treated as Dead. Likely those connections cannot do much, because we should check the auth for most requests and find that they aren't the owner of anything because the machine they think they are exists no more. But it does seem cleaner to get rid of them.

Tags:

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2019-05-07:

Do you have a repro scenario? Could you easily test it on the latest Juju version?

Since there have a been a few changes in the area since the original report was filed, I wonder if it is still relevant...

I'll mark this as Incomplete until we get a confirmation.

Changed in juju:
status:	Triaged → Incomplete

Revision history for this message

John A Meinel (jameinel) wrote on 2019-05-08: Re: [Bug 1698187] Re: juju doesn't close connections to machines it thinks are dead

This seems like a case where destroy-machine --force gets you into trouble
because it doesn't wait for an ack from the machine agent before it starts
removing machine records. I would have thought we would still ensure that
machines instances are destroyed.

I don't have a particular reproduction mechanism.

On Tue, May 7, 2019, 09:00 Anastasia <email address hidden>
wrote:

> Do you have a repro scenario? Could you easily test it on the latest
> Juju version?
>
> Since there have a been a few changes in the area since the original
> report was filed, I wonder if it is still relevant...
>
> I'll mark this as Incomplete until we get a confirmation.
>
> ** Changed in: juju
> Status: Triaged => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1698187
>
> Title:
> juju doesn't close connections to machines it thinks are dead
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1698187/+subscriptions
>

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2019-05-08:

I see. I'll try to reproduce to confirm if it is still an issue... maybe get some metric of performance for benchmarking.

Changed in juju:
status:	Incomplete → Triaged

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2022-11-03:

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance:	Medium → Low
tags:	added: expirebugs-bot

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.