machine continuously retries upgrading agent tools when disk is full

Bug #1807717 reported by Jamon Camisso
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

Logging was added for disk full situations over in LP:1782367. A unit that failed because of disk being full shows this in its machine log as expected:

2018-12-10 14:15:13 ERROR juju.worker.upgrader upgrader.go:223 failed to fetch agent binaries from "https://1.2.3.4:17070/model/d1e28ad1-4b1e-462d-8977-e7feb4c5c694/tools/2.4.7-xenial-amd64": cannot unpack agent binaries: write /tmp/tools-tar358695034: no space left on device

However, the controller logs show this cryptic message:

2018-12-10 13:39:37 ERROR juju.apiserver tools.go:89 failed to send agent binaries: write tcp 1.2.3.4:17070->5.6.7.8:48590: write: broken pipe

The failing fetch appears to keep retrying despite detecting the out of space condition on the unit itself. The controller shows many retries:

grep -c 'failed to send agent binaries: write tcp 1.2.3.4:17070->5.6.7.8' /var/log/juju/machine-0.log
209145

So despite detecting out of space conditions on the unit/machine, everything keeps retrying and chews up controller resources.

Could the agent be made to not retry when it determines there isn't enough free space? Or retry in such a way that it backs off, or just fetches metadata about the new tools version, versus trying to download and then discovering it is out of space?

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.5.1
Tim Penhey (thumper)
tags: added: logging upgrade-juju ux
tags: added: canonical-is
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.5.1 → 2.5.2
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1807717] Re: machine continuously retries upgrading agent tools when disk is full

FWIW, I believe all workers now backoff when they fail to operate. So it
will still continue to try, but it will try less frequently the more
failures it encounters.

On Tue, Jan 29, 2019 at 2:15 AM Ian Booth <email address hidden> wrote:

> ** Changed in: juju
> Milestone: 2.5.1 => 2.5.2
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1807717
>
> Title:
> machine continuously retries upgrading agent tools when disk is full
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1807717/+subscriptions
>

Changed in juju:
milestone: 2.5.2 → 2.5.3
Changed in juju:
milestone: 2.5.3 → 2.5.4
Changed in juju:
milestone: 2.5.4 → 2.5.5
Changed in juju:
milestone: 2.5.6 → 2.5.8
Changed in juju:
milestone: 2.5.8 → 2.5.9
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Removing from a milestone as this work will not be done in 2.5 series.

Changed in juju:
milestone: 2.5.9 → none
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.