Canonical Juju

Lost action(s) causing juju run to hang?

Bug #1902727 reported by Jake Hill on 2020-11-03

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Ian Booth	Canonical Juju 2.8.7

Bug Description

Similar to https://bugs.launchpad.net/juju/+bug/1729880 and https://bugs.launchpad.net/juju/+bug/1742175, but here I am running 2.8.5.

I was experimenting with scripted series upgrades (using juju run ...) and running stuff like

watch juju run --machine <list> --timeout 5s -- uptime

Gradually some of the machines are timing out. It seems that new run actions don't execute at all.

I find the respective machine log is full of;

2020-11-03 13:50:45 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available
2020-11-03 13:52:47 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available
2020-11-03 13:54:58 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available
2020-11-03 13:57:04 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available

Revision history for this message

Pen Gale (pengale) wrote on 2020-11-03:

#1

Are the machines in question otherwise responsive? Can you do juju status and see them, juju ssh into them, etc?

If so, then it sounds like we still have a bug in the way that actions are handled. If not, you might be running into a different issue in your cloud, which is manifesting as an issue with actions.

(I've been running a test against my localhost cloud, and have not been able to reproduce the issue. I've only been running the test for a couple of hours, though, and only against one machine.)

Changed in juju:
status:	New → Incomplete

Revision history for this message

Jake Hill (routergod) wrote on 2020-11-05:

#2

Thanks for looking. Yes, the machines seem ok otherwise.

(osc) routergod@juju:~$ juju run --machine 40 --timeout 10s -- uptime
ERROR timed out waiting for result from: machine 40
(osc) routergod@juju:~$ juju ssh 40 -- uptime 2>/dev/null
09:55:30 up 23 days, 19:31, 1 user, load average: 9.94, 15.79, 17.18

Pen Gale (pengale) on 2020-11-05

Changed in juju:
status:	Incomplete → Triaged
status:	Triaged → New

Revision history for this message

John A Meinel (jameinel) wrote on 2020-11-05: Re: [Bug 1902727] Re: Lost action(s) causing juju run to hang?

#3

I know that we eventually expire actions, though I thought we wouldn't
expire actions that are not completed. It may just be that enough actions
got queued up that action pressure caused us to remove some from the DB,
but we certainly shouldn't be pruning actions that haven't been run yet.

On Thu, Nov 5, 2020 at 4:35 PM Pete Vander Giessen <
<email address hidden>> wrote:

> ** Changed in: juju
> Status: Incomplete => Triaged
>
> ** Changed in: juju
> Status: Triaged => New
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1902727
>
> Title:
> Lost action(s) causing juju run to hang?
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1902727/+subscriptions
>

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-11-12:

#4

When we prune by age, we do avoid pruning any actions that have not completed.
However, when we prune by size, we do prune actions regardless of completed status, the oldest first, to get the collection size back under the limit.

The size limit defaults to 5G and is set by the model config max-action-results-size.
It is surprising that the 5G limit would be exceeded, unless there are some actions which have run and which have generated a lot of output.

But I do agree that when pruning by size, we should not start with only completed actions.

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-11-12:

#5

That should read:

But I do agree that when pruning by size, we should start with only completed actions.

Changed in juju:
milestone:	none → 2.8.7
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-11-12:

#6

https://github.com/juju/juju/pull/12305

Changed in juju:
assignee:	nobody → Ian Booth (wallyworld)
status:	Triaged → In Progress

Ian Booth (wallyworld) on 2020-11-12

Changed in juju:
status:	In Progress → Fix Committed

Canonical Juju QA Bot (juju-qa-bot) on 2020-12-14

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.