Lost action(s) causing juju run to hang?

Bug #1902727 reported by Jake Hill
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

Similar to https://bugs.launchpad.net/juju/+bug/1729880 and https://bugs.launchpad.net/juju/+bug/1742175, but here I am running 2.8.5.

I was experimenting with scripted series upgrades (using juju run ...) and running stuff like

  watch juju run --machine <list> --timeout 5s -- uptime

Gradually some of the machines are timing out. It seems that new run actions don't execute at all.

I find the respective machine log is full of;

2020-11-03 13:50:45 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available
2020-11-03 13:52:47 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available
2020-11-03 13:54:58 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available
2020-11-03 13:57:04 ERROR juju.worker.dependency engine.go:671 "machine-action-runner" manifold worker returned unexpected error: could not retrieve action 32366: action no longer available

Revision history for this message
Pen Gale (pengale) wrote :

Are the machines in question otherwise responsive? Can you do juju status and see them, juju ssh into them, etc?

If so, then it sounds like we still have a bug in the way that actions are handled. If not, you might be running into a different issue in your cloud, which is manifesting as an issue with actions.

(I've been running a test against my localhost cloud, and have not been able to reproduce the issue. I've only been running the test for a couple of hours, though, and only against one machine.)

Changed in juju:
status: New → Incomplete
Revision history for this message
Jake Hill (routergod) wrote :

Thanks for looking. Yes, the machines seem ok otherwise.

(osc) routergod@juju:~$ juju run --machine 40 --timeout 10s -- uptime
ERROR timed out waiting for result from: machine 40
(osc) routergod@juju:~$ juju ssh 40 -- uptime 2>/dev/null
 09:55:30 up 23 days, 19:31, 1 user, load average: 9.94, 15.79, 17.18

Pen Gale (pengale)
Changed in juju:
status: Incomplete → Triaged
status: Triaged → New
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1902727] Re: Lost action(s) causing juju run to hang?

I know that we eventually expire actions, though I thought we wouldn't
expire actions that are not completed. It may just be that enough actions
got queued up that action pressure caused us to remove some from the DB,
but we certainly shouldn't be pruning actions that haven't been run yet.

On Thu, Nov 5, 2020 at 4:35 PM Pete Vander Giessen <
<email address hidden>> wrote:

> ** Changed in: juju
> Status: Incomplete => Triaged
>
> ** Changed in: juju
> Status: Triaged => New
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1902727
>
> Title:
> Lost action(s) causing juju run to hang?
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1902727/+subscriptions
>

Revision history for this message
Ian Booth (wallyworld) wrote :

When we prune by age, we do avoid pruning any actions that have not completed.
However, when we prune by size, we do prune actions regardless of completed status, the oldest first, to get the collection size back under the limit.

The size limit defaults to 5G and is set by the model config max-action-results-size.
It is surprising that the 5G limit would be exceeded, unless there are some actions which have run and which have generated a lot of output.

But I do agree that when pruning by size, we should not start with only completed actions.

Revision history for this message
Ian Booth (wallyworld) wrote :

That should read:

But I do agree that when pruning by size, we should start with only completed actions.

Changed in juju:
milestone: none → 2.8.7
status: New → Triaged
importance: Undecided → High
Revision history for this message
Ian Booth (wallyworld) wrote :
Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.