timeout does not work for juju run when a unit is pending

Bug #1638332 reported by Greg Lutostanski
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju
Triaged
High
Unassigned
2.1
Fix Released
High
Andrew Wilkins

Bug Description

http://pastebin.ubuntu.com/23412167/ Is my juju status on local provider (lxd -- where nesting is not enabled...) I just did:

$ juju deploy cs:bundle/kubernetes-core-7
$ juju run --all --timeout=4s 'sleep 40'

takes way longer than either 4 or 40 seconds. It actually *never( times out (it even outlasts the 5 min default).

lutostag@cia:~/work/src/oil$ juju --version
2.0-rc3-yakkety-amd64

Tags: oil-2.0 sts
tags: added: oil-2.0
description: updated
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.1.0-beta2
Changed in juju:
assignee: nobody → Alexis Bruemmer (alexis-bruemmer)
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.1-beta2 → none
Curtis Hovey (sinzui)
Changed in juju:
milestone: none → 2.1-rc1
Changed in juju:
milestone: 2.1-beta4 → 2.1-rc1
Changed in juju:
assignee: Alexis Bruemmer (alexis-bruemmer) → nobody
Andrew Wilkins (axwalk)
Changed in juju:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Revision history for this message
Andrew Wilkins (axwalk) wrote :
Andrew Wilkins (axwalk)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Anastasia (anastasia-macmood) wrote :

PR into develop (2.2) that included this fix: https://github.com/juju/juju/pull/6934

Changed in juju:
milestone: 2.1-rc1 → 2.2.0-alpha1
assignee: Andrew Wilkins (axwalk) → John A Meinel (jameinel)
Revision history for this message
Felipe Reyes (freyes) wrote :

In 2.1-rc1 --timeout allows you to avoid "juju run" gets stuck, but it is not marking the queued action as "failed", "timeout" or any other state that prevents from being executed once the agent comes back online

Here you can see how I reproduced this issues -> http://pastebin.ubuntu.com/24002969/

I would expect that if "juju run" timed out for a given agent, it means the command won't ever be executed in the future. In the case of "juju actions" is expected that they are queued and executed only when it's possible, but "juju run" is about "I want to run this now" (related bug #1588092 )

Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Felipe, fair call, but unfortunately we're not going to be able to fix that for 2.1. We would need significant changes to support both this and a fix for lp:1588092.

To do this, we'll need:
 - deadlines on actions (independent of timeout, which only applies from the time an action is picked up by the agent)
 - a worker to watch actions, and cancel them when the deadline is reached/passed
 - modifications to the unit agent to react to action cancellation

With those changes in place, fixing this bug would just mean setting the deadline on the actions created to be now+timeout. Fixing lp:1588092 would be a simple matter of exposing a method to cancel actions in the same way the worker described above would, minus the deadline watching.

I'll reopen this for 2.2 for a complete fix.

Changed in juju:
status: Fix Committed → Triaged
assignee: John A Meinel (jameinel) → nobody
milestone: 2.2.0-alpha1 → 2.2.0
Revision history for this message
Felipe Reyes (freyes) wrote : Re: [Bug 1638332] Re: timeout does not work for juju run when a unit is pending

On Thu, Feb 16, 2017 at 04:12:21AM -0000, Andrew Wilkins wrote:
> Fixing lp:1588092 would be a simple matter of exposing a method to
> cancel actions in the same way the worker described above would, minus
> the deadline watching.

This would be a great, so we will be giving the users a tool to clean up
leftover themselves until all the rest is added.

> I'll reopen this for 2.2 for a complete fix.

Thanks, I appreciate.

Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta1 → 2.2-beta2
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Changed in juju:
milestone: 2.2-beta3 → 2.2-beta4
Revision history for this message
Doug Parrish (dparrish) wrote :

As of 2.2-beta3, I observed the following behavior with a three-machine model and one machine down.

dparrish@maasrr0:/home/dparrish
$ juju run --all -- uname -r
- MachineId: "2"
  Stdout: |
    4.4.0-77-generic
- MachineId: "4"
  Stdout: |
    4.4.0-77-generic

ERROR timed out waiting for result from: machine 3

[ timeout was 5 minutes ]

dparrish@maasrr0:/home/dparrish
$ juju show-action-status --name juju-run
actions:
- id: 37b9c667-f7bb-4498-83bd-fb67dcff2f05
  status: completed
  unit: machine-2
- id: c1e3e42f-4747-495f-8fce-9f93124309f1
  status: pending
  unit: machine-3
- id: 895cf3de-7ea6-47ff-8860-10322dcd6481
  status: completed
  unit: machine-4

[ machine 3 brought back up ]

dparrish@maasrr0:/home/dparrish
$ juju show-action-status --name juju-run
actions:
- id: 37b9c667-f7bb-4498-83bd-fb67dcff2f05
  status: completed
  unit: machine-2
- id: c1e3e42f-4747-495f-8fce-9f93124309f1
  status: completed
  unit: machine-3
- id: 895cf3de-7ea6-47ff-8860-10322dcd6481
  status: completed
  unit: machine-4

[ action for machine-3 now shows completed ]

Where did stdout/stderr get logged if anywhere? On the controller node, machine-0.log shows the action [initiation?] being logged, but didn't find any record of it in default-model machine:/var/log/juju/machine-<n>.log.

Other than this, it appears the behavior has changed for the better. Fully fixed as freyes was seeking?

Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Revision history for this message
Tim Penhey (thumper) wrote :

@doug, I don't think the stdout/stderr for actions is actually logged anywhere.

Changed in juju:
milestone: 2.2-rc1 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers