A subordinate charm hook scheduled to run(but waiting for the principal charm hook to release the lock) goes to an error state after the principal charm hook triggers a reboot.

Bug #1464470 reported by Adrian Vladu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Bogdan Teleaga
1.24
Fix Released
High
Bogdan Teleaga

Bug Description

This scenario needs at least 3 charms:
- one principal and one subordinate charm
- a third charm that services the principal charm

A relation must exist between the principal and the third charm.

This issue happens only when the principal charm executes the relation hook triggered by the third charm(let's name this hook third-relation-joined-hook), while the subordinate charm has at least a charm hook in the queue(let's name this hook secondary-relation-hook). secondary-relation-hook execution must wait for the lock to be released by third-relation-joined-hook.

if third-relation-joined-hook triggers a reboot using the command "juju-reboot --now", after the subsequent reboot, secondary-relation-hook goes into an error state.

This issue happens on Windows 2012 R2 with Juju versions 1.24 and 1.25.

Adrian Vladu (avladu)
summary: - A subordinate charm hook scheduled to run(but it is waiting for the
- principal charm hook to release the lock) goes to an error state after
- the principal charm triggers a reboot.
+ A subordinate charm hook scheduled to run(but waiting for the principal
+ charm hook to release the lock) goes to an error state after the
+ principal charm hook triggers a reboot.
Revision history for this message
Gabriel Samfira (gabriel-samfira) wrote :

This I believe happens because of the way hooks are run. The hook run has been split up in 3 steps:

* prepare
* execute
* commit

During prepare, the state of the hook is written, but we only start locking for hook execution in the "execute" phase. So we can have one hook that requires a reboot executing, and another one finishing the prepare phase, ready to execute. If we reboot the machine, when the uniter comes back up, it will see the second hook in the preparing phase, but has no knowledge of this happening, and sets it in error state.

I think this can be fixed by simply locking in the prepare phase and unlocking after execute, instead of locking in execute.

Revision history for this message
Gabriel Samfira (gabriel-samfira) wrote :

Here is a quick and dirty fix for this:

https://github.com/juju/juju/compare/1.24...gabriel-samfira:executor-lock?expand=1

I am unsure on how to do this cleanly, or what the deeper implications of locking in "prepare" are. Any advice from @fwreade would be welcome :).

Curtis Hovey (sinzui)
tags: added: reboot subordinate
tags: added: windows
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.25.0
Curtis Hovey (sinzui)
tags: added: regression
tags: added: hooks
Revision history for this message
William Reade (fwereade) wrote :

I think this is essentially sound, but that logic is a bit tangly. Can we do it in operation.executor perhaps? (e.g. add a NeedsGlobalMachineLock() bool method to the Operation interface?, and check/acquire that first of all before even calling prepare?)

Changed in juju-core:
assignee: nobody → Gabriel Samfira (gabriel-samfira)
Changed in juju-core:
assignee: Gabriel Samfira (gabriel-samfira) → Bogdan Teleaga (bteleaga)
status: Triaged → In Progress
Revision history for this message
Bogdan Teleaga (bteleaga) wrote :
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.