Subordinate units won't die until all principal relations are gone

Bug #1686696 reported by William Grant
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Christian Muirhead

Bug Description

Juju appears to not destroy any of a subordinate's units until all of the subordinate application's principal relations are destroyed. This is unfortunate when you eg. want to remove nrpe from a service before destroying just that service, in order to avoid false alerts.

Tested on Juju 2.0.2, 2.1.1 and 2.1.2:

juju deploy cs:ubuntu
juju deploy cs:ubuntu ubuntu2
juju deploy cs:nrpe
juju add-relation ubuntu nrpe
juju add-relation ubuntu2 nrpe
# Wait for both subordinate units to deploy.
juju remove-relation ubuntu nrpe
# Notice that both subordinates are still alive.
juju remove-relation ubuntu2 nrpe
# Both subordinates die.

This doesn't affect single principal applications with multiple units; multiple principal applications seem required to trigger the bug.

Revision history for this message
William Grant (wgrant) wrote :

The removed relation is gone from the DB (though the relationscopes look to still be present), and adding a new unit to its principal leaves you with inconsistent subordinates between the units of that principal.

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.2-rc1
importance: Undecided → High
status: New → Triaged
Changed in juju:
assignee: nobody → Christian Muirhead (2-xtian)
Revision history for this message
Christian Muirhead (2-xtian) wrote :

After a while digging around in the guts of the uniter, I think the problem is: when an application is related to two other applications, its units consider themselves to be involved in all of the relations for that application.

This is wrong for subordinate applications - even though there are multiple relations for the application, for this given unit attached to a principal of a specific other app, only relations with that other app are relevant. This causes the bug because there is a loop at the end of uniter/relation/relations.go:relation.update (https://github.com/juju/juju/blob/develop/worker/uniter/relation/relations.go#L417) that goes through all of the relations, and only destroys the unit if all of the relations are marked as dying. So here, the nrpe-ubuntu2 relation keeps the nrpe unit on the ubuntu machine alive.

My plan to fix this is to change the Uniter API WatchApplicationRelations to accept a unit tag, and if that's a subordinate unit, only signal changes for relations between this application and the principal's application. (But I'm going to run this by some others to confirm my understanding first.)

Revision history for this message
Christian Muirhead (2-xtian) wrote :

Oops - the method will be WatchUnitRelations after the change, rather than WatchApplicationRelations.

Revision history for this message
Christian Muirhead (2-xtian) wrote :
Changed in juju:
status: Triaged → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.