Juju occasionally kills off its own instances
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pyjuju |
Triaged
|
Low
|
Unassigned |
Bug Description
Occasionally, the datacenter hosting my juju environments will have internal connectivity problems. Juju will detect that instances are no longer reachable and attempt to remove them from relationships. During either this removal or subsequently re-adding the node once it's accessible again, juju will fail to deliver config information to the relationship hooks, and mark the instance as blocked because of a config error.
Tracebacks in the debug-log look like this:
2013-01-04 23:59:53,474 unit:houston-db/0: hook.output DEBUG: hook database-
2013-01-04 23:59:53,486 unit:houston-db/0: hook.executor DEBUG: Hook complete: /var/lib/
2013-01-04 23:59:54,069 unit:houston-app/0: twisted ERROR: Unhandled Error
Traceback (most recent call last):
File "/usr/lib/
result = f(*args, **kw)
File "/usr/lib/
return maybeDeferred(
File "/usr/lib/
result = f(*args, **kw)
File "/usr/lib/
return _inlineCallback
--- <exception caught here> ---
File "/usr/lib/
result = g.send(result)
File "/usr/lib/
options = yield context.
exceptions.
2013-01-04 23:59:54,255 unit:houston-app/0: unit.lifecycle DEBUG: relation resolved changed
2013-01-04 23:59:54,256 unit:houston-app/0: unit.lifecycle INFO: processing relation resolved changed
As you can see, one side of the relation updated correctly, one did not. When this affects the single node in a 1-N relationship, this can bring down the entire service, requiring manual intervention for a transient glitch that would otherwise have left the service unaffected outside of a few seconds of downtime.
description: | updated |
Changed in juju: | |
importance: | Undecided → Low |
status: | New → Triaged |