Juju occasionally kills off its own instances

Bug #1096945 reported by David Owen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pyjuju
Triaged
Low
Unassigned

Bug Description

Occasionally, the datacenter hosting my juju environments will have internal connectivity problems. Juju will detect that instances are no longer reachable and attempt to remove them from relationships. During either this removal or subsequently re-adding the node once it's accessible again, juju will fail to deliver config information to the relationship hooks, and mark the instance as blocked because of a config error.

Tracebacks in the debug-log look like this:

2013-01-04 23:59:53,474 unit:houston-db/0: hook.output DEBUG: hook database-relation-changed exited, exit code 0.
2013-01-04 23:59:53,486 unit:houston-db/0: hook.executor DEBUG: Hook complete: /var/lib/juju/units/houston-db-0/charm/hooks/database-relation-changed
2013-01-04 23:59:54,069 unit:houston-app/0: twisted ERROR: Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 1035, in doit
    return maybeDeferred(aCallable, **kw).addCallback(
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1181, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1039, in _inlineCallbacks
    result = g.send(result)
  File "/usr/lib/python2.7/dist-packages/juju/hooks/protocol.py", line 311, in config_get
    options = yield context.get_config()
exceptions.AttributeError: 'NoneType' object has no attribute 'get_config'

2013-01-04 23:59:54,255 unit:houston-app/0: unit.lifecycle DEBUG: relation resolved changed
2013-01-04 23:59:54,256 unit:houston-app/0: unit.lifecycle INFO: processing relation resolved changed

As you can see, one side of the relation updated correctly, one did not. When this affects the single node in a 1-N relationship, this can bring down the entire service, requiring manual intervention for a transient glitch that would otherwise have left the service unaffected outside of a few seconds of downtime.

David Owen (dsowen)
description: updated
Curtis Hovey (sinzui)
Changed in juju:
importance: Undecided → Low
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.