Provisioning agent stops watching machine changes in ZK

Reported by Jim Baker on 2011-10-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pyjuju
High
Jim Baker
juju (Ubuntu)
High
Unassigned
Oneiric
High
Unassigned

Bug Description

The watch callback watch_machine_changes in juju.agents.provision will stop working, *not* allowing the watch to re-establish, if the function it calls, process_machines, raises an uncaught exception. A scenario where this can happen is that txaws attempts to parse a bad payload and raises for example a KeyError (one of a number of possibilities that have been observed in parsing, along with timeout errors; there may be others).

At this point, only the periodic_machine_check is then run, so the provisioning agent only is resyncing every 60 seconds, making it sluggish.

Jim Baker (jimbaker) on 2011-10-11
Changed in juju:
milestone: none → florence
assignee: nobody → Jim Baker (jimbaker)
importance: Undecided → High
Clint Byrum (clint-fewbar) wrote :

We saw this bug while testing heavily against openstack, and it reproduces itself quite often when exposing units against diablo, which occasionally returns an empty response for describing instances.

Changed in juju:
status: New → Confirmed
tags: added: openstack
Jim Baker (jimbaker) on 2011-10-11
description: updated
Jim Baker (jimbaker) on 2011-10-12
Changed in juju:
status: Confirmed → In Progress
Jim Baker (jimbaker) on 2011-10-13
Changed in juju:
status: In Progress → Fix Released
Changed in juju (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in juju (Ubuntu Oneiric):
status: New → Triaged
importance: Undecided → High
Clint Byrum (clint-fewbar) wrote :

Fixed in r403, precise has r424

Changed in juju (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers