Upstart scripts for agents

Bug #770482 reported by Kapil Thangavelu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pyjuju
Fix Released
Wishlist
William Reade

Bug Description

Use upstart for all agents in the system. Upstart can manage any needed respawns. actually probably removes most of our benefits of twistd daemon support integration.

Related branches

Changed in ensemble:
milestone: none → budapest
importance: Undecided → High
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

I'm thinking we just have separate upstarts per agent type, and multiples for a unit. we could explore upstart instances functionality.

Each of the upstart jobs for agents would be marked 'manual' with a 'respawn' stanza.

as an example..

cat /etc/init/ensemble-test.conf
# tester - The magic sleep
description "Testing Sleep"
manual
respawn
exec sleep 60

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

I was thinking that machine agents would drop in per service unit init files for upstart from a template, and then launch the process via upstart, shutdown would stop via upstart and remove the service init file.

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

For machine agents and provisioning agents, we could have cloud-init drop in the upstart conf files and start. There's a minor race for upstart to notice the new file via inotify, but we can explicitly sync via reload-configuration before starting the agents.

Changed in ensemble:
milestone: budapest → dublin
Changed in ensemble:
assignee: nobody → Kapil Thangavelu (hazmat)
Changed in ensemble:
assignee: Kapil Thangavelu (hazmat) → nobody
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 770482] Re: Upstart scripts for agents

I did some work on this when playing with Orchestra.

lp:~clint-fewbar/ensemble/orchestra-provider/debian has some changes which
build two packages:

ensemble-machine-agent

and

ensemble-provision-agent

They are (at least, in theory) pre-seedable, allowing us to set the
machine_id, zookeeper addresses, etc, before installing the package
(or as part of a reconfiguration step), and then the upstart jobs are
managed just like usual upstart services. ensemble-provision-agent
starts on started ensemble-machine-agent, which starts on runlevel 2345
(basically "when the system is ready to do network stuff").

I'll do more to split this into a branch to fix this bug next week.

Excerpts from Kapil Thangavelu's message of Sat Jul 09 00:19:24 UTC 2011:
> ** Changed in: ensemble
> Assignee: Kapil Thangavelu (hazmat) => (unassigned)
>
> --
> You received this bug notification because you are a member of Ensemble
> Team, which is the registrant for Ensemble.
> https://bugs.launchpad.net/bugs/770482
>
> Title:
> Upstart scripts for agents
>
> Status in Ensemble:
> New
>
> Bug description:
> Use upstart for all agents in the system. Upstart can manage any
> needed respawns. actually probably removes most of our benefits of
> twistd daemon support integration.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ensemble/+bug/770482/+subscriptions

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

There's actually quite a bit of work i realize to allow for unit-agents to be restarted, as they currently have some important state in memory, that will need persistence and delta evaluation against remote state in order to ensure formulas don't miss any hook executions. I'm going to switch this out to wishlist item for now.

Changed in ensemble:
importance: High → Wishlist
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Excerpts from Kapil Thangavelu's message of Wed Aug 10 18:55:03 UTC 2011:
> There's actually quite a bit of work i realize to allow for unit-agents
> to be restarted, as they currently have some important state in memory,
> that will need persistence and delta evaluation against remote state in
> order to ensure formulas don't miss any hook executions. I'm going to
> switch this out to wishlist item for now.
>

I don't think we can just drop this to Wishlist.. even if its hard.

Being able to reboot a machine with a ton of data on it is really quite
important. Sometimes kernels get wedged, or userspace is non-responsive
enough to force a reboot. Also t1.micro's need to be shutdown/started
again to transform into larger nodes.

Changed in ensemble:
milestone: dublin → none
Changed in juju:
milestone: none → florence
William Reade (fwereade)
Changed in juju:
assignee: nobody → William Reade (fwereade)
William Reade (fwereade)
Changed in juju:
status: New → In Progress
William Reade (fwereade)
Changed in juju:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.