pyjuju

Bug #639888
Activity log

Activity log for bug #639888

Date	Who	What changed	Old value	New value	Message
2010-09-15 19:50:59	Gustavo Niemeyer	bug			added bug
2010-09-15 19:51:13	Gustavo Niemeyer	ensemble: status	New	Confirmed
2010-09-15 19:51:16	Gustavo Niemeyer	ensemble: importance	Undecided	High
2010-09-15 19:52:03	Gustavo Niemeyer	description	The EC2 API is "eventually consistent", which also means it's hard to deal with when one wants to infer decisions from a retrieved state. The ProvisioningAgent is in charge of firing new machines to cover requested machine states that were never seen, but also to cover machine states that were alive but died for whatever reason when they shouldn't. Now, imagine the following sequence of actions within the ProvisioningAgent: 1. Acquire the topology lock to ensure no one else attempts changes for now 2. Detect a machine state without an id (new machine requested by the admin) 3. Fire the new machine 4. Store the new machine id in the machine state in zookeeper 5. Release the topology lock 6. Acquire the topology lock again, and start over 7. Detect a machine state with an id (set in 4) 8. Observe that EC2 doesn't know about this id yet (eventual consistency FTW!) 9. Behave as if the machine had died, and fire another machine! 10. Repeat from 4.	The EC2 API is "eventually consistent", which also means it's hard to deal with when one wants to infer decisions from a retrieved state. The ProvisioningAgent is in charge of firing new machines to cover requested machine states that were never seen, but also to cover machine states that were alive but died for whatever reason when they shouldn't. Now, imagine the following sequence of actions within the ProvisioningAgent: 1. Acquire the topology lock to ensure no one else attempts changes for now 2. Detect a machine state without an id (new machine requested by the admin) 3. Fire the new machine 4. Store the new machine id in the machine state in zookeeper 5. Release the topology lock 6. Acquire the topology lock again, and start over 7. Detect a machine state with an id (set in 4) 8. Observe that EC2 doesn't know about this id yet (eventual consistency FTW!) 9. Behave as if the machine had died, and fire another machine! 10. Repeat from 4. This problem may be fixed by introducing a "started_time" parameter into the machine state, and ignoring machines which were acted upon recently.
2010-12-24 00:03:53	Kapil Thangavelu	ensemble: milestone		0.4
2011-02-03 14:03:55	Kapil Thangavelu	ensemble: importance	High	Medium
2011-02-03 14:26:46	Kapil Thangavelu	ensemble: milestone	0.4	budapest
2011-02-03 15:55:08	Kapil Thangavelu	tags		agents
2011-05-11 15:25:08	Kapil Thangavelu	ensemble: milestone	budapest	dublin
2011-08-17 01:18:53	Kapil Thangavelu	ensemble: milestone	dublin
2013-10-12 03:42:12	Curtis Hovey	juju: status	Confirmed	Triaged
2013-10-15 03:26:54	Curtis Hovey	juju: importance	Medium	Low