Activity log for bug #639888

Date Who What changed Old value New value Message
2010-09-15 19:50:59 Gustavo Niemeyer bug added bug
2010-09-15 19:51:13 Gustavo Niemeyer ensemble: status New Confirmed
2010-09-15 19:51:16 Gustavo Niemeyer ensemble: importance Undecided High
2010-09-15 19:52:03 Gustavo Niemeyer description The EC2 API is "eventually consistent", which also means it's hard to deal with when one wants to infer decisions from a retrieved state. The ProvisioningAgent is in charge of firing new machines to cover requested machine states that were never seen, but also to cover machine states that were alive but died for whatever reason when they shouldn't. Now, imagine the following sequence of actions within the ProvisioningAgent: 1. Acquire the topology lock to ensure no one else attempts changes for now 2. Detect a machine state without an id (new machine requested by the admin) 3. Fire the new machine 4. Store the new machine id in the machine state in zookeeper 5. Release the topology lock 6. Acquire the topology lock again, and start over 7. Detect a machine state with an id (set in 4) 8. Observe that EC2 doesn't know about this id yet (eventual consistency FTW!) 9. Behave as if the machine had died, and fire another machine! 10. Repeat from 4. The EC2 API is "eventually consistent", which also means it's hard to deal with when one wants to infer decisions from a retrieved state. The ProvisioningAgent is in charge of firing new machines to cover requested machine states that were never seen, but also to cover machine states that were alive but died for whatever reason when they shouldn't. Now, imagine the following sequence of actions within the ProvisioningAgent: 1. Acquire the topology lock to ensure no one else attempts changes for now 2. Detect a machine state without an id (new machine requested by the admin) 3. Fire the new machine 4. Store the new machine id in the machine state in zookeeper 5. Release the topology lock 6. Acquire the topology lock again, and start over 7. Detect a machine state with an id (set in 4) 8. Observe that EC2 doesn't know about this id yet (eventual consistency FTW!) 9. Behave as if the machine had died, and fire another machine! 10. Repeat from 4. This problem may be fixed by introducing a "started_time" parameter into the machine state, and ignoring machines which were acted upon recently.
2010-12-24 00:03:53 Kapil Thangavelu ensemble: milestone 0.4
2011-02-03 14:03:55 Kapil Thangavelu ensemble: importance High Medium
2011-02-03 14:26:46 Kapil Thangavelu ensemble: milestone 0.4 budapest
2011-02-03 15:55:08 Kapil Thangavelu tags agents
2011-05-11 15:25:08 Kapil Thangavelu ensemble: milestone budapest dublin
2011-08-17 01:18:53 Kapil Thangavelu ensemble: milestone dublin
2013-10-12 03:42:12 Curtis Hovey juju: status Confirmed Triaged
2013-10-15 03:26:54 Curtis Hovey juju: importance Medium Low