busy unit agents can starve other agents from the hook lock

Bug #1642541 reported by John A Meinel
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju
Triaged
High
Unassigned

Bug Description

Our current shared lock implementation involves polling every 250ms to see if the lock is available. However, if a unit agent has several hooks to run, it is quite likely to release the lock, and then look to grab the lock again, well within the time that any other process would wake up and see the lock available.

It would be nice to allow multiple agents to make progress, rather than starving all-but-one process.

A few possibilities:
a) Switch to "blocking flock" as our primitive. So rather than polling, we use a blocking system call, which should have something else wake up immediately when it is released, rather than needing 100+ms for other processes to notice. I'm not sure if we need to be able to interrupt the flock. C 'flock' uses a timer to trigger a SIGNAL to get flock() to return EINTR, not sure if that is kosher inside golang code.

b) Introduce a 'wait before re-acquiring' the lock. We wouldn't have to always sleep if we would only sleep if we recently released the lock. This does slow down a 'happy path' with only a single process that is wanting to acquire the lock repeatedly. We could lower the wait and poll times, though.(10ms poll, and 15ms wait wouldn't be particularly bad at slowing things down.)

c) Just increase the polling frequency. Possibly with a randomization. However, it seems we would need some amount of sleeping, else the process that just released the lock is always on the fastest path to testing if it is available.

tags: added: landscape
tags: added: kanban-cross-team
tags: removed: kanban-cross-team
Ryan Beisner (1chb1n)
tags: added: uosci
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers