Comment 4 for bug 1803629

Revision history for this message
Ian Booth (wallyworld) wrote :

The issue isn't the recently changed code - the new code simply essentially reorders the business logic a little - the state object is still used to fetch a unit and from there status is extracted. The root cause appears to be that the multiwatcher worker passes to a worker loop a state object that may be closed outside the loop when things are still being initialised.

The sequence of events is:

state is started using start()
start() method creates workers

At some time later, state is watched

func (st *State) Watch(params WatchParams) *Multiwatcher {
 return NewMultiwatcher(st.workers.allManager(params))
}

The multiwatcher startup code runs the worker loop in a go routine. Before the loop starts monitoring the dying channel, all the entities are loaded:

 if err := sm.backing.GetAll(sm.all); err != nil {
  return err
 }

The process of loading the entities uses state. However,the state may be closed while this go routine is running and hence the panic.

Inside the code to load all the entities, there is an attempt to use a copy of the state:

 // Use a single new MongoDB connection for all the work here.
 db, closer := st.newDB()
 defer closer()

However, this db is not used everywhere - in some places the st is used. That's what appears to be the problem.