I have not looked at panic 2 which seems to be occuring in metrics collection path.
I have had a look at the code from panic 1 & 3, both are the same and are coming from allwatcher code where we are retrieving agent and machine status. I believe that the problem is that we are creating a full state object, state.Entity, in https://github.com/juju/juju/blob/2.4/state/allwatcher.go#L252. We then use this object to determine machine status, instance status, agent version etc in various subsequent calls.
The problem is that I do not think that we can guarantee that the *State, st, that is used to create this *state.Entity will not be closed when the subsequent calls are made. I have examined all code in the allwatcher and this approach seems to be unique and is new, introduced recently by https://github.com/juju/juju/commit/37a0f7f1774de24d2b8317bfabfa16561db17151
I beleive this code needs to be re-written to not keep & use reference to the state - it can become stale.
I could not figure out a reproducible scenario to easily test my suggestion.
I have not looked at panic 2 which seems to be occuring in metrics collection path.
I have had a look at the code from panic 1 & 3, both are the same and are coming from allwatcher code where we are retrieving agent and machine status. I believe that the problem is that we are creating a full state object, state.Entity, in https:/ /github. com/juju/ juju/blob/ 2.4/state/ allwatcher. go#L252. We then use this object to determine machine status, instance status, agent version etc in various subsequent calls.
The problem is that I do not think that we can guarantee that the *State, st, that is used to create this *state.Entity will not be closed when the subsequent calls are made. I have examined all code in the allwatcher and this approach seems to be unique and is new, introduced recently by https:/ /github. com/juju/ juju/commit/ 37a0f7f1774de24 d2b8317bfabfa16 561db17151
I beleive this code needs to be re-written to not keep & use reference to the state - it can become stale.
I could not figure out a reproducible scenario to easily test my suggestion.