I'm quite sure it's related to the power checks, but I still haven't found the root cause.
What I've found so far is that when a power check fails, it the rack issues an RPC call to the region to create a POWER_QUERY_FAILED event. If I comment out that RPC call, I don't see any issues anymore.
I haven't been able to reproduce this in a unit test yet, but I can reproduce it locally. I've confirmed that the database triggers seem ok, since if I listen to all the machine notifications, I only see a 'machine_update' notification.
So the problem should be somewhere in the websocket code.
I'm quite sure it's related to the power checks, but I still haven't found the root cause.
What I've found so far is that when a power check fails, it the rack issues an RPC call to the region to create a POWER_QUERY_FAILED event. If I comment out that RPC call, I don't see any issues anymore.
I haven't been able to reproduce this in a unit test yet, but I can reproduce it locally. I've confirmed that the database triggers seem ok, since if I listen to all the machine notifications, I only see a 'machine_update' notification.
So the problem should be somewhere in the websocket code.