I have seen the environment and here what I've investigated.
Nailgun sends network check message to Astute, in Astute logs there is nothing about this message, after checking RabbitMQ naily queue the message was found, it was sent to the Astute, Astute didn't respond with acknowledgement message, so RabbitMQ kept the message without resending it to other workers.
So eventmachine received the message but stuck before trying to log it [1], or it stuck on logging attempt.
Also we probably had similar issue with logging which just stuck [2].
After worker which received the message was killed, message was rescheduled and received by another worker.
We had snapshot of the environment, after it was reverted Astute instantly reconnected and message was rescheduled.
So it adds more complexity to debug the issue.
I have seen the environment and here what I've investigated.
Nailgun sends network check message to Astute, in Astute logs there is nothing about this message, after checking RabbitMQ naily queue the message was found, it was sent to the Astute, Astute didn't respond with acknowledgement message, so RabbitMQ kept the message without resending it to other workers.
So eventmachine received the message but stuck before trying to log it [1], or it stuck on logging attempt.
Also we probably had similar issue with logging which just stuck [2].
After worker which received the message was killed, message was rescheduled and received by another worker.
We had snapshot of the environment, after it was reverted Astute instantly reconnected and message was rescheduled.
So it adds more complexity to debug the issue.
[1] https:/ /github. com/stackforge/ fuel-astute/ blob/53c86cba59 3ddbac776ce5a33 60240274c20738c /lib/astute/ server/ server. rb#L62 /github. com/stackforge/ fuel-astute/ commit/ 3ce8643c2d84472 56561f0eafb71a2 58b6f74f17# diff-e58148f7ac 9ffd88d46811627 73da473
[2] https:/