Comment 2 for bug 1487397

Revision history for this message
Evgeniy L (rustyrobot) wrote :

I have seen the environment and here what I've investigated.
Nailgun sends network check message to Astute, in Astute logs there is nothing about this message, after checking RabbitMQ naily queue the message was found, it was sent to the Astute, Astute didn't respond with acknowledgement message, so RabbitMQ kept the message without resending it to other workers.

So eventmachine received the message but stuck before trying to log it [1], or it stuck on logging attempt.
Also we probably had similar issue with logging which just stuck [2].

After worker which received the message was killed, message was rescheduled and received by another worker.

We had snapshot of the environment, after it was reverted Astute instantly reconnected and message was rescheduled.
So it adds more complexity to debug the issue.

[1] https://github.com/stackforge/fuel-astute/blob/53c86cba593ddbac776ce5a3360240274c20738c/lib/astute/server/server.rb#L62
[2] https://github.com/stackforge/fuel-astute/commit/3ce8643c2d8447256561f0eafb71a258b6f74f17#diff-e58148f7ac9ffd88d4681162773da473