Comment 20 for bug 1461562

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Well, today I tried to test our Astute connection errors handling and looks good. I tried to

* close Astute->RabbitMQ socket using gdb while Astute's processing message from nailgun (before sending back result)
* drop outgoing traffic

In both cases Astute tries to reconnect to message broker almost immediately.

So, I've prepared a patch that introduces more verbose logging https://review.openstack.org/#/c/220146/1 and I'm going to build ISO and ask QA to help me to reproduce this issue.

I want also to notice that perhaps we started facing this issue because Astute started to use almost 100% CPU. For instance, the Dennis case could be:

* Astute sends message to RabbitMQ (4Mb, huge enough)
* CPU is busy by other workers
* It requires some time to send it
* There's no heartbeats from Astute to RabbitMQ (because it's busy with sending message)
* RabbitMQ closes connection
* Astute reconnected, but after it's failed to send the original message. And there's no retries, apparently.