Mirantis OpenStack

Bug #1463802
Comment #9

Comment 9 for bug 1463802

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2015-06-12: Re: RPC clients do not recreate a reply queue after restart of the last RabbitMQ server in the cluster

@Igor

Short answer, no it won't, as the root cause of the problem is that oslo.messaging does not recreate a reply queue, so we are going to get Timeout error again and again for each subsequent RPC call.

Long answer, handling of Timeout errors *might* help us to mitigate short RabbitMQ server disruptions, if we missed some requests/replies. But it's a much bigger question. The way we look at MQ in oslo.messaging is just a layer to implement simple RPC protocol upon, treating all local/remote process calls in the very same way.

The problem with that is that, if you wanted to handle Timeout errors gracefully, you would end up retrying call possible RPC calls in your code. And not all of those calls are idempotent (i.e. can be safely retried). So we could do that in oslo.messaging, but the consequences might be even worse than without retries.

There is nothing specific in Nova here, it's how all OpenStack projects work with RPC right now.