As far as I can see the timestamp of the first "timed out waiting for reply" error fits into the described failover scope, see https://bugs.launchpad.net/fuel/+bug/1460762/comments/27.
So, we should debug this case deeply. Hopefully, some developer from MOS Oslo team could help us here.
As for underlying AMQP layer, as I mentioned earlier, I see no issues but a clean failover
My guess, that we have a classic AP+ C-case here, which is at least two amqp nodes was available (A+), partition was recovered (P+), some reply_queues was lost (C-) but the app layer have failed to survive such a lost.
As far as I can see the timestamp of the first "timed out waiting for reply" error fits into the described failover scope, see /bugs.launchpad .net/fuel/ +bug/1460762/ comments/ 27.
https:/
So, we should debug this case deeply. Hopefully, some developer from MOS Oslo team could help us here.
As for underlying AMQP layer, as I mentioned earlier, I see no issues but a clean failover
My guess, that we have a classic AP+ C-case here, which is at least two amqp nodes was available (A+), partition was recovered (P+), some reply_queues was lost (C-) but the app layer have failed to survive such a lost.