RPC client doesn't retry if the server fails to deliver the reply

Bug #1349301 reported by Bogdan Dobrelya
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
oslo.messaging
In Progress
Undecided
Viktor Serhieiev

Bug Description

related issue https://bugs.launchpad.net/oslo.messaging/+bug/1338732

Glossary:
client - RPCClient object sends RPC call server
server - MessageHandlingServer object recives RPC call, dispatches it in python, sends replies

Normal flow:
    client sends call message
    server receives and dispatches call
    server sends reply message
    client receives reply

Failover (the case should be fixed):
    client sends call message
    server receives and dispatches call
    FAILOVER happens here
    server reconnects
    server sends reply message (considering all is just fine, but in fact fails to deliver it)
    client reconnects and redeclares exchange and queue (Note: 1)message coulbe be already lost since published to empty exchange, 2)the reply_* queue might contain wrong i.e. unexpected for server id (why?..))
    Timeout exception - but should be a retries perhaps.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Of cause retrying could be a bad idea overall... Ideas are wellcome.

Changed in oslo.messaging:
status: New → Opinion
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/110058

Changed in oslo.messaging:
assignee: nobody → Stan Lagun (slagun)
status: Opinion → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Some quotes from patch discussion:

Non HA case (1 single rabbitmq server):
1. client declares reply queue 2. client sends message 3. rabbit dies and restarts, *queue is gone* 4. server reconnects, gets message, sends reply to exchange without queue, message is lost 5. client reconnects, waits forever for message.

This patch obviousely cannot fix such a case due to single rabbit cannot "failover", you should use clustering in order to preserve queues. Durable queues and durable messages could help a bit as well, but that is not a guarantee anyway.

HA case this patch is intenting to fix:
1. client declares reply queue 2. client sends message 3. rabbit dies and restarts, queue is preserved intact 4. server reconnects, gets message, sends reply to exchange with queue bound, message is ok 5. client reconnects, recieves the message.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (master)

Change abandoned by Davanum Srinivas (dims) (<email address hidden>) on branch: master
Review: https://review.openstack.org/110058
Reason: This review is > 12 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/185525

Changed in oslo.messaging:
assignee: Stan Lagun (slagun) → Victor Sergeyev (vsergeyev)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (master)

Change abandoned by Victor Sergeyev (<email address hidden>) on branch: master
Review: https://review.openstack.org/185525
Reason: Abandon in favour of Change-Id: I1fb8e9a6fb9a0fceb7a3f9c841906501548c3670

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.