Needs to skip duplicated messages when using rabbimq mirrored queue.

Bug #1107064 reported by Kei Masumoto
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo-incubator
Fix Released
High
Kei Masumoto
Grizzly
Fix Released
High
Kei Masumoto

Bug Description

Hello,

I checked rabbitmq mirrored queue feature, it is great work! BTW, when I checked current implementation, I found a case that openstack services receives same messages twice. Here is a detail description.

In current implementation, rpc.call() flows below().

1 ) Producer(sender) sends a message using TopicProducer.
2 ) Consumer(receiver) accepts the message using TopicConsumer.
3) Consumer dispatch appropriate methods and got return value.
4) Consumer returns the result which is got at 3) using DirectProducer.
5) Producer receives the result using DirectConsumer.
     ( and Producer sends ack ... the result is deleted from rabbitmq)
6) Consumer returns the message "ending=True", that is a sign rpc.call() is now completed.
7) Producer receives the message "ending=True" using DirectConsumer.
     ( and Producer sends ack ... the message "ending=True" is deleted from rabbitmq)
8) finally, Producer sends ack for the message described at 2).

Once openstack services receive any messages, it has to send ack for it, otherwise, message is not deleted from rabbitmq.

Now we think about what if rabbitmq is down between 3) and 8) above. Openstack services have alredy handle the messages, but after reconnecting to slave rabbitmq, same message still remains. Then, the same messages have to be dispatched again. it causes an error.

I tested using below patch. Setting time.sleep() around 2) and 3) above, and while openstack services are sleeping, kill rabbitmq.

> diff --git a/nova/openstack/common/rpc/impl_kombu.py b/nova/openstack/common/rpc/impl_kombu.py
> index 3544fc..de41b33 100644
> --- a/nova/openstack/common/rpc/impl_kombu.py
> +++ b/nova/openstack/common/rpc/impl_kombu.py
> @@ -187,13 +187.29 @@ class ConsumerBase(object):
>
> def _callback(raw_message):
> message = self.channel.message_to_python(raw_message)
> try:
> + time.sleep(10)
> msg = rpc_common.deserialize_msg(message.payload)

Revision history for this message
Mark McLoughlin (markmc) wrote :
Changed in oslo:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Kei Masumoto (masumotok)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (master)

Fix proposed to branch: master
Review: https://review.openstack.org/22495

Changed in oslo:
assignee: Kei Masumoto (masumotok) → Mark McLoughlin (markmc)
Mark McLoughlin (markmc)
Changed in oslo:
assignee: Mark McLoughlin (markmc) → Kei Masumoto (masumotok)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (master)

Reviewed: https://review.openstack.org/22495
Committed: http://github.com/openstack/oslo-incubator/commit/6f9cef85353155538fe14e5025e2d0bab5cc63e4
Submitter: Jenkins
Branch: master

commit 6f9cef85353155538fe14e5025e2d0bab5cc63e4
Author: Mark McLoughlin <email address hidden>
Date: Wed Feb 20 23:08:50 2013 +0000

    Revert "Implement replay detection."

    This reverts Ib0260a0c62e3d312d2e3448a125bed64d861319e (commit a603678)

    The issue we're trying to fix here is bug #1107064 - when using mirrored
    queues with AMQP, acks can be lost while a master is failing over to a
    slace causing the new slave to re-send messages which had previously
    been acked.

    The "replay detection" code applies to more than just amqp and also has
    the appearance of a security measure (e.g. the use of the term 'nonce')
    when clearly it serves no security purpose until we actually have
    message signing.

    Revert the "replay detection" approach in favour of the more targetted
    amqp bugfix.

    Change-Id: I8b8d15835c8b4c85cd388f5df08b60ff4c74e38d

Changed in oslo:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in oslo:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Mark McLoughlin (markmc)
Changed in oslo:
milestone: grizzly-3 → none
status: Fix Released → In Progress
Mark McLoughlin (markmc)
Changed in oslo:
status: In Progress → Fix Committed
Mark McLoughlin (markmc)
Changed in oslo:
milestone: none → grizzly-rc1
Thierry Carrez (ttx)
Changed in oslo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.