Responses mismatch requests resulting in ValueError being raised.

Bug #686048 reported by Marek Kowalski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
txAMQP
Invalid
Undecided
Unassigned

Bug Description

The problem here is quite complex. I have been testing the following scenario.

1. Connect to RabbitMQ.
2. Open two channels (with id 1, 2).
3. For both channels perform the following operations:
Deferred chain I:
a) queue_declare durable=True (each channel uses different name)
b) basic_consume name=<matches the queue>, no_ack=False
c) queue consumer_tag=<matches resp.consumer_tag>
Deferred chain II:
a) exchange_declare type=direct nowait=False durable=True name='lobby' (same for both channels)
b) queue_bind nowait=False, exchange='lobby', queue=<matches queue for each channel>

Steps above basicly creates the setup for some more stuff I want to test. This usually works. But once for a while there is an exception raised inside the txamqp library. Please see the attached diff "fix" for details. The frequency of the failures is different depending on the machine speed. Also increasing number of channels makes the test fail more often. Unfortunately I cannot separate the failing testcase - it depends on rest of the project too much to share.

The problem requires some special timing: steb Ib) is waiting for response, but than the IIa) sends the request. Once upon a time the responses for the requests will get confused resulting in a crash.

You are probably thinking now 'just connect both deferred chains and stop bugging me!'. I have made such tests, and this fixes the problem at hand. But it will pop up later, I just don't want to synchronize all the action being performed on the channel.
I consider this a bug in txamqp.
Putting things simple: the design assumes that RabbitMQ sends responses to the channel in the same order the requests has been sent. This assumption is not met, the design should be updated.

I'm attaching the diff with the "fix" (citation marks denotes how dirty it is) I made to get the scenario working. The solution is far from perfect, please take a look at the inline comments.

Revision history for this message
Marek Kowalski (kowalski0123) wrote :
Revision history for this message
Esteve Fernandez (esteve) wrote :

Hi,

I suffered from this problem too, but it happened to be a problem in the code calling txAMQP, not txAMQP itself. In section 2.2.1 of the AMQP 0.9.1 specification [1], methods are grouped in two classes: synchronous and asynchronous. Protocol methods that expect a response (e.g. Queue.declare, Exchange.delete, etc.) must be issued synchronously (i.e. wait for its response) and cannot be pipelined. That means that you'll have to synchronize your methods so that you don't issue a command before the response for the previous command has been processed.

In our case, this was caused by a missing "yield" in an inlineCallbacks decorated function. Given that you can't provide an isolated test, if you send me your code (it'll be treated confidentially), I can have a look at it.

Cheers.

1 - http://www.amqp.org/confluence/download/attachments/720900/amqp0-9-1.pdf?version=1&modificationDate=1227526523000

Revision history for this message
Marek Kowalski (kowalski0123) wrote :

Hi Esteve,

Thanks for your reply. Well, about sending the code I would have to ask my boss first. ;) I will get back to it.

I do know how to synchronize method calls inside my communication layer. In fact I can make every call synchronous. But, of course, I would loose on performance and it just doesn't seem elegant.
The point I was making submitting the bug reprot, was that I feel I shouldn't be forced to take care about such subtleties. In my opinion this synchronization should be managed by txAMQP itself.

Maybe AMQChannel.invoke method method should acquire() DeferredLock in case the method being called is synchronous and release() it when it's done ?

Cheers,
MK

Revision history for this message
Esteve Fernandez (esteve) wrote : Re: [Bug 686048] Re: Responses mismatch requests resulting in ValueError being raised.

Hi

On Wed, Dec 8, 2010 at 5:21 PM, Marek Kowalski
<email address hidden> wrote:
> Thanks for your reply. Well, about sending the code I would have to ask
> my boss first. ;) I will get back to it.

No problem, just trying to help :-)

> I do know how to synchronize method calls inside my communication layer. In fact I can make every call synchronous. But, of course, I would loose on performance and it just doesn't seem elegant.
> The point I was making submitting the bug reprot, was that I feel I shouldn't be forced to take care about such subtleties. In my opinion this synchronization should be managed by txAMQP itself.

txAMQP already handles serialization so that multiple requests are
sent in the right order. However adding a second layer to it increases
its complexity, and adding a DeferredLock to it would create yet
another Deferred for every method call, which seriously hampers
performance.

Also, I'm not so sure about your concerns regarding performance with
the callee issuing one control method at a time, after all you're
doing this during the setup of your application, unless you create
tons of consumers at several times while the application is running
(which you shouldn't be doing at all). You only need to call
queue_declare, basic_consume, etc. once. Also, basic_deliver, which is
called often during the application lifecycle, doesn't expect a
response. And listening on a queue for messages is inherently
asynchronous.

Cheers.

Revision history for this message
Marek Kowalski (kowalski0123) wrote :

Hey,

The reason I don't want to synchronize the calls in my messaging layer is that I would have to know which methods are synchronous. I can read the spec file and hardcode this knowledge into my code but I would rather not do this. The layer using the library should not be concerned about its implementation details, it is just a good design principle to follow.

I agree the idea with DeferredLock is missed, fact is that if I knew how to fix it cleanly myself we wouldn't be having this discussion. So maybe lets attack this problem from the different side ?

The response is fetched from TimeoutDefferedQueue, which is implemented inside the txAMQP. Maybe lets substitute it with something which would inherit from this class ? Instead of calling get() method, lets use get(id), where id is a index of response method we are waiting for. It will fire the deferred only for selected message. This seems to be clearer solution than re-putting message back to the queue, which I'm doing know (I'm talking about the diff attached). Although I think that such approach may just allow calling more than one synchronous message in time, which I consider a nice feature. What weak sides of this approach do you see ?

One more issue nags me... What would happen if I start the synchronous call and receive the response for get() (the one called with the consumer_tag) while waiting for response. Is this situation different than the one we are discussing?

Cheers,
MK

Revision history for this message
Esteve Fernandez (esteve) wrote :

Hi Marek,

On Wed, Dec 8, 2010 at 7:46 PM, Marek Kowalski
<email address hidden> wrote:
> The reason I don't want to synchronize the calls in my messaging layer
> is that I would have to know which methods are synchronous. I can read
> the spec file and hardcode this knowledge into my code but I would
> rather not do this.

That's fine, you don't need to memorize the spec (although I strongly
recommend you to have a look at it ;-)), and that's the reason why all
methods return a Deferred, even if they don't expect a response (e.g.
basic_deliver returns an already-fired Deferred whose result is None),
so you can issue any method without worrying whether it's synchronous
or not.

> The layer using the library should not be concerned
> about its implementation details, it is just a good design principle to
> follow.

No, txAMQP aims to be a direct translation of the AMQP spec to
Twisted, it follows the exact same design as py-amqplib and Qpid
(actually, txAMQP started as a patch against Qpid, before forking it).
This has the advantage of allowing users to build their own
higher-level libraries on top of it, if they deem so. For example, the
University of California San Diego, uses txAMQP and implemented an
adapter for Carrot, a high-level AMQP library [1]

> I agree the idea with DeferredLock is missed, fact is that if I knew how
> to fix it cleanly myself we wouldn't be having this discussion. So maybe
> lets attack this problem from the different side ?

Sorry if I sound obtuse, but I can't see where's the problem in txAMQP
that needs fixing. Maybe you could provide an example of your problem?

> The response is fetched from TimeoutDefferedQueue, which is implemented
> inside the txAMQP. Maybe lets substitute it with something which would
> inherit from this class ? Instead of calling get() method, lets use
> get(id), where id is a index of response method we are waiting for. It
> will fire the deferred only for selected message. This seems to be
> clearer solution than re-putting message back to the queue, which I'm
> doing know (I'm talking about the diff attached). Although I think that
> such approach may just allow calling more than one synchronous message
> in time, which I consider a nice feature. What weak sides of this
> approach do you see ?
>
> One more issue nags me... What would happen if I start the synchronous
> call and receive the response for get() (the one called with the
> consumer_tag) while waiting for response. Is this situation different
> than the one we are discussing?

I'm sorry, I'm afraid I didn't understood you. Could you elaborate it
with an example?

Thanks!

1 - http://oceanobservatories.org/spaces/display/CIDev/Magnet+Development+environment

Revision history for this message
Marek Kowalski (kowalski0123) wrote :

Ok, maybe I went to far. To explain more I shall cite a paragraph from the amqp spec you have attached earlier (section 2.2.1):

'''
To make method processing simple, we define distinct replies for each synchronous request. That is, no
method is used as the reply for two different requests. This means that a peer, sending a synchronous
request, can accept and process incoming methods until getting one of the valid synchronous replies. This
differentiates AMQP from more traditional RPC protocols.
'''

The problem in txAMQP, which I think needs fixing, is that it assumes that the first incoming message is the response for the synchronous call. This is not a case according to specification nor to the actual behavior of the RabbitMQ server. So I propose fixing it with using more advanced implementation of the DeferredQueue, which would allow specifying (during calling get) the list of method id's which should fire the Deferred. What do you think ?

Just forget the last paragraph (with receiving messages) after digging into spec I see it is different form of communication so it isn't a case.

Cheers!
MK

Revision history for this message
Esteve Fernandez (esteve) wrote :

Hi Marek,

On Thu, Dec 9, 2010 at 12:05 PM, Marek Kowalski
<email address hidden> wrote:
> '''
> To make method processing simple, we define distinct replies for each synchronous request. That is, no
> method is used as the reply for two different requests. This means that a peer, sending a synchronous
> request, can accept and process incoming methods until getting one of the valid synchronous replies. This
> differentiates AMQP from more traditional RPC protocols.
> '''
>
> The problem in txAMQP, which I think needs fixing, is that it assumes
> that the first incoming message is the response for the synchronous
> call. This is not a case according to specification nor to the actual
> behavior of the RabbitMQ server. So I propose fixing it with using more
> advanced implementation of the DeferredQueue, which would allow
> specifying (during calling get) the list of method id's which should
> fire the Deferred. What do you think ?

I see, my interpretation (which could be wrong) and also Qpid's and
py-amqplib's is that a client may receive an incoming method at any
time and in between replies. For example, a client must be able to
process a basic_deliver command, even if the client is waiting for a
response for a queue_declare. This is supported by txAMQP by having
two different queues: responses and incoming (see
AMQChannel#dispatch), so that even if you're waiting for a response
for a synchronous method on the first queue, messages are still
delivered asynchronously on the second queue.

However, I don't agree with you that neither the spec or RabbitMQ
enforce ordering of the responses. Actually, the fact that the spec
calls them "synchronous" is because clients must wait for a response
before issuing another command. This is enforced in txAMQP by putting
commands into a queue before they are sent, however it's not obvious
from a non-asynchronous POV because calls return immediately and the
client code must wait for the returned Deferred to be fired (i.e. you
can't issue commands concurrently)

For a more detailed explanation of how an AMQP client works, please
see [1]. More details regarding ordering can be found in item 6 of
"Design Considerations" and item 2 of "Layering".

Cheers.

1 - http://hopper.squarespace.com/blog/2008/6/21/build-your-own-amqp-client.html

Revision history for this message
Marek Kowalski (kowalski0123) wrote :

Hi Esteve,
Unfortunately this doesn't actually answer my arguments. Let's leave it.
I will fix myself the way I described. Are you interested in merging the patch after I'm done ?

Cheers,
MK

Changed in txamqp:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.