notifier endless loops in is_primitive

Bug #934575 reported by Chris Behrens
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Russell Bryant

Bug Description

The recent change for rpc connection pool stuff (https://review.openstack.org/#change,3893) unfortunately broke notifications in the manager when exceptions are raised.

The notifier payload includes the 'context' used for the manager call. Unfortunately this is typically an RpcContext which now includes the rpc connection_pool instance.

Debug output:

working on {'connection_cls': <class 'nova.rpc.impl_kombu.Connection'>, 'current_size': 1, 'order_as_stack': True, 'free_items': deque([<nova.rpc.impl_kombu.Connection object at 0x389aa50>]), 'min_size': 0, 'max_size': 30, 'channel': <LightQueue at 0x3592a90 maxsize=0>} (level=2)

This is the connection pool. unfortunately 'deque' has a list of connections... and kombu's Connection class has an attribute 'consumer_num' which is: itertools.count(). When to_primitive() reaches this.... it spins forever iterating on it.

I think there are multiple problems here.

1) to_primitive tries to recurse only so many levels... and there are a number of recursive calls that do not increase the 'level'.
2) We probably shouldn't store the connection_pool in the rpc context.

Revision history for this message
Russell Bryant (russellb) wrote :

Since I introduced the bug, I'll assign this to myself and work on it first thing this week.

Changed in nova:
assignee: nobody → Russell Bryant (russellb)
status: New → Confirmed
importance: Undecided → Medium
milestone: none → essex-4
Revision history for this message
Chris Behrens (cbehrens) wrote :

Thanks! My initial instinct was to not store connection_pool in the context but to just pass it as an arg to msg_reply.... however... that wouldn't work for the case where you can do a context.reply() in a manager. Now... we don't use that behavior currently, though... and I wonder how useful it is. If we can ditch the context.reply() support, the fix is rather easy.

Maybe you can come up with something different.

Revision history for this message
Chris Behrens (cbehrens) wrote :

Er, s/msg_reply/ctxt.reply() in ProxyCallback.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/4356

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Monsyne Dragon (mdragon) wrote :

it should also be noted that there is no reason to include the context in the notifications. the fact it does was pretty much accidental.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/4356
Committed: http://github.com/openstack/nova/commit/2a9b66c3dba5f7fb13c7b4e7442eae2bc5dbc130
Submitter: Jenkins
Branch: master

commit 2a9b66c3dba5f7fb13c7b4e7442eae2bc5dbc130
Author: Russell Bryant <email address hidden>
Date: Tue Feb 21 11:33:51 2012 -0500

    Don't store connection pool in RpcContext.

    Fix bug 934575.

    As Chris pointed out in the bug, there is a downside to this approach to
    fixing the bug, in that a manager will no longer be able to use
    context.reply(). However, it wasn't being used at all, and it's no loss
    in functionality. A remote method can still return multiple values (in
    response to a multicall()) by using yield.

    Change-Id: I0e5aff2e8a40ffd8390c0e19d89dd17e60a74130

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: essex-4 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.