Recent RPC namespacing breaks rolling upgrades

Bug #1430984 reported by Assaf Muller
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Assaf Muller
oslo.messaging
Fix Released
Undecided
Assaf Muller

Bug Description

Several patches merged as a part of:
https://review.openstack.org/#/q/status:merged+project:openstack/neutron+branch:master+topic:bp/rpc-docs-and-namespaces,n,z

Broke Neutron rolling upgrade (Specifically: Upgrading the server(s) before the agents or vice versa). This was done knowingly and discussed in the spec process. While we don't test a rolling upgrade scenario, there is no reason to break it knowingly. I've spoken to operators that have successfully performed such an upgrade from I to J and it will be very surprising to them if the same doesn't work from J to K.

The breakage comes from the introduction of RPC namespaces, a very useful concept of putting RPC endpoints in separate namespaces. i.e. you may place the same method name listening in the same process if it belongs to different namespaces.

Possible solutions:
Have the server listen on both the new namespaces, and in the root namespace. However, this effectively brings all such methods into one big namespace, so this kind of defeats the purpose of namespacing. We could delay by making a change in Oslo messaging where if a new and optional backwards_compatibility flag is passed in to a target along with a namespace, then the dispatcher will check against the namespace as well as the root namespace, and we simply stop passing the flag in the L cycle (This means that we only support rolling upgrades from version N to N+1). In order to support a scenario where an agent is upgraded before a server, then even with the proposed solution, all of the K agents would have to implement a fallback.

Testing:
I've been working on basic RPC tests:
https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:rpc_tests,n,z

But I don't think such a framework will allow us to test rolling upgrades. I can't think of an alternative to actually performing one and seeing what happens (A spin on the grenade job).

Assaf Muller (amuller)
Changed in neutron:
assignee: nobody → Assaf Muller (amuller)
Assaf Muller (amuller)
description: updated
description: updated
Revision history for this message
Russell Bryant (russellb) wrote :

Regarding testing rolling upgrades, Nova has a job for this that's currently based on nova-network. Take a look at it and see about tweaking it to provide a version that uses Neutron.

Regarding the bug, I think it will require work in both oslo.messaging and neutron. Here's a proposal ... take a current example:

    class DhcpRpcCallback(object):

        target = oslo_messaging.Target(
            namespace=constants.RPC_NAMESPACE_DHCP_PLUGIN,
            version='1.1')

1) update the Target class to accept namespaces (a list of namespaces) as an alternative to specifying a single one. This would allow you to specify that this class should be considered for methods targeted at more than one namespace. In Neutron we would then set:

    class DhcpRpcCallback(object):

        target = oslo_messaging.Target(
            namespaces=[constants.RPC_NAMESPACE_DHCP_PLUGIN, None],
            version='1.1')

2) Neutron will need an option that gets set during a live upgrade... something like juno_compat=True ... whatever. When this is enabled, all of the rpc clients should not use the namespace. When it gets unset after the upgrade is complete, clients can start using the namespace. There is precedent for this sort of thing during an upgrade. There are version pinning options you have to set in nova so you'd just add this to that process.

I appreciate you taking a look at this. I can't commit to doing this work, but I'm happy to review. As a backup plan, setting all of the NAMESPACE constants in neutron/common/constants.py to None would let us punt the problem to be addressed during Liberty development.

Assaf Muller (amuller)
Changed in oslo.messaging:
assignee: nobody → Assaf Muller (amuller)
Revision history for this message
Assaf Muller (amuller) wrote :

I'm with you on 1.

About 2, maybe perhaps of a configuration option, I'd implement a decorator for RPC methods to be used by clients, where if the message is to be sent in a namespace, it's sent, and if an unsupported version exception is caught, it resends (Exactly once) the message in the null namespace.

Cons: Every message is sent twice during an upgrade for upgraded agents IF they were upgraded *before* the server (I don't think this is common practice?).

Pros: Simpler for the admin.

I could also enable Oslo messaging to send a message in multiple namespaces, but, meh. Doesn't make too much sense.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/163673

Changed in oslo.messaging:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/163676

Revision history for this message
Russell Bryant (russellb) wrote :

Regarding the exception handling in clients, that will only work for call(), not cast(). You can't rely on any sort of response coming back about whether it worked or not in the general solution here, I'm afraid.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/163676
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0f0b8cfe535dc306e8acd3b190a40516ddb84280
Submitter: Jenkins
Branch: master

commit 0f0b8cfe535dc306e8acd3b190a40516ddb84280
Author: Assaf Muller <email address hidden>
Date: Wed Mar 11 22:11:33 2015 -0400

    Stop using RPC namespace to unbreak rolling upgrades

    This is a temporary patch until we get an Oslo messaging
    release that supports Targets with multiple namespaces:
    https://review.openstack.org/#/c/163673/

    Change-Id: I96e01c00991a9d8602ebc89dbad5206b805c67eb
    Related-Bug: #1430984

Changed in neutron:
milestone: none → kilo-3
Revision history for this message
Assaf Muller (amuller) wrote :

Gah, you're right Russel... Now I see the point of (Yet another) configuration option.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/163673
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=3be95adcebfef6e862eda2827a216048b4e9007a
Submitter: Jenkins
Branch: master

commit 3be95adcebfef6e862eda2827a216048b4e9007a
Author: Assaf Muller <email address hidden>
Date: Wed Mar 11 21:39:54 2015 -0400

    Add support for multiple namespaces in Targets

    In order for projects to use the namespace property of Targets
    in a backwards compatible way (To support rolling upgrades),
    a Target may now belong to more than a single namespace (i.e.
    'namespace1' and None). This way, if the server is upgraded first,
    the version that introduces namespaces to a project will place
    the server RPC methods in ['some_namespace', None]. Pre-upgrade
    agents will send messages in the null namespace while post-upgrade
    agents will send messages in 'some_namespace', and both will be
    accepted.

    Change-Id: I713fe9228111c36aa3f7fb95cbd59c99100e8c96
    Closes-Bug: #1430984

Changed in oslo.messaging:
status: In Progress → Fix Committed
Assaf Muller (amuller)
Changed in neutron:
status: New → In Progress
Kyle Mestery (mestery)
Changed in neutron:
milestone: kilo-3 → kilo-rc1
Revision history for this message
Assaf Muller (amuller) wrote :

Quick update:
https://review.openstack.org/#/c/163676/

Was merged which unbreaks rolling upgrades. A follow up fix requires a new version of Oslo messaging (With this patch: https://review.openstack.org/#/c/163673/) to allow Neutron to listen on two namespaces to support a smooth transition.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/166349

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I've backported the fix to stable/kilo for oslo.messaging. Doug told me the team considers another release, so it has chance to get there.

Kyle Mestery (mestery)
Changed in neutron:
importance: Undecided → High
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/kilo)

Reviewed: https://review.openstack.org/166349
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=9b14d1aa8aab0209b48cd706b771b9b55d0fece2
Submitter: Jenkins
Branch: stable/kilo

commit 9b14d1aa8aab0209b48cd706b771b9b55d0fece2
Author: Assaf Muller <email address hidden>
Date: Wed Mar 11 21:39:54 2015 -0400

    Add support for multiple namespaces in Targets

    In order for projects to use the namespace property of Targets
    in a backwards compatible way (To support rolling upgrades),
    a Target may now belong to more than a single namespace (i.e.
    'namespace1' and None). This way, if the server is upgraded first,
    the version that introduces namespaces to a project will place
    the server RPC methods in ['some_namespace', None]. Pre-upgrade
    agents will send messages in the null namespace while post-upgrade
    agents will send messages in 'some_namespace', and both will be
    accepted.

    Change-Id: I713fe9228111c36aa3f7fb95cbd59c99100e8c96
    Closes-Bug: #1430984
    (cherry picked from commit 3be95adcebfef6e862eda2827a216048b4e9007a)

tags: added: in-stable-kilo
Changed in oslo.messaging:
milestone: none → 1.8.1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.