Several patches merged as a part of:
https://review.openstack.org/#/q/status:merged+project:openstack/neutron+branch:master+topic:bp/rpc-docs-and-namespaces,n,z
Broke Neutron rolling upgrade (Specifically: Upgrading the server(s) before the agents or vice versa). This was done knowingly and discussed in the spec process. While we don't test a rolling upgrade scenario, there is no reason to break it knowingly. I've spoken to operators that have successfully performed such an upgrade from I to J and it will be very surprising to them if the same doesn't work from J to K.
The breakage comes from the introduction of RPC namespaces, a very useful concept of putting RPC endpoints in separate namespaces. i.e. you may place the same method name listening in the same process if it belongs to different namespaces.
Possible solutions:
Have the server listen on both the new namespaces, and in the root namespace. However, this effectively brings all such methods into one big namespace, so this kind of defeats the purpose of namespacing. We could delay by making a change in Oslo messaging where if a new and optional backwards_compatibility flag is passed in to a target along with a namespace, then the dispatcher will check against the namespace as well as the root namespace, and we simply stop passing the flag in the L cycle (This means that we only support rolling upgrades from version N to N+1). In order to support a scenario where an agent is upgraded before a server, then even with the proposed solution, all of the K agents would have to implement a fallback.
Testing:
I've been working on basic RPC tests:
https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:rpc_tests,n,z
But I don't think such a framework will allow us to test rolling upgrades. I can't think of an alternative to actually performing one and seeing what happens (A spin on the grenade job).
Regarding testing rolling upgrades, Nova has a job for this that's currently based on nova-network. Take a look at it and see about tweaking it to provide a version that uses Neutron.
Regarding the bug, I think it will require work in both oslo.messaging and neutron. Here's a proposal ... take a current example:
class DhcpRpcCallback (object) :
target = oslo_messaging. Target(
namespace= constants. RPC_NAMESPACE_ DHCP_PLUGIN,
version= '1.1')
1) update the Target class to accept namespaces (a list of namespaces) as an alternative to specifying a single one. This would allow you to specify that this class should be considered for methods targeted at more than one namespace. In Neutron we would then set:
class DhcpRpcCallback (object) :
target = oslo_messaging. Target(
namespaces =[constants. RPC_NAMESPACE_ DHCP_PLUGIN, None],
version= '1.1')
2) Neutron will need an option that gets set during a live upgrade... something like juno_compat=True ... whatever. When this is enabled, all of the rpc clients should not use the namespace. When it gets unset after the upgrade is complete, clients can start using the namespace. There is precedent for this sort of thing during an upgrade. There are version pinning options you have to set in nova so you'd just add this to that process.
I appreciate you taking a look at this. I can't commit to doing this work, but I'm happy to review. As a backup plan, setting all of the NAMESPACE constants in neutron/ common/ constants. py to None would let us punt the problem to be addressed during Liberty development.