Liberty server and Kilo security group aware agent fail to refresh firewall for DHCP and router IPv6 ports
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Ihar Hrachyshka |
Bug Description
When we try to mix Liberty server with Kilo L2 agent, we get the following traceback in the agent log:
ERROR oslo_messaging.
TRACE oslo_messaging.
TRACE oslo_messaging.
TRACE oslo_messaging.
TRACE oslo_messaging.
TRACE oslo_messaging.
TRACE oslo_messaging.
In Kilo, server just dropped a bare notification about some change, and the firewall was reset for all devices; in Liberty, it now passes the list of devices to refresh, so that firewall setup on security group change is more optimized.
Missing the notification could mean any kind of issues that will all go back to ‘my firewall is not updated after security group change’. For what I see in the code, it would affect DHCP and router IPv6 ports only.
Now, since the signature of the RPC call was changed (adding the list of devices), the server requires version = 1.3 for the agent endpoint that would know about the new argument. If that would be a usual notification directed specifically to the agent, we would just use call() instead of cast() and handle UnsupportedVersion exception by calling remotely without the device list. But since it’s fanout, we can’t do it.
The solution for the upgrade issue would probably be reverting the optimization in Liberty. Since we don’t support spanning upgrades through multiple cycles just yet, it should be enough.
Other alternatives do not seem to work here:
- cast()ing for both new and old signatures would effectively disable the optimization, because the same agent would receive both versions of the method, and the old one will trigger full firewall reset anyway;
- calling cast() with the new signature but without the version specified would probably make the older Kilo agent to crash in a more horrible way; (note: I need to check that locally).
Side note: it’s interesting that we have a backwards compatible code on agent side to accommodate to older servers. I will probably kill it since it’s not in line with usual rolling upgrade scenarios that we support where you never run a server older than an agent in the cluster.
Changed in neutron: | |
importance: | Undecided → High |
assignee: | nobody → Ihar Hrachyshka (ihar-hrachyshka) |
tags: | added: upgrade |
tags: | added: liberty-backport-potential |
tags: | removed: liberty-backport-potential upgrade |
Hi Ihar,
thanks for looking into this.
I think we need to find a good and long term strategy for this kind of problem. We might want to increase the RPC version of a cast whose server side is on the agent again in future.
Here is a possible solution: in Liberty we can introduce the new code in the agent, increase the version there but in the neutron server that is using the client side of the RPC won't require the newer version but still use the old one. So the Liberty server will be able to work with the Kilo agents. In Mitaka we can require version 1.3, since Liberty agents will be able to handle it.
If we go this way we should keep the backward compatible code on the agent side to accommodate older servers because it will be needed if both agentsand server are using Liberty. What do you think?