Properly consume Neutron security group updates

Bug #1629310 reported by Neil Jerram on 2016-09-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-calico
Undecided
Unassigned

Bug Description

Moved here from https://github.com/projectcalico/felix/issues/652:

Lukasa commented on 15 Jun 2015

Assuming https://github.com/projectcalico/felix/pull/641 is merged, we'll be relying on a fairly brittle monkeypatch to get hold of security group updates on Neutron. This is fundamentally because the ML2 model takes control out of our hands, assuming that the mechanism driver does not want access to security group rule changes.

We should aim to write code that does this properly. There are a few ways I can see of doing it:

    Become a fully-fledged Neutron plugin. My understanding is that proper plugins get access to the full set of data, which means we can properly handle SG calls.

    Write an 'agent' (may be an extension to the ML2 Mechanism Driver) that can run as a Firewall Driver. Neutron expects this to run as a proper agent, which doesn't really suit us, because we'd need to run one per machine. Painful.

    Attempt to lie to Neutron by consuming the security group AMQP RPC mechanism ourselves. This is also probably quite painful, and additionally will require us to write quite a lot of backing code.

I'm not really sure what the best approach here would be. It'd be good if @neiljerram could consider this problem a little bit: @fasaxc as well.

Edit: For clarity, I want to add a bit more explanation about what's going on here.

Currently what we have in our code is a bit of code that monkeypatches Neutron on import. This changes Neutron code on the fly whenever Calico is imported.

Doing this essentially pins us to an unstable interface, because the Neutron folks will rightly consider themselves able to change this code at any time without telling us. This runs the very real risk of breaking us down the road with absolutely no warning. Customers will believe they can take patches that they cannot safely take. Additionally, the maintenance burden of maintaining several different monkeypatches will be tricky.

This interface has been stable for our three currently-supported releases (Icehouse, Juno, Kilo), but there is no guarantee that it will remain stable, even within patch releases of those versions. This makes this code essentially a ticking time-bomb, waiting to make our lives difficult at a time we don't control.

To do this right we really need to find a supported interface. Right now the only appropriate supported interface for security groups is an RPC-based one that expects an agent, which is a bit annoying, but we can handle that. We're interested in whether we can get a better option, or whether we have to write an agent to consume that interface.

neiljerram commented on 15 Jun 2015

Or, I think, a 4th option: Per Kevin Benton's email at http://lists.openstack.org/pipermail/openstack-dev/2015-June/066218.html, use Neutron's callbacks registry to implement a (general) way for mechanism drivers to be notified after the forking has completed; then use that in the Calico mechanism driver to set up the CalicoNotifierProxy. Then back port all of that to all the OpenStack releases that we care about.

fasaxc commented on 22 Jun 2015

Option 5: update Neutron to allow us to consume those events from an ML2.

Option 6: ask upstream from an architectural way to do it or at least a recommendation?

@neiljerram maybe I'm missing something but it feels like the notifier proxy is also a bit of a monkey-patch. It just uses a different technique and it still assumes that the object in question is never reinitialised (an assumption that could be broken upstream fairly easily).

neiljerram commented on 23 Jun 2015

Agreed - but I think down the list compared to the other OpenStack interactions that I'm pushing at the moment.

I forget where we are as regards any immediate problems, though. Do we have working code at the moment?

Lukasa commented on 23 Jun 2015

We do, but it "works" for a given definition of works. It's a monkeypatch, and it's pretty horrible.

fasaxc commented on 30 Jun 2015

@Lukasa, does @neiljerram have the next action here: ask upstream if there's a template we can follow?

neiljerram commented on 30 Jun 2015

@fasaxc Sorry - I actually got a steer on this yesterday. Will paste that in here:

On 29/06/15 14:08, Gal Sagie wrote:
> Yes, look at this patch: https://review.openstack.org/#/c/174588/
>
>
> On Mon, Jun 29, 2015 at 3:42 PM, Neil Jerram <<email address hidden>
> <mailto:<email address hidden>>> wrote:
>
> Hi there,
>
> For my team's networking backend, we want to catch security group
> updates in our ML2 mechanism driver code.
>
> Currently we're doing this by monkey patching the AgentNotifierApi:
>
> # This section monkeypatches the
> AgentNotifierApi.security_groups_rule_updated
> # method to ensure that the Calico driver gets told about
> security group
> # updates at all times. This is a deeply unpleasant hack.
> Please, do as I say,
> # not as I do.
> #
> # For more info, please see issues #635 and #641.
> original_sgr_updated =
> rpc.AgentNotifierApi.security_groups_rule_updated
>
>
> def security_groups_rule_updated(self, context, sgids):
> LOG.info("security_groups_rule_updated: %s %s" % (context,
> sgids))
> mech_driver.send_sg_updates(sgids, context)
> original_sgr_updated(self, context, sgids)
>
>
> rpc.AgentNotifierApi.security_groups_rule_updated = (
> security_groups_rule_updated
> )
>
> But, as the comment says, this is a hack. Is there a better way?
>
> Many thanks,
> Neil

I've looked at the patch that Gal pointed to, and it looks like it will indeed address our problem in a reasonable way.

Unfortunately the change was only merged after the Kilo release
(2015.1.0). Hence we would need to do some non-trivial backporting work
for Kilo, Juno and Icehouse. (Increasingly non-trivial as we go back in
time, I guess.) But at least we will be good in Liberty!

I'm happy to do that backporting work, but can't easily absorb it in
current sprint. Could we live with the current situation until next sprint?

Lukasa commented on 30 Jun 2015

So, the current expectation here is that we'll backport this work next sprint and then build on top of it.

Lukasa commented on 8 Jul 2015

New plan.

Given that this change is going to be in Liberty we're going to do the following:

    When Liberty drops, update the mechanism driver to use the proper interface where it is available.

    Assume that the odds of the Neutron team breaking the interface we monkeypatched midcycle in Icehouse, Juno, or Kilo is really low, and that we can detect it early.

    As a result, not bother to backport the work to those releases.

That is the most efficient use of our time, so for now that's what we'll do.

neiljerram commented on 8 Jul 2015

Sounds good to me

fasaxc commented on 19 Jul

I checked the code, looks like we still do this monkeypatching, even though we're now a core plugin. Did the patch to allow us to trap SG updates in a mechanism driver ever land?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers