hacking rules for callback notifications

Bug #1547283 reported by Armando Migliaccio
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Wishlist
Unassigned

Bug Description

Callbacks were made available during the Liberty timeframe [1]. They have been put to use in a number of places, like [2] and [3].

Currently there's no formal convention or place where to locate callback notifications. This can represent a potential drawback where an inexperienced developer may accidentally add a duplicated notification (with exact signature) without carefully checking for existing notifications. A reviewer can also easily miss the error during code review.

This could lead to callbacks called multiple times from different places of workflow, without the callback being able to distinguish the difference: so long as callbacks are designed to be idempotent, this is fine, but if they aren't, then this could lead to unexpected results.

We should add a hacking rule that validates that a patch introducing a new notification hook, does not duplicate the exact signature of an existing notification.

Callbacks can (un)subscribe multiple times to the same resource event, and that's idempotent by design.

[1] http://docs.openstack.org/developer/neutron/devref/callbacks.html
[2] https://github.com/openstack/neutron/commit/868e67b480b08cc815d802cf950547c6b5ac0153
[3] https://github.com/openstack/neutron/commit/593b64dee4c0923fc85d6656e29a2beb27f27b17

Changed in neutron:
importance: Undecided → Wishlist
tags: added: low-hanging-fruit
Changed in neutron:
status: New → Confirmed
description: updated
description: updated
Jeff Augustine (ja224e)
Changed in neutron:
assignee: nobody → Jeffrey Augustine (ja224e)
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Jeffrey, thanks for taking this on. If you need clarifications, feel free to reach out.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/288964

Changed in neutron:
assignee: Jeffrey Augustine (ja224e) → Gabriele Cerami (gcerami)
status: Confirmed → In Progress
Revision history for this message
Gabriele Cerami (gcerami) wrote :

It's actually not really easy to get a proper 'signature'. Using only function names and filenames, it may still happens that the same callbacks is subscribed twice, even if it's called with two different names.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

I tend to agree with Kevin Benton's comment. Callback manager _get_id should make sure that the same exact callback get the same exact id. Then subscribe method should check for the presence of that id before adding it to the registry and maybe raise an error. Error like this could be probably be catched as early as in unit tests, if subscribers are forced to subscribe all available callbacks.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am not sure I fully understand the comment, and I must admit I only skimmed over the patch, but this isn't about callbacks being called more than once, because (un)subscription is idempotent [3,6] (I should probably clarify this in [1] as I realize I only documented this in [2,4,5]). It's the notification to the same event/resource with exact same payload that can be risky

[1] http://docs.openstack.org/developer/neutron/devref/callbacks.html
[2] https://github.com/openstack/neutron/blob/master/neutron/db/l3_db.py#L1723
[3] https://github.com/openstack/neutron/blob/master/neutron/callbacks/manager.py#L52
[4] https://github.com/openstack/neutron/blob/master/neutron/tests/unit/callbacks/test_manager.py#L60
[5] https://github.com/openstack/neutron/blob/master/neutron/tests/unit/callbacks/test_manager.py#L97
[6] https://github.com/openstack/neutron/blob/master/neutron/callbacks/manager.py#L67

Changed in neutron:
milestone: none → newton-1
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I misspoke: 'called' being 'subscribed'

description: updated
Revision history for this message
Gabriele Cerami (gcerami) wrote :

Thanks for the clarification, rule has to deal with registry.notify calls, not registry.subscribe. I'll try to refactor and reuse the same code, trying to get a signature from the notify arguments instead. The most difficult part will probably be including the **kwargs in the signature creation.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

I've hit a wall. I'm still trying to understand what to consider signature for a notify call.

Let's make an example to explain my doubts.

In neutron/plugins/ml2/plugin.py there's this notify call

kwargs = {'context': context, 'port': result}
registry.notify(resources.PORT, events.AFTER_CREATE, self, **kwargs)

Let's rule out the triggers, since we want to check notifies that come from different part
of the code, they are not part of the signature, so at least there's no reason to
evaluate 'self' in the call.

Now, suppose I'm the inexperienced developer, and I add a notify call in file
neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py

That looks like this.
kwargs = {'context': context, 'port': result, 'bridge': bridge } (yes, it's probably wrong, but I'm inexperienced)
registry.notify(resources.PORT, events.AFTER_CREATE, self, **kwargs)

Those two calls have the same signature ? I think they should, but they have different payloads
Even after understanding this, how do I evaluate **kwargs without executing the code ?
If it's not possible, and the above calls should NOT have the same signature instead, there's
no way to tell that the two **kwargs evaluate to different data.

So, is it enough to put only resources and events arguments as signature ? If not
any suggestions on how to evaluate kwargs ? (trying various combinations of compile and AST modules, but the results are poor and painfully complicated)
Once I have the **kwargs contents how do I compare the data ?

Thanks for any feedback.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

As you point out, two signatures may look alike and yet be different due to the content of the payload, and we have a few of those already.

I imagined a whitelist would help us bring the human judgement into the hacking rule, because a fully automated solution can get pretty complicated.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

How did you imagine this whitelist to be ?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am thinking along these lines, not sure on the feasibility/complexity though:

* Every time we add new notification we could be explicit and add a unique token (i.e. an integer) to the payload. The token can be identify the entire notification signature

* We'd keep the list of all notifications (token -> notify location), for documentation purposes, format to figure out.

* whenever a developer runs hackings rules, we'd fail on missing token, either in the code or in the list

* On failure we can determine whether it's appropriate to reuse an already assigned token (depending on human review).

This might add some burden to the dev process, but it may end up helping us catch the potential issue ahead of time.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

A callback is subscribed for a pair (resource, event). Can the rest of the payload be so different between two notify calls to the same pair to stop considering it a threat ? Is it really out of question to consider only the pair (resource, event) a signature ?
It may be difficult to manually maintain a map of arbitrary token associations. Rules have to be established and forced for token assignment too.
If we have to force developers to do something, why don't we force them to test callbacks for idempotency ? Whenever we encounter a subscribe(... callback_name .. ) we check for presence of a test_callback_idempotency_${callback_name} function in the tests.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Idempotency is a desirable property but not necessary for ensuring the intended behavior of a callback, so I wouldn't enforce it automatically.

Perhaps this report an example where human intervention cannot be replaced by automatic testing. I personally only wanted an alert mechanism for the reviewer/developer to think carefully about what is being added.

If we can't find an acceptable one, then I suppose we can punt until we did end up facing the issue. The process may lead us to learn more about the problem.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

I can only suggest to make a warning rule that uses only resource,event as signature to check when two notifies are called with the same pair, and warns the coder about possible side effects.
If this is not enough, I'll abandon the change. Maybe you should remove the low-hanging-fruit tag if you want to keep this bug open.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Why are you suggesting to remove the tag? I personally see low-hanging-fruit bugs as an opportunity for Neutron inexperienced developers (with python experience) to learn more about the internals of Neutron and get their teeth stuck into something that is not particularly critical to the system's lifeblood so that they don't feel the pressure of causing havoc if something goes astray.

To be this bug fits the description pretty well: one gets to learn about callbacks and how certain interactions are architected without necessarily getting into the deep woods of networking.

On the contrary, consider removing yourself to give someone else the chance to pick it up and/or to allow it to expire if deemed not worth the hassle.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Thanks for effort though! Much appreciated it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Gabriele Cerami (<email address hidden>) on branch: master
Review: https://review.openstack.org/288964
Reason: The path to an automated detection to this problem is still clouded (and not in the good sense) by the lack of real world cases. A befitting solution will have to wait.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

I didn't know the description of low-hanging-fruit, the name suggests that is something that could be reached easily, and I was imagining it described a problem which solution was easy to reach.Your description make a lot more sense. I'm abandoning and reassigning to nobody. If you have any suggestion for a medium-hanging-fruit, even involving deep woods of networking, I'll be happy to take a look. Thanks for the learning opportunity!

Changed in neutron:
assignee: Gabriele Cerami (gcerami) → nobody
status: In Progress → Confirmed
Changed in neutron:
assignee: nobody → j_king (james-agentultra)
Revision history for this message
j_king (james-agentultra) wrote :

My approach has been to generate two rules. The first enforces a strict convention for calling registry.notify and the second uses the implication of the convention to create hash signatures from the call sites to detect duplicates (class name + arguments).

This only has one drawback/limitation and that is we can only maintain state per-file due to the plugin architecture of the tool (without creating a multi-level, global cache in the factory module).

Once I have the tests passing I'll post a review.

Changed in neutron:
milestone: newton-1 → newton-2
Revision history for this message
j_king (james-agentultra) wrote :

Another way we could do this is to let the registry, which knows about notifiers at run-time, to log a warning when it detects a duplicate notifier. We could then have a gate job look out for these warnings.

The static analysis approach is proving brittle. It has to be highly restrictive and there's no guarantee it will catch all the possible ways one can work around it.

Changed in neutron:
milestone: newton-2 → newton-3
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This probably does not turn out to be as simple as I initially anticipated.

Changed in neutron:
status: Confirmed → Incomplete
milestone: newton-3 → none
assignee: j_king (james-agentultra) → nobody
tags: removed: low-hanging-fruit
Changed in neutron:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.