[RFE] Make controllers with different list of supported API extensions to behave identically

Bug #1672852 reported by Ihar Hrachyshka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Wishlist
Unassigned

Bug Description

The idea is to make controllers behave the same on API layer irrespective of the fact whether they, due to their different major versions, or because of different configuration files, support different lists of API extensions.

The primary use case here is when controllers are upgraded in rolling mode, when you have different major versions running and probably serving API requests in round-robin implemented by a frontend load balancer. If version N exposes extensions A,B,C,D, while N+1 exposes A,B,C,D,E, then during upgrade when both versions are running, API /extensions/ endpoint should return [A,B,C,D]. After all controllers get to the new major version, they can switch to [A,B,C,D,E].

This proposal implies there is mutual awareness of controller services about each other and their lists of supported extensions that will be achieved by storing lists in a new servers table, similar to agents tables we have.

On service startup, controllers will discover information about other controllers from the table and load only those extensions that are supported by all controller peers. We may also introduce a mechanism where a signal triggers reload of extensions based on current table info state, or a periodic reloading thread that will look at the table e.g. every 60 seconds. (An alternative could be discovering that info on each API request, but that would be too consuming.)

This proposal does not handle case where we drop an extension in a span of a single cycle (like replacing timestamp extension with timestamp_core). We may need to handle those cases by some other means (the easiest being not allowing such drastic in-place replacement of attribute format).

Changed in neutron:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
importance: Undecided → Wishlist
tags: added: rfe
Revision history for this message
Dolph Mathews (dolph) wrote :

As an aside, many of the rolling upgrade conversations with deployers have involved abandoning round-robin (which is currently the typical load balancing strategy) in favor of sticky sessions, which (in the base case, at least) prevents clients from seeing API extensions appear and disappear "randomly." Disregarding the other upsides and downsides of sticky sessions, that choice may mitigate the impact of this issue.

The proposal here describes a clustering behavior, which is (in my experience) relatively complicated (versus a non-clustered service), difficult to get correct (what controls the servers table, and what happens when it fails?), and complicates the deployer experience (order of operations, recovering from failure modes, etc).

Is there any reason why a simpler approach along the lines of feature flags would not solve the same issue? For example, if you assert that new features (i.e. API extensions) are not automatically exposed as a result of the upgrade process, but instead as a result of configuration changes, then new features like API extensions could be deployed as a result of canary deployment processes intended explicitly to roll out and test new features, rather than as automatically appearing along with upgraded code.

tags: added: api
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

> Disregarding the other upsides and downsides of sticky sessions, that choice may mitigate the impact of this issue.

Yes, that would solve extensions flipping for the same requester. It will not solve the issue where a feature is activated through a new service, but it requires cooperation from the old one. Let's say Nova detected presence of multiple port bindings API extension, and live migrated an instance using that new extension. Then another Nova node talks to an old Neutron node and fetches port binding for the port, and live migrates the same port using the old API. Will it cooperate properly? That probably depends on how the new extension is modeled, and we may try to solve it without the mechanism.

> Is there any reason why a simpler approach along the lines of feature flags would not solve the same issue?

Eventually that results in a bunch of options to maintain. Or do you mean a single option that will guard all new extensions, that you would bump after whole cluster is upgraded? Seems like you describe API version pinning, in a way, that would imply some form of API versioning in the first place. Something that we never actually accepted as a way forward in the Neutron community, and something that would be hard to implement considering we don't fully control what is loaded into neutron-server as API definition.

I accept complexity concerns though, and we may need to look if we can punt the proposal for the cycle.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-specs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/451993

Changed in neutron:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-specs (master)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: master
Review: https://review.openstack.org/451993
Reason: I am not going to pursue it this cycle. I will focus on OVO transition and neutron-db-manage CLI.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

With late OSIC flip-offs, I am not sure it's the best use for my time right now. I am going to unassign myself and suggest that we don't move with approving the proposal.

Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → nobody
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

This is in general a good idea, but it doesn't fit plans of existing upgrades team contributors. We may get back to it later in the life of neutron after we close gaps in database layer. In the meantime, there are some solutions that resolve some potential issues with rolling upgrades, like sticky load balancers.

We can revisit the RFE later.

tags: added: rfe-postponed
removed: rfe
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.