neutron

[RFE] Make controllers with different list of supported API extensions to behave identically

Bug #1672852 reported by Ihar Hrachyshka on 2017-03-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	Won't Fix	Wishlist	Unassigned

Bug Description

The idea is to make controllers behave the same on API layer irrespective of the fact whether they, due to their different major versions, or because of different configuration files, support different lists of API extensions.

The primary use case here is when controllers are upgraded in rolling mode, when you have different major versions running and probably serving API requests in round-robin implemented by a frontend load balancer. If version N exposes extensions A,B,C,D, while N+1 exposes A,B,C,D,E, then during upgrade when both versions are running, API /extensions/ endpoint should return [A,B,C,D]. After all controllers get to the new major version, they can switch to [A,B,C,D,E].

This proposal implies there is mutual awareness of controller services about each other and their lists of supported extensions that will be achieved by storing lists in a new servers table, similar to agents tables we have.

On service startup, controllers will discover information about other controllers from the table and load only those extensions that are supported by all controller peers. We may also introduce a mechanism where a signal triggers reload of extensions based on current table info state, or a periodic reloading thread that will look at the table e.g. every 60 seconds. (An alternative could be discovering that info on each API request, but that would be too consuming.)

This proposal does not handle case where we drop an extension in a span of a single cycle (like replacing timestamp extension with timestamp_core). We may need to handle those cases by some other means (the easiest being not allowing such drastic in-place replacement of attribute format).

Tags:

Ihar Hrachyshka (ihar-hrachyshka) on 2017-03-14

Changed in neutron:
assignee:	nobody → Ihar Hrachyshka (ihar-hrachyshka)
importance:	Undecided → Wishlist
tags:	added: rfe

Revision history for this message

Dolph Mathews (dolph) wrote on 2017-03-20:

As an aside, many of the rolling upgrade conversations with deployers have involved abandoning round-robin (which is currently the typical load balancing strategy) in favor of sticky sessions, which (in the base case, at least) prevents clients from seeing API extensions appear and disappear "randomly." Disregarding the other upsides and downsides of sticky sessions, that choice may mitigate the impact of this issue.

The proposal here describes a clustering behavior, which is (in my experience) relatively complicated (versus a non-clustered service), difficult to get correct (what controls the servers table, and what happens when it fails?), and complicates the deployer experience (order of operations, recovering from failure modes, etc).

Is there any reason why a simpler approach along the lines of feature flags would not solve the same issue? For example, if you assert that new features (i.e. API extensions) are not automatically exposed as a result of the upgrade process, but instead as a result of configuration changes, then new features like API extensions could be deployed as a result of canary deployment processes intended explicitly to roll out and test new features, rather than as automatically appearing along with upgraded code.

Ihar Hrachyshka (ihar-hrachyshka) on 2017-03-22

tags:

added: api

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2017-03-22:

> Disregarding the other upsides and downsides of sticky sessions, that choice may mitigate the impact of this issue.

Yes, that would solve extensions flipping for the same requester. It will not solve the issue where a feature is activated through a new service, but it requires cooperation from the old one. Let's say Nova detected presence of multiple port bindings API extension, and live migrated an instance using that new extension. Then another Nova node talks to an old Neutron node and fetches port binding for the port, and live migrates the same port using the old API. Will it cooperate properly? That probably depends on how the new extension is modeled, and we may try to solve it without the mechanism.

> Is there any reason why a simpler approach along the lines of feature flags would not solve the same issue?

Eventually that results in a bunch of options to maintain. Or do you mean a single option that will guard all new extensions, that you would bump after whole cluster is upgraded? Seems like you describe API version pinning, in a way, that would imply some form of API versioning in the first place. Something that we never actually accepted as a way forward in the Neutron community, and something that would be hard to implement considering we don't fully control what is loaded into neutron-server as API definition.

I accept complexity concerns though, and we may need to look if we can punt the proposal for the cycle.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-03-30: Related fix proposed to neutron-specs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/451993

Ihar Hrachyshka (ihar-hrachyshka) on 2017-03-30

Changed in neutron:
status:	New → Triaged

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-04-21: Change abandoned on neutron-specs (master)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: master
Review: https://review.openstack.org/451993
Reason: I am not going to pursue it this cycle. I will focus on OVO transition and neutron-db-manage CLI.

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2017-04-21:

With late OSIC flip-offs, I am not sure it's the best use for my time right now. I am going to unassign myself and suggest that we don't move with approving the proposal.

Changed in neutron:
assignee:	Ihar Hrachyshka (ihar-hrachyshka) → nobody

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2017-07-13:

This is in general a good idea, but it doesn't fit plans of existing upgrades team contributors. We may get back to it later in the life of neutron after we close gaps in database layer. In the meantime, there are some solutions that resolve some potential issues with rolling upgrades, like sticky load balancers.

We can revisit the RFE later.

tags:

added: rfe-postponed
removed: rfe

Revision history for this message

Rodolfo Alonso (rodolfo-alonso-hernandez) wrote on 2022-10-19:

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.