Bug #1566520 “[RFE] Upgrade controllers with no API downtime” : Bugs : neutron

Ihar Hrachyshka (ihar-hrachyshka) on 2016-04-05

Changed in neutron:
importance:	Undecided → Wishlist
tags:	added: rfe

Armando Migliaccio (armando-migliaccio) on 2016-04-14

Changed in neutron:
status:	New → Confirmed
summary:	- Upgrade controllers with no API downtime + [RFE]Upgrade controllers with no API downtime
summary:	- [RFE]Upgrade controllers with no API downtime + [RFE] Upgrade controllers with no API downtime

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-04-14:

#1

I was under the impression that the EXPAND branch could already be executed without shutting down the services. That said, the problem certainly lies with the CONTRACT branches. This feels like a rehash of parts of [1], and as such is part of a much larger solution required to accommodate a no downtime upgrade for the neutron servers. Looking at the DB alone won't suffice to ensure that requests are handled correctly during the upgrade. Please provide a more detailed plan, and break it down into chewable pieces for Newton. I suspect that once you do that the RFE title won't sound like 'Upgrade controllers with no API downtime'.

[1] https://blueprints.launchpad.net/neutron/+spec/online-schema-migrations

Revision history for this message

Artur Korzeniewski (artur-korzeniewski) wrote on 2016-04-15:

#2

The EXPAND branch can be executed in runtime, that improvement was introduced in Liberty.
The spec [1] introduces the expand/contract migrations but does not address the full story for rolling upgrades. There are pieces like oslo.versionedobjects, moving contracting migration to next releases and coexistence of N and N+1 neutron servers in the same time working on DB access.

I guess that this BP is an umbrella for finer grained RFEs like it is listed in the description:
- adopting object facades for all interaction with database models - already existing [2]
- moving contract migration to next releases - need new RFE
- coexistence of mixed version of neturon servers in the same time - new RFE needed

[1] https://github.com/openstack/neutron-specs/blob/master/specs/liberty/online-schema-migrations.rst
[2] https://bugs.launchpad.net/neutron/+bug/1541928

Revision history for this message

Artur Korzeniewski (artur-korzeniewski) wrote on 2016-04-15:

#3

And there is also the topic of online data migration, which can be done partially in runtime by accessing the values in OVO, and for data not used background python script should be added to migrate the data in small chunks. This also requires the separate RFE.

But all the work in rolling upgrades matter is dependent on finishing the OVO implementation first...

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2016-04-21:

#4

Upgrades effort map Edit (162.1 KiB, image/svg+xml)

I tried to map the effort around upgrades (in the .svg attachment). The sections in a black box are defining the required pieces to deliver for 'no-downtime' controller upgrade experience.

Legend:
- green: completed
- yellow: in progress
- orange: not started
- arrow: depends on
- dotted arrow: would benefit from

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-05-19:

#5

I don't understand what the right side of the picture means. Can you elaborate?

Having said that, It seems to me that to address this need all that's required is mostly procedural (e.g. left side of the diagram), once hurdles like versioning of objects and API endpoints is put in place. As for the former there's been an ongoing plan but what about the latter? How can we force API request handling to be at the lowest supportable version without microversioning? Have you given this any thought?

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-05-26:

#6

Ping?

Assaf Muller (amuller) on 2016-06-16

Changed in neutron:
status:	Confirmed → Triaged

Revision history for this message

Assaf Muller (amuller) wrote on 2016-06-16:

#7

http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2016-06-16.log.html#t2016-06-16T22:09:21

Ihar to supply more information next week, hopefully the work scoped for N will be clearer then.

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2016-06-20:

#8

Armando, sorry for a really late response. I somehow missed your pings in the whirl of other stuff.

> I don't understand what the right side of the picture means. Can you elaborate?

The .svg image maps out the effort that is being covered by upgrades subteam right now or planned for the future. The right side of the picture is part of the effort, but is not directly related to this RFE. Sorry for putting it there probably misleading readers.

What we should care about in the context of the RFE is the left side that is contoured by a solid line. Note that API versioning/pinning is out of the contour. This is because the RFE speaks about technical ability to run mixed versions of the controller service, without considering usability limitations like inconsistent replies from different load balanced controllers. Those are indeed put into the scheme to show the next steps, but those are beyond the consideration for this RFE.

What this RFE is to cover is getting to a point where:
- we have the code supporting the new mode of operation;
- ...and it's proved through targeted gating jobs that it's working;
- we provide framework to avoid data migration in alembic;
- ...and we actually forbid data migration in alembic.

Once all this is covered, I claim that neutron-server indeed supports the new model of operation. With that in place, we can start looking into tackling usability limitations we spot.

One of those limitations is indeed potentially inconsistent behaviour of neutron-server between different major versions. I actually believe that we will need to return back to microversioning idea for the next cycle. And if we manage to achieve most of the things that I mapped above for Newton/start of Ocata, I would love to look at detailed plan for API versioning/pinning.

That said, I don't believe we need such a detailed plan in place right now while we haven't laid down the foundation for mixed versions mode with objects work and proper gating for the feature.

So tl;dr I believe it's only the contoured blocks that should be tracked by the RFE, and other blocks will require a separate discussion.