Newton to Ocata - upgrade to ovs 2.5->2.6 with current workaround and lose connectivity

Bug #1669714 reported by Marios Andreou on 2017-03-03
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Marios Andreou

Bug Description

Up to openvswitch 2.5 for both minor update and major upgrade in TripleO we have had to use a special case workaround [1] to upgrade ovs 2.4->2.5 because of bug 1635205 [2]. This is executed before the general package update as otherwise connectivity would be lost if openvswitch were upgraded with a regular yum update.

However with the arrival of openvswitch 2.6 *using* the workaround causes the
node to lose IPs whilst a regular yum update delivers ovs 2.6 without problems (seems the rpm script fixes have arrived).

So we need to remove this workaround ... we will need it backported as far as
possible (mitaka presently?). Otherwise wherever openvswitch 2.6 is available
using the workaround to upgrade it will cause the upgrade to hang as the node
loses connectivity:

My local env where I hit this I was going from openvswitch-2.5.0-14.git20160727.el7fdp.x86_64 to openvswitch-2.6.1-8.git20161206.el7fdb.x86_64 testing the N..O upgrade workflow:

        [stack@instack ~]$ upgrade-non-controller.sh --upgrade overcloud-novacompute-0

        Feb 22 12:47:09 overcloud-compute-0.localdomain ovs-ctl[302367]: Configuring Open vSwitch system IDs [ OK ]
        Feb 22 12:47:09 overcloud-compute-0.localdomain systemd[1]: Started Open vSwitch Database Unit.
        Feb 22 12:47:09 overcloud-compute-0.localdomain systemd[1]: Starting Open vSwitch Forwarding Uni
        ...
        --> Running transaction check
        ---> Package openvswitch.x86_64 1:2.6.1-8.git20161206.el7fdb will be installed
        --> Finished Dependency Resolution
        Updating openvswitch-2.6.1-8.git20161206.el7fdb.x86_64.rpm with nopostun option
        ... (nothing)

        [stack@instack ~]$ ping 192.0.2.16
        PING 192.0.2.16 (192.0.2.16) 56(84) bytes of data.
        From 192.0.2.1 icmp_seq=1 Destination Host Unreachable

I posted https://review.openstack.org/#/c/436990/ Remove the openvswitch special case in tripleo_upgrade_node.sh Icd1517bcade36781fa0da21d045ffd9ec68efc38 (tripleo-heat-templates) to remove this for N..O but filing this bug since we'll need the backports too

[1] https://github.com/openstack/tripleo-heat-templates/blob/afcb6e01f3af573a7bdd286a65b71eee48cec204/extraconfig/tasks/pacemaker_common_functions.sh#L301-L320
[2] https://bugs.launchpad.net/tripleo/+bug/1635205

description: updated
description: updated
Changed in tripleo:
milestone: pike-1 → ocata-rc2
tags: added: ocata-backport-potential
tags: added: mitaka-backport-potential newton-backport-potential
Marios Andreou (marios-b) wrote :

So I added the backport potential tags but I'm waiting to hear confirmation about how far back ovs 2.6 will be available. I posted the reviews for stable/ocata (like master) and stable/newton (different, more files to fix because of the workflow difference) at https://review.openstack.org/441192

Changed in tripleo:
milestone: ocata-rc2 → pike-1

Reviewed: https://review.openstack.org/436990
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9025a3bc23834e31efc5021acaef80b8d0f5de73
Submitter: Jenkins
Branch: master

commit 9025a3bc23834e31efc5021acaef80b8d0f5de73
Author: marios <email address hidden>
Date: Wed Feb 22 17:29:45 2017 +0200

    Remove the openvswitch special case upgrade code

    Removed from the tripleo_upgrade_node.sh (major upgrade) & yum_update.sh
    (minor update). The workaround is no longer needed and in fact has the
    opposite effect killing connectitivity to the node. The 'normal' yum
    update on nodes delivers the latest openvswitch 2.6.1 with no drama.

    Also adds a 'complete' message, some extra debug echo for logs
    and removes the python-zaqarclient install no longer needed

    Closes-Bug: 1669714
    Change-Id: Icd1517bcade36781fa0da21d045ffd9ec68efc38

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/441184
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=59e5f9597eb37f69045e470eb457b878728477d7
Submitter: Jenkins
Branch: stable/ocata

commit 59e5f9597eb37f69045e470eb457b878728477d7
Author: marios <email address hidden>
Date: Wed Feb 22 17:29:45 2017 +0200

    Remove the openvswitch special case upgrade code

    Removed from the tripleo_upgrade_node.sh (major upgrade) & yum_update.sh
    (minor update). The workaround is no longer needed and in fact has the
    opposite effect killing connectitivity to the node. The 'normal' yum
    update on nodes delivers the latest openvswitch 2.6.1 with no drama.

    Also adds a 'complete' message, some extra debug echo for logs
    and removes the python-zaqarclient install no longer needed

    Closes-Bug: 1669714
    Change-Id: Icd1517bcade36781fa0da21d045ffd9ec68efc38
    (cherry picked from commit 9025a3bc23834e31efc5021acaef80b8d0f5de73)

tags: added: in-stable-ocata
Marios Andreou (marios-b) wrote :

update: spent some time trying to dig into this some more today. The issue is also discussed in https://bugzilla.redhat.com/show_bug.cgi?id=1431108 OSP 11 which refers to this bug so pointing back for completeness.

For ocata we landed https://review.openstack.org/441184. Note that the special case ovs code isn't currently exercised in tripleo upgrade ci because we don't use it in the composable ansible upgrade steps and the part that did have it (removed in 441184) is not exercised here, which is the upgrade-non-controller.sh against compute/swift nodes.

For newton, and even earlier, the answer depends entirely on the OS you have installed on your overcloud nodes. For rhel7.3/centos7.3 then the answer is ovs 2.6 regardless if you are deploying mitaka/newton/ocata.

Unfortunately we don't currently have ci coverage for mitaka to newton upgrades. I suspect we'd see the issue there (i.e. upgrade hanging because it is trying to special case upgrade ovs) because we are using the special case code as part of the upgrade @ https://github.com/openstack/tripleo-heat-templates/blob/7240998433f9fac3af47d4a2b40f52b241e8bdef/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh#L103 - and finally similar case for mitaka but lets get newton sorted first ;) https://github.com/openstack/tripleo-heat-templates/blob/191df2bf1f2c0ae36be503285e84f81f8173a6b8/extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh#L150

Newton review at https://review.openstack.org/#/c/441192/ - I will remove my -2 since ocata landed, and afaics we need it. Some folks have said they'd start jobs localy to test m..n and hopefully we can get more confirmation from them too.

thanks, marios

Changed in tripleo:
milestone: pike-1 → ongoing

Reviewed: https://review.openstack.org/441192
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f8e6b03458b0a0799f93972e050776bc317f199a
Submitter: Jenkins
Branch: stable/newton

commit f8e6b03458b0a0799f93972e050776bc317f199a
Author: marios <email address hidden>
Date: Fri Mar 3 17:18:34 2017 +0200

    Remove the openvswitch special case upgrade code

    Removed from the major upgrade and minor update scripts. The
    workaround is no longer needed and in fact has the opposite
    effect killing connectitivity to the node. The 'normal' yum
    update on nodes delivers the latest openvswitch 2.6.1 OK.

    Closes-Bug: 1669714
    Change-Id: I5d0ae4e76a2c61faceddc444f91b09aef52b62f0
    (cherry picked from commit 59e5f9597eb37f69045e470eb457b878728477d7)

tags: added: in-stable-newton
Marios Andreou (marios-b) wrote :

Update: seems like we should *still* carry a special case upgrade for openvswitch and specifically ovs 2.5.0-14 - I've decided to use the same bug in an attempt to minimize the inevitable confusion here :(

Please see the discussion at https://bugzilla.redhat.com/show_bug.cgi?id=1424945#c11 for more information but essentially the workaround is the same as the one we previously had, with the addition of the '--notriggerun' flag for the package update.

I have just posted https://review.openstack.org/450607 Add special case upgrade from openvswitch 2.5.0-14 and matbu has this for the ansible steps at https://review.openstack.org/#/c/434346/ (which I'll also be updating momentarily to point to this bug).

Marios Andreou (marios-b) wrote :

so I wonder if we *can* use this bug since it is fix released... I think I may have to file a new one, will update here if I do.

Reviewed: https://review.openstack.org/450607
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=25983882c2f7a8e8f8fb83bd967a67d008a556a4
Submitter: Jenkins
Branch: master

commit 25983882c2f7a8e8f8fb83bd967a67d008a556a4
Author: marios <email address hidden>
Date: Tue Mar 28 10:44:41 2017 +0300

    Add special case upgrade from openvswitch 2.5.0-14

    In [1] we removed the previously used special case upgrade code.
    However we have since discovered that for openvswitch 2.5.0-14
    the special case is still required with an extra flag to prevent
    the restart. This adds the upgrade code back into the minor
    update and 'manual upgrade' scripts for compute/swift. The
    review at If998704b3c4199bbae8a1d068c31a71763f5c8a2 is adding
    this logic for the ansible upgrade steps.

    Related-Bug: 1669714
    [1] https://review.openstack.org/#/q/59e5f9597eb37f69045e470eb457b878728477d7
    Change-Id: I3e5899e2d831b89745b2f37e61ff69dbf83ff595

Reviewed: https://review.openstack.org/452524
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=2e7c850fa1d78f2b21e5017e03b15a066a0666da
Submitter: Jenkins
Branch: stable/ocata

commit 2e7c850fa1d78f2b21e5017e03b15a066a0666da
Author: marios <email address hidden>
Date: Tue Mar 28 10:44:41 2017 +0300

    Add special case upgrade from openvswitch 2.5.0-14

    In [1] we removed the previously used special case upgrade code.
    However we have since discovered that for openvswitch 2.5.0-14
    the special case is still required with an extra flag to prevent
    the restart. This adds the upgrade code back into the minor
    update and 'manual upgrade' scripts for compute/swift. The
    review at If998704b3c4199bbae8a1d068c31a71763f5c8a2 is adding
    this logic for the ansible upgrade steps.

    Related-Bug: 1669714
    [1] https://review.openstack.org/#/q/59e5f9597eb37f69045e470eb457b878728477d7
    Change-Id: I3e5899e2d831b89745b2f37e61ff69dbf83ff595
    (cherry picked from commit 25983882c2f7a8e8f8fb83bd967a67d008a556a4)

Reviewed: https://review.openstack.org/434346
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d2d319ec0ead06b860f8464b001048fb4f723788
Submitter: Jenkins
Branch: master

commit d2d319ec0ead06b860f8464b001048fb4f723788
Author: Mathieu Bultel <email address hidden>
Date: Wed Feb 15 16:36:17 2017 +0100

    Add manual ovs upgrade script for workaround ovs upgrade issue

    When we upgrade OVS from 2.5 to 2.6, the postrun package update
    restart the services and drop the connectivity
    We need to push this manual upgrade script and executed to the
    nodes for newton to ocata

    The special case is needed for 2.5.0-14 specifically see related
    bug for more info (or, older where the postun tries restart).
    See related review at [1] for the minor update/manual upgrade.

    Related-Bug: 1669714
    Depends-On: I3227189691df85f265cf84bd4115d8d4c9f979f3
    Co-Authored-By: Sofer Athlan-Guyot <email address hidden>

    [1] https://review.openstack.org/#/c/450607/

    Change-Id: If998704b3c4199bbae8a1d068c31a71763f5c8a2

Reviewed: https://review.openstack.org/451231
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d3f47eb0b97bab298759021162efebed45c658d0
Submitter: Jenkins
Branch: stable/ocata

commit d3f47eb0b97bab298759021162efebed45c658d0
Author: Mathieu Bultel <email address hidden>
Date: Wed Feb 15 16:36:17 2017 +0100

    Add manual ovs upgrade script for workaround ovs upgrade issue

    When we upgrade OVS from 2.5 to 2.6, the postrun package update
    restart the services and drop the connectivity
    We need to push this manual upgrade script and executed to the
    nodes for newton to ocata

    The special case is needed for 2.5.0-14 specifically see related
    bug for more info (or, older where the postun tries restart).
    See related review at [1] for the minor update/manual upgrade.

    Related-Bug: 1669714
    Depends-On: I3227189691df85f265cf84bd4115d8d4c9f979f3
    Co-Authored-By: Sofer Athlan-Guyot <email address hidden>

    [1] https://review.openstack.org/#/c/450607/

    Change-Id: If998704b3c4199bbae8a1d068c31a71763f5c8a2
    (cherry picked from commit d2d319ec0ead06b860f8464b001048fb4f723788)

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b1 development milestone.

Marios Andreou (marios-b) wrote :

just posted the stable/newton "re-addition of workaround with the --notriggerun" at https://review.openstack.org/458737 prompted by discussion at https://bugzilla.redhat.com/show_bug.cgi?id=1416088#c5

Reviewed: https://review.openstack.org/458737
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=b0e4970a7c7bd0a8e908abe7071d700968f8a3ff
Submitter: Jenkins
Branch: stable/newton

commit b0e4970a7c7bd0a8e908abe7071d700968f8a3ff
Author: marios <email address hidden>
Date: Tue Mar 28 10:44:41 2017 +0300

    Add special case upgrade from openvswitch 2.5.0-14

    In [1] we removed the previously used special case upgrade code.
    However we have since discovered that for openvswitch 2.5.0-14
    the special case is still required with an extra flag to prevent
    the restart. This adds the upgrade code back into the update
    and upgrade scripts.

    Related-Bug: 1669714
    [1] https://review.openstack.org/#/q/59e5f9597eb37f69045e470eb457b878728477d7
    Change-Id: I3e5899e2d831b89745b2f37e61ff69dbf83ff595
    (cherry picked from commit 2e7c850fa1d78f2b21e5017e03b15a066a0666da)

This issue was fixed in the openstack/tripleo-heat-templates 5.3.0 release.

This issue was fixed in the openstack/tripleo-heat-templates 6.1.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.