pacemaker neutron resource constraints and cleanup

Bug #1501378 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Marios Andreou

Bug Description

As discussed in [1] the pacemaker resource constraints chain defined in the overcloud_controller_pacemaker puppet manifest [2] means that restarting neutron-server will restart all neutron-related services, including the neutron-ovs-cleanup and neutron-netns-cleanup services. Apparently, these only need be run when decomissioning nodes, for cleanup and not when any of the neutron services are restarted on a given node.

Given the current constraints chain, a "pcs resource restart neutron-server" may leave the neutron services in erroneous states after restart and any router interfaces without IP (not hosted anywhere). In the bug report at [1] the reproducer is to restart haproxy (which results in restart of all openstack services, including neutron-*).

Discussion is ongoing as to the solution, though I will link to a review momentarily which tries to implement a different constraints chain, as suggested by ajo @ [3] (will be WIP for now)

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1266910#c12
[2] https://github.com/openstack/tripleo-heat-templates/blob/9e918a4a517f62d4417909311041e3e54a726462/puppet/manifests/overcloud_controller_pacemaker.pp#L1060
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1266910#c19

Revision history for this message
Marios Andreou (marios-b) wrote :

WIP review for discussion at https://review.openstack.org/#/c/229466/

Changed in tripleo:
importance: Undecided → High
status: Triaged → In Progress
assignee: nobody → Marios Andreou (marios-b)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/229466
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3dd47a7c2a4ef9f565249053c49d65561ad95741
Submitter: Jenkins
Branch: master

commit 3dd47a7c2a4ef9f565249053c49d65561ad95741
Author: marios <email address hidden>
Date: Wed Sep 30 13:47:58 2015 +0300

    Rework pacemaker constraints from ovs and netns cleanup agents

    In the current neutron-* services constraints chain, the ovs and
    netns cleanup services are re-run after a neutron-server restart.
    As discussed at [1] this may not be desirable leaving some neutron
    services down and any tenant routers without IP.

    This review introduces a second constraints chain so we now have:

    neutron-server-->openvswitch-->dhcp-->l3-->metadata
    and
    ovs-cleanup-->netns-cleanup-->openvswitch

    Instead of a single chain like

    neutron-server-->ovs-cleanup-->netns-cleanup-->openvswitch-->
    dhcp-->l3-->metadata

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1266910#c12

    Related-Bug: 1501378
    Change-Id: I4096704257aff74ff5bd37d8d01d8a776c6c6a76

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/248572

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/248572
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f070fa189c8f218f979263816975fdfa13451623
Submitter: Jenkins
Branch: master

commit f070fa189c8f218f979263816975fdfa13451623
Author: marios <email address hidden>
Date: Mon Nov 23 10:24:00 2015 +0200

    Fixup neutron constraints in older overclouds before updating

    The neutron pcs constraints were reworked in
    https://review.openstack.org/#/c/229466/

    For overclouds deployed with older tripleo-heat-templates the
    current pcs ordering constraints will not have those changes,
    meaning that the behaviour discussed at
    https://bugs.launchpad.net/tripleo/+bug/1501378 is likely
    given we will stop and restart all services. This review
    applies those, in short, remove the ovs-cleanup after
    neutron-server and add openvswitch-agent instead. Detail in
    the bug report and linked BZ.

    Change-Id: I45822c5fe9029f11635400b7fbd386880ac80a4e
    Related-Bug: 1501378

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/liberty)

Related fix proposed to branch: stable/liberty
Review: https://review.openstack.org/255303

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/liberty)

Reviewed: https://review.openstack.org/255303
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=8793308151c56eb9f371749221a1fb5a4e801f47
Submitter: Jenkins
Branch: stable/liberty

commit 8793308151c56eb9f371749221a1fb5a4e801f47
Author: marios <email address hidden>
Date: Mon Nov 23 10:24:00 2015 +0200

    Fixup neutron constraints in older overclouds before updating

    The neutron pcs constraints were reworked in
    https://review.openstack.org/#/c/229466/

    For overclouds deployed with older tripleo-heat-templates the
    current pcs ordering constraints will not have those changes,
    meaning that the behaviour discussed at
    https://bugs.launchpad.net/tripleo/+bug/1501378 is likely
    given we will stop and restart all services. This review
    applies those, in short, remove the ovs-cleanup after
    neutron-server and add openvswitch-agent instead. Detail in
    the bug report and linked BZ.

    Change-Id: I45822c5fe9029f11635400b7fbd386880ac80a4e
    Related-Bug: 1501378
    (cherry picked from commit f070fa189c8f218f979263816975fdfa13451623)

tags: added: in-stable-liberty
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.