deferred services restart hangs in functional tests

Bug #2012553 reported by Corey Bryant
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-ovn-chassis
Fix Committed
High
Unassigned
charm-ovn-dedicated-chassis
Fix Committed
Undecided
Unassigned

Bug Description

This can be seen here: https://review.opendev.org/c/x/charm-ovn-chassis/+/878038

...
2023-03-22 17:55:46 [INFO] ## Running Test zaza.openstack.charm_tests.ovn.tests.OVNChassisDeferredRestartTest ##
2023-03-22 17:55:46 [INFO] test_deferred_restarts (zaza.openstack.charm_tests.ovn.tests.OVNChassisDeferredRestartTest)
2023-03-22 17:55:46 [INFO] Run deferred restart tests.
2023-03-22 17:55:46 [INFO] ...
2023-03-22 17:55:46 [INFO] Turning off auto restarts
2023-03-22 17:55:46 [INFO] Waiting for /etc/policy-rc.d/charm-ovn-chassis.policy to appear on units of ovn-chassis
2023-03-22 17:55:57 [INFO] Waiting for units to be idle
2023-03-22 17:55:58 [INFO] Triggering deferred restart via config change
2023-03-22 17:55:58 [INFO] Setting enable-sriov: True
2023-03-22 17:55:58 [INFO] Waiting for ovn-chassis/0 to show deferred hook
2023-03-22 17:56:06 [INFO] Waiting for ovn-chassis/1 to show deferred hook
2023-03-22 17:56:14 [INFO] Waiting for units to be idle
2023-03-22 17:56:14 [INFO] Checking ovn-chassis/0 is marked as having deferred hook in workload message
2023-03-22 17:56:14 [INFO] Checking ovn-chassis/1 is marked as having deferred hook in workload message
2023-03-22 17:56:14 [INFO] Checking configure_ovs is marked as skipped in show-deferred-events action
2023-03-22 17:56:18 [INFO] configure_ovs in configure_ovs
2023-03-22 17:56:18 [INFO] Triggering deferred restart via config change
2023-03-22 17:56:18 [INFO] Setting enable-sriov: False
2023-03-22 17:56:18 [INFO] Waiting for ovn-chassis/0 to show deferred hook
2023-03-22 17:56:22 [INFO] Waiting for ovn-chassis/1 to show deferred hook 2023-03-22 17:56:30 [INFO] Waiting for units to be idle
2023-03-22 17:56:31 [INFO] Running restart action to clear deferred hooks
2023-03-22 17:56:31 [INFO] Running run-deferred-hooks on ovn-chassis/0
2023-03-22 17:56:36 [INFO] Running run-deferred-hooks on ovn-chassis/1
2023-03-22 17:56:39 [INFO] Triggering deferred restart via package change
2023-03-22 17:56:57 [INFO] Service was openvswitch-switch not restarted. 2023-03-22 17:56:57 [INFO] Checking openvswitch-switch is marked as needing restart in workload message of ovn-chassis/0
2023-03-22 17:56:57 [INFO] Checking openvswitch-switch is marked as needing restart in workload message of ovn-chassis/1
2023-03-22 17:56:57 [INFO] Checking openvswitch-switch is marked as needing restart in show-deferred-events action
2023-03-22 17:56:59 [INFO] openvswitch-switch in 1679507800 ovs-record-hostname.service Package update and Package update in 1679507800 ovs-record-hostname.service Package update
2023-03-22 17:56:59 [INFO] openvswitch-switch in 1679507800 ovs-vswitchd.service Package update and Package update in 1679507800 ovs-vswitchd.service Package update
2023-03-22 17:56:59 [INFO] openvswitch-switch in 1679507800 ovsdb-server.service Package update and Package update in 1679507800 ovsdb-server.service Package update
2023-03-22 17:56:59 [INFO] openvswitch-switch in 1679507801 openvswitch-switch Package update and Package update in 1679507801 openvswitch-switch Package update
2023-03-22 17:56:59 [INFO] Running restart action to clear deferred restarts
2023-03-22 17:56:59 [INFO] Running restart-services on ovn-chassis/0
2023-03-22 17:57:02 [INFO] Running restart-services on ovn-chassis/1
.... hang until timeout ...

juju status shows:

magpie/0* active idle 0 10.5.0.91 icmp ok, local hostname ok (juju-d7c54a-zaza-7e03420ac6bd-0), dns ok, iperf leader, mtu: 8942
  ovn-chassis/1 active idle 10.5.0.91 Services queued for stop: ovs-record-hostname.service, ovs-vswitchd.service, ovsdb-server.service. Hooks skipped due ...

I think this may be something to do with this function from charmhelpers/contrib/openstack/deferred_events.py as it is only checking for 'restart':

def get_deferred_restarts():
    """List of deferred restart events requested by the charm and packages.

    :returns: List of deferred restarts
    :rtype: List[ServiceEvent]
    """
    return [e for e in get_deferred_events() if e.action == 'restart']

The deferred events files on a ovn-chassis charm show 'stop' events so there is no match because the code is looking for 'restart':

ubuntu@juju-d7c54a-zaza-7e03420ac6bd-1:/var/lib/juju/agents/unit-ovn-chassis-0$ ls /var/lib/policy-rc.d/
charm-ovn-chassis-e5bac61a-c8da-11ed-bca5-350a38bbd268.deferred charm-ovn-chassis-e5ce8ac4-c8da-11ed-bca5-350a38bbd268.deferred charm-ovn-chassis-e5e23024-c8da-11ed-bca5-350a38bbd268.deferred

ubuntu@juju-d7c54a-zaza-7e03420ac6bd-1:/var/lib/juju/agents/unit-ovn-chassis-0$ cat /var/lib/policy-rc.d/*
action: stop
policy_requestor_name: ovn-chassis
policy_requestor_type: charm
reason: Package update
service: ovsdb-server.service
timestamp: 1679507800
action: stop
policy_requestor_name: ovn-chassis
policy_requestor_type: charm
reason: Package update
service: ovs-vswitchd.service
timestamp: 1679507800
action: stop
policy_requestor_name: ovn-chassis
policy_requestor_type: charm
reason: Package update
service: ovs-record-hostname.service
timestamp: 1679507800

description: updated
description: updated
description: updated
Revision history for this message
Corey Bryant (corey.bryant) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote (last edit ):

Interesting point to note. jammy-yoga tests ran successfully here: https://review.opendev.org/c/x/charm-ovn-chassis/+/878712

The only difference between the jammy-yoga and jammy-zed bundles in that review is:

openstack-origin: &openstack-origin distro
vs
openstack-origin: &openstack-origin cloud:jammy-zed

All of the charms are latest/edge. They use the same (test-)requirements.txt files, therefore use the same zaza tests etc.

which makes me think this is a difference in behavior at the package level.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I ran the the jammy-yoga tests locally a few times. I then added the zed cloud-archive to both ovn-chassis units, upgraded openvswitch-switch to 3.0.1-0ubuntu0.22.10.1~cloud0 and tests failed as described in this bug.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I'm guessing this is a side-effect of the following package changes:

$ diff -Naur ./openvswitch-2.17.3/debian/ ./openvswitch-3.0.1/debian/ | less
...
+override_dh_installsystemd:
+ dh_installsystemd -popenvswitch-switch --name=ovsdb-server --no-start
+ dh_installsystemd -popenvswitch-switch --name=ovs-vswitchd --no-start
+ dh_installsystemd -popenvswitch-switch --name=ovs-record-hostname --no-start
+ dh_installsystemd --restart-after-upgrade -Xovs-vswitchd.service -Xovsdb-server.service -Xovs-record-hostname.service

Revision history for this message
Corey Bryant (corey.bryant) wrote :

The /var/lib/dpkg/info/openvswitch-switch.preinst file (pasted below) is new with the kinetic-based 3.0.1 package. If I comment out the 'deb-systemd-invoke stop' calls at the end of this script, the tests succeed.

#!/bin/sh
# preinst script for openvswitch-switch
#
# see: dh_installdeb(1)

set -e

# summary of how this script can be called:
# * <new-preinst> `install'
# * <new-preinst> `install' <old-version>
# * <new-preinst> `upgrade' <old-version>
# * <old-preinst> `abort-upgrade' <new-version>
# for details, see http://www.debian.org/doc/debian-policy/ or
# the debian-policy package

case "$1" in
    install|upgrade)
       if dpkg --compare-versions "$2" lt-nl "2.17.2-1"; then
           # the conffile was not owned by the pkg before if it had any
           # custom content, retain as is to avoid an upgrade error or
           # conffile prompt we back it up in that case and restore it
           # in the postinst
           # Since it wasn#t owned we can't query the old checksum via
           # dpkg-query -W -f='${Conffiles}' as one usually would
           conffile="/etc/default/openvswitch-switch"
           md5olddebian="167668db26d5b29ec1469413b12d9bbe"
           md5oldubuntu="ae4d44b501cfb1eb362d87644a1bae0d"
           md5new="$(md5sum ${conffile} | sed -e 's/ .*//')"
           if [ "${md5olddebian}" = "${md5new}" ]; then
               # was unmodified, remove - will drop the new at unpack
               rm -f "${conffile}"
           else
               if [ ! "${md5oldubuntu}" = "${md5new}" ]; then
                   # neither matches old default Debian, nor Ubuntu.
                   # move to restore in postinst after taking conffile ownership
                   mv "${conffile}" "${conffile}.dpkg-bak"
               fi
           fi
       fi
    ;;

    *)
        echo "preinst called with unknown argument \`$1'" >&2
        exit 1
    ;;
esac

# Automatically added by dh_installsystemd/13.6ubuntu1
if [ -z "${DPKG_ROOT:-}" ] && [ "$1" = upgrade ] && [ -d /run/systemd/system ] ; then
        deb-systemd-invoke stop 'ovsdb-server.service' >/dev/null || true
fi
# End automatically added section
# Automatically added by dh_installsystemd/13.6ubuntu1
if [ -z "${DPKG_ROOT:-}" ] && [ "$1" = upgrade ] && [ -d /run/systemd/system ] ; then
        deb-systemd-invoke stop 'ovs-vswitchd.service' >/dev/null || true
fi
# End automatically added section
# Automatically added by dh_installsystemd/13.6ubuntu1
if [ -z "${DPKG_ROOT:-}" ] && [ "$1" = upgrade ] && [ -d /run/systemd/system ] ; then
        deb-systemd-invoke stop 'ovs-record-hostname.service' >/dev/null || true
fi
# End automatically added section

exit 0

Revision history for this message
Corey Bryant (corey.bryant) wrote (last edit ):

The stop events are added as a result of the dpkg reconfigure in trigger_deferred_restart_via_package() from zaza/openstack/charm_tests/test_utils.py (sleeps were added so that I could watch creation of deferred files in /var/lib/policy-rc.d/):

def trigger_deferred_restart_via_package(self, restart_package):
    """Update a package which requires a service restart.

    :param restart_package: Package that will be changed to trigger a
                            service restart.
    :type restart_package: str
    """
    logging.info("Triggering deferred restart via package change")
    # Test restart requested by package
    logging.info("Sleeping for 20")
    time.sleep(20)
    for unit in model.get_units(self.application_name):
        model.run_on_unit(
            unit.entity_id,
            ('dpkg-reconfigure {}; '
            'JUJU_HOOK_NAME=update-status ./hooks/update-status').format(
            restart_package))
        logging.info("Sleeping for 20")
        time.sleep(20)

Revision history for this message
Corey Bryant (corey.bryant) wrote (last edit ):

It seems like the problem is as follows:

This is what is called:
    clear_deferred_restarts(['openvswitch-switch.service', 'openvswitch-switch'])

however, there are deferred files that don't have matching service names, for example:
    service: ovs-vswitchd.service

So it seems that run_package_change_test() needs to support multiple services. It is currently being called with:

self.run_package_change_test(
    'openvswitch-switch',
    'openvswitch-switch')

The services would need to match the openvswitch services from:
_DEFERABLE_SVC_LIST = ['openvswitch-switch', 'ovn-controller', 'ovn-host',
                       'ovs-vswitchd', 'ovsdb-server', 'ovs-record-hostname']

Changed in charm-ovn-chassis:
status: New → Triaged
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-chassis (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/878921

Changed in charm-ovn-chassis:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-chassis (master)

Reviewed: https://review.opendev.org/c/x/charm-ovn-chassis/+/878921
Committed: https://opendev.org/x/charm-ovn-chassis/commit/364fa7fa45946f084c7924d9ca61fe56944bb6e1
Submitter: "Zuul (22348)"
Branch: master

commit 364fa7fa45946f084c7924d9ca61fe56944bb6e1
Author: Liam Young <email address hidden>
Date: Wed Mar 29 15:40:55 2023 +0000

    Handle deferred stops

    The ovs package in kinetic+ (zed+) uses dh_installsystemd with
    --no-start for ovsdb-server, ovs-vswitchd, and ovs-record-hostname.
    this results in stop events rather than restart events. This
    change updates the restart-services actions for this charm to
    handle stop events.

    Change-Id: I947f6eb6f74c35d3b3f0786ec7a5274709558e17
    Closes-Bug: #2012553

Changed in charm-ovn-chassis:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-dedicated-chassis (master)
Changed in charm-ovn-dedicated-chassis:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-dedicated-chassis (master)

Reviewed: https://review.opendev.org/c/x/charm-ovn-dedicated-chassis/+/881544
Committed: https://opendev.org/x/charm-ovn-dedicated-chassis/commit/bc15fdac16609d486ad2a16bc7fdfda769f15003
Submitter: "Zuul (22348)"
Branch: master

commit bc15fdac16609d486ad2a16bc7fdfda769f15003
Author: Dmitrii Shcherbakov <email address hidden>
Date: Wed Apr 26 13:11:07 2023 +0300

    Handle deferred stops

    The ovs package in kinetic+ (zed+) uses dh_installsystemd with
    --no-start for ovsdb-server, ovs-vswitchd, and ovs-record-hostname.
    this results in stop events rather than restart events. This
    change updates the restart-services actions for this charm to
    handle stop events.

    Closes-Bug: #2012553
    Change-Id: If1dd5d0e5ca640a91893376366b23e78649d5772

Changed in charm-ovn-dedicated-chassis:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-chassis (stable/22.09)

Fix proposed to branch: stable/22.09
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/902673

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-dedicated-chassis (stable/22.09)

Fix proposed to branch: stable/22.09
Review: https://review.opendev.org/c/x/charm-ovn-dedicated-chassis/+/902672

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-chassis (stable/22.09)

Reviewed: https://review.opendev.org/c/x/charm-ovn-chassis/+/902673
Committed: https://opendev.org/x/charm-ovn-chassis/commit/3a3acb2be20573ce0a807258db23a6a30f8e7705
Submitter: "Zuul (22348)"
Branch: stable/22.09

commit 3a3acb2be20573ce0a807258db23a6a30f8e7705
Author: Liam Young <email address hidden>
Date: Wed Mar 29 15:40:55 2023 +0000

    Handle deferred stops

    The ovs package in kinetic+ (zed+) uses dh_installsystemd with
    --no-start for ovsdb-server, ovs-vswitchd, and ovs-record-hostname.
    this results in stop events rather than restart events. This
    change updates the restart-services actions for this charm to
    handle stop events.

    Change-Id: I947f6eb6f74c35d3b3f0786ec7a5274709558e17
    Closes-Bug: #2012553

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-dedicated-chassis (stable/22.09)

Reviewed: https://review.opendev.org/c/x/charm-ovn-dedicated-chassis/+/902672
Committed: https://opendev.org/x/charm-ovn-dedicated-chassis/commit/ea04d86cacdc581c7be4372f856d7f2ec2fc249e
Submitter: "Zuul (22348)"
Branch: stable/22.09

commit ea04d86cacdc581c7be4372f856d7f2ec2fc249e
Author: Dmitrii Shcherbakov <email address hidden>
Date: Wed Apr 26 13:11:07 2023 +0300

    Handle deferred stops

    The ovs package in kinetic+ (zed+) uses dh_installsystemd with
    --no-start for ovsdb-server, ovs-vswitchd, and ovs-record-hostname.
    this results in stop events rather than restart events. This
    change updates the restart-services actions for this charm to
    handle stop events.

    Closes-Bug: #2012553
    Change-Id: If1dd5d0e5ca640a91893376366b23e78649d5772
    (cherry picked from commit bc15fdac16609d486ad2a16bc7fdfda769f15003)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.