tripleo

Bug #1792701
Comment #8

Comment 8 for bug 1792701

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-01: Fix merged to puppet-tripleo (stable/queens)

Reviewed: https://review.openstack.org/606849
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=5f43470da1f80675ac6144136ec8e60f23f9356b
Submitter: Zuul
Branch: stable/queens

commit 5f43470da1f80675ac6144136ec8e60f23f9356b
Author: Michele Baldessari <email address hidden>
Date: Sat Sep 15 15:19:26 2018 +0200

Make sure rhel-plugin-push.service is stopped after pacemaker stops

    When issuing a normal reboot command on an overcloud node the following
    stop sequence can take place:
    ------------- -----------------------------
    | Pacemaker | | paunch-container-shutdown |
    ------------- -----------------------------
              | |
               \ /
                \ /
            ----------
            | docker |
            ----------

    If there are docker plugins that are allowed to stop before docker and
    also before pacemaker, it might happen that stopping them down during
    the pacemaker stop will cause a bunch of timeouts and a failure to stop
    containers:
    Sep 13 17:53:00.821030 controller-0.localdomain pacemakerd[6147]: notice: Shutting down Pacemaker
    Sep 13 17:54:15.798026 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000 process (PID 226329) timed out
    Sep 13 17:54:15.799004 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000:226329 - timed out after 20000ms

    One of these plugins is 'rhel-push-plugin.service'. It seems that when
    this plugin is free to stop before docker on shutdown, it is very
    possible that docker commands can start timing out.

    Before:
    Before adding the symlink we would need 15mins to reboot a node and
    we would get a bunch of timeouts on shutdown and some failed actions on
    boot.

    After:
    A reboot will take a reasonable couple of minutes to complete with no
    failed actions at boot and timeouts during shutdown.

NB: We add the symlink unconditionally as systemd will ignore it if the
service is not installed.

Closes-Bug: #1792701

Change-Id: I6f6d27f2457efcc49d9edd8a2f98484c5f7c0933
(cherry picked from commit e288dbd8252765020816639b9b53f8212292cfaf)

Reviewed:  https://review.openstack.org/606849
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=5f43470da1f80675ac6144136ec8e60f23f9356b
Submitter: Zuul
Branch:    stable/queens

commit 5f43470da1f80675ac6144136ec8e60f23f9356b
Author: Michele Baldessari <michele@acksyn.org>
Date:   Sat Sep 15 15:19:26 2018 +0200

Make sure rhel-plugin-push.service is stopped after pacemaker stops
    
    When issuing a normal reboot command on an overcloud node the following
    stop sequence can take place:
    ------------- -----------------------------
    | Pacemaker | | paunch-container-shutdown |
    ------------- -----------------------------
              |     |
               \   /
                \ /
            ----------
            | docker |
            ----------
    
    If there are docker plugins that are allowed to stop before docker and
    also before pacemaker, it might happen that stopping them down during
    the pacemaker stop will cause a bunch of timeouts and a failure to stop
    containers:
    Sep 13 17:53:00.821030 controller-0.localdomain pacemakerd[6147]: notice: Shutting down Pacemaker
    Sep 13 17:54:15.798026 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000 process (PID 226329) timed out
    Sep 13 17:54:15.799004 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000:226329 - timed out after 20000ms
    
    One of these plugins is 'rhel-push-plugin.service'. It seems that when
    this plugin is free to stop before docker on shutdown, it is very
    possible that docker commands can start timing out.
    
    Before:
    Before adding the symlink we would need 15mins to reboot a node and
    we would get a bunch of timeouts on shutdown and some failed actions on
    boot.
    
    After:
    A reboot will take a reasonable couple of minutes to complete with no
    failed actions at boot and timeouts during shutdown.
    
    NB: We add the symlink unconditionally as systemd will ignore it if the
    service is not installed.
    
    Closes-Bug: #1792701
    
    Change-Id: I6f6d27f2457efcc49d9edd8a2f98484c5f7c0933
    (cherry picked from commit e288dbd8252765020816639b9b53f8212292cfaf)