hacluster should disable unattended-upgrades by default

Bug #1826898 reported by Felipe Reyes
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Triaged
High
Unassigned

Bug Description

Last week a security upgrade of pacemaker was published http://changelogs.ubuntu.com/changelogs/pool/main/p/pacemaker/pacemaker_1.1.18-0ubuntu1.1/changelog , this left our units with pacemaker stopped, here are some of the logs as evidence:

/var/log/pacemaker.log -> https://paste.ubuntu.com/p/rh8kP8KGKm/
journalctl -u pacemaker -> https://pastebin.ubuntu.com/p/c5sDP4BNmt/
journalctl -u corosync -> http://paste.ubuntu.com/p/vDWgjwvkFh/
grep -A3 'Start-Date: 2019-04-24' /var/log/apt/history.log -> http://paste.ubuntu.com/p/KqJD3JgWPQ/

[Proposed solution]

An equivalent to this in the hacluster charm, it can be managed via "juju config" to enable/disable, the default will be "0".

echo 'APT::Periodic::Unattended-Upgrade "0";' > /etc/apt/apt.conf.d/90hacluster

[Workaround]
Disabling unattended-upgrades with juju on all the units:
juju run --all "echo 'APT::Periodic::Unattended-Upgrade \"0\";' | sudo tee /etc/apt/apt.conf.d/90hacluster"

Tags: sts
Felipe Reyes (freyes)
tags: added: sts
Felipe Reyes (freyes)
description: updated
Revision history for this message
Felipe Reyes (freyes) wrote :

The other option that just came to my mind is to blacklist pacamaker/corosync AND their dependencies

Unattended-Upgrade::Package-Blacklist {
 "^pacemaker.*";
 "^corosync.*";
 "^libquorum.*";
 ...
};

Get to find the right list of package to be blacklisted will be tricky though.

description: updated
Revision history for this message
Trent Lloyd (lathiat) wrote :

I would suggest that we should fix the real problem, which is that restarting corosync causes pacemaker to stop but does not cause it to start again.

On previous installations where I used Corosync/Pacemaker we actually configured corosync to start pacemaker. I remember looking at this myself at one point and I think technically that functionality might have been deprecated but I'd have to check I may be recalling that incorrectly.

It would seem the same should be possible by updating the systemd units with the appropriate configuration to make them restart with each other. Off the top of my head I am guessing like most packages they are not native scripts though I wonder if we can add special headers to the sysV scripts like most other systemd generator sources, to set them anyway?

Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Just got hit by this over the weekend with corosync (https://usn.ubuntu.com/4000-1/) leaving pacemaker and haproxy dead. We're running an older hacluster charm rev 33, but it was still a bit annoying.

Revision history for this message
Felipe Reyes (freyes) wrote : Re: [Bug 1826898] Re: hacluster should disable unattended-upgrades by default

On Wed, 2019-05-01 at 06:31 +0000, Trent Lloyd wrote:
> I would suggest that we should fix the real problem, which is that
> restarting corosync causes pacemaker to stop but does not cause it to
> start again.
>
> On previous installations where I used Corosync/Pacemaker we actually
> configured corosync to start pacemaker. I remember looking at this
> myself at one point and I think technically that functionality might
> have been deprecated but I'd have to check I may be recalling that
> incorrectly.
>
> It would seem the same should be possible by updating the systemd
> units
> with the appropriate configuration to make them restart with each
> other.
> Off the top of my head I am guessing like most packages they are not
> native scripts though I wonder if we can add special headers to the
> sysV
> scripts like most other systemd generator sources, to set them
> anyway?

Even when you fix that aspect, you will still be getting into problems
when restarting 3 pacemaker/corosync at once (I'm assuming here that
the update will hit in some environment at the same time to the
clustered machines), you may get resources migration without real need
and even when things like a VIP are a cheap-ish operation, some others
are not like promoting postgres from standby to active.

Revision history for this message
Alan MacMillan (amac8585) wrote :
Download full text (7.7 KiB)

we have correlated the apt-update on hacluster components with the start of the issue when the pacemaker loses the quorum....

history.log.
Start-Date: 2019-05-09 11:56:50
Commandline: apt-get --assume-yes --option=Dpkg::Options::=--force-confold install crmsh corosync pacemaker ipmitool libmonitoring-plugin-perl python3-requests-oauthlib python3-libmaas
Install: corosync:amd64 (2.4.3-0ubuntu1), resource-agents:amd64 (1:4.1.0~rc1-1ubuntu1, automatic), libcmap4:amd64 (2.4.3-0ubuntu1, automatic), libb-hooks-op-check-perl:amd64 (0.22-1, automatic), python3-argcomplete:amd64 (1.8.1-1ubuntu1, automatic), python3-async-timeout:amd64 (2.0.0-1, automatic), libmodule-runtime-perl:amd64 (0.016-1, automatic), libmath-calc-units-perl:amd64 (1.07-1, automatic), libparams-validate-perl:amd64 (1.29-1, automatic), python3-terminaltables:amd64 (3.1.0-2, automatic), libxml2-utils:amd64 (2.9.4+dfsg1-6.1ubuntu1.2, automatic), openhpid:amd64 (3.6.1-3.1build1, automatic), pacemaker:amd64 (1.1.18-0ubuntu1.1), libfreeipmi16:amd64 (1.4.11-1.1ubuntu4.1, automatic), python3-pymongo-ext:amd64 (3.6.1+dfsg1-1, automatic), libtry-tiny-perl:amd64 (0.30-1, automatic), pacemaker-resource-agents:amd64 (1.1.18-0ubuntu1.1, automatic), liblrmd1:amd64 (1.1.18-0ubuntu1.1, automatic), libdevel-callchecker-perl:amd64 (0.007-2build1, automatic), python3-libmaas:amd64 (0.6.0-0ubuntu1), libquorum5:amd64 (2.4.3-0ubuntu1, automatic), libcrmcommon3:amd64 (1.1.18-0ubuntu1.1, automatic), libesmtp6:amd64 (1.0.6-4.3build1, automatic), freeipmi-common:amd64 (1.4.11-1.1ubuntu4.1, automatic), libcrmcluster4:amd64 (1.1.18-0ubuntu1.1, automatic), libmodule-implementation-perl:amd64 (0.09-1, automatic), python3-colorclass:amd64 (2.2.0-2, automatic), libtotem-pg5:amd64 (2.4.3-0ubuntu1, automatic), libpe-rules2:amd64 (1.1.18-0ubuntu1.1, automatic), liblrm2:amd64 (1.0.12-7build1, automatic), libpengine10:amd64 (1.1.18-0ubuntu1.1, automatic), libtransitioner2:amd64 (1.1.18-0ubuntu1.1, automatic), librdmacm1:amd64 (17.1-1ubuntu0.1, automatic), libqb0:amd64 (1.0.1-1ubuntu1, automatic), python3-pymongo:amd64 (3.6.1+dfsg1-1, automatic), libdynaloader-functions-perl:amd64 (0.003-1, automatic), libltdl7:amd64 (2.4.6-2, automatic), ipmitool:amd64 (1.8.18-5ubuntu0.1), libconfig-tiny-perl:amd64 (2.23-1, automatic), libplumb2:amd64 (1.0.12-7build1, automatic), libnet1:amd64 (1.1.6+dfsg-3.1, automatic), python3-requests-oauthlib:amd64 (0.8.0-0.1), libparams-classify-perl:amd64 (0.015-1, automatic), libstatgrab10:amd64 (0.91-1build1, automatic), libplumbgpl2:amd64 (1.0.12-7build1, automatic), libsub-name-perl:amd64 (0.21-1build1, automatic), python3-multidict:amd64 (4.1.0-1, automatic), libvotequorum8:amd64 (2.4.3-0ubuntu1, automatic), crmsh:amd64 (3.0.1-3ubuntu1), python3-bson-ext:amd64 (3.6.1+dfsg1-1, automatic), python3-aiohttp:amd64 (3.0.1-1, automatic), python3-yarl:amd64 (1.1.0-1, automatic), libdbus-glib-1-2:amd64 (0.110-2, automatic), libopenhpi3:amd64 (3.6.1-3.1build1, automatic), python3-bson:amd64 (3.6.1+dfsg1-1, automatic), libcfg6:amd64 (2.4.3-0ubuntu1, automatic), libcrmservice3:amd64 (1.1.18-0ubuntu1.1, automatic), libpils2:amd64 (1.0.12-7build1, automatic), libstonith1:amd64 (1.0.12-7build1, automa...

Read more...

Revision history for this message
Drew Freiberger (afreiberger) wrote :

As a note, in Foundations/Bootstack clouds, we solve this unattended-upgrades issue by deploying charm-landscape-client to all hosts and disabling unattended-upgrades through it's config options.

Revision history for this message
Trent Lloyd (lathiat) wrote :

Previous bug, which supposedly fixed this issue on xenial & bionic:
https://bugs.launchpad.net/charm-hacluster/+bug/1740892

Revision history for this message
Felipe Reyes (freyes) wrote :
Changed in charm-hacluster:
status: Triaged → In Progress
assignee: nobody → Felipe Reyes (freyes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-hacluster (master)

Change abandoned by Felipe Reyes (<email address hidden>) on branch: master
Review: https://review.opendev.org/693080

Felipe Reyes (freyes)
Changed in charm-hacluster:
assignee: Felipe Reyes (freyes) → nobody
status: In Progress → New
Revision history for this message
Andrew McLeod (admcleod) wrote :

The review in #10 mentions another approach to address this issue

Changed in charm-hacluster:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.