corosync upgrade on 2018-01-02 caused pacemaker to fail

Bug #1740892 reported by Drew Freiberger
56
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Invalid
Undecided
Unassigned
corosync (Debian)
Fix Released
Unknown
corosync (Ubuntu)
Fix Released
Medium
Nish Aravamudan
Trusty
Won't Fix
Medium
Nish Aravamudan
Xenial
Fix Released
High
Eric Desrochers
Artful
Fix Released
High
Eric Desrochers
Bionic
Fix Released
Medium
Nish Aravamudan
pacemaker (Ubuntu)
Fix Released
Medium
Nish Aravamudan
Trusty
Won't Fix
Medium
Nish Aravamudan
Xenial
Fix Released
High
Eric Desrochers
Artful
Fix Released
High
Eric Desrochers

Bug Description

[Impact]

When corosync and pacemaker are both installed, a corosync upgrade caused pacemaker to fail. pacemaker will need to be restarted manually to work again, it won't recover by itself.

[Test Case]

1) Have corosync (< 2.3.5-3ubuntu2) and pacemaker (< 1.1.14-2ubuntu1.3) installed
2) Make sure corosync & pacemaker are running via systemctl status cmd.
3) Upgrade corosync
4) Look corosync and pacemaker via systemctl status cmd again.

You will notice pacemaker is dead (inactive) and doesn't recover, unless a systemctl start pacemaker is done manually.

[Regression Potential]

Regression potential is low, it doesn't change corosync/pacemaker core functionality. This patch make sure thing goes smoother at the packaging level during a corosync upgrade where pacemaker is installed/involved.

This can also be useful in particular in situation where the system has "unattended-upgrades" enable (software upgrades without supervision), and no sysadmin available to start pacemaker manually because this isn't a schedule maintenance.

For the symbol tag change in Artful to (optional), please refer yourself to comment #60 from slangasek.

For the asctime change in Artful, please refer yourself to comment #51 & comment #52.

Note that both Artful changes in pacemaker above are only necessary for the package to build (even as-is without this patch). They aren't a requirement for the patch the work, but for the src pkg to build.

[Other Info]

XENIAL Merge-proposal:
https://code.launchpad.net/~nacc/ubuntu/+source/corosync/+git/corosync/+merge/336338
https://code.launchpad.net/~nacc/ubuntu/+source/pacemaker/+git/pacemaker/+merge/336339

[Original Description]

During upgrades on 2018-01-02, corosync and it's libs were upgraded:

(from a trusty/mitaka cloud)

Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4)

During this process, it appears that pacemaker service is restarted and it errors:

syslog:Jan 2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member)
syslog:Jan 2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: cfg_connection_destroy: Connection destroyed
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: stop_child: Stopping crmd: Sent -15 to process 2050
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: mcp_cpg_destroy: Connection destroyed

Also affected xenial/ocata

Related branches

summary: - corosync upgrade on 2018-01-02 caused pacemaker daemon restart to fail
+ corosync upgrade on 2018-01-02 caused pacemaker to fail
Revision history for this message
Drew Freiberger (afreiberger) wrote :
Revision history for this message
James Troup (elmo) wrote :

This took down the control plane on several production clouds today; subscribing ~field-critical.

David Ames (thedac)
Changed in charm-hacluster:
status: New → Invalid
David Britton (dpb)
Changed in corosync (Ubuntu):
assignee: nobody → Nish Aravamudan (nacc)
Revision history for this message
James Page (james-page) wrote :

bug 1739033 was the stable release update that went in around this time.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Hello,

Being relatively new to pacemaker/corosync -- what is the actual error condition here? That when corosync is stopped (as part of the package upgrade), pacemakerd stops and does not restart?

Thanks,
Nish

David Britton (dpb)
Changed in corosync (Ubuntu):
status: New → Incomplete
Revision history for this message
Drew Freiberger (afreiberger) wrote :

@nacc:

The error condition was that when corosync restarted, pacemaker disconnected (as was normal) and then tried reconnecting, but when reconnecting ran into this error:

error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)

So, pacemaker is trying to re-handshake with the revived corosync, and when it does, the api fails due to a library error. Given that it's the CPG API, and the libcpg4 package was updated, I'd guess that there was an incompatible patch added to the libcpg4 library was incompatible with the previous version of libcpg4 that was in-memory linked into the running pacemaker binary. Once we restarted the dead pacemaker service, pacemaker reloaded the new library and was able to connect to the CPG API as normal.

I don't know if that's a library failure or a change to the CPG API that was not version-compatible with the previously running version of libcpg4 whenever the dying pacemaker had been started.

The issue occured in trusty and xenial clouds across Mitaka and Ocata cloud archives.

Changed in corosync (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
Drew Freiberger (afreiberger) wrote :

So, it was NOT a pacemaker restart, it was pacemaker trying to perform standard dropped connection reconnect with an incompatible library for CPG API access to the restarted corosync.

Robie Basak (racb)
tags: added: regression-update
Revision history for this message
Nish Aravamudan (nacc) wrote :

@afreiberger, thank you for the extra info!

Reading the upstream patch referred to in Bug 1739033, I see this change:

struct main_cp_cb_data {
- enum main_cp_cb_data_state state;
-
  int ringnumber;
  char *bindnetaddr;
  char *mcastaddr;

Now, that is a struct size change, which seems like it could easily be an exported symbol / API / ABI.

afaict, there would be no harm in leaving this struct member in place, but now unused, in the backported patch. The replacement, in the code, is to pass this struct member in the callers themselves, but this in turn changed the callback layout, which again might be part of the exported interface of the library.

How easy is this to reproduce? Do you have a testcase handy that I might be able to run?

Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
David Britton (dpb) wrote :

Assigning to slashd as agreed on IRC. This was introduced from the SRU in bug 1739033 and would be best fixed with that context in mind.

Changed in corosync (Ubuntu):
assignee: Nish Aravamudan (nacc) → Eric Desrochers (slashd)
Revision history for this message
Eric Desrochers (slashd) wrote :

This particular bug is under investigation already by some of my teammate. I'll sync-up with them on Monday.

- Eric

Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

Hi, David. This was not induced by the SRU from bug 1739033, as this issue existed before.

I just tried installing previous version of corosync, and when I stop/start corosync, pacemaker dies, and needs to be restarted.

As dpkg, when upgrading packages, first stops the service, copies/extracts new files, and then starts the service, the package upgrade actually induced pacemaker brokenes.

I remember this always being an issue - each time you stop/start corosync you need to restart pacemaker too.

Eric Desrochers (slashd)
Changed in corosync (Ubuntu):
assignee: Eric Desrochers (slashd) → Victor Tapia (vtapia)
Revision history for this message
James Page (james-page) wrote :

Based on #10 and some testing I did early today, it really feels like this might be a bug in pacemaker when re-connecting to corosync after a restart; I think there are two ways forward on this:

a) corosync package updates should always restart pacemaker
b) we should look for a fix in pacemaker to make it more resilient to corosync restarts

either would have prevented this issue from occurring in the first place.

Revision history for this message
Felipe Reyes (freyes) wrote :

"""
> 3.) Stopping and starting corosync doesn't awake the node up again:
> systemctl stop corosync;sleep 10;systemctl restart corosync
> Online: [ kvm01 lb01 ]
> OFFLINE: [ lb02 ]
> Stays in that state until pacemaker is restarted: systemctl restart
> pacemaker
> Bug?

No, pacemaker should always restart if corosync restarts. That is
specified in the systemd units, so I'm not sure why pacemaker didn't
automatically restart in your case.
""""

Source: http://lists.clusterlabs.org/pipermail/users/2017-January/004828.html

Revision history for this message
Victor Tapia (vtapia) wrote :

As mentioned by Mario @ #10, stopping corosync while pacemaker runs throws the same error as the upgrade. Syslog from Xenial + corosync=2.3.5-3ubuntu1:

Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_notification (28748-28753-12): Broken pipe (32)
Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.

Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly

I think that the option A from James Page (#11) is the way to go

Revision history for this message
Nish Aravamudan (nacc) wrote : Re: [Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail
Download full text (3.4 KiB)

On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <email address hidden> wrote:
> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
> throws the same error as the upgrade. Syslog from Xenial +
> corosync=2.3.5-3ubuntu1:
>
> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
> Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_notification (28748-28753-12): Broken pipe (32)
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
>
>
> Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly
>
> I think that the option A from James Page (#11) is the way to go

I took a quick look at a LXD container after seeing Felipe and
Victor's posts. It seems like this is a bug in the xenial (at least)
systemd unit files:

# grep pacemaker /lib/systemd/system/corosync.service
# pacemaker.service, and if you want to exert the watchdog when a

# grep corosync /lib/systemd/system/pacemaker.service
After=corosync.service
Requires=corosync.service
# ExecStopPost=/bin/sh -c 'pidof crmd || killall -TERM corosync'

So, what I see is that corosync.service has no dependency on
pacemaker.service (in the file).

pacemaker.service will start after corosync.service. And when
pacemaker.service is shutdown it will be before corosync.service.
Additionally, if pacemaker.service is started, ...

Read more...

Revision history for this message
Nish Aravamudan (nacc) wrote :
Download full text (3.9 KiB)

On Mon, Jan 8, 2018 at 9:51 AM, Nish Aravamudan
<email address hidden> wrote:
> On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <email address hidden> wrote:
>> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
>> throws the same error as the upgrade. Syslog from Xenial +
>> corosync=2.3.5-3ubuntu1:
>>
>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
>> Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
>> Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_notification (28748-28753-12): Broken pipe (32)
>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
>> Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
>> Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
>> Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
>>
>>
>> Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly
>>
>> I think that the option A from James Page (#11) is the way to go
>
> I took a quick look at a LXD container after seeing Felipe and
> Victor's posts. It seems like this is a bug in the xenial (at least)
> systemd unit files:
>
> # grep pacemaker /lib/systemd/system/corosync.service
> # pacemaker.service, and if you want to exert the watchdog when a
>
> # grep corosync /lib/systemd/system/pacemaker.service
> After=corosync.service
> Requires=corosync.service
> # ExecStopPost=/bin/sh -c 'pidof crmd || killall -TERM corosync'
>
> So, what I see is that corosync.service has no dependency on
> pacemaker.service (in the file).
>
> pacemaker.service will start a...

Read more...

Revision history for this message
Nish Aravamudan (nacc) wrote :
Download full text (4.5 KiB)

On Mon, Jan 8, 2018 at 10:04 AM, Nish Aravamudan
<email address hidden> wrote:
> On Mon, Jan 8, 2018 at 9:51 AM, Nish Aravamudan
> <email address hidden> wrote:
>> On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <email address hidden> wrote:
>>> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
>>> throws the same error as the upgrade. Syslog from Xenial +
>>> corosync=2.3.5-3ubuntu1:
>>>
>>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
>>> Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
>>> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_notification (28748-28753-12): Broken pipe (32)
>>> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
>>> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
>>>
>>>
>>> Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly
>>>
>>> I think that the option A from James Page (#11) is the way to go
>>
>> I took a quick look at a LXD container after seeing Felipe and
>> Victor's posts. It seems like this is a bug in the xenial (at least)
>> systemd unit files:
>>
>> # grep pacemaker /lib/systemd/system/corosync.service
>> # pacemaker.service, and if you want to exert the watchdog when a
>>
>> # grep corosync /lib/systemd/system/pacemaker.service
>> After=corosync.service
>> Requires=corosync.service
>> # ExecStopPost=/bin/sh -c 'pidof crmd || killall -TERM corosync'
>>...

Read more...

Revision history for this message
Nish Aravamudan (nacc) wrote :
Download full text (16.3 KiB)

On Mon, Jan 8, 2018 at 8:48 AM, Victor Tapia <email address hidden> wrote:
> As mentioned by Mario @ #10, stopping corosync while pacemaker runs
> throws the same error as the upgrade. Syslog from Xenial +
> corosync=2.3.5-3ubuntu1:
>
> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
> Jan 8 16:24:37 xenial-corosync pacemakerd[28747]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Delaying fencing operations until there are resources to manage
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Scheduling Node xenial-corosync for shutdown
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-52.bz2
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
> Jan 8 16:24:37 xenial-corosync crmd[28753]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync cib[28748]: warning: new_event_notification (28748-28753-12): Broken pipe (32)
> Jan 8 16:24:37 xenial-corosync pengine[28752]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync attrd[28751]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync lrmd[28750]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync stonith-ng[28749]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Invoking handler for signal 15: Terminated
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync cib[28748]: notice: Disconnecting from Corosync
> Jan 8 16:24:37 xenial-corosync systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
>
>
> Pacemakerd shuts down sending SIGTERM to its components, but after the install, corosync does not start pacemaker. BTW, "systemctl restart corosync" restarts both services perfectly
>
> I think that the option A from James Page (#11) is the way to go

Hrm, sorry, the above isn't exactly what is relevant here. That is, if
you only issued stop on corosync, then the above is fully correct
(and I'm not seeing the 'same' error you mention (which would be about
the library or so).

It is fully correct based upon the unit files that if you issue
`systemctl stop corosync` that you will see in the logs that pacemaker
is stopped first (the last line above shows this happening). It is not
expected that stopping corosync would cause pacemaker to restart.

If you, in the above env, issue `systemctl start corosync`, does
pacemaker start? Here is what I see in a Xenial LXD:

root@splendid-viper:~# systemctl status corosync
● corosync.service - Corosync Cluster Engin...

Revision history for this message
Eric Desrochers (slashd) wrote :

As per nacc's comment it seems like "Wants=" is the recommended way to hook start-up of one unit to the start-up of another unit.[1]

[1] - https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Wants=

So far I have tested using 2 scenarios (including "Wants=pacemaker.service") and it look good so far.

---------------------------------------
* Scenario #1
[Both corosync & pacemaker installed]
---------------------------------------

- pacemaker start on corosync start.

root@xenialcorosyncpacemaker:~# systemctl status corosync | egrep "PID|Active:"
   Active: active (running) since Mon 2018-01-08 19:29:44 UTC; 21s ago
 Main PID: 445 (corosync)

root@xenialcorosyncpacemaker:~# systemctl status pacemaker | egrep "PID|Active:"
   Active: active (running) since Mon 2018-01-08 19:29:44 UTC; 27s ago
 Main PID: 447 (pacemakerd)

root@xenialcorosyncpacemaker:~# systemctl stop corosync

root@xenialcorosyncpacemaker:~# systemctl status corosync | egrep "PID|Active:"
   Active: inactive (dead) since Mon 2018-01-08 19:30:29 UTC; 1s ago
 Main PID: 445 (code=exited, status=0/SUCCESS)

root@xenialcorosyncpacemaker:~# systemctl status pacemaker | egrep "PID|Active:"
   Active: inactive (dead) since Mon 2018-01-08 19:30:29 UTC; 3s ago
 Main PID: 447 (code=exited, status=0/SUCCESS)

root@xenialcorosyncpacemaker:~# systemctl start corosync

root@xenialcorosync:~# systemctl status corosync | egrep "PID|Active:"
   Active: active (running) since Mon 2018-01-08 19:30:56 UTC; 1s ago
 Main PID: 474 (corosync)

root@xenialcorosyncpacemaker:~# systemctl status pacemaker | egrep "PID|Active:"
   Active: active (running) since Mon 2018-01-08 19:30:56 UTC; 3s ago
 Main PID: 476 (pacemakerd)

---------------------------------------
* Scenario #2
[corosync installed & pacemaker not installed]
---------------------------------------

- It doesn't seem to have any side-effects when pacemaker isn't installed.
The Wants= options is simply ignore since the pacemaker.service is not present.

root@xenialcorosyncnopacemaker:~# systemctl status corosync | egrep "PID|Active:"
   Active: active (running) since Mon 2018-01-08 19:32:11 UTC; 53s ago
 Main PID: 1284 (corosync)

root@xenialcorosyncnopacemake:~# systemctl status pacemaker
● pacemaker.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

root@v:~# systemctl stop corosync

root@xenialcorosyncnopacemake:~# systemctl status pacemaker
● pacemaker.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

root@xenialcorosyncnopacemake:~# systemctl status corosync | egrep "PID|Active:"
   Active: inactive (dead) since Mon 2018-01-08 19:33:17 UTC; 4s ago
 Main PID: 1284 (code=exited, status=0/SUCCESS)

root@xenialcorosyncnopacemake:~# systemctl start corosync

root@xenialcorosyncnopacemake:~# systemctl status corosync | egrep "PID|Active:"
   Active: active (running) since Mon 2018-01-08 19:33:26 UTC; 1s ago
 Main PID: 1378 (corosync)

- Eric

Revision history for this message
Victor Tapia (vtapia) wrote :

On Trusty + corosync=2.3.3-1ubuntu1:

- Stopping corosync while pacemaker is still running:

Jan 9 09:26:55 trusty-corosync corosync[5492]: [MAIN ] Node was shut down by a signal
Jan 9 09:26:55 trusty-corosync corosync[5492]: [SERV ] Unloading all Corosync service engines.
Jan 9 09:26:55 trusty-corosync corosync[5492]: [QB ] withdrawing server sockets
Jan 9 09:26:55 trusty-corosync corosync[5492]: [SERV ] Service engine unloaded: corosync vote quorum service v1.0
Jan 9 09:26:55 trusty-corosync corosync[5492]: [QB ] withdrawing server sockets
Jan 9 09:26:55 trusty-corosync corosync[5492]: [SERV ] Service engine unloaded: corosync configuration map access
Jan 9 09:26:55 trusty-corosync pacemakerd[5514]: error: cfg_connection_destroy: Connection destroyed
Jan 9 09:26:55 trusty-corosync pacemakerd[5514]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
Jan 9 09:26:55 trusty-corosync pacemakerd[5514]: notice: stop_child: Stopping crmd: Sent -15 to process 5519
Jan 9 09:26:55 trusty-corosync corosync[5492]: [QB ] withdrawing server sockets
Jan 9 09:26:55 trusty-corosync crmd[5519]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
Jan 9 09:26:55 trusty-corosync crmd[5519]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Jan 9 09:26:55 trusty-corosync crmd[5519]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jan 9 09:26:55 trusty-corosync crmd[5519]: error: crmd_cs_destroy: connection terminated
Jan 9 09:26:55 trusty-corosync crmd[5519]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Jan 9 09:26:55 trusty-corosync corosync[5492]: [SERV ] Service engine unloaded: corosync configuration service
Jan 9 09:26:55 trusty-corosync attrd[5518]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jan 9 09:26:55 trusty-corosync attrd[5518]: crit: attrd_cs_destroy: Lost connection to Corosync service!
Jan 9 09:26:55 trusty-corosync attrd[5518]: notice: main: Exiting...
Jan 9 09:26:55 trusty-corosync attrd[5518]: error: attrd_cib_connection_destroy: Connection to the CIB terminated...
Jan 9 09:26:55 trusty-corosync pacemakerd[5514]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jan 9 09:26:55 trusty-corosync pacemakerd[5514]: error: mcp_cpg_destroy: Connection destroyed
Jan 9 09:26:55 trusty-corosync stonith-ng[5517]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
...

Regarding my previous comment (#13), I meant to say that stopping corosync or upgrading it showed the same output (I used "error" instead of "output", my bad).

Eric Desrochers (slashd)
no longer affects: pacemaker (Ubuntu)
Changed in corosync (Ubuntu Bionic):
assignee: Victor Tapia (vtapia) → Eric Desrochers (slashd)
Changed in corosync (Ubuntu Artful):
assignee: nobody → Eric Desrochers (slashd)
no longer affects: pacemaker (Ubuntu Trusty)
no longer affects: pacemaker (Ubuntu Xenial)
no longer affects: pacemaker (Ubuntu Zesty)
no longer affects: pacemaker (Ubuntu Artful)
no longer affects: pacemaker (Ubuntu Bionic)
Changed in corosync (Ubuntu Zesty):
assignee: nobody → Eric Desrochers (slashd)
Changed in corosync (Ubuntu Xenial):
assignee: nobody → Eric Desrochers (slashd)
Changed in corosync (Ubuntu Trusty):
assignee: nobody → Victor Tapia (vtapia)
Changed in corosync (Ubuntu Bionic):
importance: Undecided → Medium
Changed in corosync (Ubuntu Artful):
importance: Undecided → Medium
Changed in corosync (Ubuntu Zesty):
importance: Undecided → Medium
Changed in corosync (Ubuntu Xenial):
importance: Undecided → Medium
Changed in corosync (Ubuntu Trusty):
importance: Undecided → Medium
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Currently, in bionic:
$ systemctl cat pacemaker.service
# /lib/systemd/system/pacemaker.service
After=corosync.service
Requires=corosync.service

$ systemctl cat corosync.service
<nothing about pacemaker>

Desired properties:
i) when corosync is started, attempt to start pacemaker
ii) when corosync is restarted, attempt to restart pacemaker too
iii) when corosync is stopped, do not stop pacemaker

1) Property i) can be satisfied with [Install] WantedBy=corosync.service, in pacemaker.service.

2) Requires=corosync.service is too strong, as it means that pacemaker cannot operate without corosync. Is this true?

3) Currently on upgrade corosync prerm script does "stop corosync" and later postinst does "start corosync". My understanding it would be better, on upgrades to simply "restart" corosync, instead of doing stop&start. Please consider switching corosync package to use dh_systemd and use restart-on-upgrade dh_installinit/systemd option.

4) Properties ii) and iii) cannot currently be satisfied simultaneously with simple stanzas. If pacemaker requires corosync at all time, then pacemaker.service should declare PartOf=corosync.service. Then stop/restart of corosync will stop and restart pacemaker. Condition ii) is good. However that will violate condition iii). However, we can instead introduce a helper unit to achieve both ii) and iii) simultaneously. e.g.:

pacemaker-restart.service
[Unit]
PartOf=corosync.service
[Service]
ExecStart=/bin/true
ExecStop=/bin/systemctl restart pacemaker.service
[Install]
WantedBy=corosync.service

This means that whenever corosync is stopped, or restarted, pacemaker.service will be restarted too. This extra unit will satisfy the conditions `ii` and `iii` as stated.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

(maybe actually even /bin/systemctl try-restart pacemaker.service -> restart, if it's running)

Revision history for this message
Victor Tapia (vtapia) wrote :

In my opinion, from the list of desired properties, only the second one is true:
i) Corosync can be used on its own, regardless of having pacemaker installed or not. Starting both of them would force to mask pacemaker's unit file under particular scenarios.
iii) IIRC, pacemaker requires corosync to run, so this property can't happen (in fact pacemaker SIGTERMs its components when corosync is not available).

I like the idea stated at point 3) (restart on upgrade instead of stop+start). It would solve the issue without having to change the unit files.

Regarding Trusty, both corosync and pacemaker currently use sysV scripts. I ran a short test switching to upstart using the scripts in source [1] and it seems to work fine (thanks to the 'respawn' directive for pacemaker).

[1]
master/mcp/pacemaker.upstart.in
master/init/corosync.conf.in

Revision history for this message
Victor Tapia (vtapia) wrote :

I was wrong regarding iii) "when corosync is stopped, do not stop pacemaker": Pacemaker can use other applications[1] (e.g. heartbeat) instead of corosync, so this is a property we want to keep.

Revision history for this message
Victor Tapia (vtapia) wrote :

Forgot the link to Pacemaker's FAQ

[1] https://wiki.clusterlabs.org/wiki/FAQ

Revision history for this message
Chris Gregan (cgregan) wrote :

This issue has been identified as Field Critical. The medium importance assigned to this defect seems counter intuitive to the SLA level assigned.

@Eric
can you provide some insight into a time frame for a fix. Typically under Critical SLA we expect a dedicated engineer and a fix in a week.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in corosync (Ubuntu Artful):
status: New → Confirmed
Changed in corosync (Ubuntu Trusty):
status: New → Confirmed
Changed in corosync (Ubuntu Xenial):
status: New → Confirmed
Changed in corosync (Ubuntu Zesty):
status: New → Confirmed
Revision history for this message
Victor Tapia (vtapia) wrote :

1. Xenial+:

- Overriding dh_installinit[1] would still fail the first time it's upgraded because of the old corosync.prerm file [2], that contains:

# Automatically added by dh_installinit
if [ -x "/etc/init.d/corosync" ] || [ -e "/etc/init/corosync.conf" ]; then
        invoke-rc.d corosync stop || exit $?
fi
# End automatically added section

- After the change, the same file will stop only for removal:

# Automatically added by dh_installinit
if ([ -x "/etc/init.d/corosync" ] || [ -e "/etc/init/corosync.conf" ]) && \
   [ "$1" = remove ]; then
        invoke-rc.d corosync stop || exit $?
fi
# End automatically added section

I still prefer this fix instead of changing how pacemaker and corosync unit files relate to each other.

2. Trusty:

- corosync and pacemaker use sysv scripts (however, upstart files are present in the source for both coro&pace).
- Switching pacemaker to upstart with the respawn stanza should be enough to fix this issue.

[1]
override_dh_installinit:
        dh_installinit --restart-after-upgrade

[2] https://www.debian.org/doc/debian-policy/#details-of-unpack-phase-of-installation-or-upgrade

Eric Desrochers (slashd)
Changed in corosync (Ubuntu Bionic):
assignee: Eric Desrochers (slashd) → Nish Aravamudan (nacc)
Eric Desrochers (slashd)
no longer affects: corosync (Ubuntu Zesty)
Nish Aravamudan (nacc)
tags: removed: regression-update
Revision history for this message
Nish Aravamudan (nacc) wrote :

My findings from today:

1) This situation has always existed on Trusty, afaict. Removing the regression related tag.

2) There are 24 possible combinations to consider (some are by definition green already, but I'm including them for completeness; and some are not achievable) for each release: `service {start,stop,restart} {corosync,pacemaker}` where each of corosync and pacemaker can begin in one of {started,stopped}; 3 * 2 * 2 * 2 = 24.

3) For now, I'm ignoring the case of pacemaker configured to use heartbeat, as that is not the default in the current Ubuntu release.

4) On Trusty, 6 of those combinations are not possible by default (corosync stopped but pacemaker running).

5) On Trusty, the only failing situation I can provoke is `service restart corosync` when corosync and pacemaker are running already. In all other 17 cases, the expected result is obtained with existing packages.

6) I have submitted an MP to the Ubuntu Server Git repository for general review and submitted a build to a PPA at: https://launchpad.net/~nacc/+archive/ubuntu/lp1740892/, which adds a manual SysV start of pacemaker in corosync's SysV restart logic, if pacemaker was running before corosync was restarted. I think this is the least likely path to affect any existing configurations. In particularly, this does not affect the corosync start path, which may or may not have previously started pacemaker (that is a local configuration decision, afaict).

7) In my investigation (this relates to xnox's and other's comments), there is no SysV link between pacemaker and corosync. Instead, pacemaker itself quits due to not finding corosync if it's not already started. This is why the SysV do_stop routine for corosync ends up resulting in pacemaker stopping.

Revision history for this message
Nish Aravamudan (nacc) wrote :

8) On Xenial and on, where we have systemd .service files for both corosync and pacemaker, this is my current understanding (after looking at the .service files and thinking about the Trusty case, as well; some of this is repeats of others' observations):

8a) corosync can be installed configured without pacemaker, so there is no direct reverse-dependency on pacemaker for corosync.

8b) pacemaker.service already is After= and Requires= corosync.service. This makes me less interested in the heartbeat case for 16.04+, as that would imply administrator customization (afaict).

8c) The above stanza elements only ensure that when pacemaker.service is started, that it is after corosync.service (After=) and ensures corosync.service is running (Requires=). There is no expression that would currently assert that corosync.service should start pacemaker.service (and as mentioned in 8a, as well as based upon Trusty, that would be incorrect).

9) It seems like what we want to express for systemd is that when corosync.service is restarted, that pacemaker.service is restarted, if it was running before; similar to the proposed Trusty change. I believe PartOf would achieve this correctly, as stops are safe to propogate from corosync to pacemaker as the systemd unit files by default have the Requires= dependency already.

10) invoke-rc.d is the means by which both Trusty and Xenial+ end up stopping corosync.service (prerm) and starting it (postinst) in the maintscripts. xnox is right, I believe, that if the scripts were updated to restart corosync.service on upgrade, rather than manually stopping and starting them, this would be solved in coordination the PartOf= change. However, if that is not doable in the SRU, then it might make sense to check the status of pacemakerd in the prerm (before we stop corosync) and ensure it is back in that state in the postinst (after we start corosync).

11) I think the correct thing to do for Xenial is set PartOf in the pacemaker service file.

12) For both Trusty and Xenial (the actual change may be different), we want to ensure the corosync postinst correctly restores the state of pacemaker regardless of the init system in place.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Sorry for the long-winded, and delayed update to the bug!

Here's my TL;DR:

1) We want the postinst of corosync to be created with dh_installinit --restart-on-upgrade, which is the default in compat levels <= 10.

2) We want the init scripts (of whatever type) to restart pacemaker, if they restart corosync and if pacemaker was running before corosync was restarted.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Ok, I've put up MPs for Trusty (just corosync) and Xenial (corosync and pacemaker). I think that's correct, and the underlying bug here (package upgrade of corosync does not lead to pacemaker restarting) should be resolved in both cases [1) in c#33]. Additionally, case 2) in c#33 is resolved in both cases, but requires different changes in each release.

I will propose similar MPs for bionic as are in xenial.

The corresponding packages are building in the PPA currentlly.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Testing on Trusty:

# apt-get install corosync pacemaker
# Make corosync start at boot
# sed -i 's/no/yes/' /etc/default/corosync
# Make pacemaker start at boot
# update-rc.d pacemaker defaults
# reboot

# service corosync status; service pacemaker status
 * corosync is running
pacemakerd (pid 1927) is running...

Add PPA and upgrade corosync:

# add-apt-repository ppa:nacc/lp1740892
# apt-get update; apt-get install corosync

# service corosync status; service pacemaker status
 * corosync is running
pacemakerd is stopped

So what is in my PPA is not yet a fix and I think I see why:

Preparing to unpack .../corosync_2.3.3-1ubuntu4.1~ppa3_amd64.deb ...
 * Stopping corosync daemon corosync
...
Setting up corosync (2.3.3-1ubuntu4.1~ppa3) ...
Installing new version of config file /etc/init.d/corosync ...
 * Restarting corosync daemon corosync warning [MAIN ] Could not lock memory of service to avoid page faults: Cannot allocate memory (12)

So the postinst change is correct and we now restart corosync instead of start it. However, because the old package's prerm is run, that leads to a stop of corosync which in turn causes pacemaker to exit. When we run our updated init-script, it does not detect that pacemaker is running and so does not restart it.

Changed in corosync (Debian):
status: Unknown → New
Revision history for this message
Nish Aravamudan (nacc) wrote :

Big thanks to Robie Basak for providing some feedback and discussion on IRC and in the MP.

We came up with the following, which I'm currently testing, to try and resolve this issue:

In addition to all the changes currently in the MP(s), modify:

corosync to Breaks: on older pacemaker versions than the one we are going to provide in this update
pacemaker's preinst to mark via a file in /run if pacemaker is running, if upgrading from an older version of pacemaker
corosync's postinst to check the file in /run and start pacemaker, if upgrading from an older version of corosync

The effect of these changes together is to force corosync to upgrade pacemaker (via the Breaks) and for pacemaker to indicate to corosync whether it should start pacemaker in the maintainer scripts.

The currently building versions for Trusty in my PPA (corosync = 2.3.3-1ubuntu4.1~ppa4 and pacemaker=1.1.10+git20130802-1ubuntu2.5~ppa1) are meant to contain these additional changes and upgrade together. I will test them once they are built.

Revision history for this message
Nish Aravamudan (nacc) wrote :

I tested the versions mentioned in the last comment (with one syntax fix) and the upgrade path successfully worked! I need to test more corner-cases and would appreciate help with that (e.g., corosync only installed, installing pacemaker separately, etc)

The MPs have been updated and I'm building packages that are source-identical to the MPs as

corosync - 2.3.3-1ubuntu4.1~ppa6

pacemaker - 1.1.10+git20130802-1ubuntu2.5~ppa3

The reason for the repeated version bumps is these packages now have inter-related versioning.

I will try and do a similar set of MPs for xenial and bionic first thing tomorrow, if not later today.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Xenial MPs updated and packages uploaded to PPA:

 corosync - 2.3.5-3ubuntu2.1~ppa2

pacemaker - 1.1.14-2ubuntu1.4~ppa2

Bionic will have to wait til tomorrow.

Revision history for this message
Nish Aravamudan (nacc) wrote :

After hitting some corner cases with my Trusty packages (and Xenial), I uploaded new versions:

corosync - 2.3.3-1ubuntu4.1~ppa7

pacemaker - 1.1.10+git20130802-1ubuntu2.5~ppa4

And ran the following tests:

1) install corosync and pacemaker, but do not enable either
Upgrade to the PPA packages
No errors, both are still disabled

2) install corosync and pacemaker, enable corosync only
Upgrade to the PPA packages
No errors, corosyc is restarted

3) install corosync and pacemaker, enable both
Upgrade to the PPA packages
No errors, corosync and pacemaker restarted

I'm moving on to update the xenial branches now.

Revision history for this message
Nish Aravamudan (nacc) wrote :

4) a case I am not sure what to do with yet (in Trusty or Xenial): stopping pacemaker and then installing the PPA packages. I'm not sure how to handle this, really, for 16.04, as the deb-helper scripts don't seem to have status wrappers and calling systemctl directly does not apparently work. The end result is that pacemaker will be started in these cases, which does not seem directly harmful, but I'm not 100% (it would also presumablly start on next reboot anyways).

Revision history for this message
Nish Aravamudan (nacc) wrote :

On Xenial:

corosync 2.3.5-3ubuntu2.1~ppa3
pacemaker 1.1.14-2ubuntu1.4~ppa3

The same 3 test cases as in c#39 alll passed.

Revision history for this message
Nish Aravamudan (nacc) wrote :

@slashd, @vtapia: I would appreciate additional testing and I will work on the bionic uploads on Monday.

Revision history for this message
Eric Desrochers (slashd) wrote :
Download full text (7.7 KiB)

[VERIFICATION for XENIAL]

---------------------------------------------
[UPGRADE SCENARIO]

# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.3.5-3ubuntu2 amd64 cluster engine daemon and utilities
ii crmsh 2.2.0-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.3.5-3ubuntu2 amd64 cluster engine common library
ii pacemaker 1.1.14-2ubuntu1.3 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.14-2ubuntu1.3 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.14-2ubuntu1.3 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.14-2ubuntu1.3 all cluster resource manager general resource agents

# pidof pacemakerd
3647

# pidof corosync
1283

# sudo add-apt-repository ppa:nacc/lp1740892
# sudo apt-get update

# apt-get install corosync -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  pacemaker
Suggested packages:
  fence-agents
The following packages will be upgraded:
  corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 43 not upgraded.
Need to get 765 kB of archives.
After this operation, 1024 B of additional disk space will be used.
Get:1 http://ppa.launchpad.net/nacc/lp1740892/ubuntu xenial/main amd64 pacemaker amd64 1.1.14-2ubuntu1.4~ppa3 [403 kB]
Get:2 http://ppa.launchpad.net/nacc/lp1740892/ubuntu xenial/main amd64 corosync amd64 2.3.5-3ubuntu2.1~ppa3 [361 kB]
Fetched 765 kB in 1s (488 kB/s)
(Reading database ... 28089 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.14-2ubuntu1.4~ppa3_amd64.deb ...
Unpacking pacemaker (1.1.14-2ubuntu1.4~ppa3) over (1.1.14-2ubuntu1.3) ...
Preparing to unpack .../corosync_2.3.5-3ubuntu2.1~ppa3_amd64.deb ...
Unpacking corosync (2.3.5-3ubuntu2.1~ppa3) over (2.3.5-3ubuntu2) ...
Processing triggers for systemd (229-4ubuntu21) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up corosync (2.3.5-3ubuntu2.1~ppa3) ...
Setting up pacemaker (1.1.14-2ubuntu1.4~ppa3) ...

# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.3.5-3ubuntu2.1~ppa3 amd64 cluster engine daemon and utilities
ii crmsh 2.2.0-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.3.5-3ubuntu2 amd64 cluster engine common library
ii pacemaker 1.1.14-2ubuntu1.4~ppa3 amd64 cluster resource manager
ii pacemaker-cli-utils ...

Read more...

Revision history for this message
Nish Aravamudan (nacc) wrote :

Sorry for the long delay on my end!

I have pushed up MPs for the correct resolution (I think) for Bionic (incl. appropriate comments of what can be dropped after B+1 opens).

I am building them in my PPA (pacemaker 1.1.18~rc4-1ubuntu1~ppa1 and corosync 2.4.2-3ubuntu1~ppa1) now and will test the following scenarios:

(to level-set)
X -> B [should be broken]

X -> B + PPA [should work]

A -> B [may or may not be broken, because there is not a corosync version change]

A -> B + PPA [should work]

as well as the prior cases of fresh install in B and reinstall in B.

Additionally, we should be able to test starting/stopping/restarting of corosync in B successfully doing the same state to pacemaker.

Presuming these tests pass and the Canonical Server Team reviews and approves them, I will upload them this week.

Eric & co. at that point, I'm wondering if perhaps your team can pick up the SRUs to X, A and T? I think X and A will take the same changes. As we discussed, we would do the minimum required for the older releases, as in my MPs already up. The only thing currently missing is a debconf note prompt, I think, that says pacemaker will have been stopped by the corosync upgrade and will need to be manually restarted.

Revision history for this message
Nish Aravamudan (nacc) wrote :

I did some cursory testing yesterday (I still need to think about how to do a X -> B upgrade w/ a PPA) of B -> B updates. pacemaker stays running as expected. I am checking (via the MP, a comment recently) if it's possible to minimize our changes even more in 18.04.

Eric Desrochers (slashd)
Changed in corosync (Ubuntu Artful):
assignee: Eric Desrochers (slashd) → Nish Aravamudan (nacc)
Changed in corosync (Ubuntu Xenial):
assignee: Eric Desrochers (slashd) → Nish Aravamudan (nacc)
Changed in corosync (Ubuntu Trusty):
assignee: Victor Tapia (vtapia) → Nish Aravamudan (nacc)
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Hello

I'm fixing the other pacemaker/glib bug[1]. but it is blocked by this issue.

It would be good if there is good news for this issue i hope.

Thanks.

[1] https://bugs.launchpad.net/bugs/1316970

Revision history for this message
Nish Aravamudan (nacc) wrote :

I have uploaded the fixes today to Bionic. I will update the bug once they are migrated, but the SRUs should be startable now.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 2.4.2-3ubuntu1

---------------
corosync (2.4.2-3ubuntu1) bionic; urgency=medium

  * Properly restart corosync and pacemaker together (LP: #1740892)
    - d/rules: pass --restart-after-upgrade to dh_installinit
    - d/control: indicate this version breaks all older pacemaker, to
      force an upgrade of pacemaker.
    - d/corosync.postinst: if flagged to do so by pacemaker, start
      pacemaker on upgrade.
      + This can be dropped after bionic releases as long as the other
        changes are maintained.

 -- Nishanth Aravamudan <email address hidden> Tue, 30 Jan 2018 16:40:31 -0800

Changed in corosync (Ubuntu Bionic):
status: In Progress → Fix Released
Revision history for this message
Eric Desrochers (slashd) wrote :

I'll proceed with the SRU next week.

As per my discussion with server team, we will fix Xenial and Artful but we won't fix Trusty.

Changed in corosync (Ubuntu Artful):
assignee: Nish Aravamudan (nacc) → Eric Desrochers (slashd)
Changed in corosync (Ubuntu Xenial):
assignee: Nish Aravamudan (nacc) → Eric Desrochers (slashd)
Changed in corosync (Ubuntu Trusty):
status: Confirmed → Won't Fix
Changed in corosync (Ubuntu Artful):
status: Confirmed → In Progress
Changed in corosync (Ubuntu Xenial):
status: Confirmed → In Progress
tags: added: id-5a53cc961fb7361dbac726f8
Eric Desrochers (slashd)
Changed in pacemaker (Ubuntu):
status: New → In Progress
status: In Progress → Fix Released
assignee: nobody → Nish Aravamudan (nacc)
importance: Undecided → Medium
no longer affects: corosync (Ubuntu Artful)
no longer affects: corosync (Ubuntu Xenial)
no longer affects: corosync (Ubuntu Trusty)
Changed in corosync (Ubuntu Trusty):
assignee: nobody → Eric Desrochers (slashd)
importance: Undecided → Medium
status: New → Won't Fix
assignee: Eric Desrochers (slashd) → Nish Aravamudan (nacc)
Changed in pacemaker (Ubuntu Trusty):
assignee: nobody → Nish Aravamudan (nacc)
importance: Undecided → Medium
status: New → Won't Fix
Changed in corosync (Ubuntu Xenial):
assignee: nobody → Eric Desrochers (slashd)
importance: Undecided → High
status: New → In Progress
Changed in corosync (Ubuntu Artful):
assignee: nobody → Eric Desrochers (slashd)
importance: Undecided → High
status: New → In Progress
Changed in pacemaker (Ubuntu Xenial):
assignee: nobody → Eric Desrochers (slashd)
importance: Undecided → High
status: New → In Progress
Changed in pacemaker (Ubuntu Artful):
assignee: nobody → Eric Desrochers (slashd)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Eric Desrochers (slashd) wrote :
Download full text (4.5 KiB)

[XENIAL (pre-SRU)]

== BEFORE UPGRADE ==

# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.3.5-3ubuntu2 amd64 cluster engine daemon and utilities
ii crmsh 2.2.0-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.3.5-3ubuntu2 amd64 cluster engine common library
ii pacemaker 1.1.14-2ubuntu1.3 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.14-2ubuntu1.3 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.14-2ubuntu1.3 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.14-2ubuntu1.3 all cluster resource manager general resource agents

# systemctl status corosync | egrep "Active:|Main PID"
   Active: active (running) since Mon 2018-02-19 15:14:44 UTC; 16min ago
 Main PID: 3228 (corosync)

# systemctl status pacemaker | egrep "Active:|Main PID"
   Active: active (running) since Mon 2018-02-19 15:14:44 UTC; 16min ago
 Main PID: 3321 (pacemakerd)

== UPGRADE ==

# apt-cache policy corosync
corosync:
  Installed: 2.3.5-3ubuntu2
  Candidate: 2.3.5-3ubuntu2.1
  Version table:
     2.3.5-3ubuntu2.1 500
        500 http://ppa.launchpad.net/slashd/test/ubuntu xenial/main amd64 Packages
 *** 2.3.5-3ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status

# apt-get install corosync
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  pacemaker
Suggested packages:
  fence-agents
The following packages will be upgraded:
  corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 55 not upgraded.
Need to get 766 kB of archives.
After this operation, 2048 B of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://ppa.launchpad.net/slashd/test/ubuntu xenial/main amd64 pacemaker amd64 1.1.14-2ubuntu1.4 [404 kB]
Get:2 http://ppa.launchpad.net/slashd/test/ubuntu xenial/main amd64 corosync amd64 2.3.5-3ubuntu2.1 [361 kB]
Fetched 766 kB in 1s (507 kB/s)
(Reading database ... 28089 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.14-2ubuntu1.4_amd64.deb ...
Unpacking pacemaker (1.1.14-2ubuntu1.4) over (1.1.14-2ubuntu1.3) ...
Preparing to unpack .../corosync_2.3.5-3ubuntu2.1_amd64.deb ...
Unpacking corosync (2.3.5-3ubuntu2.1) over (2.3.5-3ubuntu2) ...
Processing triggers for systemd (229-4ubuntu21) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up corosync (2.3.5-3ubuntu2.1) ...
Setting up pacemaker (1.1.14-2ubuntu1.4) ...

== AFTER UPGRADE ==

# dpkg -l | egrep "corosync|pacemaker"
ii cor...

Read more...

Eric Desrochers (slashd)
description: updated
Eric Desrochers (slashd)
description: updated
Revision history for this message
Eric Desrochers (slashd) wrote :

[ARTFUL pacemaker - note ]

pacemaker package doesn't build (doesn't build as is without any changes) in Artful. I suspect the package has been first build in Zesty and then simply copied to Artful when the release was first created (without going through the build farm again). Additionally, no change has been made in Artful since the suspected Zesty->Artful copy so no one could have notice the FTBFS situation before today.

IMHO, the above is the most plausible reason explaining why we have a binary package for Artful even though it fails to build in Artful.

# buildlog : https://launchpadlibrarian.net/357638704/buildlog_ubuntu-artful-amd64.pacemaker_1.1.16-1ubuntu1_BUILDING.txt.gz
...
crm_mon.c: In function ‘print_nvpair’:
crm_mon.c:959:30: error: comparison between pointer and zero character constant [-Werror=pointer-compare]
         for (c = date_str; c != '\0'; ++c) {
                              ^~
crm_mon.c:959:28: note: did you mean to dereference the pointer?
         for (c = date_str; c != '\0'; ++c) {
... ^
cc1: all warnings being treated as errors
Makefile:1028: recipe for target 'crm_mon.o' failed
make[3]: *** [crm_mon.o] Error 1
make[3]: *** Waiting for unfinished jobs...
...
dh_auto_build: make -j4 returned exit code 2
debian/rules:31: recipe for target 'override_dh_auto_build-indep' failed
make[1]: *** [override_dh_auto_build-indep] Error 2
make[1]: Leaving directory '/<<PKGBUILDDIR>>'
debian/rules:15: recipe for target 'build' failed
make: *** [build] Error 2
dpkg-buildpackage: error: debian/rules build gave error exit status 2

Revision history for this message
Eric Desrochers (slashd) wrote :

I *think* this debian commit could possibly be a good candidate to fix Artful FTBFS situation.

-------
$ git show a7476dd9
commit a7476dd96e79197f65acf0f049f75ce8e8f9e801
Author: Jan Pokorny <email address hidden>
Date: Thu Feb 2 14:51:46 2017 +0100

    Fix: crm_mon: protect against non-standard or failing asctime

    So far, we have been likely covered by standards requiring asctime to
    produce an output ending with \n\0 bytes, because otherwise, we would
    overrun the buffer, reading unspecified content, possibly segfaulting.
    This was actually discovered with a brand new GCC7 warning
    ( [-Werror=pointer-compare]).

    Another latent issue was that the code was not ready for the case
    of failing asctime call (returning NULL). This is now fixed as well.

diff --git a/tools/crm_mon.c b/tools/crm_mon.c
index 776aea8..023b07b 100644
--- a/tools/crm_mon.c
+++ b/tools/crm_mon.c
@@ -954,10 +954,10 @@ print_nvpair(FILE *stream, const char *name, const char *value,

     /* Otherwise print user-friendly time string */
     } else {
- char *date_str, *c;
+ static char empty_str[] = "";
+ char *c, *date_str = asctime(localtime(&epoch_time));

- date_str = asctime(localtime(&epoch_time));
- for (c = date_str; c != '\0'; ++c) {
+ for (c = (date_str != NULL) ? date_str : empty_str; *c != '\0'; ++c) {
             if (*c == '\n') {
                 *c = '\0';
                 break;

-------

I'll give it a try and update the case with the outcome of my test.

- Eric

Revision history for this message
Eric Desrochers (slashd) wrote :

It still FTBFS with the above Debian candidate commit ^

- Eric

Revision history for this message
Eric Desrochers (slashd) wrote :

Forgot the mentioned that It still fails to build, but it fails differently now than before applying debian commit "a7476dd96e79197f65acf0f049f75ce8e8f9e801"

Revision history for this message
Nish Aravamudan (nacc) wrote :

Based upon discussion, we will not be fixing this in Trusty. Updating the SysV scripts to be more error-proof is itself error-prone. Additionally, corosync on Trusty has not received significant updates, and this update itself would lead to at least an additional outage (in order to put the fix into place).

If, in the future, an additional security/bugfix update was to be applied to Trusty for corosync, we should revisit this.

Revision history for this message
Eric Desrochers (slashd) wrote :
Revision history for this message
Eric Desrochers (slashd) wrote :

Quick update ....

The corosync/pacemaker SRU is on hold for now until the FBTFS situation is fix for Artful.

As mentioned above in comment #58 based on the build log error I had and the debbug #869986, it is related to some libqb header issues.

Server team will have a look at this, and I'll then resume the SRU once completed.

Revision history for this message
Eric Desrochers (slashd) wrote :

another quick update base on a discussion between nacc/slangasek and myself :

...
<nacc> slangasek: fair, above patch results in https://paste.ubuntu.com/p/hb68G8rpMw/
<slangasek> nacc: those are pretty clearly internal symbols which are not part of the ABI and you should just mark them (optional) instead of doing an architecture-based exclusion list
<slangasek> and when I say mark them (optional), I mean mark them '(optional)'
<slangasek> nacc: dropping the symbols, or marking them optional, both valid. (optional) would make the same source package more cleanly backportable to older toolchains
<slangasek> but a symbol that starts with a __ and isn't listed in the public headers for the library, and especially that doesn't originate in the source of this library, can be assumed safe to drop from .symbols

Revision history for this message
Eric Desrochers (slashd) wrote :

Was able to build fine using (optional)

--------------------
Standard symbol tags

optional

A symbol marked as optional can disappear from the library at any time and that will never cause dpkg-gensymbols to fail. However, disappeared optional symbols will continuously appear as MISSING in the diff in each new package revision. This behaviour serves as a reminder for the maintainer that such a symbol needs to be removed from the symbol file or readded to the library. When the optional symbol, which was previously declared as MISSING, suddenly reappears in the next revision, it will be upgraded back to the “existing” status with its minimum version unchanged.

This tag is useful for symbols which are private where their disappearance do not cause ABI breakage. For example, most of C++ template instantiations fall into this category. Like any
other tag, this one may also have an arbitrary value: it could be used to indicate why the symbol is considered optional.
--------------------

Revision history for this message
Eric Desrochers (slashd) wrote :

I'll resume the SRU and hopefully upload everything next Monday.

Revision history for this message
Eric Desrochers (slashd) wrote :
Download full text (4.1 KiB)

[Artful (pre-sru)]

# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.4.2-3build1 amd64 cluster engine daemon and utilities
ii crmsh 2.3.2-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.4.2-3build1 amd64 cluster engine common library
ii pacemaker 1.1.16-1ubuntu1 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.16-1ubuntu1 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.16-1ubuntu1 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.16-1ubuntu1 all cluster resource manager general resource agents

# systemctl status corosync | egrep -i "Active:|pid"
   Active: active (running) since Mon 2018-02-26 15:23:37 UTC; 15min ago
 Main PID: 8943 (corosync)

# systemctl status pacemaker | egrep -i "Active:|pid"
   Active: active (running) since Mon 2018-02-26 15:23:39 UTC; 15min ago
 Main PID: 9033 (pacemakerd)

# apt-get install corosync -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  pacemaker
Suggested packages:
  fence-agents
The following packages will be upgraded:
  corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 41 not upgraded.
Need to get 768 kB of archives.
After this operation, 11.3 kB of additional disk space will be used.
Get:1 http://ppa.launchpad.net/slashd/lp1740892/ubuntu artful/main amd64 pacemaker amd64 1.1.16-1ubuntu2 [389 kB]
Get:2 http://ppa.launchpad.net/slashd/lp1740892/ubuntu artful/main amd64 corosync amd64 2.4.2-3ubuntu0.17.10.1 [379 kB]
Fetched 768 kB in 1s (420 kB/s)
(Reading database ... 29268 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.16-1ubuntu2_amd64.deb ...
Unpacking pacemaker (1.1.16-1ubuntu2) over (1.1.16-1ubuntu1) ...
Preparing to unpack .../corosync_2.4.2-3ubuntu0.17.10.1_amd64.deb ...
Unpacking corosync (2.4.2-3ubuntu0.17.10.1) over (2.4.2-3build1) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (234-2ubuntu12.1) ...
Setting up corosync (2.4.2-3ubuntu0.17.10.1) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up pacemaker (1.1.16-1ubuntu2) ...

# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.4.2-3ubuntu0.17.10.1 amd64 cluster engine daemon and utilities
ii crmsh 2.3.2-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.4.2-3build1...

Read more...

Revision history for this message
Eric Desrochers (slashd) wrote :

Uploaded for Xenial and Artful, it is now waiting in the upload queue for SRU verification team approval.

Eric Desrochers (slashd)
description: updated
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Drew, or anyone else affected,

Accepted corosync into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/2.4.2-3ubuntu0.17.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in corosync (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-artful
Revision history for this message
Eric Desrochers (slashd) wrote :

I had to re-uploader pacemaker for artful.

at one point someone tried to push 1.1.17 into Artful and it then got deleted, which leave me with no other choice but to re-upload with a version greater than 1.1.17 for the SRU machinery to accept the changes.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Drew, or anyone else affected,

Accepted pacemaker into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/1.1.17+really1.1.16-1ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pacemaker (Ubuntu Artful):
status: In Progress → Fix Committed
Changed in corosync (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Drew, or anyone else affected,

Accepted corosync into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/corosync/2.3.5-3ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pacemaker (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Drew, or anyone else affected,

Accepted pacemaker into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/1.1.14-2ubuntu1.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Eric Desrochers (slashd) wrote :
Download full text (10.7 KiB)

[VERIFICATION XENIAL]

* == corosync upgrade (with pacemaker installed) ==

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial

# dpkg -l
ii corosync 2.3.5-3ubuntu2 amd64 cluster engine daemon and utilities
ii pacemaker 1.1.14-2ubuntu1.3 amd64 cluster resource manager

# systemctl status corosync | egrep "Active:|PID:"
Active: active (running) since Tue 2018-02-27 13:38:49 UTC; 1min 19s ago
Main PID: 3214 (corosync)

# systemctl status pacemaker | egrep "Active:|PID:"
Active: active (running) since Tue 2018-02-27 13:38:50 UTC; 1min 23s ago
Main PID: 3307 (pacemakerd)

# apt-get install corosync -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
pacemaker
Suggested packages:
fence-agents
The following packages will be upgraded:
corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 44 not upgraded.
Need to get 502 kB of archives.
After this operation, 2048 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64 pacemaker amd64 1.1.14-2ubuntu1.4 [334 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64 corosync amd64 2.3.5-3ubuntu2.1 [168 kB]
Fetched 502 kB in 1s (373 kB/s)
(Reading database ... 28101 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.14-2ubuntu1.4_amd64.deb ...
Unpacking pacemaker (1.1.14-2ubuntu1.4) over (1.1.14-2ubuntu1.3) ...
Preparing to unpack .../corosync_2.3.5-3ubuntu2.1_amd64.deb ...
Unpacking corosync (2.3.5-3ubuntu2.1) over (2.3.5-3ubuntu2) ...
Processing triggers for systemd (229-4ubuntu21.1) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up corosync (2.3.5-3ubuntu2.1) ...
Setting up pacemaker (1.1.14-2ubuntu1.4) ...

# dpkg
ii corosync 2.3.5-3ubuntu2.1 amd64 cluster engine daemon and utilities
ii pacemaker 1.1.14-2ubuntu1.4 amd64 cluster resource manager

# systemctl status corosync | egrep "Active:|PID:"
Active: active (running) since Tue 2018-02-27 13:42:54 UTC; 23s ago
Main PID: 6562 (corosync)

# systemctl status pacemaker | egrep "Active:|PID:"
Active: active (running) since Tue 2018-02-27 13:42:54 UTC; 25s ago
Main PID: 6652 (pacemakerd)

* == corosync upgrade (with pacemaker not installed) ==

# systemctl status corosync
   Active: active (running) since Tue 2018-02-27 14:02:51 UTC; 17s ago
 Main PID: 1488 (corosync)

# apt-get install corosync -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'apt autoremove' to remove it.
The following packages will be upgraded:
  corosync
1 upgraded, 0 newly installed, 0 to remove and 36 not upgraded.
Need to get 168 kB of archives.
After this operation, 1024 B of additional disk space will be used.
Get:1 http:/...

Eric Desrochers (slashd)
tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Eric Desrochers (slashd) wrote :

[VERIFICATION ARTFUL]

Upgrade went well, and have restarted pacemaker on a corosync installation as it should.

# systemctl status corosync
   Active: active (running) since Thu 2018-03-01 15:18:08 UTC; 2min 37s ago
 Main PID: 2366 (corosync)

systemctl status pacemaker
   Active: active (running) since Thu 2018-03-01 15:18:10 UTC; 2min 44s ago
 Main PID: 2456 (pacemakerd)

# apt-get install corosync -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  pacemaker
Suggested packages:
  fence-agents
The following packages will be upgraded:
  corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 36 not upgraded.
Need to get 486 kB of archives.
After this operation, 11.3 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu artful-proposed/main amd64 pacemaker amd64 1.1.17+really1.1.16-1ubuntu2 [314 kB]
Get:2 http://archive.ubuntu.com/ubuntu artful-proposed/main amd64 corosync amd64 2.4.2-3ubuntu0.17.10.1 [172 kB]
Fetched 486 kB in 2s (239 kB/s)
(Reading database ... 29280 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.17+really1.1.16-1ubuntu2_amd64.deb ...
Unpacking pacemaker (1.1.17+really1.1.16-1ubuntu2) over (1.1.16-1ubuntu1) ...
Preparing to unpack .../corosync_2.4.2-3ubuntu0.17.10.1_amd64.deb ...
Unpacking corosync (2.4.2-3ubuntu0.17.10.1) over (2.4.2-3build1) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (234-2ubuntu12.1) ...
Setting up corosync (2.4.2-3ubuntu0.17.10.1) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up pacemaker (1.1.17+really1.1.16-1ubuntu2) ...

# systemctl status corosync | egrep "Active:|PID"
   Active: active (running) since Thu 2018-03-01 15:21:05 UTC; 17s ago
 Main PID: 3091 (corosync)

# systemctl status pacemaker | egrep "Active:|PID"
   Active: active (running) since Thu 2018-03-01 15:21:05 UTC; 21s ago
 Main PID: 3273 (pacemakerd)

tags: added: verification-done-artful
removed: verification-needed-artful
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for corosync has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 2.4.2-3ubuntu0.17.10.1

---------------
corosync (2.4.2-3ubuntu0.17.10.1) artful; urgency=high

  * Properly restart corosync and pacemaker together (LP: #1740892)
    - d/rules: pass --restart-after-upgrade to dh_installinit
    - d/control: indicate this version breaks all older pacemaker, to
      force an upgrade of pacemaker.
    - d/corosync.postinst: if flagged to do so by pacemaker, start
      pacemaker on upgrade.

 -- Eric Desrochers <email address hidden> Mon, 26 Feb 2018 08:49:19 -0500

Changed in corosync (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pacemaker - 1.1.17+really1.1.16-1ubuntu2

---------------
pacemaker (1.1.17+really1.1.16-1ubuntu2) artful; urgency=medium

  * Rebuilding with a version greater than 1.1.17:
    - pacemaker 1.1.17 was pushed to Artful at one point and then got deleted.

pacemaker (1.1.16-1ubuntu2) artful; urgency=high

  * Fix FBTFS situations
    - d/p/crm-mon-protect-against-non-standard-or-failing-asctime.patch:
    Fix: crm_mon: protect against non-standard or failing asctime.

    - d/libcrmservice3.symbols: Mark symbols optional to avoid FTBFS
    with newer toolchains and make it cleanly backportable to older toolchains.

  * Properly restart corosync and pacemaker together (LP: #1740892)
    - d/pacemaker.preinst: flag corosync to restart pacemaker on upgrade.

 -- Eric Desrochers <email address hidden> Mon, 26 Feb 2018 12:36:37 -0500

Changed in pacemaker (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package corosync - 2.3.5-3ubuntu2.1

---------------
corosync (2.3.5-3ubuntu2.1) xenial; urgency=high

  * Properly restart corosync and pacemaker together (LP: #1740892)
    - d/rules: pass --restart-after-upgrade to dh_installinit
    - d/control: indicate this version breaks all older pacemaker, to
      force an upgrade of pacemaker.
    - d/corosync.postinst: if flagged to do so by pacemaker, start
      pacemaker on upgrade.

 -- Eric Desrochers <email address hidden> Mon, 19 Feb 2018 09:28:34 -0500

Changed in corosync (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pacemaker - 1.1.14-2ubuntu1.4

---------------
pacemaker (1.1.14-2ubuntu1.4) xenial; urgency=high

  * Properly restart corosync and pacemaker together (LP: #1740892)
    - d/pacemaker.preinst: flag corosync to restart pacemaker on
      upgrade.

 -- Eric Desrochers <email address hidden> Mon, 19 Feb 2018 09:37:35 -0500

Changed in pacemaker (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in corosync (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.