Comment 2 for bug 1931588

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

TL;DR: this seems to be an intended change of behavior in corosync and we need to adapt the charm.

Details:

On hirsute `crm -w -F node delete <node-name>` fails with "Transport endpoint is not connected" if <node-name> isn't reachable, and so the `update-ring` action is unusable. On groovy we don't have the issue.

The most relevant packages are:

 crmsh | 4.2.0-3ubuntu1 | groovy
 crmsh | 4.2.0-4ubuntu1 | hirsute
 corosync | 3.0.3-2ubuntu3.1 | groovy-updates
 corosync | 3.1.0-2ubuntu3 | hirsute
 pacemaker | 2.0.4-2ubuntu3.2 | groovy-updates
 pacemaker | 2.0.5-2ubuntu1 | hirsute
 pacemaker-cli-utils | 2.0.4-2ubuntu3.2 | groovy-updates
 pacemaker-cli-utils | 2.0.5-2ubuntu1 | hirsute

If I deploy a groovy bundle [0], then upgrade all packages above **EXCEPT corosync** on each node:
# sed s/groovy/hirsute/g /etc/apt/sources.list > /etc/apt/sources.list.d/hirsute.list
# apt update
# apt install pacemaker libpe-status28 libpe-rules26 libcib27 libcrmservice28 libcrmcluster29 libpacemaker1 pacemaker-cli-utils crmsh liblrmd28
Get:1 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libfakeroot amd64 1.25.3-1.1ubuntu2 [28.1 kB]
Get:2 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 fakeroot amd64 1.25.3-1.1ubuntu2 [62.9 kB]
Get:3 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 locales all 2.33-0ubuntu5 [3876 kB]
Get:4 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libc6 amd64 2.33-0ubuntu5 [2690 kB]
Get:5 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libc-bin amd64 2.33-0ubuntu5 [646 kB]
Get:6 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libc-dev-bin amd64 2.33-0ubuntu5 [19.3 kB]
Get:7 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libc6-dev amd64 2.33-0ubuntu5 [2143 kB]
Get:8 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute-updates/main amd64 libnettle8 amd64 3.7-2.1ubuntu1.1 [146 kB]
Get:9 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libgnutls30 amd64 3.7.1-3ubuntu1 [902 kB]
Get:10 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libqb100 amd64 2.0.2-1 [66.9 kB]
Get:11 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libcrmcommon34 amd64 2.0.5-2ubuntu1 [175 kB]
Get:12 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libpe-rules26 amd64 2.0.5-2ubuntu1 [27.7 kB]
Get:13 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libcib27 amd64 2.0.5-2ubuntu1 [50.6 kB]
Get:14 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libcrmservice28 amd64 2.0.5-2ubuntu1 [38.1 kB]
Get:15 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libstonithd26 amd64 2.0.5-2ubuntu1 [39.6 kB]
Get:16 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 liblrmd28 amd64 2.0.5-2ubuntu1 [30.8 kB]
Get:17 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libpe-status28 amd64 2.0.5-2ubuntu1 [150 kB]
Get:18 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libpacemaker1 amd64 2.0.5-2ubuntu1 [167 kB]
Get:19 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 crmsh all 4.2.0-4ubuntu1 [500 kB]
Get:20 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 pacemaker amd64 2.0.5-2ubuntu1 [314 kB]
Get:21 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 pacemaker-cli-utils amd64 2.0.5-2ubuntu1 [161 kB]
Get:22 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 libcrmcluster29 amd64 2.0.5-2ubuntu1 [43.7 kB]
# systemctl restart corosync
# systemctl restart pacemaker

then I still don't hit the issue:
functest-test -t zaza.openstack.charm_tests.hacluster.tests.HaclusterScaleBackAndForthTest -m <model-name>

But if I now also upgrade corosync to 3.1.0-2ubuntu3 on each node:
# apt install corosync
Get:1 http://nova.clouds.archive.ubuntu.com/ubuntu hirsute/main amd64 corosync amd64 3.1.0-2ubuntu3 [237 kB]
# systemctl restart corosync
# systemctl restart pacemaker

I then hit the issue again.

There has been a lot of work/commits between corosync 3.0.3 and 3.1.0 and these two commits [1][2] make me think that corosync returning this error code is a new intended behavior. Our charm should just accept that error code and move on.

Also note that there seems to have been a change in the packager's default /etc/corosync/corosync.conf and they may have to be reflected in the charm's template. [3]

[0] https://github.com/openstack/charm-hacluster/blob/master/tests/bundles/groovy-victoria.yaml
[1] https://github.com/corosync/corosync/commit/9105d94a
[2] https://github.com/corosync/corosync/commit/0d0febbc
[3] https://github.com/openstack/charm-hacluster/blob/master/templates/corosync.conf