infinite cluster-relation-changed loop when upgrading 21.04 charm-cinder from bionic-queens to rocky

Bug #1928383 reported by Drew Freiberger on 2021-05-13
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack cinder charm
Undecided
Unassigned

Bug Description

I am performing a non-action-managed-upgrade of API charms for openstack from bionic distro (after a xenial-queens -> bionic series upgrade) on 21.04 openstack charms.

I have a 3 unit cinder deployment with typical hacluster and cinder-ceph subordinates.

When I run 'juju config cinder openstack-origin=cloud:bionic-rocky action-managed-upgrade=false', the upgrade appears to have updated all of the packages properly and config-changed succeeds. Then I witness a never-ending cascade effect of cluster-relation changed which appears to be coming from relation data changes from each of the units:

While one cluster-relation-changed is running, I captured the 'juju show-unit' of cinder/1 (one of the non-leaders) and the relation data for "cinder-db-initialised-echo" is constantly changing.

Here are two iterations of the data that diffed out from juju show-unit.
         cinder-db-initialised-echo: cinder/2-0cb907fd-aa11-4486-a471-e000a1f4e881
         cinder-db-initialised-echo: cinder/0-40bb8ddd-488c-4562-883b-10862fd62774
         cinder-db-initialised-echo: cinder/0-40bb8ddd-488c-4562-883b-10862fd62774

         cinder-db-initialised-echo: cinder/1-c1de47c8-dd95-43d6-8e6e-c5a11ef4d50f
         cinder-db-initialised-echo: cinder/2-0cb907fd-aa11-4486-a471-e000a1f4e881
         cinder-db-initialised-echo: cinder/1-c1de47c8-dd95-43d6-8e6e-c5a11ef4d50f

More context of this data from juju show-unit cinder/0:
    cinder/0:
      data:
        cinder-db-initialised: cinder/0-40bb8ddd-488c-4562-883b-10862fd62774
        cinder-db-initialised-echo: cinder/2-0cb907fd-aa11-4486-a471-e000a1f4e881
    related-units:
      cinder/1:
        in-scope: true
        data:
          cinder-db-initialised: cinder/1-c1de47c8-dd95-43d6-8e6e-c5a11ef4d50f
          cinder-db-initialised-echo: cinder/0-40bb8ddd-488c-4562-883b-10862fd62774
      cinder/2:
        in-scope: true
        data:
          cinder-db-initialised: cinder/2-0cb907fd-aa11-4486-a471-e000a1f4e881
          cinder-db-initialised-echo: cinder/0-40bb8ddd-488c-4562-883b-10862fd62774

It looks like there was a race that happened with the database initialisation.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

subscribing field-critical as this is actively causing a volume service outage on a production cloud.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Worked around this by hard-setting the relation data for this initialised variable as was defined from the leader unit:

$ for i in 0 1 2; do juju run -u cinder/$i -- relation-set -r cluster:4 cinder-db-initialised=cinder/2-0cb907fd-aa11-4486-a471-e000a1f4e881 cinder-db-initialised-echo=cinder/2-0cb907fd-aa11-4486-a471-e000a1f4e881; done

Revision history for this message
Drew Freiberger (afreiberger) wrote :
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I'm about to do this tomorrow with ServerStack from queens to rocky. I'm going to to an action-managed-upgrade to see/verify if it has the same behaviour. If not, then this might be the recommended work-around for cinder; I'll update here after my upgrade.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Since a work around is available, dropping this to field-high - though this remains an important bug.

@afreiberger is it possible to get some sanitized logs for this?

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

If I'm reading this correctly, this upgrade has missed one of the key pre-upgrade requirements(https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/upgrade-series.html#pre-upgrade-requirements): "All currently deployed charms should be upgraded to the latest stable charm revision." Over the last two years, many improvements have been made to the charms in regards to how they handle upgrading things, from the charms themselves, to OpenStack, to Ubuntu.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I completely failed to read the 21 bit in the description, apologies!

James Troup (elmo) on 2021-06-15
tags: added: openstack-upgrade
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers