Comment 13 for bug 1928383

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Okay, I've reproduced the issue. Essentially, I used a small deployment script:

Every 5.0s: timeout 4 juju status -m cinder --color Wed May 11 10:20:12 2022

Model Controller Cloud/Region Version SLA Timestamp
cinder tinwood2-serverstack serverstack/serverstack 2.9.27 unsupported 10:20:12Z

App Version Status Scale Charm Channel Rev Exposed Message
cinder 19.0.0 active 3 cinder stable 530 no Unit is ready
keystone 17.0.1 active 1 keystone stable 539 no Application Ready
percona-cluster 5.7.20 active 1 percona-cluster stable 302 no Unit is ready
rabbitmq-server 3.8.2 active 1 rabbitmq-server stable 123 no Unit is ready

Unit Workload Agent Machine Public address Ports Message
cinder/0* active executing 3 10.5.3.43 8776/tcp Unit is ready
cinder/1 active executing 4 10.5.2.251 8776/tcp Unit is ready
cinder/2 active executing 5 10.5.2.67 8776/tcp Unit is ready
keystone/0* active idle 0 10.5.1.134 5000/tcp Unit is ready
percona-cluster/0* active idle 1 10.5.3.32 3306/tcp Unit is ready
rabbitmq-server/0* active idle 2 10.5.3.182 5672/tcp Unit is ready

- I started it at focal/distro for cinder and keystone.
- I then forced a leadership election to move the leader to a different unit (e.g. 0 -> 1).
- I then did an upgrade from distro (ussuri) -> victoria on cinder.
- Then I forced another leadership election from 1-> 0
- I did another upgrade (victoria -> wallaby) and it was okay.
- I then forced another leadership election to get it to 2.
- I then did an upgrade from wallaby -> xena and triggered the issue.

The show unit for the 3 devices shows that each one has bean the leader and 'done' the upgrade:

      cinder/0:
        ...
        cinder-db-initialised: cinder/0-c19dc67e-ee4c-4753-9868-be0e8efa36da
        cinder-db-initialised-echo: cinder/1-9717e388-8b09-4976-9f0f-4690ee1203f2
      cinder/1:
        ...
          cinder-db-initialised: cinder/1-9717e388-8b09-4976-9f0f-4690ee1203f2
          cinder-db-initialised-echo: cinder/0-c19dc67e-ee4c-4753-9868-be0e8efa36da
      cinder/2:
        ...
          cinder-db-initialised: cinder/2-71063595-9742-4950-bad6-6a1a8a5a8ab1
          cinder-db-initialised-echo: cinder/1-9717e388-8b09-4976-9f0f-4690ee1203f2

i.e. cinder-db-initialised for each unit is that unit's own id with a UUID.

However, as Drew in the comments says, it the cinder-db-initialised-echo keeps bouncing around the units. In the above case, two agree (but this will change with the next hook).

The code in question is:

def check_local_db_actions_complete():
    """Check if we have received db init'd notification and restart services
    if we have not already.

    NOTE: this must only be called from peer relation context.
    """
    if not is_db_initialised():
        return

    settings = relation_get() or {}
    if settings:
        init_id = settings.get(CINDER_DB_INIT_RKEY)
        echoed_init_id = relation_get(unit=local_unit(),
                                      attribute=CINDER_DB_INIT_ECHO_RKEY)

        # If we have received an init notification from a peer unit
        # (assumed to be the leader) then restart cinder-* and echo the
        # notification and don't restart again unless we receive a new
        # (different) notification.
        if is_new_dbinit_notification(init_id, echoed_init_id):
            if not is_unit_paused_set():
                log("Restarting cinder services following db "
                    "initialisation", level=DEBUG)
                for svc in enabled_services():
                    service_restart(svc)

            # Echo notification
            relation_set(**{CINDER_DB_INIT_ECHO_RKEY: init_id})

What I think is happening is that the "init_id = settings.get(CINDER_DB_INIT_RKEY)" assignment is getting a different "cinder-db-initialised" depending on the unit.

I'll debug that and work out how to fix it.