The region should verify database migrations on start

Bug #1644345 reported by Lee Trager
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Wishlist
Unassigned

Bug Description

MAAS runs a number of database migrations when the maas-region-controller package is upgraded on a system. When in HA mode it is currently possible to upgrade one instance of the region while keeping another instance at the older version. Another possibility is the database is backed up, a new installation with a newer version of MAAS is installed, and the old database is restored. This can cause really weird errors as the older version may not be compatible with the newer database migrations or new versions of MAAS are expecting database migrations to be applied which haven't.

We need the following checks to occur
* On region start the region should make sure only migrations which it knows about have been applied. If unknown migrations have been applied it should be logged and the region should fail to start. Its important that the systemd service fails to make this very clear.
* If one region is upgraded in HA mode the other regions should disconnect and log the mismatch error. Again its important the systemd service fails to make this very clear.
* If the regiond starts and notices some database migrations haven't been applied it should automatically apply them before starting.

Tags: ha performance
Changed in maas:
importance: Critical → Wishlist
milestone: none → 2.2.0
Revision history for this message
Gavin Panella (allenap) wrote :

As a starting point for an implementation, can I suggest the following
approach?

- Agree on a database lock number, say K.

- During start-up of each regiond a single database connection is
  established, in which:

  - A SHARED lock is taken on K.

  - The migration state is checked against what is expected.

  - If there's a mismatch, the lock on K is upgraded to an EXCLUSIVE
    lock.

  - The migration state is checked again, then:

    - If the database is behind on migrations, migrations are applied,
      then the lock on K is downgraded to a SHARED lock (or the
      connection is dropped).

    - If the database is ahead on migrations, regiond logs an error,
      then EXITS.

    - If the database is now level on migrations the lock on K is
      downgraded to a SHARED lock (or the connection is dropped).

- In the main body of the application, as each new connection is opened,
  a SHARED lock is taken on K and it is not released until the
  connection is closed.

I may have it wrong, but the expected behaviour is:

- Migrations will only ever be applied while holding an EXCLUSIVE lock
  on K.

- Acquisition of a SHARED lock blocks/fails until an EXCLUSIVE lock is
  released. All connections hold at least a SHARED lock on K at all
  times, hence:

  (a) Migrations cannot run while there are other connections using the
      database.

  (b) Other connections cannot use the database while migrations are
      being applied.

  i.e. normal run-time use of the database and the application of
  migrations are mutually exclusive.

- Given a regiond running at migration level M, a newly started regiond
  expecting M+1 will block until the former goes away, apply migrations,
  then complete start-up.

  Consider an installation of MAAS with two region hosts, A and B. If
  MAAS is updated and restarted on A, the regionds on A will wait until
  those on B are stopped (presumably as part of the upgrade, but not
  necessarily). One of those regionds on A will then win the exclusive
  lock race and apply migrations while the others wait. Once it releases
  that exclusive lock all the regionds will finish starting up.

  If B was, say, only rebooted without upgrading MAAS, the regionds
  would find the database now to be ahead of their expectations, and
  thus exit. Their absence would be noted by service tracking and
  administrators would go and investigate.

As you can see migrations would no longer applied by packaging. This
makes good sense in a distributed system; a Debian package has a view
only of the local system.

Changed in maas:
milestone: 2.2.0 → 2.2.x
tags: added: performance
tags: added: ha
Changed in maas:
milestone: 2.2.x → next
Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Triaged → Invalid
Changed in maas:
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.