Comment 4 for bug 1827690

Revision history for this message
Márton Kiss (marton-kiss) wrote :

After additional debug and research I can confirm the issue is the result of a race-condition situation when multiple barbican-worker units trying to populate the database with schema data simultaneously and upgrade the alembic version.

As a result the barbican-worker service will stop because it is trying to recreate tables already created by other worker units.

I applied the following manual steps as workaround:

1, stop all barbican-workers, stop jujud agent
2, drop db tables
3, clean up alembic db change states if present: rm -rf /usr/lib/python3/dist-packages/barbican/model/migration/alembic_migrations/versions/*_db_change.py files
4, start barbican-worker on a single unit (populates the db tables to head revisions)
5, restart all barbican-workers, start jujud agent

as a result each units must be on the same db revision:
$ juju run --application barbican 'barbican-db-manage current -V | grep Revision'
- Stdout: |2
        Revision ID: 39cf2e645cba
  UnitId: barbican/0
- Stdout: |2
        Revision ID: 39cf2e645cba
  UnitId: barbican/1
- Stdout: |2
        Revision ID: 39cf2e645cba
  UnitId: barbican/2

The root cause of the problem is the default implementation of barbican-worker services, because they are populating the data after service start, this can lead to the race condition situation.

A proper permanent charm fix should be:
1, As [1] mentions in the Install and configure components / 3. Populate the Key Manager service database section, the charm should set the db_auto_create to false in the /etc/barbican/barbican.conf file.
2, the leader charm populates the database
3, start the barbican-worker services after the leader finished with db schema upgrade.

[1] https://docs.openstack.org/barbican/stein/install/install-ubuntu.html