[19.04][stein] barbican-worker is down: Requested revision 1a0c2cdafb38 overlaps with other requested revisions 39cf2e645cba

Bug #1827690 reported by Dmitrii Shcherbakov on 2019-05-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Barbican Charm
Critical
James Page
barbican (Ubuntu)
Undecided
Unassigned

Bug Description

After deploying barbican (stein, 19.04 charms) I got into a situation where both workers (two-node HA) are down due to a failed alembic migration:

https://private-fileshare.canonical.com/~dima/charm-dumps/04-05-2019-barbican-0-var-log.tar.gz
https://private-fileshare.canonical.com/~dima/charm-dumps/04-05-2019-barbican-1-var-log.tar.gz

https://paste.ubuntu.com/p/J6GvwkXMWc/ - bundle.yaml

May 03 20:51:11 juju-35b20e-3-lxd-0 systemd[1]: Started OpenStack Barbican Key Management Workers.
May 03 20:51:39 juju-35b20e-3-lxd-0 systemd[1]: barbican-worker.service: Main process exited, code=exited, status=1/FAILURE
May 03 20:51:39 juju-35b20e-3-lxd-0 systemd[1]: barbican-worker.service: Failed with result 'exit-code'.

juju status barbican
Model Controller Cloud/Region Version SLA Timestamp
openstack samaas samaas 2.6-rc1 unsupported 23:27:23Z

App Version Status Scale Charm Store Rev OS Notes
barbican 8.0.0 blocked 2 barbican jujucharms 18 ubuntu
barbican-vault 1.2.2 active 2 barbican-vault jujucharms 2 ubuntu
hacluster-barbican active 2 hacluster jujucharms 54 ubuntu

Unit Workload Agent Machine Public address Ports Message
barbican/0 blocked idle 3/lxd/0 10.232.46.135 9311/tcp,9312/tcp Services not running that should be: barbican-worker
  barbican-vault/1 active idle 10.232.46.135 Unit is ready
  hacluster-barbican/1 active idle 10.232.46.135 Unit is ready and clustered
barbican/1* blocked idle 2/lxd/0 10.232.46.130 9311/tcp,9312/tcp Services not running that should be: barbican-worker
  barbican-vault/0* active idle 10.232.46.130 Unit is ready
  hacluster-barbican/0* active idle 10.232.46.130 Unit is ready and clustered

https://paste.ubuntu.com/p/BS3fHw287r/
2019-05-03 21:13:46.582 115638 ERROR barbican alembic.util.exc.CommandError: Requested revision 1a0c2cdafb38 overlaps with other requested revisions 39cf2e645cba

mysql> use barbican;
mysql> select * from alembic_version;
+--------------+
| version_num |
+--------------+
| 1a0c2cdafb38 |
| 39cf2e645cba |
+--------------+
2 rows in set (0.00 sec)

description: updated
Dmitrii Shcherbakov (dmitriis) wrote :

Added the UCA package as well as I am not exactly sure where the root cause is.

The logs attached above point to a synchronization issue in the charm code because it looks to be that both units have similar errors but "Table already exists" messages appear for different databases in different units. Also barbican-worker logs for both units contain messages about model migrations which is problematic as model migrations are not idempotent.

barbican/0:

2019-05-03 20:51:24.375 78559 INFO barbican.model.repositories [-] Auto-creating barbican registry DB
2019-05-03 20:51:38.930 78559 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, "Table 'kek_data' already exists") [SQL: '\nCREATE TABLE kek_data (\n\tid VARCHAR(36) NOT NULL, \n\tcreated_at DATETIME NOT NULL, \n\tupdated_at DATETIME NOT NULL, \n\tdeleted_at DATETIME, \n\tdeleted BOOL NOT NULL, \n\tstatus VARCHAR(20) NOT NULL, \n\tplugin_name VARCHAR(255) NOT NULL, \n\tkek_label VARCHAR(255), \n\tproject_id VARCHAR(36) NOT NULL, \n\tactive BOOL NOT NULL, \n\tbind_completed BOOL NOT NULL, \n\talgorithm VARCHAR(255), \n\tbit_length INTEGER, \n\tmode VARCHAR(255), \n\tplugin_meta TEXT, \n\tPRIMARY KEY (id), \n\tCHECK (deleted IN (0, 1)), \n\tCONSTRAINT kek_data_project_fk FOREIGN KEY(project_id) REFERENCES projects (id), \n\tCHECK (active IN (0, 1)), \n\tCHECK (bind_completed IN (0, 1))\n)ENGINE=InnoDB\n\n'] (Background on this error at: http://sqlalche.me/e/2j85): pymysql.err.InternalError: (1050, "Table 'kek_data' already exists")

barbican/1:

2019-05-03 20:51:08.942 96228 INFO barbican.model.repositories [-] Auto-creating barbican registry DB
2019-05-03 20:51:41.095 96228 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, "Table 'secret_store_metadata' already exists") [SQL: '\nCREATE TABLE secret_store_metadata (\n\tid VARCHAR(36) NOT NULL, \n\tcreated_at DATETIME NOT NULL, \n\tupdated_at DATETIME NOT NULL, \n\tdeleted_at DATETIME, \n\tdeleted BOOL NOT NULL, \n\tstatus VARCHAR(20) NOT NULL, \n\t`key` VARCHAR(255) NOT NULL, \n\tvalue VARCHAR(255) NOT NULL, \n\tsecret_id VARCHAR(36) NOT NULL, \n\tPRIMARY KEY (id), \n\tCHECK (deleted IN (0, 1)), \n\tFOREIGN KEY(secret_id) REFERENCES secrets (id)\n)ENGINE=InnoDB\n\n'] (Background on this error at: http://sqlalche.me/e/2j85): pymysql.err.InternalError: (1050, "Table 'secret_store_metadata' already exists")

description: updated
David Ames (thedac) wrote :

TRIAGE:

I suspect we are not limiting the db migration to only the leader.
Guarantee only the leader runs the migration.

Changed in charm-barbican:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 19.07
James Page (james-page) on 2019-07-01
Changed in charm-barbican:
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
David Ames (thedac) on 2019-08-12
Changed in charm-barbican:
milestone: 19.07 → 19.10
David Ames (thedac) on 2019-10-24
Changed in charm-barbican:
milestone: 19.10 → 20.01
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers