Occasional deadlock during db_sync --contract during Newton to Pike live upgrade
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Identity (keystone) |
Expired
|
Medium
|
Unassigned |
Bug Description
We are testing upgrade from Newton to Pike (skip Ocata). The Newton system has two controllers. On controller0 (active) we upgrade controller1 to Pike. Then we swact to make controller1 active and controller0 standby, then we upgrade controller0 to Pike. After that we upgrade the other nodes to Pike.
Now with all the nodes running Pike and controller1 as active, we finalize the upgrade by running "keystone-manage db_sync --contract". Occasionally the contract failed with a keystone DB deadlock.
The log from postgres during the deadlock:
-------
2018-02-
2018-02-
2018-02-
2018-02-
2018-02-
2018-02-
2018-02-
2018-02-
2018-02-
2018-02-
The log from keystone during the deadlock:
-------
keystone:log 2018-02-22 21:26:47.447 76427 CRITICAL keystone [-] Unhandled error: DbMigrationError: (psycopg2.
DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
HINT: See server log for query details.
[SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_
2018-02-22 21:26:47.447 76427 ERROR keystone Traceback (most recent call last):
2018-02-22 21:26:47.447 76427 ERROR keystone File "/bin/keystone-
2018-02-22 21:26:47.447 76427 ERROR keystone sys.exit(main())
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/
2018-02-22 21:26:47.447 76427 ERROR keystone cli.main(
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/
2018-02-22 21:26:47.447 76427 ERROR keystone CONF.command.
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/
2018-02-22 21:26:47.447 76427 ERROR keystone upgrades.
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/
2018-02-22 21:26:47.447 76427 ERROR keystone _sync_repo(
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/
2018-02-22 21:26:47.447 76427 ERROR keystone init_version=
2018-02-22 21:26:47.447 76427 ERROR keystone File "/usr/lib/
2018-02-22 21:26:47.447 76427 ERROR keystone raise exception.
2018-02-22 21:26:47.447 76427 ERROR keystone DbMigrationError: (psycopg2.
2018-02-22 21:26:47.447 76427 ERROR keystone DETAIL: Process 76959 waits for AccessExclusiveLock on relation 17886 of database 16401; blocked by process 76955.
2018-02-22 21:26:47.447 76427 ERROR keystone Process 76955 waits for AccessShareLock on relation 17776 of database 16401; blocked by process 76959.
2018-02-22 21:26:47.447 76427 ERROR keystone HINT: See server log for query details.
2018-02-22 21:26:47.447 76427 ERROR keystone [SQL: 'ALTER TABLE local_user ADD CONSTRAINT local_user_
2018-02-22 21:26:47.447 76427 ERROR keystone
Looks like the two processes involves are the keystone-management command and a query from a openstack service. The keystone tables involved are "user" and "local_user". But we do see the deadlock happened on nonlocal_user table as well.
When re-run keystone-manage db_sync --contract, it would failed and complaining:
-------
2018-02-
2018-02-
This seems to be because the previous failed (deadlock) run successfully created the unique constraint in step 014 of contract_repo, just before the deadlock happens.
If the deadlock can't be fixed completely, making db_sync --contract re-runnable seems to be a reasonable solution as well.
description: | updated |
Changed in keystone: | |
status: | New → Confirmed |
importance: | Undecided → High |
We had another user report this behavior with MySQL today [0].
[0] http:// eavesdrop. openstack. org/irclogs/ %23openstack- keystone/ %23openstack- keystone. 2018-03- 27.log. html#t2018- 03-27T13: 04:13