Duplicate Cinder services DB entries after upgrade
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Triaged
|
Low
|
Unassigned |
Bug Description
Environment
===========
- SETUP: HA setup with 2 controllers.
- OS: Red Hat Enterprise Linux Server release 7.7 (Maipo)
- KERNEL: 3.10.0-
- DOCKER VERSION: 1.13.1
- DOCKER IMAGES: binary
- upgrade from Ocata to Rocky by using Rocky kolla-ansible
Description
===========
This issue was firstly reported in kolla-ansible project ( https:/
It seems that best way how to fix it will be using uniqueness constraints in DB. Which was proposed in this change some time ago: https:/
Unfortunately that change was abandoned. Can it be reconsidered?
Original ticket description:
During upgrade of OpenStack there is ansible part which configure cinder services and then it restarts their containers. In case of HA setup services are restarted on both controllers in parallel and it seems that there is some race condition, which lead to state, that sometimes can happen that cinder-backup or cinder-scheduler entry is added to DB two times. It seems like one service for each controller.
Upgrade finished successfully and openstack was fully functional, but doubled entry of cinder backup service was visible in openstack:
[root@osc1 softi-ops(
+------
| Binary | Host | Zone | Status | State | Updated At |
+------
| cinder-scheduler | 128.0.0.50 | nova | enabled | up | 2020-05-
| cinder-volume | 128.0.0.50@lvm | nova | enabled | up | 2020-05-
| cinder-backup | 128.0.0.50 | nova | enabled | up | 2020-05-
| cinder-backup | 128.0.0.50 | nova | enabled | down | 2020-05-
+------
And also in DB:
MariaDB [(none)]> select created_
+------
| created_at | updated_at | deleted_at | deleted | id | host | binary |
+------
| 2020-05-15 13:18:18 | 2020-05-15 14:46:25 | NULL | 0 | 2 | 128.0.0.50 | cinder-scheduler |
| 2020-05-15 13:18:20 | 2020-05-15 14:46:23 | NULL | 0 | 4 | 128.0.0.50@lvm | cinder-volume |
| 2020-05-15 13:18:20 | 2020-05-15 14:46:25 | NULL | 0 | 8 | 128.0.0.50 | cinder-backup |
| 2020-05-15 13:18:20 | 2020-05-15 13:23:51 | NULL | 0 | 10 | 128.0.0.50 | cinder-backup |
+------
From output of DB query above is visible that there are two entries of cinder-backup and both were created exactly in same time.
In kolla-ansible logs was found that few moments before that time was called handler for restart of cinder-backup containers. So it looks like race condition when both services are starting and connecting to DB.
Steps to reproduce
==================
Upgrade from Ocata to Rocky by using Rocky kolla-ansible. This issue is hard to reproduce because it happens only sometimes, but it was hit several times.
Expected result
===============
After upgrade only one entry for cinder-backup has to be present in DB.
The patch rejecting using a DB uniqueness constraint was rejected because A/A was not yet supported. It is now, so we should probably reconsider the general approach of https:/ /review. opendev. org/#/c/ 389049/ (there were some other objections to the patch).
I haven't been able to reproduce this locally, but am going ahead and marking 'triaged' based on yoctozepto's confirmation in https:/ /bugs.launchpad .net/kolla- ansible/ +bug/1889202/ comments/ 4