workers table has many volumes stuck with null service_id being NULL

Bug #2077172 reported by Walt Boring
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned

Bug Description

If cinder restarts or rabbitmq restarts/bounces before the volume is scheduled on a host a volume can be stuck in 'creating' and the workers table is left with an entry without a service_id. This makes the volume not cleanable and will never get out of creating status even on restarts.

The following is from our QA system where we redeploy cinder pods on kubernetes frequently to update
rabbitmq or cinder services.

These entries will NEVER get cleaned as the volume service requires the service_id to be set in order for the db query to find them at do_cleanup() time in the volume manager.

https://github.com/openstack/cinder/blob/master/cinder/manager.py#L241-L246

MariaDB root@127.0.0.1:cinder> select count(*), status from workers where deleted=0 and service_id is NULL group by status;
+----------+----------+
| count(*) | status |
+----------+----------+
| 1951 | creating |
| 211 | deleting |
| 1 | OK |
+----------+----------+

This is one of our production systems.

MariaDB root@127.0.0.1:cinder> select count(*), status from workers where deleted=0 and service_id is NULL group by status;
+----------+----------+
| count(*) | status |
+----------+----------+
| 292 | creating |
| 42 | deleting |
| 1 | OK |
+----------+----------+

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.