Volumes stuck in downloading state

Bug #1896769 reported by Herve Beraud
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.privsep
Fix Released
Undecided
Herve Beraud

Bug Description

Volumes stuck in downloading state

Description of problem:
Volume created from image were getting stuck in downloading status. When tried to create empty volumes it was created without any issue.

When tried to restart cinder-volume container stonith rebooted the controller and cinder-volume service reporting the following in pcs status:
~~~
 Docker container: openstack-cinder-volume [<HIDDEN>/openstack-cinder-volume-dellemc:pcmklatest]
   openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started controller-2 (UNCLEAN, disabled)

 Docker container: openstack-cinder-volume [<HIDDEN>/openstack-cinder-volume-dellemc:pcmklatest]
   openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): FAILED controller-2 (disabled)
~~~

When the controller were powered back on the cinder-volume service moved to another controller and it's working fine now.

Image caching is disabled & multipath for xfer parameter is set in the backend array section in cinder.conf
~~~
$ grep image_volume_cache <HIDDEN>controller-2/var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf
#image_volume_cache_enabled = false
#image_volume_cache_max_size_gb = 0
#image_volume_cache_max_count = 0
#image_volume_cache_enabled = false
#image_volume_cache_max_size_gb = 0
#image_volume_cache_max_count = 0

$ grep use_multipath_for_image_xfer <HIDDEN>controller-2/var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf
#use_multipath_for_image_xfer = false
use_multipath_for_image_xfer=True
#use_multipath_for_image_xfer = false
use_multipath_for_image_xfer = True
~~~

This seems to be a problem with privsep getting stuck on a request, and since privsep was serializing requests on OSP13 (queens) it won't be able to handle any new calls, so Cinder won't be able to run any other privileged command.

The solution would be to backport the privsep concurrency support to OSP13 (queens).
Support was added in OSP15 (stein) with patch https://review.opendev.org/#/c/593556/
That way even if a request gets stuck the following requests will be able to continue.

Version-Release number of selected component (if applicable):
- openstack-cinder-12.0.7-5
- oslo.privsep < 1.30.1 (rocky, queens)

How reproducible:
Not reproducible anymore, issue happened when volumes were created from images

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.privsep (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/753643

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.privsep (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/753644

Changed in oslo.privsep:
assignee: nobody → Herve Beraud (herveberaud)
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

Queens and Rocky are already EOL. The issue was fixed in Stein.

Changed in oslo.privsep:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.