creating many volumes results in a few volumes failing to create and be ready for instances

Bug #1957848 reported by Syed Mohammad Adnan Karim
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
In Progress
Medium
Unassigned

Bug Description

In a focal-ussuri cloud when I created 100 cirros tiny instances and I had 3 failed instances.
The cinder-volume logs show (https://pastebin.canonical.com/p/n55W56mj5j/):
|__Flow 'volume_create_manager': cinder.exception.ImageCopyFailure: Failed to copy image to volume: The device in the path /dev/dm-25 is unavailable: Unable to access the backend storage via the path /dev/dm-25.
nova-logs show the typical too many retries error and nothing in glance

the other instance/volume failures have a similar message with a different value (dm-21 and dm-28)
there is no CEPH only pure storage and glance is a single instance with local file storage in this environment.

when I turned on debug, it revealed a new error around 6 minutes before the failed instance/volume creation:
https://pastebin.canonical.com/p/scbyhXJsHW/
ERROR oslo_service.periodic_task oslo_messaging.exceptions.MessageDeliveryFailure: Unable to connect to AMQP server on 10.101.222.177:5672 after inf tries: Basic.publish: (404) NO
T_FOUND - no exchange 'cinder-scheduler_fanout' in vhost 'openstack'
some kind of rabbitmq error
I was not able to reproduce this error message though

Also when I tried to create 50 focal small instances, I had 4 failed instances and also found this error (https://pastebin.canonical.com/p/GpDtKHB88T/):
os_brick.exception.VolumePathNotRemoved: Volume path ['/dev/sdak'] was not removed in time.
could be related to os-brick

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Okay, after some further investigation - the pure storage backend used in this scenario should not be using the slow path volume copy when an instance is booted from an image backed by a cinder volume.

The slow path is taken for backends that cannot do an efficient clone (pure storage can), but only if allowed_direct_url_schemes in the cinder.conf is set and includes the 'cinder' label [0] - which it does not by default and that logic was not added as part of the Cinder backend for Glance enablement. I do not think it hurts to include this option by default.

[0] - https://opendev.org/openstack/cinder/src/branch/stable/yoga/cinder/volume/flows/manager/create_volume.py#L1077

        # Try and clone the image if we have it set as a glance location.
        if not cloned and 'cinder' in CONF.allowed_direct_url_schemes:
            model_update, cloned = self._clone_image_volume(context,
                                                            volume,
                                                            image_location,
                                                            image_meta)

tags: added: good-first-bug
Changed in charm-cinder:
status: New → Triaged
importance: Undecided → Medium
Nishant Dash (dash3)
Changed in charm-cinder:
assignee: nobody → Nishant Dash (dash3)
status: Triaged → In Progress
Nishant Dash (dash3)
Changed in charm-cinder:
assignee: Nishant Dash (dash3) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.