Cinder volumes not always attached to instance in order presented

Bug #1697580 reported by Kevin Lambright
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Description
===========

Our application require a number of Cinder volumes to be attached to the Nova instance. They always need to be attached in the same order, the order matter to the software defined storage application. How they are presented in our application determines what "disk" the Cinder volume becomes in the application (our software defined storage VM has boot, root, coredump, data disks, etc.)

We use the OpenStack API to create the resources (Cinder, Neutron, Nova, etc.) and attach them with a Nova server create call.

Most of the time the volumes are attached in the correct order, but about 1 out of 10 times, the order of the volumes as they are presented in the nova API call (Python list) is not preserved. This causes our SDS VM to fail booting because it does not get the disks it expects in the correct order.

Most VMs do not care about the order in which the cinder volumes are presented in the VM, in our case it is significant.

Steps to reproduce
==================

This has been done using the OpenStack API, which is the best way to programmatically reproduce the problem, but could likely be done with OS CLI as well.

1. Create a number of Cinder volumes in a way which they can be uniquely identified in the VM instance (different sizes, etc.
2. Attach volumes to Nova instance and boot.
3. repeat steps 1 & 2 enough times, and the cinder volumes will be attached to the nova instance in a different order than what was specified. This can be verified by checking the libvirt XML file that is generated by nova (virsh dumpxml <domain name>).

Expected result
===============

Given an ordered list of Cinder volumes to be attached to a nova instance, the expected result is that they are attached in the specified order every time.

Actual result
=============

Most times the expected result is true, about 1 out of 10 times, the order the volumes are attached to the nova instance is not what is expected.

Environment
===========

RedHat RDO - Liberty

openstack-nova-compute-12.0.4-1.el7.noarch

1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

2. Which hypervisor did you use?

   Libvirt + KVM

libvirt-daemon-2.0.0-10.el7_3.5.x86_64
libvirt-daemon-2.0.0-10.el7_3.5.x86_64

qemu-kvm-common-rhev-2.6.0-28.el7_3.9.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.5.x86_64

2. Which storage type did you use?

Cinder NFS driver for Netapp FAS; Cinder iSCSI driver for SolidFire

3. Which networking type did you use?

Neutron with openvswitch.

Logs & Configs
==============

N/A

Revision history for this message
Matt Riedemann (mriedem) wrote :

Are you waiting for each volume to show up as "in-use" before attaching another one?

Are you passing in the device_name field and expecting that to be honored when attaching the volume? Because with the libvirt driver it won't be:

https://developer.openstack.org/api-ref/compute/?expanded=attach-a-volume-to-an-instance-detail#attach-a-volume-to-an-instance

"Name of the device such as, /dev/vdb. Omit or set this parameter to null for auto-assignment, if supported. If you specify this parameter, the device must not exist in the guest operating system. Note that as of the 12.0.0 Liberty release, the Nova libvirt driver no longer honors a user-supplied device name. This is the same behavior as if the device name parameter is not supplied on the request."

Also see: https://review.openstack.org/#/c/452546/

tags: added: volumes
tags: added: libvirt
Revision history for this message
John Griffith (john-griffith) wrote :

I was wondering if you could provide details on the calls you use here? My earlier suggestion (haven't heard anything back) was to use auto-assign, then query with cinder-show to get the device assignment and map them on the Instance via that device (ie /dev/vdb, /dev/vdc...).

If you avoid auto-assign, and just let Nova/Libvirt do what it does; the device entry in the Cinder volume object should be correct/valid.

Most importantly though, how are you creating these? Heat Template? nova boot --block-device-mapping...., nova volume-attach * 10 ????

Sean Dague (sdague)
Changed in nova:
status: New → Incomplete
Revision history for this message
Kevin Lambright (kflambright) wrote :

All volumes are attached using block device mapping.

I am not passing in any device name, just letting it do auto-assignment. In the software defined storage application, it's not expecting specific device names, rather it relies on the order in which they show up on first boot, which is one of the reasons that we can't boot and then attach the volumes after the fact.

>> Are you waiting for each volume to show up as "in-use" before attaching another one?

The volumes are not explicitly attached to the instance, they are created, we wait until they are in the active state, and they are put in order into a Python list.

The call to boot the server looks like this:

 server = self.nova.servers.create(name=self.name,
                                          image=self.boot_image_base,
                                          flavor=self.flavor,
                                          config_drive=self.config_drive,
       userdata=self.userdata_fp,
                                          block_device_mapping_v2=self.block_device_list,
                                   nics=self.nic_device_list)

self.block_device_list is the list of volumes that we want mapped to the instance, in a very specific order. As stated in the original message, something on the order of 9 times out of 10 that order is preserved.

BTW, we do have a HEAT template - we don't use it for our internal deployments, but it is something that we could give out to customers. I'm doing a bunch of deployments with the HEAT templates to see if I can get it to fail in the same way.

Let me know what other information should be provided.

Revision history for this message
Kevin Lambright (kflambright) wrote :

Since I've added more information, I'm wondering if there's any update to this, or anything else I need to provide - other things I should try?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.