nova libvirt re-write broken with mulitiple ephemeral disks

Bug #1305423 reported by Ken Schroeder
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Thang Pham
Icehouse
Fix Released
High
Martin Falatic

Bug Description

Seem to be experiencing a bug with libvirt.xml device formatting when --ephemeral flag is used after initial booth and then use of nova stop/start or nova reboot --hard. We are using following libvirt options in nova.conf for storage:
libvirt_images_type=lvm
libvirt_images_volume_group=vglocal

When normally using nova boot with a flavor that has ephemeral defined it create two LVM volumes appropriatly ex.
instance-0000077e_disk
instance-0000077e_disk.local

The instance libvirt.xml contains disk devices entry as follows:
<devices>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-0000077e_disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-0000077e_disk.local"/>
      <target bus="virtio" dev="vdb"/>
    </disk>

If we use "nova boot --flavor 757c75fa-0b6d-4d4f-a128-27813009bff4 --image caa978e0-acae-4205-a4a4-2cf159c166fd --nic net-id=44f2fb0b-0a7a-475c-8fff-54cd4b37958b --ephemeral size=1 --ephemeral size=1 localdisk-1" the LVM disks for ephemeral goes through enumeration logic whether there is one or more --ephemeral options
 instance-000007ed_disk
 instance-000007ed_disk.eph0
 instance-000007ed_disk.eph1

The instance libvirt.xml after instance spawn has disk device entries like below and the instances happily boots.
 <devices>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk.eph0"/>
      <target bus="virtio" dev="vdb"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk.eph1"/>
      <target bus="virtio" dev="vdc"/>
    </disk>

If nova stop/start or nova reboot --hard is executed the instance is destroyed and libvirt.xml gets recreated. At this stage whatever values we passed with --ephemeral are not respected and libvirt.xml revirts to configuration that would have been generated without the use of the --ephemeral option like below where we only have one extra disk and it is not using the enumerated naming.
  <devices>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk.local"/>
      <target bus="virtio" dev="vdb"/>
    </disk>

This causes instances booting to fail at this stage. The nova block_device_mapping table has records for all 3 devices.

Tracy Jones (tjones-i)
tags: added: libvirt
Revision history for this message
Kevin Bringard (kbringard) wrote :

In digging into this some more, it looks like the issue may be that block_device_info is set to none in compute/api.py:

compute/api.py: self.compute_rpcapi.reboot_instance(context, instance=instance,
compute/api.py- block_device_info=None,
compute/api.py- reboot_type=reboot_type)

It's then dutifully passed along to the message queue:

compute/rpcapi.py: def reboot_instance(self, ctxt, instance, block_device_info,
compute/rpcapi.py- reboot_type):
compute/rpcapi.py- if not self.client.can_send_version('2.32'):
--
compute/rpcapi.py: cctxt.cast(ctxt, 'reboot_instance',
compute/rpcapi.py- instance=instance,
compute/rpcapi.py- block_device_info=block_device_info,

I don't see any code in the reboot methods which calls get_instance_bdms, which would imply the block_device_info is never populated when reboot is called.

I believe it works when the instance is first booted because block_device_info is populated from the API call, but when reboot gets to has_default_ephemeral in libvirt/blockinfo.py, this conditional fails:

if (instance['ephemeral_gb'] <= 0) or ephemerals:

Because ephemerals is empty and ephemeral_gb is > 0.

I'm guessing we need to do a _get_bdm_image_metadata somewhere in the reboot method to make sure we populate "ephemerals", but I don't know enough about this code to know if I'm being completely crazy or not.

Revision history for this message
Kevin Bringard (kbringard) wrote :

So I spent some more time digging into this, and it looks like, in compute/manager.py, reboot_instance is calling:

2126 """Reboot an instance on this host."""
2127 context = context.elevated()
2128 LOG.audit(_("Rebooting instance"), context=context, instance=instance)
2129
2130 block_device_info = self._get_instance_volume_block_device_info(
2131 context, instance)
2132

However, self._get_instance_volume_block_device_info is returning an empty array.

I dug a bit further into this, and discovered that in _get_volume_bdms we only return volumes with a volume_id

 def _get_volume_bdms(self, bdms, legacy=True):
        """Return only bdms that have a volume_id."""
        if legacy:
            return [bdm for bdm in bdms if bdm['volume_id']]
        else:
            return [bdm for bdm in bdms
                    if bdm['destination_type'] == 'volume']

However, ephemeral volumes don't have a volume_id:

mysql> select volume_id,device_name,volume_size from block_device_mapping where instance_uuid = 'b70670f5-0c10-4fca-9e5e-e3cbabeddbf2' \G
*************************** 1. row ***************************
  volume_id: NULL
device_name: /dev/vda
volume_size: NULL
*************************** 2. row ***************************
  volume_id: NULL
device_name: /dev/vdb
volume_size: 1
*************************** 3. row ***************************
  volume_id: NULL
device_name: /dev/vdc
volume_size: 1
3 rows in set (0.00 sec)

This makes sense, because these aren't cinder volumes, but are instead ephemeral volumes, and don't have a volume_id.

I'm not entirely certain what the fix would be here, but it seems like we'd need to add logic to the _get_volume_bdms method to account for destination_type: local and/or volume_id is NULL.

Revision history for this message
Kevin Bringard (kbringard) wrote :

So I think we probably need to add an entirely new method which returns local ephemeral "volumes". It looks like the existing code returns cinder volumes well, but isn't geared toward non-cinder volumes.

Will have to go though and look at the data structures to see what is a good way to differentiate them and then what information we need to pass back to the reboot method to make sure they recreated in the same way they do during a nova boot.

Solly Ross (sross-7)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Thang Pham (thang-pham)
Changed in nova:
assignee: nobody → Thang Pham (thang-pham)
Revision history for this message
Thang Pham (thang-pham) wrote :

The solution is actually not too bad. If you look at nova/compute/manager.py _prep_block_device(), the block_device_info dictionary that it returns contains swap, ephemerals, and block_device_mapping keys. Since _get_instance_volume_block_device_info is used to retrieve the block_device_info on reboot, we just need to modify _get_instance_volume_block_device_info to return swap and ephemerals in addition to the block_device_mapping it already returns today. Or we could create another method based on _get_instance_volume_block_device_info to return swap, ephemerals, and block_device_mapping. I will get a patch submitted for this soon. We will see if we can merge it into _get_instance_volume_block_device_info or create another method.

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/90583

John Dewey (retr0h)
tags: added: havana-backport-potential
Jay Pipes (jaypipes)
tags: added: icehouse-backport-potential
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/90583
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f087a6f77ef1338bb8d10943d2a18712220c3c44
Submitter: Jenkins
Branch: master

commit f087a6f77ef1338bb8d10943d2a18712220c3c44
Author: Thang Pham <email address hidden>
Date: Sun Apr 27 00:28:35 2014 -0400

    Update block_device_info to contain swap and ephemeral disks

    An ephemeral or swap disk is attached to an instance on boot
    as follows: nova boot --flavor FLAVOR --image IMAGE_ID
    --swap 512 --ephemeral size=2 INSTANCE. When a hard reboot is
    performed on the instance, nova fails to recreate the
    appropriate libvirt XML definition, containing the ephemeral
    disk. This is because the correct block_device_info dict that
    is passed to the compute manager's reboot_instance method does
    not contain swap or ephemeral disk key values that are necessary
    to recreate those disks. In addition to nova boot, the correct
    block_device_info dict is also needed by nova rebuild, reboot,
    resize, and migrate to recreate those disks. This patch updates
    _get_instance_volume_block_device_info (renamed
    _get_instance_block_device_info) to return the swap and ephemeral
    disk key values, in addition to the block_device_mapping it
    already returns today.

    Change-Id: Iec329d1c12a48ea90ba9d57decd0996fde6544f0
    Closes-Bug: #1305423

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/100362

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/100362
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b5913892869ba98fecd4f8d597eb55defee905e7
Submitter: Jenkins
Branch: stable/icehouse

commit b5913892869ba98fecd4f8d597eb55defee905e7
Author: Thang Pham <email address hidden>
Date: Sun Apr 27 00:28:35 2014 -0400

    Update block_device_info to contain swap and ephemeral disks

    An ephemeral or swap disk is attached to an instance on boot
    as follows: nova boot --flavor FLAVOR --image IMAGE_ID
    --swap 512 --ephemeral size=2 INSTANCE. When a hard reboot is
    performed on the instance, nova fails to recreate the
    appropriate libvirt XML definition, containing the ephemeral
    disk. This is because the correct block_device_info dict that
    is passed to the compute manager's reboot_instance method does
    not contain swap or ephemeral disk key values that are necessary
    to recreate those disks. In addition to nova boot, the correct
    block_device_info dict is also needed by nova rebuild, reboot,
    resize, and migrate to recreate those disks. This patch updates
    _get_instance_volume_block_device_info (renamed
    _get_instance_block_device_info) to return the swap and ephemeral
    disk key values, in addition to the block_device_mapping it
    already returns today.

    Change-Id: Iec329d1c12a48ea90ba9d57decd0996fde6544f0
    Closes-Bug: #1305423
    (cherry picked from commit f087a6f77ef1338bb8d10943d2a18712220c3c44)

Jian Wen (wenjianhn)
tags: removed: icehouse-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.