OpenStack Compute (nova)

nova libvirt re-write broken with mulitiple ephemeral disks

Bug #1305423 reported by Ken Schroeder on 2014-04-10

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Thang Pham	OpenStack Compute (nova) 2014.2 "juno"
	Icehouse	Fix Released	High	Martin Falatic	OpenStack Compute (nova) 2014.1.3

Bug Description

Seem to be experiencing a bug with libvirt.xml device formatting when --ephemeral flag is used after initial booth and then use of nova stop/start or nova reboot --hard. We are using following libvirt options in nova.conf for storage:
libvirt_images_type=lvm
libvirt_images_volume_group=vglocal

When normally using nova boot with a flavor that has ephemeral defined it create two LVM volumes appropriatly ex.
instance-0000077e_disk
instance-0000077e_disk.local

The instance libvirt.xml contains disk devices entry as follows:
<devices>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-0000077e_disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-0000077e_disk.local"/>
      <target bus="virtio" dev="vdb"/>
    </disk>

If we use "nova boot --flavor 757c75fa-0b6d-4d4f-a128-27813009bff4 --image caa978e0-acae-4205-a4a4-2cf159c166fd --nic net-id=44f2fb0b-0a7a-475c-8fff-54cd4b37958b --ephemeral size=1 --ephemeral size=1 localdisk-1" the LVM disks for ephemeral goes through enumeration logic whether there is one or more --ephemeral options
instance-000007ed_disk
instance-000007ed_disk.eph0
instance-000007ed_disk.eph1

The instance libvirt.xml after instance spawn has disk device entries like below and the instances happily boots.
<devices>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk.eph0"/>
      <target bus="virtio" dev="vdb"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk.eph1"/>
      <target bus="virtio" dev="vdc"/>
    </disk>

If nova stop/start or nova reboot --hard is executed the instance is destroyed and libvirt.xml gets recreated. At this stage whatever values we passed with --ephemeral are not respected and libvirt.xml revirts to configuration that would have been generated without the use of the --ephemeral option like below where we only have one extra disk and it is not using the enumerated naming.
  <devices>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source dev="/dev/vgephemeral/instance-000007ed_disk.local"/>
      <target bus="virtio" dev="vdb"/>
    </disk>

This causes instances booting to fail at this stage. The nova block_device_mapping table has records for all 3 devices.

Tags:

Tracy Jones (tjones-i) on 2014-04-10

tags:

added: libvirt

Revision history for this message

Kevin Bringard (kbringard) wrote on 2014-04-10:

In digging into this some more, it looks like the issue may be that block_device_info is set to none in compute/api.py:

compute/api.py: self.compute_rpcapi.reboot_instance(context, instance=instance,
compute/api.py- block_device_info=None,
compute/api.py- reboot_type=reboot_type)

It's then dutifully passed along to the message queue:

compute/rpcapi.py: def reboot_instance(self, ctxt, instance, block_device_info,
compute/rpcapi.py- reboot_type):
compute/rpcapi.py- if not self.client.can_send_version('2.32'):
--
compute/rpcapi.py: cctxt.cast(ctxt, 'reboot_instance',
compute/rpcapi.py- instance=instance,
compute/rpcapi.py- block_device_info=block_device_info,

I don't see any code in the reboot methods which calls get_instance_bdms, which would imply the block_device_info is never populated when reboot is called.

I believe it works when the instance is first booted because block_device_info is populated from the API call, but when reboot gets to has_default_ephemeral in libvirt/blockinfo.py, this conditional fails:

if (instance['ephemeral_gb'] <= 0) or ephemerals:

Because ephemerals is empty and ephemeral_gb is > 0.

I'm guessing we need to do a _get_bdm_image_metadata somewhere in the reboot method to make sure we populate "ephemerals", but I don't know enough about this code to know if I'm being completely crazy or not.

Revision history for this message

Kevin Bringard (kbringard) wrote on 2014-04-11:

So I spent some more time digging into this, and it looks like, in compute/manager.py, reboot_instance is calling:

2126 """Reboot an instance on this host."""
2127 context = context.elevated()
2128 LOG.audit(_("Rebooting instance"), context=context, instance=instance)
2129
2130 block_device_info = self._get_instance_volume_block_device_info(
2131 context, instance)
2132

However, self._get_instance_volume_block_device_info is returning an empty array.

I dug a bit further into this, and discovered that in _get_volume_bdms we only return volumes with a volume_id

def _get_volume_bdms(self, bdms, legacy=True):
        """Return only bdms that have a volume_id."""
        if legacy:
            return [bdm for bdm in bdms if bdm['volume_id']]
        else:
            return [bdm for bdm in bdms
                    if bdm['destination_type'] == 'volume']

However, ephemeral volumes don't have a volume_id:

mysql> select volume_id,device_name,volume_size from block_device_mapping where instance_uuid = 'b70670f5-0c10-4fca-9e5e-e3cbabeddbf2' \G
*************************** 1. row ***************************
  volume_id: NULL
device_name: /dev/vda
volume_size: NULL
*************************** 2. row ***************************
  volume_id: NULL
device_name: /dev/vdb
volume_size: 1
*************************** 3. row ***************************
  volume_id: NULL
device_name: /dev/vdc
volume_size: 1
3 rows in set (0.00 sec)

This makes sense, because these aren't cinder volumes, but are instead ephemeral volumes, and don't have a volume_id.

I'm not entirely certain what the fix would be here, but it seems like we'd need to add logic to the _get_volume_bdms method to account for destination_type: local and/or volume_id is NULL.

Revision history for this message

Kevin Bringard (kbringard) wrote on 2014-04-11:

So I think we probably need to add an entirely new method which returns local ephemeral "volumes". It looks like the existing code returns cinder volumes well, but isn't geared toward non-cinder volumes.

Will have to go though and look at the data structures to see what is a good way to differentiate them and then what information we need to pass back to the reboot method to make sure they recreated in the same way they do during a nova boot.

Solly Ross (sross-7) on 2014-04-15

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High

Thang Pham (thang-pham) on 2014-04-26

Changed in nova:
assignee:	nobody → Thang Pham (thang-pham)

Revision history for this message

Thang Pham (thang-pham) wrote on 2014-04-26:

The solution is actually not too bad. If you look at nova/compute/manager.py _prep_block_device(), the block_device_info dictionary that it returns contains swap, ephemerals, and block_device_mapping keys. Since _get_instance_volume_block_device_info is used to retrieve the block_device_info on reboot, we just need to modify _get_instance_volume_block_device_info to return swap and ephemerals in addition to the block_device_mapping it already returns today. Or we could create another method based on _get_instance_volume_block_device_info to return swap, ephemerals, and block_device_mapping. I will get a patch submitted for this soon. We will see if we can merge it into _get_instance_volume_block_device_info or create another method.

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Openstack Gerrit (openstack-gerrit) wrote on 2014-04-27: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/90583

John Dewey (retr0h) on 2014-04-29

tags:

added: havana-backport-potential

Jay Pipes (jaypipes) on 2014-05-05

tags:

added: icehouse-backport-potential

Revision history for this message

Openstack Gerrit (openstack-gerrit) wrote on 2014-05-06: Fix merged to nova (master)

Reviewed: https://review.openstack.org/90583
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f087a6f77ef1338bb8d10943d2a18712220c3c44
Submitter: Jenkins
Branch: master

commit f087a6f77ef1338bb8d10943d2a18712220c3c44
Author: Thang Pham <email address hidden>
Date: Sun Apr 27 00:28:35 2014 -0400

Update block_device_info to contain swap and ephemeral disks

    An ephemeral or swap disk is attached to an instance on boot
    as follows: nova boot --flavor FLAVOR --image IMAGE_ID
    --swap 512 --ephemeral size=2 INSTANCE. When a hard reboot is
    performed on the instance, nova fails to recreate the
    appropriate libvirt XML definition, containing the ephemeral
    disk. This is because the correct block_device_info dict that
    is passed to the compute manager's reboot_instance method does
    not contain swap or ephemeral disk key values that are necessary
    to recreate those disks. In addition to nova boot, the correct
    block_device_info dict is also needed by nova rebuild, reboot,
    resize, and migrate to recreate those disks. This patch updates
    _get_instance_volume_block_device_info (renamed
    _get_instance_block_device_info) to return the swap and ephemeral
    disk key values, in addition to the block_device_mapping it
    already returns today.

Change-Id: Iec329d1c12a48ea90ba9d57decd0996fde6544f0
Closes-Bug: #1305423

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-06-11

Changed in nova:
milestone:	none → juno-1
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-16: Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/100362

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-24: Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/100362
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b5913892869ba98fecd4f8d597eb55defee905e7
Submitter: Jenkins
Branch: stable/icehouse

commit b5913892869ba98fecd4f8d597eb55defee905e7
Author: Thang Pham <email address hidden>
Date: Sun Apr 27 00:28:35 2014 -0400

Update block_device_info to contain swap and ephemeral disks

    Change-Id: Iec329d1c12a48ea90ba9d57decd0996fde6544f0
    Closes-Bug: #1305423
    (cherry picked from commit f087a6f77ef1338bb8d10943d2a18712220c3c44)

Jian Wen (wenjianhn) on 2014-09-29

tags:

removed: icehouse-backport-potential

Thierry Carrez (ttx) on 2014-10-16

Changed in nova:
milestone:	juno-1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1313107

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.