RBD backed instance can't shutdown and restart

Bug #1245719 reported by Quenten Grasso
60
This bug affects 11 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Mike Perez
Havana
Fix Released
Undecided
Unassigned
OpenStack Dashboard (Horizon)
Invalid
Undecided
Unassigned
Ubuntu
Confirmed
Undecided
Unassigned

Bug Description

Version: Havana w/ Ubuntu Repos. with Ceph for RBD.

When creating Launching a instance with "Boot from image (Creates a new volume)" this creates the instance fine and all is well however if you shutdown the instance I can't turn it back on again.

I get the following error in the nova-compute.log when trying to power on an shutdown instance.

#######################################################################################
2013-10-29 00:48:33.859 2746 WARNING nova.compute.utils [req-89bbd72f-2280-4fac-802a-1211ec774980 27106b78ceac4e389558566857a7875f 464099f86eb94d049ed1f7b0f0144275] [instance: cc370f6d-4be0-4cd3-9f20-bf86f5ad7c09] Can't access image $
2013-10-29 00:48:34.040 2746 WARNING nova.virt.libvirt.vif [req-89bbd72f-2280-4fac-802a-1211ec774980 27106b78ceac4e389558566857a7875f 464099f86eb94d049ed1f7b0f0144275] Deprecated: The LibvirtHybridOVSBridgeDriver VIF driver is now de$
2013-10-29 00:48:34.578 2746 ERROR nova.openstack.common.rpc.amqp [req-89bbd72f-2280-4fac-802a-1211ec774980 27106b78ceac4e389558566857a7875f 464099f86eb94d049ed1f7b0f0144275] Exception during message handling
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp **args)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 353, in decorated_function
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 90, in wrapped
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp payload)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 73, in wrapped
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 243, in decorated_function
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp pass
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 229, in decorated_function
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 294, in decorated_function
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 271, in decorated_function
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 258, in decorated_function
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1832, in start_instance
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp self._power_on(context, instance)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1819, in _power_on
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp block_device_info)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1948, in power_on
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp self._hard_reboot(context, instance, network_info, block_device_info)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1903, in _hard_reboot
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp block_device_info)
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4318, in get_instance_disk_info
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp dk_size = int(os.path.getsize(path))
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp return os.stat(filename).st_size
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp OSError: [Errno 2] No such file or directory: '/var/lib/nova/instances/cc370f6d-4be0-4cd3-9f20-bf86f5ad7c09/disk'
2013-10-29 00:48:34.578 2746 TRACE nova.openstack.common.rpc.amqp
#######################################################################################

On Closer inspection It seems the libvirt.xml file for the instance gets all screwed up.

This is the libvirt.xml file for this instance before shutdown.

#######################################################################################
<domain type="kvm">
  <uuid>dc9749cd-0002-41c9-ac55-5f691637146a</uuid>
  <name>instance-00000004</name>
  <memory>524288</memory>
  <vcpu>1</vcpu>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">OpenStack Foundation</entry>
      <entry name="product">OpenStack Nova</entry>
      <entry name="version">2013.2</entry>
      <entry name="serial">4c4c4544-0053-3210-8032-b6c04f5a5931</entry>
      <entry name="uuid">dc9749cd-0002-41c9-ac55-5f691637146a</entry>
    </system>
  </sysinfo>
  <os>
    <type>hvm</type>
    <boot dev="hd"/>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <clock offset="utc">
    <timer name="pit" tickpolicy="delay"/>
    <timer name="rtc" tickpolicy="catchup"/>
  </clock>
  <cpu mode="host-model" match="exact"/>
  <devices>
    <disk type="network" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source protocol="rbd" name="volumes/volume-5abadeb8-49e9-4628-a54d-d742d6f2e012">
        <host name="10.100.96.10" port="6789"/>
        <host name="10.100.96.11" port="6789"/>
        <host name="10.100.96.12" port="6789"/>
      </source>
      <auth username="volumes">
        <secret type="ceph" uuid="13a673af-ff80-3036-8310-4c72f566673d"/>
      </auth>
      <target bus="virtio" dev="vda"/>
      <serial>5abadeb8-49e9-4628-a54d-d742d6f2e012</serial>
    </disk>
    <interface type="bridge">
      <mac address="fa:16:3e:45:17:d4"/>
      <model type="virtio"/>
      <source bridge="qbr125aa659-01"/>
      <target dev="tap125aa659-01"/>
    </interface>
    <serial type="file">
      <source path="/var/lib/nova/instances/dc9749cd-0002-41c9-ac55-5f691637146a/console.log"/>
    </serial>
    <serial type="pty"/>
    <input type="tablet" bus="usb"/>
    <graphics type="vnc" autoport="yes" keymap="en-us" listen="10.100.32.10"/>
  </devices>
</domain>
#######################################################################################

All looks fine here i can see the ceph mons etc.

Next i'll shutdown the instance and as expected the file stays the same now this is the file if i try to power the instance back on again.

you'll notice all of the details regarding the disk have changed and now it thinks its a qcow2 disk located on the local hard drive... how does this happen?

#######################################################################################

<domain type="kvm">
  <uuid>dc9749cd-0002-41c9-ac55-5f691637146a</uuid>
  <name>instance-00000004</name>
  <memory>524288</memory>
  <vcpu>1</vcpu>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">OpenStack Foundation</entry>
      <entry name="product">OpenStack Nova</entry>
      <entry name="version">2013.2</entry>
      <entry name="serial">4c4c4544-0053-3210-8032-b6c04f5a5931</entry>
      <entry name="uuid">dc9749cd-0002-41c9-ac55-5f691637146a</entry>
    </system>
  </sysinfo>
  <os>
    <type>hvm</type>
    <boot dev="hd"/>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <clock offset="utc">
    <timer name="pit" tickpolicy="delay"/>
    <timer name="rtc" tickpolicy="catchup"/>
  </clock>
  <cpu mode="host-model" match="exact"/>
  <devices>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="none"/>
      <source file="/var/lib/nova/instances/dc9749cd-0002-41c9-ac55-5f691637146a/disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <interface type="bridge">
      <mac address="fa:16:3e:45:17:d4"/>
      <model type="virtio"/>
      <source bridge="qbr125aa659-01"/>
      <target dev="tap125aa659-01"/>
    </interface>
    <serial type="file">
      <source path="/var/lib/nova/instances/dc9749cd-0002-41c9-ac55-5f691637146a/console.log"/>
    </serial>
    <serial type="pty"/>
    <input type="tablet" bus="usb"/>
    <graphics type="vnc" autoport="yes" keymap="en-us" listen="10.100.32.10"/>
  </devices>
</domain>

#######################################################################################

Revision history for this message
Quenten Grasso (qgrasso-d) wrote :

I've found if i create the instance via horizon using the "Launch Instance"

Using Instance Boot Source: Boot from image (creates a new volume)

Selecting Image etc,

when powered off and back on again the error happens every time.

However!

If I create the volume first and use a "Boot from Volume" Option and selecting the volume i precreated with the image,
the Instance it seems it can power off and on without any issues.

So could be a horizon issue in how its storing the data in sql?

tags: added: nova-manage
affects: nova → horizon
Revision history for this message
Josh Durgin (jdurgin) wrote :

The 'Boot from image (creates a new volume)' code path uses the new block_device_mapping_v2 api. This is most likely a bug in the implementation of this in nova that affects all volumes backends, not just rbd.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
Blane Bramble (blane) wrote :

This seems to be similar to Bug #1248695

Revision history for this message
Blane Bramble (blane) wrote :

I believe this is a problem with the source_type being set to image and the destination_type being set to volume which prevents the DriverVolumeBlockDevice._transform code from firing. A work around is attached that just detects the source_type as image and forces it to volume for the check - a better solution would be to not set it to image in the first place but I haven't tracked where this is done yet.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Work-around rather than fix" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Quenten Grasso (qgrasso-d) wrote :

I Tried to implement the patch from Blane however doesn't seem to work.

When trying to launch a instance it now fails when it try's to map the block device.

Any ideas?

Revision history for this message
Blane Bramble (blane) wrote :

Hi Quenten, what version are you installing, and what error do you get?

Revision history for this message
Quenten Grasso (qgrasso-d) wrote :

Hi Blane,

Sorry for my late reply, I'm using the ubuntu Havana with the latest from "updates" repo.

Q

Revision history for this message
Mike Perez (thingee) wrote :
Download full text (4.1 KiB)

I'm also having the same problem. Using RBD for both Cinder and Glance image backend. If I create an instance and select in horizon 'boot from image (create new volume)', soft/hard reboot fails later.

I get the following error from the compute manager:

2013-11-19 09:56:26.036 ERROR nova.openstack.common.rpc.amqp [req-b5d86ef0-f03d-4708-b94a-e94f30055b18 admin demo] Exception during message handling
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp **args)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 353, in decorated_function
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 90, in wrapped
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp payload)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 73, in wrapped
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 243, in decorated_function
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp pass
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 229, in decorated_function
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 294, in decorated_function
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 271, in decorated_function
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 258, in decorated_function
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 2121, in reboot_instance
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amqp block_device_info = self._get_instance_volume_block_device_info(
2013-11-19 09:56:26.036 TRACE nova.openstack.common.rpc.amq...

Read more...

Mike Perez (thingee)
Changed in nova:
status: New → Confirmed
status: Confirmed → In Progress
assignee: nobody → Mike Perez (thingee)
tags: added: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/57406

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

I've acked the fix on the master branch - we need this on stable/havana too. I can't seem to propose the bug, but will happily do the backport.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/57406
Committed: http://github.com/openstack/nova/commit/85d0ace169be513c30b09e13a35fa7d912f5b380
Submitter: Jenkins
Branch: master

commit 85d0ace169be513c30b09e13a35fa7d912f5b380
Author: Mike Perez <email address hidden>
Date: Wed Nov 20 01:22:38 2013 -0800

    Include image block device maps in info

    In cases for example soft/hard reboot, we were just getting volume and
    snapshot block device maps. For block devices with a source type image,
    Nova would fall back on a local disk location and fail.

    This change allows us to include block devices with a source of an
    image, so that we correctly get the right the location of the block
    device.

    Closes-Bug: #1245719
    Change-Id: I1f726293e85183d4b84c05b635a7c606a092992f

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/58102

Changed in nova:
milestone: none → icehouse-1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/58102
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dfde60b4bb05c45daedc2eaa98fcd772e647fba5
Submitter: Jenkins
Branch: stable/havana

commit dfde60b4bb05c45daedc2eaa98fcd772e647fba5
Author: Mike Perez <email address hidden>
Date: Wed Nov 20 01:22:38 2013 -0800

    Include image block device maps in info

    In cases for example soft/hard reboot, we were just getting volume and
    snapshot block device maps. For block devices with a source type image,
    Nova would fall back on a local disk location and fail.

    This change allows us to include block devices with a source of an
    image, so that we correctly get the right the location of the block
    device.

    Closes-Bug: #1245719
    Change-Id: I1f726293e85183d4b84c05b635a7c606a092992f
    (cherry picked from commit 85d0ace169be513c30b09e13a35fa7d912f5b380)

tags: added: in-stable-havana
Revision history for this message
Akihiro Motoki (amotoki) wrote :

Related to Horizon, does this bug still exist after nova patch is merged.
I don't have Ceph environment now and cannot check it.
Horizon team needs more information about the current status. Mark it "Incomplete".

Changed in horizon:
status: New → Incomplete
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-1 → 2014.1
Revision history for this message
Gary W. Smith (gary-w-smith) wrote :

Marking the horizon bug as Invalid due to lack of confirmation and low probability that there is a bug in horizon

Changed in horizon:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.