VMs that don't have config drive fail to start when force_config_drive=Always

Bug #1356534 reported by Mike Dorman
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Zhenzan Zhou

Bug Description

When force_config_drive=Always is set, VMs that did not previously have a config drive created for them will fail to start.

In our particular use case, we had NOT been using config drive for a while, and then enabled it with force_config_drive=Always. Any VMs created before that time did not have a config drive created, and are now failing to start because nova-compute expects all VMs to have one.

2014-08-13 11:32:22.459 4711 ERROR nova.openstack.common.rpc.amqp [req-3d24e130-a682-415f-a6be-c3e9f3e97e39 02d7755lyxlnA 1be95d2dfcae4ab281004e22553c0d92] Exception during message handling
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp **args)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 353, in decorated_function
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 90, in wrapped
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp payload)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 73, in wrapped
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 243, in decorated_function
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp pass
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 229, in decorated_function
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 294, in decorated_function
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 271, in decorated_function
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 258, in decorated_function
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1853, in start_instance
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp self._power_on(context, instance)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1840, in _power_on
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp block_device_info)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1969, in power_on
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp self._hard_reboot(context, instance, network_info, block_device_info)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1924, in _hard_reboot
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp block_device_info)
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 4380, in get_instance_disk_info
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp dk_size = int(os.path.getsize(path))
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp File "/usr/lib64/python2.6/genericpath.py", line 49, in getsize
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp return os.stat(filename).st_size
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp OSError: [Errno 2] No such file or directory: '/var/lib/nova/instances/8625b6cf-2ab2-4a05-8e15-ae834f250393/disk.config'
2014-08-13 11:32:22.459 4711 TRACE nova.openstack.common.rpc.amqp

The work around for now is to just create a 0-byte file (or empty iso image) for disk.config on the VMs that didn't previously have one.

But, in our opinion, nova should not fail to boot the VM in this situation. I see there as three valid behaviors:

1. Recognize that the config drive is missing, but ignore it and do not attach a config drive device to the VM
2. Retroactively create the config drive disk for the VM if it is attempting to start and does not already have one.
3. Recreate the config drive disk for VMs on every boot. This has the added advantage of making the config drive dynamic.

Mike Dorman (mdorman-m)
description: updated
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Mike,

Do you have more logs? (especially nova-cpu)

thanks,
dims

Revision history for this message
Mike Dorman (mdorman-m) wrote :

I am not familiar with the nova-cpu log <?>. The above is from nova-compute. If you can tell me a little more specifically exactly what you're looking for, I can probably pull it out. Thanks.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Mike,

yep, that's the one. If you see the snippet above, you'll see that req-3d24e130-a682-415f-a6be-c3e9f3e97e39 is the id of the http request that caused this problem. So a quick way is to grep the logs to find that request id to see what is happening before this stack trace occurs. that would help.

-- dims

Revision history for this message
Mike Dorman (mdorman-m) wrote :
Download full text (4.1 KiB)

Here are some further details, and how to reproduce.

Steps to reproduce:

- Run nova-compute with force_config_drive off (not that =False doens't work, see https://bugs.launchpad.net/nova/+bug/1244725 )

- Create a VM (which does not have a config drive) For reference, here is the libvirt.xml file generated:

<domain type="kvm">
  <uuid>8a0042a8-adb9-4cca-9b67-9e61708cbd04</uuid>
  <name>instance-000005f5</name>
  <memory>1048576</memory>
  <vcpu>1</vcpu>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">OpenStack Foundation</entry>
      <entry name="product">OpenStack Nova</entry>
      <entry name="version">2014.1.2</entry>
      <entry name="serial">44454c4c-5000-104a-8036-c3c04f484d31</entry>
      <entry name="uuid">8a0042a8-adb9-4cca-9b67-9e61708cbd04</entry>
    </system>
  </sysinfo>
  <os>
    <type>hvm</type>
    <boot dev="hd"/>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <clock offset="utc">
    <timer name="pit" tickpolicy="delay"/>
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="hpet" present="no"/>
  </clock>
  <cpu mode="host-model" match="exact"/>
  <devices>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="none"/>
      <source file="/var/lib/nova/instances/8a0042a8-adb9-4cca-9b67-9e61708cbd04/disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <interface type="bridge">
      <mac address="fa:16:3e:e2:d3:4f"/>
      <model type="virtio"/>
      <source bridge="qbre29bb8dd-e3"/>
      <target dev="tape29bb8dd-e3"/>
    </interface>
    <serial type="file">
      <source path="/var/lib/nova/instances/8a0042a8-adb9-4cca-9b67-9e61708cbd04/console.log"/>
    </serial>
    <serial type="pty"/>
    <input type="tablet" bus="usb"/>
    <graphics type="spice" autoport="yes" keymap="en-us" listen="10.224.52.4"/>
    <video>
      <model type="qxl"/>
    </video>
  </devices>
</domain>

- Now configure force_config_drive=True for nova-compute

- Stop the VM that was created earlier

- Start the VM. This will fail, and the libvirt.xml file now looks like the following (with the addition of the disk.config cdrom device.

<domain type="kvm">
  <uuid>8a0042a8-adb9-4cca-9b67-9e61708cbd04</uuid>
  <name>instance-000005f5</name>
  <memory>1048576</memory>
  <vcpu>1</vcpu>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">OpenStack Foundation</entry>
      <entry name="product">OpenStack Nova</entry>
      <entry name="version">2014.1.2</entry>
      <entry name="serial">44454c4c-5000-104a-8036-c3c04f484d31</entry>
      <entry name="uuid">8a0042a8-adb9-4cca-9b67-9e61708cbd04</entry>
    </system>
  </sysinfo>
  <os>
    <type>hvm</type>
    <boot dev="hd"/>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <clock offset="utc">
    <timer name="pit" tickpolicy="delay"/>
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="hpet" present="no"/>
  </clock>
  <cpu mode="host-model" match="exact"/>
  <devices>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="none"/>
      <source file="/var/lib/nova/instances/8a0042a...

Read more...

Tracy Jones (tjones-i)
tags: added: compute
Revision history for this message
Sean Dague (sdague) wrote :

Seems solid. I agree that #3 is probably the right option.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
status: Confirmed → Triaged
tags: added: low-hanging-fruit
Changed in nova:
assignee: nobody → BUSSY Jean-Daniel (silversurfer972)
Revision history for this message
Michael Still (mikal) wrote :

I think this is a duplicate of 1241806, but am too mid-summit to think clearly about it. I don't feel recreating the config drive is the answer.

Revision history for this message
Mike Dorman (mdorman-m) wrote :

Sorry, just now getting back to this.

Agreed, this is a special case of 1241806.

The suggestion that a reboot of an existing instance should just "give you what you had before" as suggested in the other bug I think is valid.

The idea of rebuilding the config drive on every boot was attractive to us, because that way the config drive would contain the latest meta data, rather than just the meta data that existed at the first instance boot time.

Either way, it's fine. Just making things not crash and burn when changing force_config_drive would be good.

wuhao (wuhao)
Changed in nova:
assignee: Jean-Daniel Bussy (silversurfer972) → wuhao (wuhao)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/166514

Changed in nova:
status: Triaged → In Progress
Changed in nova:
importance: Low → Medium
status: In Progress → Triaged
assignee: wuhao (wuhao) → nobody
tags: added: libvirt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by John Garbutt (<email address hidden>) on branch: master
Review: https://review.openstack.org/166514
Reason: This patch seems to have stalled, lets abandon it.
Please restore the patch if that is no longer true.

Any questions, please catch me via email or on IRC johnthetubaguy

Changed in nova:
assignee: nobody → Zhenzan Zhou (zhenzan-zhou)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/222465

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Zhenzan Zhou (zhenzan-zhou) wrote :

This should be a dup of bug 1241806

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/222465
Reason: Looks like this has stalled. Please restore the patch if it still needs review.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.