vms failed to hard reboot and became error after set force_config_drive in compute nodes

Bug #1827492 reported by pandatt on 2019-05-03
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
pandatt
Queens
Low
Lee Yarwood
Rocky
Low
Lee Yarwood
Stein
Low
Lee Yarwood

Bug Description

Description
===========
Hi guys, i ran into a problem in our POC cluster:
At first, our cluster is configured to use ONLY metadata service
to assist cloud init, and all KVM instances are not configureed
with `--config-drive` opt.
Later, for the reason of verification on how to inject metadata
and network configuration in pure L2 tenant network where DHCP/
L3 services not allowed (and therefore, metadata service not ava
-ilable either), we configured all the compute nodes with `force
_config_drive=true` opt. Then i noticed the problems:
a. poweroffed instances cannot poweron
b. active instances failed to hard reboot, stuck powering-on and
   finally became error.
After inspecting the compute log, i believe that this case is not
taken into account when nova-compute re-generate virt XML, define
and start instances.

Steps to reproduce
==================
1. boot new instance without `--config-drive` opt to certain compute host:
   # nova boot --flavor 512-1-1 --image cirros \
               --nic net-id=d9897882-607a-47ba-8b28-91043a5c2d58 POC
2. configure the compute host with `force_config_drive` opt and restart
   the nova-compute service or service container(if kolla used).
3. shutoff the instance `POC`
   # nova stop <UUID of instance `POC`>
4. start the instance `POC`
   # nova start <UUID of instance `POC`>
5. hard reboot the instance `POC`
   # nova reboot --hard <UUID of instance `POC`>

Expected result
===============
After step4, instance `POC` will be active
After step5, instance `POC` will be active

Actual result
=============
After step4, instance `POC` stuck shutoff.
After step5, instance `POC` is still shutoff, stuck powering-on and
finally became error.

Environment
===========
1. version: OpenStack Rocky + centOS7
2. hypervisor: Libvirt + KVM
3. storage: Ceph
4. networking Neutron with OpenVSwitch

Logs & Configs
==============
(1) nova.conf in compute node:
[DEFAULT]
...
config_drive_format=vfat
force_config_drive=true
flat_injected=true
...

(2) nova-compute.log in compute node:
2019-05-02 12:32:35.000 6 INFO nova.compute.manager [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131
380f701f5575430195526229dc143a1f - - -] [instance: 27730cc2-25ba-4ebc-a73d-f8d2e071ae92] Rebooting instance
2019-05-02 12:32:36.030 6 INFO nova.virt.libvirt.driver [-] [instance: 27730cc2-25ba-4ebc-a73d-f8d2e071ae92] Instance destroyed successfully.
2019-05-02 12:32:38.128 6 WARNING nova.virt.osinfo [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] Cannot find OS information - Reason: (No configuration information found for operating system CirrOS-64)
2019-05-02 12:32:38.129 6 WARNING nova.virt.osinfo [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] Cannot find OS information - Reason: (No configuration information found for operating system CirrOS-64)
2019-05-02 12:32:38.890 6 WARNING nova.virt.osinfo [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] Cannot find OS information - Reason: (No configuration information found for operating system CirrOS-64)
2019-05-02 12:32:38.898 6 INFO nova.virt.libvirt.driver [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] before plug_vifs
2019-05-02 12:32:38.938 6 INFO os_vif [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] Successfully plugged vif VIFBridge(active=True,address=fa:16:3e:36:a2:36,bridge_name='qbr60e943ff-4e',has_traffic_filtering=True,id=60e943ff-4e12-491b-a39c-0eb6bdca7ebb,network=Network(ed4829d3-d1b8-40fa-ab11-c59772a0d68e),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=False,vif_name='tap60e943ff-4e')
2019-05-02 12:32:38.939 6 INFO nova.virt.libvirt.driver [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] after plug_vifs
2019-05-02 12:32:38.939 6 INFO nova.virt.libvirt.driver [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] after setup_basic_filtering
2019-05-02 12:32:38.940 6 INFO nova.virt.libvirt.driver [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] after prepare_instance_filter
2019-05-02 12:32:38.940 6 INFO nova.virt.libvirt.driver [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] before _create_domain
2019-05-02 12:32:40.772 6 ERROR nova.virt.libvirt.guest [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] Error launching a defined domain with XML: <domain type='kvm'>
  <name>instance-000000c1</name>
  <uuid>27730cc2-25ba-4ebc-a73d-f8d2e071ae92</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="0.0.1"/>
      <nova:name>jingyu</nova:name>
      <nova:creationTime>2019-05-02 04:32:38</nova:creationTime>
      <nova:flavor name="2-2-1">
        <nova:memory>2048</nova:memory>
        <nova:disk>1</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>2</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="9fef2099c3254226a96e48311d124131">admin</nova:user>
        <nova:project uuid="380f701f5575430195526229dc143a1f">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="da4e5e0b-e421-434c-a970-7b2ac680b3b5"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static' current='2'>4</vcpu>
  <cputune>
    <shares>2048</shares>
  </cputune>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>OpenStack Foundation</entry>
      <entry name='product'>OpenStack Nova</entry>
      <entry name='version'>0.0.1</entry>
      <entry name='serial'>d8127418-14a7-50e1-9e31-6f9fe4de8ca2</entry>
      <entry name='uuid'>27730cc2-25ba-4ebc-a73d-f8d2e071ae92</entry>
      <entry name='family'>Virtual Machine</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='2' cores='1' threads='2'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='admin'>
        <secret type='ceph' uuid='acf3fb4f-94b9-45b8-bcd4-4a6a7fac1f4e'/>
      </auth>
      <source protocol='rbd' name='vms/27730cc2-25ba-4ebc-a73d-f8d2e071ae92_disk'>
        <host name='100.2.29.231' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='admin'>
        <secret type='ceph' uuid='acf3fb4f-94b9-45b8-bcd4-4a6a7fac1f4e'/>
      </auth>
      <source protocol='rbd' name='vms/27730cc2-25ba-4ebc-a73d-f8d2e071ae92_disk.config'>
        <host name='100.2.29.231' port='6789'/>
      </source>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='bridge'>
      <mac address='fa:16:3e:36:a2:36'/>
      <source bridge='qbr60e943ff-4e'/>
      <target dev='tap60e943ff-4e'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='file'>
      <source path='/var/lib/nova/instances/27730cc2-25ba-4ebc-a73d-f8d2e071ae92/console.log'/>
      <target port='0'/>
    </serial>
    <serial type='pty'>
      <target port='1'/>
    </serial>
    <console type='file'>
      <source path='/var/lib/nova/instances/27730cc2-25ba-4ebc-a73d-f8d2e071ae92/console.log'/>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='100.2.96.159' keymap='en-us'>
      <listen type='address' address='100.2.96.159'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
</domain>

2019-05-02 12:32:40.773 6 ERROR nova.virt.libvirt.driver [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] [instance: 27730cc2-25ba-4ebc-a73d-f8d2e071ae92] Failed to start libvirt guest
2019-05-02 12:32:41.058 6 INFO os_vif [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] Successfully unplugged vif VIFBridge(active=True,address=fa:16:3e:36:a2:36,bridge_name='qbr60e943ff-4e',has_traffic_filtering=True,id=60e943ff-4e12-491b-a39c-0eb6bdca7ebb,network=Network(ed4829d3-d1b8-40fa-ab11-c59772a0d68e),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=False,vif_name='tap60e943ff-4e')
2019-05-02 12:32:41.064 6 ERROR nova.compute.manager [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] [instance: 27730cc2-25ba-4ebc-a73d-f8d2e071ae92] Cannot reboot instance: internal error: qemu unexpectedly closed the monitor: 2019-05-02T04:32:40.560702Z qemu-kvm: -drive file=rbd:vms/27730cc2-25ba-4ebc-a73d-f8d2e071ae92_disk.config:id=admin:auth_supported=cephx\;none:mon_host=100.2.29.231\:6789,file.password-secret=virtio-disk1-secret0,format=raw,if=none,id=drive-virtio-disk1,cache=none: error reading header from 27730cc2-25ba-4ebc-a73d-f8d2e071ae92_disk.config: No such file or directory
2019-05-02 12:32:41.865 6 INFO nova.compute.manager [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] [instance: 27730cc2-25ba-4ebc-a73d-f8d2e071ae92] Successfully reverted task state from reboot_started_hard on failure for instance.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server [req-2a9948c2-0c51-4950-9a40-3d72d362ead8 9fef2099c3254226a96e48311d124131 380f701f5575430195526229dc143a1f - - -] Exception during message handling
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/exception_wrapper.py", line 75, in wrapped
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/exception_wrapper.py", line 66, in wrapped
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 189, in decorated_function
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server LOG.warning(msg, e, instance=instance)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 158, in decorated_function
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/utils.py", line 686, in decorated_function
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 217, in decorated_function
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 205, in decorated_function
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 3048, in reboot_instance
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self._set_instance_obj_error_state(context, instance)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 3029, in reboot_instance
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server bad_volumes_callback=bad_volumes_callback)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2318, in reboot
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server block_device_info)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2422, in _hard_reboot
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server vifs_already_plugged=True)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5151, in _create_domain_and_network
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server destroy_disks_on_failure)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5122, in _create_domain_and_network
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server post_xml_callback=post_xml_callback)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5035, in _create_domain
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server guest.launch(pause=pause)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 145, in launch
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self._encoded_xml, errors='ignore')
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 140, in launch
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server return self._domain.createWithFlags(flags)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server result = proxy_call(self._autowrap, f, *args, **kwargs)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server rv = execute(f, *args, **kwargs)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server six.reraise(c, e, tb)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server rv = meth(*args, **kwargs)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/libvirt.py", line 1065, in createWithFlags
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.rpc.server libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-05-02T04:32:40.560702Z qemu-kvm: -drive file=rbd:vms/27730cc2-25ba-4ebc-a73d-f8d2e071ae92_disk.config:id=admin:auth_supported=cephx\;none:mon_host=100.2.29.231\:6789,file.password-secret=virtio-disk1-secret0,format=raw,if=none,id=drive-virtio-disk1,cache=none: error reading header from 27730cc2-25ba-4ebc-a73d-f8d2e071ae92_disk.config: No such file or directory

pandatt (pandatt) on 2019-05-03
Changed in nova:
status: New → Confirmed
assignee: nobody → pandatt (pandatt)
pandatt (pandatt) on 2019-05-08
Changed in nova:
status: Confirmed → In Progress
pandatt (pandatt) wrote :

I found the root cause, the `required_by` function in `nova.virt.config_drive` module did not take the case as follows into consideration:
"""
def required_by(instance):
    image_prop = instance.image_meta.properties.get(
        "img_config_drive",
        fields.ConfigDrivePolicy.OPTIONAL)

    return (instance.config_drive or
            CONF.force_config_drive or
            image_prop == fields.ConfigDrivePolicy.MANDATORY
            )
"""
When instances ,initially booted without `--config-drive` option and host `force_config_drive=True` option,
are hard rebooted in host with `force_config_drive=True` newly configured and virt XML needs to be re-generated,
they should not be enforced to have config drives. Only the newly-built ones should be.

pandatt (pandatt) wrote :

We can make a distinction between existing VMs and newly-built VMs by `instance.launched_at` ;

Matt Riedemann (mriedem) on 2019-05-17
tags: added: libvirt
Changed in nova:
assignee: pandatt (pandatt) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) on 2019-05-22
Changed in nova:
assignee: Matt Riedemann (mriedem) → pandatt (pandatt)
importance: Undecided → Medium
importance: Medium → Low

Reviewed: https://review.opendev.org/659703
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2af89cfea0531ab75b4f5765bb220073f42662d0
Submitter: Zuul
Branch: master

commit 2af89cfea0531ab75b4f5765bb220073f42662d0
Author: pandatt <guojy8993@163.com>
Date: Thu May 16 02:50:57 2019 +0800

    Skip existing VMs when hosts apply force_config_drive

    When hosts apply config `CONF.force_config_drive=True`, the existing
    VMs shouldn't be enforced to must have config drive device. For they
    may have been cloud-inited via metadata service, and may not need and
    have any config drive device ever. In contrast, the newly being-built
    ones should. Instance attr `launched_at` serves as an apparent flag
    to distinguish the two kinds of VMs.

    When hard reboots happend, existing VMs skip config drive enforcement,
    and therefore avoid hitting 'No such file or directory (config drive
    device)' error.

    Change-Id: I0558ece92f8657c2f6294e07965c619eb7c8dfcf
    Closes-Bug: #1827492

Changed in nova:
status: In Progress → Fix Released
sean mooney (sean-k-mooney) wrote :

there is a bug in the patch that was merged on master which im going to fix.
once i file the new bug.

basicaly these two lines need to be swapped
https://github.com/openstack/nova/blob/86524773b8cd3a52c98409c7ca183b4e1873e2b8/nova/compute/manager.py#L1757-L1758

or required_by will always be false if the a config drive is not requested on the spawn api.
https://review.opendev.org/#/c/659703/8/nova/virt/configdrive.py@169

before https://review.opendev.org/#/c/659703/8 required_by did not depend in instance.launched_at now it does.

the effect of this bug is that if you boot a vm with CONF.force_config_drive=True then hardreboot the
vm it will nolonger have access to the config drive. if the deployement does not have the metadta service or is use config drive for file injection then this is a regreussion and possibel data loss.

for cloud without the metadata service deployed like rackspace this would mean that vms nolonger have acces to vendordata or other metadtate like deivce role tagging ater there first boot which is incorrect.

Matt Riedemann (mriedem) wrote :

Note this is essentially a duplicate of bug 1241806.

Reviewed: https://review.opendev.org/669738
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=641feb62a33e211a3ed1f9a2309ee16d827f5fe2
Submitter: Zuul
Branch: master

commit 641feb62a33e211a3ed1f9a2309ee16d827f5fe2
Author: Sean Mooney <email address hidden>
Date: Mon Jul 8 18:35:11 2019 +0000

    make config drives sticky bug 1835822

    This change reorders the call in
    _update_instance_after_spawn so that we call
    configdrive.update_instance before we set
    instance.launched_at. This ensures that if the vm
    is booted with a config drive because the host had
    force_config_drive=true the instance will keep its
    config drive across reboots.

    This change fixes a regression introduced as part
    of fixing bug #1827492 which addressed failing
    to boot vms after changing force_config_drive=false
    to force_config_drive=true.

    Change-Id: I9194423f5f95e9799bd891548e24756131d65e76
    Related-Bug: #1827492
    Closes-Bug: #1835822

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/660914

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/660915

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/660917

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers