vms failed to hard reboot and became error after set force_config_drive in compute nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Low
|
pandatt | ||
Queens |
In Progress
|
Low
|
Lee Yarwood | ||
Rocky |
In Progress
|
Low
|
Lee Yarwood | ||
Stein |
In Progress
|
Low
|
Lee Yarwood |
Bug Description
Description
===========
Hi guys, i ran into a problem in our POC cluster:
At first, our cluster is configured to use ONLY metadata service
to assist cloud init, and all KVM instances are not configureed
with `--config-drive` opt.
Later, for the reason of verification on how to inject metadata
and network configuration in pure L2 tenant network where DHCP/
L3 services not allowed (and therefore, metadata service not ava
-ilable either), we configured all the compute nodes with `force
_config_drive=true` opt. Then i noticed the problems:
a. poweroffed instances cannot poweron
b. active instances failed to hard reboot, stuck powering-on and
finally became error.
After inspecting the compute log, i believe that this case is not
taken into account when nova-compute re-generate virt XML, define
and start instances.
Steps to reproduce
==================
1. boot new instance without `--config-drive` opt to certain compute host:
# nova boot --flavor 512-1-1 --image cirros \
2. configure the compute host with `force_
the nova-compute service or service container(if kolla used).
3. shutoff the instance `POC`
# nova stop <UUID of instance `POC`>
4. start the instance `POC`
# nova start <UUID of instance `POC`>
5. hard reboot the instance `POC`
# nova reboot --hard <UUID of instance `POC`>
Expected result
===============
After step4, instance `POC` will be active
After step5, instance `POC` will be active
Actual result
=============
After step4, instance `POC` stuck shutoff.
After step5, instance `POC` is still shutoff, stuck powering-on and
finally became error.
Environment
===========
1. version: OpenStack Rocky + centOS7
2. hypervisor: Libvirt + KVM
3. storage: Ceph
4. networking Neutron with OpenVSwitch
Logs & Configs
==============
(1) nova.conf in compute node:
[DEFAULT]
...
config_
force_config_
flat_injected=true
...
(2) nova-compute.log in compute node:
2019-05-02 12:32:35.000 6 INFO nova.compute.
380f701f5575430
2019-05-02 12:32:36.030 6 INFO nova.virt.
2019-05-02 12:32:38.128 6 WARNING nova.virt.osinfo [req-2a9948c2-
2019-05-02 12:32:38.129 6 WARNING nova.virt.osinfo [req-2a9948c2-
2019-05-02 12:32:38.890 6 WARNING nova.virt.osinfo [req-2a9948c2-
2019-05-02 12:32:38.898 6 INFO nova.virt.
2019-05-02 12:32:38.938 6 INFO os_vif [req-2a9948c2-
2019-05-02 12:32:38.939 6 INFO nova.virt.
2019-05-02 12:32:38.939 6 INFO nova.virt.
2019-05-02 12:32:38.940 6 INFO nova.virt.
2019-05-02 12:32:38.940 6 INFO nova.virt.
2019-05-02 12:32:40.772 6 ERROR nova.virt.
<name>
<uuid>
<metadata>
<nova:instance xmlns:nova="http://
<nova:package version="0.0.1"/>
<
<
<nova:flavor name="2-2-1">
<
<nova:owner>
<nova:user uuid="9fef2099c
</nova:owner>
<nova:root type="image" uuid="da4e5e0b-
</nova:
</metadata>
<memory unit='KiB'
<currentMemory unit='KiB'
<vcpu placement='static' current=
<cputune>
<shares>
</cputune>
<sysinfo type='smbios'>
<system>
<entry name='manufactu
<entry name='product'
<entry name='version'
<entry name='serial'
<entry name='uuid'
<entry name='family'
</system>
</sysinfo>
<os>
<type arch='x86_64' machine=
<boot dev='hd'/>
<smbios mode='sysinfo'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode='host-model'>
<model fallback='allow'/>
<topology sockets='2' cores='1' threads='2'/>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy=
<timer name='rtc' tickpolicy=
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>
<on_reboot>
<on_crash>
<devices>
<emulator>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<auth username='admin'>
<secret type='ceph' uuid='acf3fb4f-
</auth>
<source protocol='rbd' name='vms/
<host name='100.2.29.231' port='6789'/>
</source>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<auth username='admin'>
<secret type='ceph' uuid='acf3fb4f-
</auth>
<source protocol='rbd' name='vms/
<host name='100.2.29.231' port='6789'/>
</source>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
<controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<interface type='bridge'>
<mac address=
<source bridge=
<target dev='tap60e943f
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<serial type='file'>
<source path='/
<target port='0'/>
</serial>
<serial type='pty'>
<target port='1'/>
</serial>
<console type='file'>
<source path='/
<target type='serial' port='0'/>
</console>
<input type='tablet' bus='usb'>
<address type='usb' bus='0' port='1'/>
</input>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes' listen=
<listen type='address' address=
</graphics>
<video>
<model type='cirrus' vram='16384' heads='1' primary='yes'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<stats period='10'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</memballoon>
</devices>
</domain>
2019-05-02 12:32:40.773 6 ERROR nova.virt.
2019-05-02 12:32:41.058 6 INFO os_vif [req-2a9948c2-
2019-05-02 12:32:41.064 6 ERROR nova.compute.
2019-05-02 12:32:41.865 6 INFO nova.compute.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
2019-05-02 12:32:41.890 6 ERROR oslo_messaging.
Changed in nova: | |
status: | New → Confirmed |
assignee: | nobody → pandatt (pandatt) |
Changed in nova: | |
status: | Confirmed → In Progress |
tags: | added: libvirt |
Changed in nova: | |
assignee: | pandatt (pandatt) → Matt Riedemann (mriedem) |
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → pandatt (pandatt) |
importance: | Undecided → Medium |
importance: | Medium → Low |
I found the root cause, the `required_by` function in `nova.virt. config_ drive` module did not take the case as follows into consideration: by(instance) : image_meta. properties. get(
"img_config_ drive",
fields. ConfigDrivePoli cy.OPTIONAL)
"""
def required_
image_prop = instance.
return (instance. config_ drive or
CONF. force_config_ drive or
image_ prop == fields. ConfigDrivePoli cy.MANDATORY config_ drive=True` option, config_ drive=True` newly configured and virt XML needs to be re-generated,
)
"""
When instances ,initially booted without `--config-drive` option and host `force_
are hard rebooted in host with `force_
they should not be enforced to have config drives. Only the newly-built ones should be.