instances libvirt config not present post-upgrade

Bug #1336115 reported by Adam Gandelman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Adam Gandelman

Bug Description

After an image based overcloud upgrade, instances that were ACTIVE prior to rebuilding are no longer running, tho their nova status shows otherwise:

| 75c5d266-1a5d-449c-83cf-0ebfd68613ed | demo | ACTIVE | - | NOSTATE | default-net=192.168.10.2, 10.22.167.112

On the compute host that hosts the missing instance, there is no sign of it. Bouncing the nova-compute service shows:

2014-07-01 00:56:46.257 4034 INFO oslo.messaging._drivers.impl_rabbit [req-47e724cc-774f-4373-b062-47b55cfb8c59 ] Connected to AMQP server on 10.22.167.73:5672
2014-07-01 00:56:46.278 4034 INFO oslo.messaging._drivers.impl_rabbit [req-47e724cc-774f-4373-b062-47b55cfb8c59 ] Connected to AMQP server on 10.22.167.73:5672
2014-07-01 00:56:49.851 4034 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 10.22.167.73:5672
2014-07-01 00:56:49.956 4034 WARNING nova.compute.manager [req-ea2b98ce-6d59-440c-8cbc-111f4344c486 None] Found 1 in the database and 0 on the hypervisor.
2014-07-01 00:56:50.174 4034 WARNING nova.compute.manager [req-ea2b98ce-6d59-440c-8cbc-111f4344c486 None] [instance: 75c5d266-1a5d-449c-83cf-0ebfd68613ed] Instance is unexpectedly not found. Ignore.

The instance is not defined in libvirt, as I would have expected:

root@overcloud-novacompute2-zd67ege6kr6v:/mnt/state/var/lib/nova/instances# sudo virsh list --all
 Id Name State
----------------------------------------------------

Tho its directory in nova's state path is present:

root@overcloud-novacompute2-zd67ege6kr6v:~# find /mnt/state/var/lib/nova//instances/75c5d266-1a5d-449c-83cf-0ebfd68613ed/

/mnt/state/var/lib/nova//instances/75c5d266-1a5d-449c-83cf-0ebfd68613ed/
/mnt/state/var/lib/nova//instances/75c5d266-1a5d-449c-83cf-0ebfd68613ed/console.log
/mnt/state/var/lib/nova//instances/75c5d266-1a5d-449c-83cf-0ebfd68613ed/libvirt.xml
/mnt/state/var/lib/nova//instances/75c5d266-1a5d-449c-83cf-0ebfd68613ed/disk
/mnt/state/var/lib/nova//instances/75c5d266-1a5d-449c-83cf-0ebfd68613ed/disk.info

I believe we need to ensure libvirt's state directory ends up on the persistent partition with the rest.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

We also need to move /etc/libvirt/ to the stateful partition. Removing a libvirt domain out of band from nova will result in an ACTIVE instance with NO_STATE:

https://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n5443

libvirt essentially stores all domain/network/etc entries as xml in /etc/libvirt/

Package installation populates /etc/libvirt with some stuff that needs to be moved to the stateful partition before the service starts. Note the default network entries at install time:

$ tree /etc/libvirt/
/etc/libvirt/
├── hooks
├── libvirt.conf
├── libvirtd.conf
├── lxc.conf
├── nwfilter
│   ├── allow-arp.xml
│   ├── allow-dhcp-server.xml
│   ├── allow-dhcp.xml
│   ├── allow-incoming-ipv4.xml
│   ├── allow-ipv4.xml
│   ├── clean-traffic.xml
│   ├── no-arp-ip-spoofing.xml
│   ├── no-arp-mac-spoofing.xml
│   ├── no-arp-spoofing.xml
│   ├── no-ip-multicast.xml
│   ├── no-ip-spoofing.xml
│   ├── no-mac-broadcast.xml
│   ├── no-mac-spoofing.xml
│   ├── no-other-l2-traffic.xml
│   ├── no-other-rarp-traffic.xml
│   ├── qemu-announce-self-rarp.xml
│   └── qemu-announce-self.xml
├── qemu
│   └── networks
│   ├── autostart
│   │   └── default.xml -> /etc/libvirt/qemu/networks/default.xml
│   └── default.xml
├── qemu.conf
├── qemu-lockd.conf
├── virtlockd.conf
└── virt-login-shell.conf

Changed in tripleo:
assignee: nobody → Adam Gandelman (gandelman-a)
status: New → In Progress
no longer affects: tripleo-image-elements (Ubuntu)
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Not certain /var/lib/libvirt/ needs to be preserved. Looks like libvirt does a good job of re-populating as needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/104407

Ben Nemec (bnemec)
Changed in tripleo:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/104407
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=455e35e8fc845449cf8fd9b03afd804651612173
Submitter: Jenkins
Branch: master

commit 455e35e8fc845449cf8fd9b03afd804651612173
Author: Adam Gandelman <email address hidden>
Date: Wed Jul 2 18:28:10 2014 -0700

    Move libvirt's qemu configuration dir to state fs

    We need libvirt domain entries created by Nova in to persist across
    image upgrades. Otherwise, instances are effectively removed from the
    hypervisor after a new image deploys with an empty libvirt config tree.
    This registers /etc/libvirt/qemu as a stateful path and ensures persistent
    domain configurations across upgrades.

    Change-Id: I419bb25069314c7e639f980b5c236711b2d58edc
    Closes-bug: #1336115

Changed in tripleo:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/107461

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/107461
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=81ebbdce33cb695a5c6b40e06cf5102220be791f
Submitter: Jenkins
Branch: master

commit 81ebbdce33cb695a5c6b40e06cf5102220be791f
Author: Adam Gandelman <email address hidden>
Date: Wed Jul 16 11:13:36 2014 -0700

    Move libvirt's qemu configuration dir to state fs

    We need libvirt domain entries created by Nova in to persist across
    image upgrades. Otherwise, instances are effectively removed from the
    hypervisor after a new image deploys with an empty libvirt config tree.
    This registers /etc/libvirt/qemu as a stateful path and ensures persistent
    domain configurations across upgrades.

    This reapplies a previously reverted fix, this time including the proper
    element-deps.

    Change-Id: I716b9e6f3c8d36c56749b7de0915e5a3d2eb1973
    Closes-bug: #1336115

Jay Dobies (jdob)
Changed in tripleo:
status: Fix Committed → Fix Released
summary: - instances no longer running post-upgrade
+ instances libvirt config not present post-upgrade
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.