unable to start lxd container instances after host reboot

Bug #1630891 reported by James Page on 2016-10-06
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nova-lxd
Undecided
Unassigned
nova-lxd (Ubuntu)
Low
Unassigned

Bug Description

After a compute hypervisor is rebooted, instances revert to SHUTDOWN state (which is the correct default); however, issuing a nova start <uuid> does not start the instance up:

Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 75, in wrapped
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server self.force_reraise()
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 66, in wrapped
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 188, in decorated_function
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server LOG.warning(msg, e, instance=instance)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server self.force_reraise()
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 157, in decorated_function
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/utils.py", line 613, in decorated_function
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 216, in decorated_function
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server self.force_reraise()
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 204, in decorated_function
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2475, in start_instance
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server self._power_on(context, instance)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2445, in _power_on
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server block_device_info)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 747, in power_on
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server container.start(wait=True)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/pylxd/models/container.py", line 159, in start
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server wait=wait)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/pylxd/models/container.py", line 144, in _set_state
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server self.client, response.json()['operation'])
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/pylxd/models/operation.py", line 29, in wait_for_operation
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server operation.wait()
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/pylxd/models/operation.py", line 51, in wait
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server raise exceptions.LXDAPIException(response)
Oct 6 08:25:57 juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10 nova-compute[2958]: 2016-10-06 08:25:57.910 2958 ERROR oslo_messaging.rpc.server LXDAPIException: Missing parent 'qbrd853dc7e-2c' for nic 'qbrd853dc7e-2c'

It looks like the neutron network setup is not being re-created prior to creating the container - this must be done by the compute driver before attempting to start the instance:

root@juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:9d:8d:e1 brd ff:ff:ff:ff:ff:ff
3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ether fa:8d:01:a5:24:0a brd ff:ff:ff:ff:ff:ff
4: br-ex: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ether 36:1b:99:1f:fd:4f brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ether 26:e8:f0:7c:58:43 brd ff:ff:ff:ff:ff:ff
6: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/gre 0.0.0.0 brd 0.0.0.0
7: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
8: gre_sys@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65490 qdisc pfifo_fast master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 86:a0:50:4a:1b:b0 brd ff:ff:ff:ff:ff:ff
9: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ether ae:20:44:00:f9:4b brd ff:ff:ff:ff:ff:ff
10: br-data: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ether ee:3d:ef:52:b3:45 brd ff:ff:ff:ff:ff:ff
11: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 62:46:66:1a:af:04 brd ff:ff:ff:ff:ff:ff
root@juju-cf19f1e0-16f6-4889-8211-f9d4ec880eaf-machine-10:~# sudo ovs-vsctl show
86ce50a2-d54f-4d1a-92f2-655ff728d897
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-tun
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "gre-0a051413"
            Interface "gre-0a051413"
                type: gre
                options: {df_default="true", in_key=flow, local_ip="10.5.20.25", out_key=flow, remote_ip="10.5.20.19"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "gre-0a051412"
            Interface "gre-0a051412"
                type: gre
                options: {df_default="true", in_key=flow, local_ip="10.5.20.25", out_key=flow, remote_ip="10.5.20.18"}
        Port br-tun
            Interface br-tun
                type: internal
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-data
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port br-data
            Interface br-data
                type: internal
        Port phy-br-data
            Interface phy-br-data
                type: patch
                options: {peer=int-br-data}
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port int-br-data
            Interface int-br-data
                type: patch
                options: {peer=phy-br-data}
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
    ovs_version: "2.6.0"

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: nova-compute-lxd 14.0.0~rc1-0ubuntu1~cloud0 [origin: Canonical]
ProcVersionSignature: Ubuntu 4.4.0-38.57-generic 4.4.19
Uname: Linux 4.4.0-38-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
CrashDB:
 {
                "impl": "launchpad",
                "project": "cloud-archive",
                "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml",
             }
Date: Thu Oct 6 08:26:12 2016
Ec2AMI: ami-0000044a
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
PackageArchitecture: all
ProcEnviron:
 TERM=screen-256color-bce
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nova-lxd
UpgradeStatus: No upgrade log present (probably fresh install)

James Page (james-page) wrote :
Chuck Short (zulcss) on 2016-10-14
Changed in nova-lxd (Ubuntu):
status: New → Fix Released
Kevin Metz (pertinent) wrote :

This issue still exists. This also effects instances starting automatically after a reboot

James Page (james-page) wrote :

Kevin - can you confirm versions please? This issue was quite specific to OpenStack Newton - you might be seeing the same symptom but the underlying cause may be different.

Chuck Short (zulcss) on 2016-11-09
Changed in nova-lxd:
status: New → Fix Released
Changed in nova-lxd (Ubuntu):
status: Fix Released → Confirmed
James Page (james-page) wrote :

from stable/newton branch:

commit c3ed3574a5c0aa59c9e68e6ac4ab2bff1fce3ca0
Author: Chuck Short <email address hidden>
Date: Thu Oct 6 09:08:37 2016 -0400

    Reset vifs after reboot

    When a host is rebooted instances are set into a shutdown instance
    state, which is the correct thing.

    However when a user tries to restart an instance, LXD will fail
    to restart the container because the bridge is not present.

    This is due to the fact that when the instance status transition
    from shutdown to active the compute daemon tries to run the
    plug_vifs method, which was not present in the driver.

    To correct this we re-add the the plug_vifs and unplug_vifs
    method.

    LP: #1630891

    Change-Id: Iae80f4667debab4d1e21d67d882ed52ef59cf15b
    Signed-off-by: Chuck Short <email address hidden>

James Page (james-page) wrote :

I tried to reproduce this on xenial-mitaka - on reboot, the instance is in the shutdown state, but I was able to start it and confirm that networking was OK.

Kevin Metz (pertinent) wrote :

Version was LXD 2.3.0, and Openstack Mitaka. nova-compute-lxd nodes have been removed so not able to replicate exact issue

James Page (james-page) on 2016-12-15
Changed in nova-lxd (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Medium
no longer affects: cloud-archive
Changed in nova-lxd (Ubuntu):
importance: Medium → Low
status: Triaged → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers