juju deployed openstack nova-lxd cannot start instances

Bug #1690345 reported by Patrizio Bassi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nova-lxd
Incomplete
Low
Unassigned

Bug Description

when launching a tar-root image (xenial cloud image) on a nova-compute lxc hypervisor i get

2017-05-12 09:36:35.270 9689 WARNING nova.scheduler.client.report [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] Unable to refresh my resource provider record
2017-05-12 09:36:53.532 9689 ERROR nova.virt.lxd.driver [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] Failed to upload to LXD: Image is unacceptable: Bad Image format: Image could not be found.
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] Instance failed to spawn
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] Traceback (most recent call last):
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2082, in _build_resources
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] yield resources
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1924, in _build_and_run_instance
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] block_device_info=block_device_info)
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 274, in spawn
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] self.setup_image(context, instance, image_meta)
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 1333, in setup_image
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] os.unlink(container_manifest)
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] self.force_reraise()
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] six.reraise(self.type_, self.value, self.tb)
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 1216, in setup_image
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] image_id=instance.image_ref, reason=reason)
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] ImageUnacceptable: Image is unacceptable: Bad Image format: Image could not be found.
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840]
2017-05-12 09:36:53.533 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840]
2017-05-12 09:36:53.549 9689 WARNING nova.virt.lxd.driver [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] Failed to delete instance. Container does not exist for instance-00000097.
2017-05-12 09:36:54.085 9689 WARNING nova.compute.manager [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] Could not clean up failed build, not rescheduling. Error: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf zfs destroy lxd/instance-00000097-ephemeral
Exit code: 1
Stdout: u''
Stderr: u"cannot open 'lxd/instance-00000097-ephemeral': dataset does not exist\n"
2017-05-12 09:36:54.277 9689 WARNING nova.scheduler.client.report [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] No authentication information found for placement API. Placement is optional in Newton, but required in Ocata. Please enable the placement service before upgrading.
2017-05-12 09:36:54.278 9689 WARNING nova.scheduler.client.report [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] Unable to refresh my resource provider record
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] Build of instance 1ad75f54-e9a3-420f-bd4c-defe890fa840 aborted: Image is unacceptable: Bad Image format: Image could not be found.
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] Traceback (most recent call last):
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1783, in _do_build_and_run_instance
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] filter_properties)
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1943, in _build_and_run_instance
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] 'create.error', fault=e)
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] self.force_reraise()
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] six.reraise(self.type_, self.value, self.tb)
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1927, in _build_and_run_instance
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] instance=instance)
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] self.gen.throw(type, value, traceback)
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2109, in _build_resources
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] reason=six.text_type(exc))
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840] BuildAbortException: Build of instance 1ad75f54-e9a3-420f-bd4c-defe890fa840 aborted: Image is unacceptable: Bad Image format: Image could not be found.
2017-05-12 09:36:54.280 9689 ERROR nova.compute.manager [instance: 1ad75f54-e9a3-420f-bd4c-defe890fa840]
2017-05-12 09:36:55.391 9689 WARNING nova.compute.manager [req-e3ce840a-b823-40ce-9c32-85b99e118a70 8446f64e0b504345b48673a3e3a328f1 35515180b8b646329e2caa2372250e0b - - -] Failed to delete volume: 7c49db87-16ab-4b30-9b13-832bd7cbf720 due to Invalid input received: Invalid volume: Volume status must be available or error or error_restoring or error_extending and must not be migrating, attached, belong to a group or have snapshots. (HTTP 400) (Request-ID: req-f82b164d-9044-4661-ab69-0bdbab083b4b)

it looks like nova cannot retrieve image from glance.

I checked /etc/nova/nova.conf in kvm nodes and lxd hypervisor nodes and they look the same.
Regarding glance section it just contains the same endpoint

[glance]
api_servers = http://10.10.10.95:9292

Following the backtrace it seems it cannot get the image metadata at all. I checked the upstream source at https://github.com/openstack/nova-lxd/blob/master/nova/virt/lxd/driver.py#L398

has changed the galnce-lxd sync code.
Maybe it has been fixed already, did anyone try? do we have a bleeding edge nova-lxd ppa to test against?

James Page (james-page)
affects: juju → nova-lxd
Revision history for this message
Patrizio Bassi (patrizio-bassi) wrote :

hi all, is anyone checking this issue?

Revision history for this message
James Page (james-page) wrote :

Hi Patrizio

Yes we are checking this issue, but right now we're not able to reproduce this problem either using devstack or using the openstack charms.

Just for completeness here, please could you confirm which version of nova-compute and nova-compute-lxd you are using so we can match up on versions.

Changed in nova-lxd:
status: New → Incomplete
importance: Undecided → Low
Revision history for this message
Patrizio Bassi (patrizio-bassi) wrote :

Dear James,

i'm using the same nova-compute charm for both, deployed to 2 applications
nova-compute 14.0.4 active 5 nova-compute jujucharms 264 ubuntu
nova-compute-lxd 14.0.4 active 1 nova-compute jujucharms 264 ubuntu

of course i change the virt-type for them

Revision history for this message
James Page (james-page) wrote :

Thanks for that information; I actually need the version of the nova-compute-lxd package thats installed on the nova-compute-lxd units; 14.0.4 is the main nova version, not the driver version for LXD.

If you ssh to a unit and do "apt-cache policy nova-compute-lxd" we'll get the right information.

Revision history for this message
James Page (james-page) wrote :

Patrizio

How are you creating your instance? via Horizon or from the CLI?

I could do with visibility over what options you are using to attempt to boot an instance.

Revision history for this message
Patrizio Bassi (patrizio-bassi) wrote :
Download full text (11.0 KiB)

Dear James,

i scratched the hypervisor and redeployed this morning using:
nova-lxd 14.0.5 active 1 nova-compute jujucharms 269 ubuntu
lxd 2.0.9 active 1 lxd jujucharms 11 ubuntu

nova-compute-lxd:
  Installed: 14.2.2-0ubuntu0.16.10.2~cloud0
  Candidate: 14.2.2-0ubuntu0.16.10.2~cloud0
  Version table:
 *** 14.2.2-0ubuntu0.16.10.2~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-updates/newton/main amd64 Packages
        100 /var/lib/dpkg/status
     13.3.0-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
     13.2.0-0ubuntu1.16.04.1 500
        500 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages
     13.0.0-0ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages

I changed from ZFS to LVM backend.

juju deploy nova-compute nova-lxd --to=spare002
juju config nova-lxd virt-type=lxd
juju config nova-lxd openstack-origin=cloud:xenial-newton enable-live-migration=true enable-resize=true migration-auth-type=ssh
juju deploy lxd
juju config lxd block-devices=/dev/sdb storage-type=lvm overwrite=true
juju add-relation lxd nova-lxd

juju add-relation nova-lxd rabbitmq-server
juju add-relation nova-lxd glance
juju add-relation nova-lxd neutron-openvswitch
juju add-relation ceph-mon nova-lxd
juju add-relation ntp nova-lxd
juju add-relation nova-cloud-controller nova-lxd
juju add-relation ceilometer-agent nova-lxd

when i try to deploy from horizon (no volume creation)

2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [req-ada8fb6a-9153-4c4f-a44f-d3e438ed5b9d 87ea33bede84421a94de0c9bb7bb49bb d554493bcaea40d5a16a3a7e7d14e590 - - -] [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] Instance failed to spawn
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] Traceback (most recent call last):
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2083, in _build_resources
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] yield resources
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1924, in _build_and_run_instance
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] block_device_info=block_device_info)
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 317, in spawn
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] self._add_ephemeral(block_device_info, lxd_config, instance)
2017-06-19 07:47:40.528 21769 ERROR nova.compute.manager [instance: 29ce4c98-77eb-4675-9150-1d58e228e143] File "/usr/lib/python2.7/dist-packages/nova/v...

Revision history for this message
Patrizio Bassi (patrizio-bassi) wrote :

as it's not working i changed back to ZFS (and issued a reboot on the hypervisor)

lxd on /lxd type zfs (rw,relatime,xattr,noacl)

root@spare002:/var/log/nova# df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.1G 17M 3.1G 1% /run
/dev/mapper/vgroot-lvroot 549G 11G 511G 3% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
lxd 539G 0 539G 0% /lxd
tmpfs 3.1G 0 3.1G 0% /run/user/1000

2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [req-ecd17988-6aa9-40a9-b1e7-74f3c6493a82 87ea33bede84421a94de0c9bb7bb49bb d554493bcaea40d5a16a3a7e7d14e590 - - -] [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] Instance failed to spawn
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] Traceback (most recent call last):
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2083, in _build_resources
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] yield resources
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1924, in _build_and_run_instance
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] block_device_info=block_device_info)
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 317, in spawn
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] self._add_ephemeral(block_device_info, lxd_config, instance)
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 1070, in _add_ephemeral
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] raise exception.NovaException(reason)
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2] NovaException: Unsupport LXD storage detected. Supported storage drivers are zfs and btrfs.
2017-06-19 09:17:56.348 12306 ERROR nova.compute.manager [instance: 9e345600-d27e-4401-97a5-cc474f7a0aa2]

Revision history for this message
Patrizio Bassi (patrizio-bassi) wrote :

James,

after scratching and trying again (i would changing the storage-backend type after deploy doesn't work), i finally had a xenial lxc running. I could not have trusty too, it fails because of invalid image, i will try to delete from glance, download and deploy again.

Back to xenial, the container is running, i can see it in the hypervisor via lxc list, i can see in nova instances, i can access startup log where i see

1) lots of denied operations on physical devices. of course, it's a containers.
2) eth0 can't get ip address. systemctl raises network service, dhcp occours but on the hypervisor i cannot see anything on lxbdr0 device. it's not bridged.

it's the same reported here: https://askubuntu.com/questions/776632/ubuntu-16-04-lxd-openstack-network

but i can't find a solution.
I even tried to configure the bridge by issuing: dpkg-reconfigure -p medium lxd
and configure an ipv4 only networking there, but no way.

What's missing?

btw i'm creating the instance via Horizon setting no new volume, just a flavor, a net (other VM on other hypervisors are working fine there), a key

Revision history for this message
James Page (james-page) wrote :

lxdbr0 is not used by the nova-lxd driver (it wires things into Neutron, rather than into anything LXD itself controls) so I'd not expect that to fix anything.

Thing that would help debug this include:

   sudo lxc profile show <name>
   sudo ovs-vsctl show
   sudo ip link

That will give me a little visibility into what's not happening.

Revision history for this message
James Page (james-page) wrote :

Re "juju add-relation ceilometer-agent nova-lxd" - we don't have ceilometer metric collection support for LXD yet so that relation is superfluous at the moment.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.