neutron-openvswitch charm lxc profile not getting set with correct configuration

Bug #1856832 reported by Yanos Angelopoulos
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan

Bug Description

Using juju version 2.6.9

I got the following error trying to install neutron-openvswitch charm (version train) inside an lxd container:
 dpkg: error processing package openvswitch-switch (--configure):
installed openvswitch-switch package post-installation script subprocess returned error exit status 1

Investigating the host machine, there was a "juju-modelname-neutron-openvswitch-269" lxc profile in with config: {},
while it should have been config: {linux.kernel_modules: openvswitch,ip_tables,ip6_tables}
Changing that and rebooting the lxc got past the error.

The question is, why did juju not create the lxc profile with the correct config field.

On another host, the config in the lxc profile had the correct values.

Tags: sts
Revision history for this message
Richard Harding (rharding) wrote :

Thanks for the report. That is troubling. It'd be good if we could get any related log information from the creation of the container and the controller logs for the model that would have handled the provisioning. Thanks!

Changed in juju:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Brett Milford (brettmilford) wrote :

Hi Richard,

We appear to be able to replicate this issue with juju version 2.7.1.

Attached is model log from the controller.

As observed by Yanos, this doesn't occur on other instances of the application in the deployment.

Revision history for this message
Brett Milford (brettmilford) wrote :
Revision history for this message
Brett Milford (brettmilford) wrote :
Revision history for this message
Brett Milford (brettmilford) wrote :
Revision history for this message
Ian Booth (wallyworld) wrote :

As a test, I bootstrapped 2.7.1 AWS controller.
I added a machine, and deployed a test charm cs:~juju-qa/bionic/lxd-profile to a LXD container on that new machine.
It all looked ok:

$ sudo lxc profile list
+---------------------+---------+
| NAME | USED BY |
+---------------------+---------+
| default | 0 |
+---------------------+---------+
| juju-default-test-0 | 0 |
+---------------------+---------+

$ sudo lxc profile show juju-default-test-0
config:
  environment.http_proxy: ""
  linux.kernel_modules: openvswitch,nbd,ip_tables,ip6_tables
  security.nesting: "true"
  security.privileged: "true"
description: lxd profile for testing, black list items grouped commented out
devices:
  bdisk:
    source: /dev/loop0
    type: unix-block
  gpu:
    type: gpu
  sony:
    productid: 51da
    type: usb
    vendorid: 0fce
  tun:
    path: /dev/net/tun
    type: unix-char
name: juju-default-test-0
used_by:
- /1.0/containers/juju-b95fdf-0-lxd-0

Changed in juju:
milestone: none → 2.7.3
Revision history for this message
Brett Milford (brettmilford) wrote :

N.B. the issue doesn't occur on every unit on the charm. Rather in each reproduction of the issue thus far it has happened on `juju-machine-3`. The charm is being deployed as part of a bundle like attached.

Revision history for this message
Simon Richardson (simonrichardson) wrote :

Looking at the logs, juju does know about the profile (see below). I'm currently attempting to replicate it.

```
2020-02-06 21:40:48 DEBUG juju.core.cache lxdprofilewatcher.go:246 end of unit change map[string]cache.appInfo{"neutron-openvswitch":cache.appInfo{charmURL:"cs:neutron-openvswitch-271", charmProfile:lxdprofile.Profile{Config:map[string]string{"linux.kernel_modules":"openvswitch,ip_tables,ip6_tables"}, Description:"", Devices:map[string]map[string]string{}}, units:set.Strings{"neutron-openvswitch/3":true}}, "nova-compute":cache.appInfo{charmURL:"cs:nova-compute-311", charmProfile:lxdprofile.Profile{Config:map[string]string(nil), Description:"", Devices:map[string]map[string]string(nil)}, units:set.Strings{"nova-compute/3":true}}}
```

Revision history for this message
Simon Richardson (simonrichardson) wrote :

So I've tested a couple scenarios trying to replicate this (see below), but I can't get it to fail.

Scenarios:

 - Testing with just deploying to lxd with lxd-profiles[1]
 - Testing with just deploying to lxd with lxd-profiles within a bundle[2]
 - Testing with deploying to lxd with subordinates[3]

Repeated all of the above with aws and didn't have an issue.

-

So from the logs we can see there is an issue with openvswitch-3 (see below), but working out why the profile didn't apply is strange.

unit-neutron-openvswitch-3 2020-02-06 21:40:26 DEBUG unit.neutron-openvswitch/3.install runner.go:360 A dependency job for openvswitch-switch.service failed. See 'journalctl -xe' for details.

-

Can we get the syslog/lxd logs from machine 3?
Is the hardware the same for all machines or is there any differences?

---------

 1. charm cs:~juju-qa/bionic/lxd-profile
 2. https://github.com/juju/juju/blob/develop/acceptancetests/repository/bundles-lxd-profile.yaml
 3. https://paste.ubuntu.com/p/xdZnGS7TS8/

Revision history for this message
Simon Richardson (simonrichardson) wrote :

I also tried editing your bundle to work on AWS and that also worked.

Revision history for this message
Brett Milford (brettmilford) wrote :

Simon, lxd logs from machine 3 are on #3; from machine 3 attached.

Revision history for this message
Brett Milford (brettmilford) wrote :

I can replicate the issue with the bundle from https://bugs.launchpad.net/juju/+bug/1856832/comments/7 on a MAAS cloud and an Openstack cloud.

- On the MAAS cloud the problem occurred on 3/4 neutron-openvswitch units
- On the Openstack cloud the problem occured on 1/4 neutron-openvswitch units.

I can't replicate the issue with the juju-qa charm bundle (https://paste.ubuntu.com/p/xdZnGS7TS8/) on either cloud.

My understanding is that
- juju allocates the machine,
- then the lxd container machine, juju will merge this with any other relevant lxd profiles that apply to the applications in the lxd machines and apply them?
- applies the primary charm (nova-compute)
- applies the applies the subordinate charm (neturon-openvswitch), and by virtue of the charm having specified an 'lxd-profile.yaml' in its base, juju will merge this with any other relevant lxd profiles that apply to the applications in the lxd machines and apply them?

Is there any other interaction necessary on the part of the charm?

Revision history for this message
Brett Milford (brettmilford) wrote :

* should read
My understanding is that
- juju allocates the machine,
- then the lxd container machine,
- applies the primary charm (nova-compute),
- applies the subordinate charm (neturon-openvswitch), and by virtue of the charm having specified an 'lxd-profile.yaml' in its base, juju will merge this with any other relevant lxd profiles that apply to the applications in the lxd machines and apply them?

Revision history for this message
Simon Richardson (simonrichardson) wrote :

Can we get the following commands when it does fail for the containers?

lxc profile list
lxc config show <container name>
lxc info <container name> --show-log

Revision history for this message
Simon Richardson (simonrichardson) wrote :

If you change the tags around in your bundle does the problem still follow machine-3 around or does it move to another machine?

Revision history for this message
Simon Richardson (simonrichardson) wrote :

I've added some additional logging and backported a case where we weren't correctly handling an error from a API call. Is there any chance you could test with this (it'll reach snap --channel 2.7/edge once it's landed) or build the binary?

https://github.com/juju/juju/pull/11188

Revision history for this message
Brett Milford (brettmilford) wrote :

In the MAAS test case it failed on machines 1-4, in the Openstack test case it failed on machines 0 and 2, so the issue doesn't appear to be specific to machine-3 anymore.

I've attached the requested output from the 4 failed machines in the MAAS test case.

I can test it again when the pr lands in the edge snap.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

I've been able to reproduce this myself on LXD. I have a 50% fail rate on the neutron-openvswitch profiles being empty. Working on resolution.

Nick Niehoff (nniehoff)
tags: added: sts
Revision history for this message
Heather Lanigan (hmlanigan) wrote :
Changed in juju:
assignee: nobody → Heather Lanigan (hmlanigan)
status: Triaged → In Progress
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

Also hit that. octavia-diskimage-retrofit charm was deployed with kvm acceleration disabled, so it run non-accelerated and it took qemu 3hours (instead of just 10 min) to prepare an amphora image.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.