Canonical Juju

neutron-openvswitch charm lxc profile not getting set with correct configuration

Bug #1856832 reported by Yanos Angelopoulos on 2019-12-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Heather Lanigan	Canonical Juju 2.7.3

Bug Description

Using juju version 2.6.9

I got the following error trying to install neutron-openvswitch charm (version train) inside an lxd container:
dpkg: error processing package openvswitch-switch (--configure):
installed openvswitch-switch package post-installation script subprocess returned error exit status 1

Investigating the host machine, there was a "juju-modelname-neutron-openvswitch-269" lxc profile in with config: {},
while it should have been config: {linux.kernel_modules: openvswitch,ip_tables,ip6_tables}
Changing that and rebooting the lxc got past the error.

The question is, why did juju not create the lxc profile with the correct config field.

On another host, the config in the lxc profile had the correct values.

Tags:

Revision history for this message

Richard Harding (rharding) wrote on 2019-12-18:

Thanks for the report. That is troubling. It'd be good if we could get any related log information from the creation of the container and the controller logs for the model that would have handled the provisioning. Thanks!

Changed in juju:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-07:

admin-default-bcb542.log Edit (16.8 KiB, text/plain)

Hi Richard,

We appear to be able to replicate this issue with juju version 2.7.1.

Attached is model log from the controller.

As observed by Yanos, this doesn't occur on other instances of the application in the deployment.

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-07:

lxd logs Edit (26.8 KiB, application/x-tar)

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-07:

juju logs from host Edit (17.9 KiB, application/x-tar)

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-07:

/var/log/juju from the juju controller Edit (2.2 MiB, application/x-tar)

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-02-07:

As a test, I bootstrapped 2.7.1 AWS controller.
I added a machine, and deployed a test charm cs:~juju-qa/bionic/lxd-profile to a LXD container on that new machine.
It all looked ok:

$ sudo lxc profile list
+---------------------+---------+
| NAME | USED BY |
+---------------------+---------+
| default | 0 |
+---------------------+---------+
| juju-default-test-0 | 0 |
+---------------------+---------+

$ sudo lxc profile show juju-default-test-0
config:
  environment.http_proxy: ""
  linux.kernel_modules: openvswitch,nbd,ip_tables,ip6_tables
  security.nesting: "true"
  security.privileged: "true"
description: lxd profile for testing, black list items grouped commented out
devices:
  bdisk:
    source: /dev/loop0
    type: unix-block
  gpu:
    type: gpu
  sony:
    productid: 51da
    type: usb
    vendorid: 0fce
  tun:
    path: /dev/net/tun
    type: unix-char
name: juju-default-test-0
used_by:
- /1.0/containers/juju-b95fdf-0-lxd-0

Changed in juju:
milestone:	none → 2.7.3

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-07:

bundle being deployed Edit (4.1 KiB, text/plain)

N.B. the issue doesn't occur on every unit on the charm. Rather in each reproduction of the issue thus far it has happened on `juju-machine-3`. The charm is being deployed as part of a bundle like attached.

Revision history for this message

Simon Richardson (simonrichardson) wrote on 2020-02-07:

Looking at the logs, juju does know about the profile (see below). I'm currently attempting to replicate it.

```
2020-02-06 21:40:48 DEBUG juju.core.cache lxdprofilewatcher.go:246 end of unit change map[string]cache.appInfo{"neutron-openvswitch":cache.appInfo{charmURL:"cs:neutron-openvswitch-271", charmProfile:lxdprofile.Profile{Config:map[string]string{"linux.kernel_modules":"openvswitch,ip_tables,ip6_tables"}, Description:"", Devices:map[string]map[string]string{}}, units:set.Strings{"neutron-openvswitch/3":true}}, "nova-compute":cache.appInfo{charmURL:"cs:nova-compute-311", charmProfile:lxdprofile.Profile{Config:map[string]string(nil), Description:"", Devices:map[string]map[string]string(nil)}, units:set.Strings{"nova-compute/3":true}}}
```

Revision history for this message

Simon Richardson (simonrichardson) wrote on 2020-02-07:

So I've tested a couple scenarios trying to replicate this (see below), but I can't get it to fail.

Scenarios:

- Testing with just deploying to lxd with lxd-profiles[1]
- Testing with just deploying to lxd with lxd-profiles within a bundle[2]
- Testing with deploying to lxd with subordinates[3]

Repeated all of the above with aws and didn't have an issue.

So from the logs we can see there is an issue with openvswitch-3 (see below), but working out why the profile didn't apply is strange.

unit-neutron-openvswitch-3 2020-02-06 21:40:26 DEBUG unit.neutron-openvswitch/3.install runner.go:360 A dependency job for openvswitch-switch.service failed. See 'journalctl -xe' for details.

Can we get the syslog/lxd logs from machine 3?
Is the hardware the same for all machines or is there any differences?

---------

1. charm cs:~juju-qa/bionic/lxd-profile
2. https://github.com/juju/juju/blob/develop/acceptancetests/repository/bundles-lxd-profile.yaml
3. https://paste.ubuntu.com/p/xdZnGS7TS8/

Revision history for this message

Simon Richardson (simonrichardson) wrote on 2020-02-07:

#10

I also tried editing your bundle to work on AWS and that also worked.

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-10:

#11

syslog machine 3 Edit (173.3 KiB, text/html)

Simon, lxd logs from machine 3 are on #3; from machine 3 attached.

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-11:

#12

I can replicate the issue with the bundle from https://bugs.launchpad.net/juju/+bug/1856832/comments/7 on a MAAS cloud and an Openstack cloud.

- On the MAAS cloud the problem occurred on 3/4 neutron-openvswitch units
- On the Openstack cloud the problem occured on 1/4 neutron-openvswitch units.

I can't replicate the issue with the juju-qa charm bundle (https://paste.ubuntu.com/p/xdZnGS7TS8/) on either cloud.

My understanding is that
- juju allocates the machine,
- then the lxd container machine, juju will merge this with any other relevant lxd profiles that apply to the applications in the lxd machines and apply them?
- applies the primary charm (nova-compute)
- applies the applies the subordinate charm (neturon-openvswitch), and by virtue of the charm having specified an 'lxd-profile.yaml' in its base, juju will merge this with any other relevant lxd profiles that apply to the applications in the lxd machines and apply them?

Is there any other interaction necessary on the part of the charm?

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-11:

#13

* should read
My understanding is that
- juju allocates the machine,
- then the lxd container machine,
- applies the primary charm (nova-compute),
- applies the subordinate charm (neturon-openvswitch), and by virtue of the charm having specified an 'lxd-profile.yaml' in its base, juju will merge this with any other relevant lxd profiles that apply to the applications in the lxd machines and apply them?

Revision history for this message

Simon Richardson (simonrichardson) wrote on 2020-02-11:

#14

Can we get the following commands when it does fail for the containers?

lxc profile list
lxc config show <container name>
lxc info <container name> --show-log

Revision history for this message

Simon Richardson (simonrichardson) wrote on 2020-02-11:

#15

If you change the tags around in your bundle does the problem still follow machine-3 around or does it move to another machine?

Revision history for this message

Simon Richardson (simonrichardson) wrote on 2020-02-11:

#16

I've added some additional logging and backported a case where we weren't correctly handling an error from a API call. Is there any chance you could test with this (it'll reach snap --channel 2.7/edge once it's landed) or build the binary?

https://github.com/juju/juju/pull/11188

Revision history for this message

Brett Milford (brettmilford) wrote on 2020-02-12:

#17

lxc_output.txt Edit (68.5 KiB, text/plain)

In the MAAS test case it failed on machines 1-4, in the Openstack test case it failed on machines 0 and 2, so the issue doesn't appear to be specific to machine-3 anymore.

I've attached the requested output from the 4 failed machines in the MAAS test case.

I can test it again when the pr lands in the edge snap.

Revision history for this message

Heather Lanigan (hmlanigan) wrote on 2020-02-12:

#18

I've been able to reproduce this myself on LXD. I have a 50% fail rate on the neutron-openvswitch profiles being empty. Working on resolution.

Nick Niehoff (nniehoff) on 2020-02-13

tags:

added: sts

Revision history for this message

Heather Lanigan (hmlanigan) wrote on 2020-02-14:

#19

PR for 2.7: https://github.com/juju/juju/pull/11093

Changed in juju:
assignee:	nobody → Heather Lanigan (hmlanigan)
status:	Triaged → In Progress

Heather Lanigan (hmlanigan) on 2020-02-14

Changed in juju:
status:	In Progress → Fix Committed

Canonical Juju QA Bot (juju-qa-bot) on 2020-02-27

Changed in juju:
status:	Fix Committed → Fix Released

Revision history for this message

Nikolay Vinogradov (nikolay.vinogradov) wrote on 2020-03-13:

#20

Also hit that. octavia-diskimage-retrofit charm was deployed with kvm acceleration disabled, so it run non-accelerated and it took qemu 3hours (instead of just 10 min) to prepare an amphora image.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.