[bionic-stein] openvswitch kernel module was not loaded prior to a container startup which lead to an error

Bug #1876849 reported by John George
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan
OpenStack Neutron Open vSwitch Charm
Invalid
Undecided
Unassigned

Bug Description

neutron-openvswitch install hook post-installation script subprocess returned error

juju unit log has:

2020-05-05 00:26:34 DEBUG install invoke-rc.d: initscript openvswitch-switch, action "start" failed.
2020-05-05 00:26:35 DEBUG install ● openvswitch-switch.service - Open vSwitch
2020-05-05 00:26:35 DEBUG install Loaded: loaded (/lib/systemd/system/openvswitch-switch.service; enabled; vendor preset: enabled)
2020-05-05 00:26:35 DEBUG install Active: inactive (dead)
2020-05-05 00:26:35 DEBUG install
2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8 systemd[1]: Dependency failed for Open vSwitch.
2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8 systemd[1]: openvswitch-switch.service: Job openvswitch-switch.service/start failed with result 'dependency'.
2020-05-05 00:26:35 DEBUG install May 05 00:26:07 juju-8d5b8e-15-lxd-8 systemd[1]: Dependency failed for Open vSwitch.
2020-05-05 00:26:35 DEBUG install May 05 00:26:07 juju-8d5b8e-15-lxd-8 systemd[1]: openvswitch-switch.service: Job openvswitch-switch.service/start failed with result 'dependency'.
2020-05-05 00:26:35 DEBUG install May 05 00:26:20 juju-8d5b8e-15-lxd-8 systemd[1]: Dependency failed for Open vSwitch.
2020-05-05 00:26:35 DEBUG install May 05 00:26:20 juju-8d5b8e-15-lxd-8 systemd[1]: openvswitch-switch.service: Job openvswitch-switch.service/start failed with result 'dependency'.
2020-05-05 00:26:35 DEBUG install May 05 00:26:34 juju-8d5b8e-15-lxd-8 systemd[1]: Dependency failed for Open vSwitch.
2020-05-05 00:26:35 DEBUG install May 05 00:26:34 juju-8d5b8e-15-lxd-8 systemd[1]: openvswitch-switch.service: Job openvswitch-switch.service/start failed with result 'dependency'.
2020-05-05 00:26:35 DEBUG install dpkg: error processing package openvswitch-switch (--configure):
2020-05-05 00:26:35 DEBUG install installed openvswitch-switch package post-installation script subprocess returned error exit status 1
2020-05-05 00:26:35 DEBUG install dpkg: dependency problems prevent configuration of neutron-openvswitch-agent:
2020-05-05 00:26:35 DEBUG install neutron-openvswitch-agent depends on openvswitch-switch; however:
2020-05-05 00:26:35 DEBUG install Package openvswitch-switch is not configured yet.

vi journalctl -xe shows:

-- Unit ovs-vswitchd.service has begun starting up.
May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: modprobe: FATAL: Module openvswitch not found in directory /lib/modules/4.15.0-99-generic
May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: * Inserting openvswitch module
May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: rmmod: ERROR: ../libkmod/libkmod-module.c:793 kmod_module_remove_module() could not remove 'bridge': Function not implemented
May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: rmmod: ERROR: could not remove module bridge: Function not implemented
May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: * removing bridge module
May 05 04:02:48 juju-8d5b8e-15-lxd-8 systemd[1]: ovs-vswitchd.service: Control process exited, code=exited status=1
May 05 04:02:48 juju-8d5b8e-15-lxd-8 systemd[1]: ovs-vswitchd.service: Failed with result 'exit-code'.
May 05 04:02:48 juju-8d5b8e-15-lxd-8 systemd[1]: Failed to start Open vSwitch Forwarding Unit.

Please see the juju crashdump available under:https://oil-jenkins.canonical.com/artifacts/dbc75568-fd66-4274-a9a0-06faaef210b9/index.html

Tags: cdo-qa
summary: - neutron-openvswitch install hook post-installation script subprocess
- returned error
+ [bionic-stein] neutron-openvswitch install hook post-installation script
+ subprocess returned error
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote : Re: [bionic-stein] neutron-openvswitch install hook post-installation script subprocess returned error

Thanks John, this doesn't happen on our CI system with this bundle: https://github.com/openstack/charm-neutron-openvswitch/blob/master/tests/bundles/bionic-stein-dvr-snat.yaml

Do you have a bundle or do you remember what juju config options you were using?

Revision history for this message
Michael Skalka (mskalka) wrote :

Hit this again on this run: https://solutions.qa.canonical.com/#/qa/testRun/0e498f36-1078-49f2-b88d-43ad555bb294

Bundle, crashdup, and other artifacts from the run can be found at a link at the bottom of the test run page.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

By the looks of it, the ovs-ctl tool is trying to load the openvswitch module

May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: modprobe: FATAL: Module openvswitch not found in directory /lib/modules/4.15.0-99-generic
May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: * Inserting openvswitch module
May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: rmmod: ERROR: ../libkmod/libkmod-module.c:793 kmod_module_remove_module() could not remove 'bridge': Function not implemented

inside a LXD container for a charm-octavia unit:

23:53:10 DEBUG juju.cmd.juju.application bundle.go:857 created 15/lxd/8 container in machine 15 for holding octavia unit

ovs-ctl should not try to do it in a container because this is not allowed and the expectation is that the necessary module will be pre-loaded prior to the container startup:

https://git.launchpad.net/~ubuntu-server-dev/ubuntu/+source/openvswitch/tree/utilities/ovs-ctl.in?h=ubuntu/disco#n192
https://git.launchpad.net/~ubuntu-server-dev/ubuntu/+source/openvswitch/tree/utilities/ovs-kmod-ctl.in?h=ubuntu/disco#n43
    # If openvswitch is already loaded then we're done.
    test -e /sys/module/openvswitch && return 0

The fact that it tried to continue and attempt to unload the bridge module shows that the openvswitch module was not loaded before the container got started.

Octavia itself doesn't contain a LXD profile for Juju to apply but neutron-openvswitch (subordinate to Octavia) does:

https://opendev.org/openstack/charm-neutron-openvswitch/src/commit/677a31b95ecc261446b92fd2608c34f08061d6aa/lxd-profile.yaml
config:
  linux.kernel_modules: openvswitch,ip_tables,ip6_tables

Judging by machine-15.log, Juju noticed the profile for neutron-openvswitch at least:

2020-05-05 00:11:17 INFO juju.container.lxd container.go:220 starting new container "juju-8d5b8e-15-lxd-8" (image "ubuntu-18.04-server-cloudimg-amd64-lxd.tar.xz")

2020-05-05 00:21:56 INFO juju.worker.instancemutater mutater.go:237 machine-15/lxd/8 (juju-8d5b8e-15-lxd-8) assign lxd profiles ["default" "juju-openstack-neutron-openvswitch-274"], []lxdprofile.ProfilePost{lxdprofile.ProfilePost{Name:"juju-openstack-neutron-openvswitch-274", Profile:(*lxdprofile.Profile)(0xc000428a98)}}

openvswitch kernel module prints "Open vSwitch switching datapath" when it is loaded to dmesg (but this doesn't get saved into kern.log or syslog)
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/net/openvswitch/datapath.c?h=Ubuntu-4.15.0-99.100#n2408

systemd saves messages sent to dmesg so the message should be in system.journal but it isn't:

/15/baremetal/var/log/journal/2248c676df704297bea3a3a7c95cc3de/system.journal

journalctl --file system.journal --utc -k | grep openvswitch ; echo $?
1

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Comparing to machine 13 which also has an octavia unit:

- add unit octavia/1 to 13/lxd/7

The openvswitch module was loaded much earlier than the container got started (likely a different neutron-openvswitch on the machine 13 itself triggered it to be loaded):

journalctl --file system.journal --utc -k | grep openvswitch
мая 05 00:19:15 azurill kernel: openvswitch: Open vSwitch switching datapath

2020-05-05 00:30:42 INFO juju.worker.provisioner provisioner_task.go:1220 started machine 13/lxd/7 as instance juju-8d5b8e-13-lxd-7 with hardware "availability-zone=zone1", network config

2020-05-05 00:42:30 INFO juju.worker.instancemutater mutater.go:237 machine-13/lxd/7 (juju-8d5b8e-13-lxd-7) assign lxd profiles ["default" "juju-openstack-neutron-openvswitch-274"], []lxdprofile.ProfilePost{lxdprofile.ProfilePost{Name:"juju-openstack-neutron-openvswitch-274", Profile:(*lxdprofile.Profile)(0xc000220618)}}

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Is this not a resurface of juju bug 1856832 ?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Juju 2.7.6 was used for the SQA test runs:
https://oil-jenkins.canonical.com/artifacts/dbc75568-fd66-4274-a9a0-06faaef210b9/versions.yaml

But it looks very similar to bug 1856832.

summary: - [bionic-stein] neutron-openvswitch install hook post-installation script
- subprocess returned error
+ [bionic-stein] openvswitch kernel module was not loaded prior to a
+ container startup which lead to an error
Changed in charm-neutron-openvswitch:
status: New → Incomplete
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Added Juju as well because it doesn't look like there's anything wrong from the charm or maintainer script perspective.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

I cannot reproduce 1856832 in 2.7.6, nor in 2.8-rc1, with the bundle from the bug: https://bugs.launchpad.net/juju/+bug/1856832/comments/7, nor with the qa steps from the fix. The fix is still in place.

With the logs above, there is no verification the actual contents of the juju-openstack-neutron-openvswitch-274 profile, only what juju thinks it is.

Are there any reproducers outside of the above solutions-qa runs?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I tried to reproduce it on ServerStack but couldn't reach the same state.

Are there any guarantees that the LXD profile will be applied/updated earlier than the subordinate for which it is needed will start and process events?

Revision history for this message
Tim Penhey (thumper) wrote : Re: [Bug 1876849] Re: [bionic-stein] openvswitch kernel module was not loaded prior to a container startup which lead to an error

I would say no.

The subordinate unit isn't created until the principal unit enters the
relation scope. This then triggers the deployer on the machine to lay down
the charm and start the service.

At the same time the LXD Profile watcher will fire because of the new unit.
So I think you may have found the race we need to solve.

I don't want to posit solutions on the bug, but will discuss with the team.

On Tue, May 19, 2020 at 8:55 AM Dmitrii Shcherbakov <
<email address hidden>> wrote:

> I tried to reproduce it on ServerStack but couldn't reach the same
> state.
>
> Are there any guarantees that the LXD profile will be applied/updated
> earlier than the subordinate for which it is needed will start and
> process events?
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> Matching subscriptions: Juju bugs
> https://bugs.launchpad.net/bugs/1876849
>
> Title:
> [bionic-stein] openvswitch kernel module was not loaded prior to a
> container startup which lead to an error
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1876849/+subscriptions
>

Revision history for this message
Pen Gale (pengale) wrote :

Notes from some discussion with the team:

The profile watcher fires asynchronously by design. This is because there are many circumstances where we might need to add or remove profiles over the life cycle of a deployment, and finding and rolling the profiler watcher logic into each of those circumstances is non trivial and bug prone.

Most of the time, the asynchronous nature of the watcher doesn't cause problems -- an install hook might initially fail, for example, but then succeed on retry, which is standard Juju behavior (charms can be written to handle this more or less elegantly, but the default Juju behavior is to fail, move on, and make it work next time).

There are three things that might be going on here:

1) This might be a request for a redesign of the way profiles get applied. Per the above, this is a non trivial task, which may work less well than the current solution.

2) The profile may not have been applied correctly, due to a bug in lxd (if the profile had simply failed to apply due to system incompatibility, the unit would have wound up in an error state).

3) The profile may not have been applied correctly, due to a Juju bug.

Regardless, to troubleshoot, it would be extremely helpful to have a machine eye view of what profiles have actually been applied to the containers. You can get that with:

    ```
    lxd profile list
    lxd profile show <juju-...> # repeat for all of the juju-* containers
    ```

That would help us understand the root cause of the issue. Is this behavior on the charm level, which could be fixed by rewriting the charm, or changing how Juju handles profile writing (again, with the cautions from above), or is this a bug in the actual writing of the profile?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (6.0 KiB)

That sounds like a great thing for juju to log - lxc profiles of the
containers it's creating.

On Thu, May 21, 2020 at 10:50 AM Pete Vander Giessen <
<email address hidden>> wrote:

> Notes from some discussion with the team:
>
> The profile watcher fires asynchronously by design. This is because
> there are many circumstances where we might need to add or remove
> profiles over the life cycle of a deployment, and finding and rolling
> the profiler watcher logic into each of those circumstances is non
> trivial and bug prone.
>
> Most of the time, the asynchronous nature of the watcher doesn't cause
> problems -- an install hook might initially fail, for example, but then
> succeed on retry, which is standard Juju behavior (charms can be written
> to handle this more or less elegantly, but the default Juju behavior is
> to fail, move on, and make it work next time).
>
> There are three things that might be going on here:
>
> 1) This might be a request for a redesign of the way profiles get
> applied. Per the above, this is a non trivial task, which may work less
> well than the current solution.
>
> 2) The profile may not have been applied correctly, due to a bug in lxd
> (if the profile had simply failed to apply due to system
> incompatibility, the unit would have wound up in an error state).
>
> 3) The profile may not have been applied correctly, due to a Juju bug.
>
> Regardless, to troubleshoot, it would be extremely helpful to have a
> machine eye view of what profiles have actually been applied to the
> containers. You can get that with:
>
> ```
> lxd profile list
> lxd profile show <juju-...> # repeat for all of the juju-* containers
> ```
>
> That would help us understand the root cause of the issue. Is this
> behavior on the charm level, which could be fixed by rewriting the
> charm, or changing how Juju handles profile writing (again, with the
> cautions from above), or is this a bug in the actual writing of the
> profile?
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1876849
>
> Title:
> [bionic-stein] openvswitch kernel module was not loaded prior to a
> container startup which lead to an error
>
> Status in OpenStack neutron-openvswitch charm:
> Incomplete
> Status in juju:
> New
>
> Bug description:
> neutron-openvswitch install hook post-installation script subprocess
> returned error
>
> juju unit log has:
>
> 2020-05-05 00:26:34 DEBUG install invoke-rc.d: initscript
> openvswitch-switch, action "start" failed.
> 2020-05-05 00:26:35 DEBUG install ● openvswitch-switch.service - Open
> vSwitch
> 2020-05-05 00:26:35 DEBUG install Loaded: loaded
> (/lib/systemd/system/openvswitch-switch.service; enabled; vendor preset:
> enabled)
> 2020-05-05 00:26:35 DEBUG install Active: inactive (dead)
> 2020-05-05 00:26:35 DEBUG install
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.servi...

Read more...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

juju team, this is sub'd to field high, please have a look!

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Also, to be clear, we don't use retries in our CI. So, if the charms have to handle the asynchronous nature of the profile watcher in order to prevent errors from bubbling up, this seems to be a charm issue.

Revision history for this message
James Page (james-page) wrote :

I tend to disagree - I don't believe there is a specific hook execution so that the charm can catch this event - feels like Juju needs to make some baseline assurances to the charm unit before its install hook gets fired.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

In my view, making charms responsible for waiting on profile changes would over-complicate them and would require charm authors to do repetitive work for every such case.

I think the guarantees that charm authors need to have are:

* no hooks fire for a unit before the profile of its application is applied to the unit's container;
* upgrade-charm does not fire before the profile is applied.

I understand that it's difficult to coordinate this across agents on different machines.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (4.9 KiB)

That seems totally reasonable to me. I meant to say that if juju doesn't
make the guarantee of the lxc profiles being applied prior to the hooks
executing, it's a charm issue. It sounds like this needs to be discussed
between the Openstack and Juju teams.

On Mon, Jun 1, 2020 at 10:15 AM Dmitrii Shcherbakov <
<email address hidden>> wrote:

> In my view, making charms responsible for waiting on profile changes
> would over-complicate them and would require charm authors to do
> repetitive work for every such case.
>
> I think the guarantees that charm authors need to have are:
>
> * no hooks fire for a unit before the profile of its application is
> applied to the unit's container;
> * upgrade-charm does not fire before the profile is applied.
>
> I understand that it's difficult to coordinate this across agents on
> different machines.
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1876849
>
> Title:
> [bionic-stein] openvswitch kernel module was not loaded prior to a
> container startup which lead to an error
>
> Status in OpenStack neutron-openvswitch charm:
> Incomplete
> Status in juju:
> New
>
> Bug description:
> neutron-openvswitch install hook post-installation script subprocess
> returned error
>
> juju unit log has:
>
> 2020-05-05 00:26:34 DEBUG install invoke-rc.d: initscript
> openvswitch-switch, action "start" failed.
> 2020-05-05 00:26:35 DEBUG install ● openvswitch-switch.service - Open
> vSwitch
> 2020-05-05 00:26:35 DEBUG install Loaded: loaded
> (/lib/systemd/system/openvswitch-switch.service; enabled; vendor preset:
> enabled)
> 2020-05-05 00:26:35 DEBUG install Active: inactive (dead)
> 2020-05-05 00:26:35 DEBUG install
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:07 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:07 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:20 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:20 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:34 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:34 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install dpkg: error processing package
> openvswitch-switch (--configure):
> 2020-05-05...

Read more...

Revision history for this message
Pen Gale (pengale) wrote :

Thanks for the feedback, @dmitriis. I agree that this puts a burden on the charm author, and it would be nicer to have Juju abstract it away.

I don't think that the field high label is appropriate, however. In the field, timing issues with the profile being applied should get resolved via a retry. This issue seems to be uniquely damaging in a testing environment with retry turned off, while testing a charm that was written with incorrect assumptions about the way that the profile watcher currently works.

I think that the next steps here are to treat this like a feature request, and add it to the list of things in consideration for the roadmap. The scope of the feature is to redesign the profile watcher so that it can behave more linearly. This is a non trivial chunk of work, involving an approach which was initially considered and then discarded by the Juju team, which merits a more complete discussion of the tradeoffs involved.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (5.9 KiB)

Hi Pete,
On Mon, Jun 1, 2020 at 1:01 PM Pete Vander Giessen <
<email address hidden>> wrote:

> Thanks for the feedback, @dmitriis. I agree that this puts a burden on
> the charm author, and it would be nicer to have Juju abstract it away.
>
> I don't think that the field high label is appropriate, however. In the
> field, timing issues with the profile being applied should get resolved
> via a retry. This issue seems to be uniquely damaging in a testing
> environment with retry turned off, while testing a charm that was
> written with incorrect assumptions about the way that the profile
> watcher currently works.
>

The Solutions QA testing environment is covered explicitly by the SLA.
Here's part of the criteria for Field Critical:

"Stable release Solutions QA Foundation Cloud build blocked, no workaround
exists"

Solutions QA does not run with retry on, and we haven't for a couple of
years now; we want to find as many race conditions and issues in charms as
possible and using retries will cover that up. Relying on retries can mask
other issues and can lead to increased deployment time. We've got agreement
with this approach from the Openstack Engineering team, which I believe
also runs their CI without retries.

AFAIK there is no way to turn on retries for a single application, so
turning them on would affect our testing of all the other charms in this
model, which is unacceptable.

Thanks,
Jason

> I think that the next steps here are to treat this like a feature
> request, and add it to the list of things in consideration for the
> roadmap. The scope of the feature is to redesign the profile watcher so
> that it can behave more linearly. This is a non trivial chunk of work,
> involving an approach which was initially considered and then discarded
> by the Juju team, which merits a more complete discussion of the
> tradeoffs involved.
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1876849
>
> Title:
> [bionic-stein] openvswitch kernel module was not loaded prior to a
> container startup which lead to an error
>
> Status in OpenStack neutron-openvswitch charm:
> Incomplete
> Status in juju:
> New
>
> Bug description:
> neutron-openvswitch install hook post-installation script subprocess
> returned error
>
> juju unit log has:
>
> 2020-05-05 00:26:34 DEBUG install invoke-rc.d: initscript
> openvswitch-switch, action "start" failed.
> 2020-05-05 00:26:35 DEBUG install ● openvswitch-switch.service - Open
> vSwitch
> 2020-05-05 00:26:35 DEBUG install Loaded: loaded
> (/lib/systemd/system/openvswitch-switch.service; enabled; vendor preset:
> enabled)
> 2020-05-05 00:26:35 DEBUG install Active: inactive (dead)
> 2020-05-05 00:26:35 DEBUG install
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG instal...

Read more...

Revision history for this message
Ian Booth (wallyworld) wrote :

As per comment 16, if the contract between juju and the charm is that Juju is expected to fully set up the machine on which the charm runs before starting hook execution, then it's on Juju to coordinate that (if subordinate charm deployments currently behave differently to principal charms in this regard, then IMO that's also an issue). If the required machine set up includes the LXD profile, just the same as mem or other requirements, then IMO until that's done, the charm unit agent should not start running hooks, and if there's an issue applying the profile the agent should report an error or go to blocked until the profile is applied. So to me this is on Juju to get right.

Revision history for this message
Tim Penhey (thumper) wrote :

I don't think Juju could guarantee that the host container has the profile
fully in place before a charm starts, because it is always possible to add
the machine first and then use a placement directive to put the unit there.
In fact, many bundles do exactly this. However, there should be a hook that
we are able to fire that indicates that the environment has changed.

Currently the config-changed hook is fired when config doesn't change to
have charms deal with address changes. There is an outstanding request to
fire a hook when proxy information changes. I feel that this fits into that
area.

This is sitting very much in that grey area between a bug and a feature
request.

On Tue, Jun 2, 2020 at 8:50 AM Ian Booth <email address hidden> wrote:

> As per comment 16, if the contract between juju and the charm is that
> Juju is expected to fully set up the machine on which the charm runs
> before starting hook execution, then it's on Juju to coordinate that (if
> subordinate charm deployments currently behave differently to principal
> charms in this regard, then IMO that's also an issue). If the required
> machine set up includes the LXD profile, just the same as mem or other
> requirements, then IMO until that's done, the charm unit agent should
> not start running hooks, and if there's an issue applying the profile
> the agent should report an error or go to blocked until the profile is
> applied. So to me this is on Juju to get right.
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> Matching subscriptions: Juju bugs
> https://bugs.launchpad.net/bugs/1876849
>
> Title:
> [bionic-stein] openvswitch kernel module was not loaded prior to a
> container startup which lead to an error
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1876849/+subscriptions
>

Tim Penhey (thumper)
Changed in juju:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Download full text (4.3 KiB)

Juju team, can you provide an estimated time to fix for this?

On Mon, Jun 1, 2020 at 11:45 PM Tim Penhey <email address hidden>
wrote:

> ** Changed in: juju
> Status: New => Triaged
>
> ** Changed in: juju
> Importance: Undecided => High
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1876849
>
> Title:
> [bionic-stein] openvswitch kernel module was not loaded prior to a
> container startup which lead to an error
>
> Status in OpenStack neutron-openvswitch charm:
> Incomplete
> Status in juju:
> Triaged
>
> Bug description:
> neutron-openvswitch install hook post-installation script subprocess
> returned error
>
> juju unit log has:
>
> 2020-05-05 00:26:34 DEBUG install invoke-rc.d: initscript
> openvswitch-switch, action "start" failed.
> 2020-05-05 00:26:35 DEBUG install ● openvswitch-switch.service - Open
> vSwitch
> 2020-05-05 00:26:35 DEBUG install Loaded: loaded
> (/lib/systemd/system/openvswitch-switch.service; enabled; vendor preset:
> enabled)
> 2020-05-05 00:26:35 DEBUG install Active: inactive (dead)
> 2020-05-05 00:26:35 DEBUG install
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:25:46 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:07 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:07 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:20 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:20 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:34 juju-8d5b8e-15-lxd-8
> systemd[1]: Dependency failed for Open vSwitch.
> 2020-05-05 00:26:35 DEBUG install May 05 00:26:34 juju-8d5b8e-15-lxd-8
> systemd[1]: openvswitch-switch.service: Job
> openvswitch-switch.service/start failed with result 'dependency'.
> 2020-05-05 00:26:35 DEBUG install dpkg: error processing package
> openvswitch-switch (--configure):
> 2020-05-05 00:26:35 DEBUG install installed openvswitch-switch package
> post-installation script subprocess returned error exit status 1
> 2020-05-05 00:26:35 DEBUG install dpkg: dependency problems prevent
> configuration of neutron-openvswitch-agent:
> 2020-05-05 00:26:35 DEBUG install neutron-openvswitch-agent depends on
> openvswitch-switch; however:
> 2020-05-05 00:26:35 DEBUG install Package openvswitch-switch is not
> configured yet.
>
> vi journalctl -xe shows:
>
> -- Unit ovs-vswitchd.service has begun starting up.
> May 05 04:02:48 juju-8d5b8e-15-lxd-8 ovs-ctl[251118]: ...

Read more...

Revision history for this message
Michael Skalka (mskalka) wrote :

In lieu of a Juju fix, would it be possible to use the lxc profile from the neutron-openvswitch charm in the octavia charm. That way we guarantee that before the octavia charm's install hook at the required permissions are applied?

Michael Skalka (mskalka)
Changed in charm-neutron-openvswitch:
status: Incomplete → New
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Responding to @mskalka's question in #23, with my pragmatic hat on, with a strong comment and bug link back to juju, I think we could add that to Octavia as a workaround.

Do bare in mind that we have been using this Juju feature from subordinates since the genesis of the feature in Juju, but it would be fair to say that it only recently have gone into our regular on-metal test runs, which I guess is why it has come to the surface as a recurring and blocking problem now.

At this moment the Octavia charm assumes OVS or OVN as transport between the Octavia worker and its Amphorae, and as such we could work around the issue by adding the common kernel module dependencies in the Octavia charm LXD profile.

At some point we will have a plugin story for other SDNs for Octavia just as we have with the Neutron charms (we already have had out of band conversations with third party SDN vendors).

For the plugin story to become a reality the subordinate dictating the LXD profile must work reliably.

What I'll do is put up a PoC for review and make a built charm artifact available for testing in my personal namespace as soon as we have initial smoke test results back.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Please help test if the workaround provided in cs:~fnordahl/octavia-lp1876849-0 alleviates the problem.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

I've added some logging to juju to confirm the contents of the lxd profile written. It's now available in the 2.8/edge snap.

To use, add "juju.container.lxd=DEBUG;juju.provider.lxd=DEBUG" to the juju model-config logging-config.

It'd be extremely helpful to that this output when this issue is seen again. Thank you.

Changed in juju:
milestone: none → 2.8.1
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

https://github.com/juju/juju/pull/11701 : Charm hooks will not be run until an lxd profile is applied, if required by the charm and possible with the machine.

Changed in juju:
assignee: nobody → Heather Lanigan (hmlanigan)
status: Triaged → Fix Committed
Revision history for this message
John George (jog) wrote :

Solutions QA ran into this again with the additional logging enabled.
https://solutions.qa.canonical.com/#/qa/testRun/c896f181-32d9-4911-a152-2e9fe9167189

Changed in juju:
status: Fix Committed → Fix Released
James Page (james-page)
Changed in charm-neutron-openvswitch:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.