Comment 48 for bug 1906280

Revision history for this message
Michael Skalka (mskalka) wrote :

We are still seeing this issue using the -next version of the ovn-chassis charm, as seen during this test run for the charm release: https://solutions.qa.canonical.com/testruns/testRun/23d8528d-2931-4be6-a0d1-bad21e3d75a5

Artifacts can be found here: https://oil-jenkins.canonical.com/artifacts/23d8528d-2931-4be6-a0d1-bad21e3d75a5/index.html

And specifically the openstack crashdump here: https://oil-jenkins.canonical.com/artifacts/23d8528d-2931-4be6-a0d1-bad21e3d75a5/generated/generated/openstack/juju-crashdump-openstack-2021-01-27-18.32.08.tar.gz

Symptoms are the same, ovn-chassis units stay blocked:

ubuntu@production-cpe-23d8528d-2931-4be6-a0d1-bad21e3d75a5:~$ juju status octavia-ovn-chassis
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud/default 2.8.7 unsupported 18:29:58Z

App Version Status Scale Charm Store Rev OS Notes
hacluster-octavia active 0 hacluster jujucharms 161 ubuntu
logrotated active 0 logrotated jujucharms 2 ubuntu
octavia 6.1.0 blocked 3 octavia jujucharms 90 ubuntu
octavia-ovn-chassis 20.03.1 waiting 3 ovn-chassis jujucharms 49 ubuntu
public-policy-routing active 0 advanced-routing jujucharms 3 ubuntu

Unit Workload Agent Machine Public address Ports Message
octavia/0* blocked idle 1/lxd/8 10.244.40.229 9876/tcp Awaiting end-user execution of `configure-resources` action to create required resources
  hacluster-octavia/0* active idle 10.244.40.229 Unit is ready and clustered
  logrotated/62 active idle 10.244.40.229 Unit is ready.
  octavia-ovn-chassis/0* waiting executing 10.244.40.229 'ovsdb' incomplete
  public-policy-routing/44 active idle 10.244.40.229 Unit is ready
octavia/1 blocked idle 3/lxd/8 10.244.40.244 9876/tcp Awaiting leader to create required resources
  hacluster-octavia/1 active idle 10.244.40.244 Unit is ready and clustered
  logrotated/63 active idle 10.244.40.244 Unit is ready.
  octavia-ovn-chassis/1 waiting executing 10.244.40.244 'ovsdb' incomplete
  public-policy-routing/45 active idle 10.244.40.244 Unit is ready
octavia/2 blocked idle 5/lxd/8 10.244.40.250 9876/tcp Awaiting leader to create required resources
  hacluster-octavia/2 active idle 10.244.40.250 Unit is ready and clustered
  logrotated/64 active idle 10.244.40.250 Unit is ready.
  octavia-ovn-chassis/2 waiting executing 10.244.40.250 'ovsdb' incomplete
  public-policy-routing/46 active idle 10.244.40.250 Unit is ready

Machine State DNS Inst id Series AZ Message
1 started 10.244.41.35 armaldo bionic zone1 Deployed
1/lxd/8 started 10.244.40.229 juju-15ff71-1-lxd-8 bionic zone1 Container started
3 started 10.244.41.17 spearow bionic zone2 Deployed
3/lxd/8 started 10.244.40.244 juju-15ff71-3-lxd-8 bionic zone2 Container started
5 started 10.244.41.18 beartic bionic zone3 Deployed
5/lxd/8 started 10.244.40.250 juju-15ff71-5-lxd-8 bionic zone3 Container started

Confirmed that the --no-mlockall flag was present in /etc/default/openvswitch-switch:

# This is a POSIX shell fragment -*- sh -*-
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Configuration managed by neutron-openvswitch charm
# Service restart triggered by remote application:
#
###############################################################################
OVS_CTL_OPTS='--no-mlockall'

Then at the request of Billy Olsen I restarted the ovs-vswitchd service on one of the units which caused it to go back to ready:

root@juju-15ff71-5-lxd-8:~# service ovs-vswitchd restart
root@juju-15ff71-5-lxd-8:~# service ovs-vswitchd status
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: enabled)
   Active: active (running) since Wed 2021-01-27 18:33:34 UTC; 7s ago
  Process: 70258 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS)
  Process: 79634 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random start $OVS_CTL_OPTS (code=exited, status=0/SUCCESS)
    Tasks: 22 (limit: 314572)
   CGroup: /system.slice/ovs-vswitchd.service
           └─79674 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vs

Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Starting Open vSwitch Forwarding Unit...
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: nice: cannot set niceness: Permission denied
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: * Starting ovs-vswitchd
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-vsctl[79702]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=juju-15ff71-5-lxd-8.production.solutionsqa
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: * Enabling remote OVSDB managers
Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Started Open vSwitch Forwarding Unit.

Before being booted out of the system due to our CI cleaning up the run.