charm-ovn-chassis

Bug #1906280
Comment #48

Comment 48 for bug 1906280

Revision history for this message

Michael Skalka (mskalka) wrote on 2021-01-27:

#48

We are still seeing this issue using the -next version of the ovn-chassis charm, as seen during this test run for the charm release: https://solutions.qa.canonical.com/testruns/testRun/23d8528d-2931-4be6-a0d1-bad21e3d75a5

Artifacts can be found here: https://oil-jenkins.canonical.com/artifacts/23d8528d-2931-4be6-a0d1-bad21e3d75a5/index.html

And specifically the openstack crashdump here: https://oil-jenkins.canonical.com/artifacts/23d8528d-2931-4be6-a0d1-bad21e3d75a5/generated/generated/openstack/juju-crashdump-openstack-2021-01-27-18.32.08.tar.gz

Symptoms are the same, ovn-chassis units stay blocked:

ubuntu@production-cpe-23d8528d-2931-4be6-a0d1-bad21e3d75a5:~$ juju status octavia-ovn-chassis
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud/default 2.8.7 unsupported 18:29:58Z

Machine State DNS Inst id Series AZ Message
1 started 10.244.41.35 armaldo bionic zone1 Deployed
1/lxd/8 started 10.244.40.229 juju-15ff71-1-lxd-8 bionic zone1 Container started
3 started 10.244.41.17 spearow bionic zone2 Deployed
3/lxd/8 started 10.244.40.244 juju-15ff71-3-lxd-8 bionic zone2 Container started
5 started 10.244.41.18 beartic bionic zone3 Deployed
5/lxd/8 started 10.244.40.250 juju-15ff71-5-lxd-8 bionic zone3 Container started

Confirmed that the --no-mlockall flag was present in /etc/default/openvswitch-switch:

# This is a POSIX shell fragment -*- sh -*-
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Configuration managed by neutron-openvswitch charm
# Service restart triggered by remote application:
#
###############################################################################
OVS_CTL_OPTS='--no-mlockall'

Then at the request of Billy Olsen I restarted the ovs-vswitchd service on one of the units which caused it to go back to ready:

root@juju-15ff71-5-lxd-8:~# service ovs-vswitchd restart
root@juju-15ff71-5-lxd-8:~# service ovs-vswitchd status
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: enabled)
   Active: active (running) since Wed 2021-01-27 18:33:34 UTC; 7s ago
  Process: 70258 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS)
  Process: 79634 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random start $OVS_CTL_OPTS (code=exited, status=0/SUCCESS)
    Tasks: 22 (limit: 314572)
   CGroup: /system.slice/ovs-vswitchd.service
           └─79674 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vs

Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Starting Open vSwitch Forwarding Unit...
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: nice: cannot set niceness: Permission denied
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: * Starting ovs-vswitchd
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-vsctl[79702]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=juju-15ff71-5-lxd-8.production.solutionsqa
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: * Enabling remote OVSDB managers
Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Started Open vSwitch Forwarding Unit.

Before being booted out of the system due to our CI cleaning up the run.

Artifacts can be found here: https://oil-jenkins.canonical.com/artifacts/23d8528d-2931-4be6-a0d1-bad21e3d75a5/index.html

Symptoms are the same, ovn-chassis units stay blocked:

ubuntu@production-cpe-23d8528d-2931-4be6-a0d1-bad21e3d75a5:~$ juju status octavia-ovn-chassis
Model      Controller        Cloud/Region        Version  SLA          Timestamp
openstack  foundations-maas  maas_cloud/default  2.8.7    unsupported  18:29:58Z

App                    Version  Status   Scale  Charm             Store       Rev  OS      Notes
hacluster-octavia               active       0  hacluster         jujucharms  161  ubuntu  
logrotated                      active       0  logrotated        jujucharms    2  ubuntu  
octavia                6.1.0    blocked      3  octavia           jujucharms   90  ubuntu  
octavia-ovn-chassis    20.03.1  waiting      3  ovn-chassis       jujucharms   49  ubuntu  
public-policy-routing           active       0  advanced-routing  jujucharms    3  ubuntu

Unit                        Workload  Agent      Machine  Public address  Ports     Message
octavia/0*                  blocked   idle       1/lxd/8  10.244.40.229   9876/tcp  Awaiting end-user execution of `configure-resources` action to create required resources
  hacluster-octavia/0*      active    idle                10.244.40.229             Unit is ready and clustered
  logrotated/62             active    idle                10.244.40.229             Unit is ready.
  octavia-ovn-chassis/0*    waiting   executing           10.244.40.229             'ovsdb' incomplete
  public-policy-routing/44  active    idle                10.244.40.229             Unit is ready
octavia/1                   blocked   idle       3/lxd/8  10.244.40.244   9876/tcp  Awaiting leader to create required resources
  hacluster-octavia/1       active    idle                10.244.40.244             Unit is ready and clustered
  logrotated/63             active    idle                10.244.40.244             Unit is ready.
  octavia-ovn-chassis/1     waiting   executing           10.244.40.244             'ovsdb' incomplete
  public-policy-routing/45  active    idle                10.244.40.244             Unit is ready
octavia/2                   blocked   idle       5/lxd/8  10.244.40.250   9876/tcp  Awaiting leader to create required resources
  hacluster-octavia/2       active    idle                10.244.40.250             Unit is ready and clustered
  logrotated/64             active    idle                10.244.40.250             Unit is ready.
  octavia-ovn-chassis/2     waiting   executing           10.244.40.250             'ovsdb' incomplete
  public-policy-routing/46  active    idle                10.244.40.250             Unit is ready

Machine  State    DNS            Inst id              Series  AZ     Message
1        started  10.244.41.35   armaldo              bionic  zone1  Deployed
1/lxd/8  started  10.244.40.229  juju-15ff71-1-lxd-8  bionic  zone1  Container started
3        started  10.244.41.17   spearow              bionic  zone2  Deployed
3/lxd/8  started  10.244.40.244  juju-15ff71-3-lxd-8  bionic  zone2  Container started
5        started  10.244.41.18   beartic              bionic  zone3  Deployed
5/lxd/8  started  10.244.40.250  juju-15ff71-5-lxd-8  bionic  zone3  Container started

Confirmed that the --no-mlockall flag was present in /etc/default/openvswitch-switch:

# This is a POSIX shell fragment                -*- sh -*-
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Configuration managed by neutron-openvswitch charm
# Service restart triggered by remote application: 
#                                                  
###############################################################################
OVS_CTL_OPTS='--no-mlockall'

Then at the request of Billy Olsen I restarted the ovs-vswitchd service on one of the units which caused it to go back to ready:

Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Starting Open vSwitch Forwarding Unit...
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: nice: cannot set niceness: Permission denied
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]:  * Starting ovs-vswitchd
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-vsctl[79702]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=juju-15ff71-5-lxd-8.production.solutionsqa
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]:  * Enabling remote OVSDB managers
Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Started Open vSwitch Forwarding Unit.

Before being booted out of the system due to our CI cleaning up the run.