paused ovn-chassis unit attempts to run ovs-vsctl command

Bug #1908615 reported by Corey Bryant
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-ovn-chassis
New
Undecided
Unassigned

Bug Description

I'm not certain this is a bug but I need to at least document it somewhere.

The neutron-openvswitch smoke test starts ovn-chassis in a paused state. After neutron-openvswitch gets paused later on in the test, ovn-chassis then attempts to run an ovs-vsctl command (while paused) that fails because it can't connect to the database unix:/var/run/openvswitch/db.sock.

This only occurs with this change:

https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/767212

And specifically due to this code removal:

@@ -423,9 +424,6 @@ def resource_map():
         )
         if not use_dpdk():
             drop_config.append(DPDK_INTERFACES)
- drop_config.append(OVS_DEFAULT)
- elif ovs_has_late_dpdk_init():
- drop_config.append(OVS_DEFAULT)

(Note: drop_config.append(OVS_DEFAULT) logic is dropped in the above patch set as it prevents a rewrite of the config template when charm config is reset. With that cod removed it results in /etc/default/openvswitch-switch being written with comments only when the corresponding config options aren't set (see template). This is very similar to what is installed by the package by default, a file full of comments.)

The neutron-openvswitch smoke test fails with:

...
2020-12-17 22:52:13 [INFO] Connected (version 2.0, client OpenSSH_7.6p1)
2020-12-17 22:52:13 [INFO] Authentication (publickey) successful!
2020-12-17 22:52:13 [INFO] Running ping -M do -s 1414 -c 1 192.168.0.1 on instance
2020-12-17 22:52:14 [INFO] ok
2020-12-17 22:52:14 [INFO] ----------------------------------------------------------------------
2020-12-17 22:52:14 [INFO] Ran 1 test in 394.044s
2020-12-17 22:52:14 [INFO] OK
2020-12-17 22:52:14 [INFO] ## Running Test zaza.openstack.charm_tests.ovn.tests.OVSOVNMigrationTest ##
2020-12-17 22:52:15 [WARNING] unknown delta type: id
2020-12-17 22:52:15 [WARNING] unknown delta type: id
2020-12-17 22:52:15 [INFO] test_ovs_ovn_migration (zaza.openstack.charm_tests.ovn.tests.OVSOVNMigrationTest)
2020-12-17 22:52:15 [INFO] Test migration of existing Neutron ML2+OVS deployment to OVN.
2020-12-17 22:52:15 [INFO] ...
2020-12-17 22:52:15 [INFO] Performing migration steps.
2020-12-17 22:52:15 [INFO] Pausing neutron-openvswitch units
2020-12-17 22:52:15 [WARNING] unknown delta type: id
2020-12-17 22:52:15 [WARNING] unknown delta type: id
2020-12-17 22:54:18 [INFO] Pausing neutron-gateway units
2020-12-17 22:54:19 [WARNING] unknown delta type: id
2020-12-17 22:54:19 [INFO] No neutron-gateway in deployment, skip pausing it.
2020-12-17 22:54:19 [INFO] Adding relation neutron-api-plugin-ovn -> neutron-api
2020-12-17 22:54:19 [WARNING] unknown delta type: id
2020-12-17 22:54:20 [WARNING] unknown delta type: id
2020-12-17 22:54:20 [INFO] Waiting for at least one unit with agent status "executing"
2020-12-17 22:54:20 [INFO] ERROR
2020-12-17 22:54:20 [INFO] ======================================================================
2020-12-17 22:54:20 [INFO] ERROR: test_ovs_ovn_migration (zaza.openstack.charm_tests.ovn.tests.OVSOVNMigrationTest)
2020-12-17 22:54:20 [INFO] Test migration of existing Neutron ML2+OVS deployment to OVN.
2020-12-17 22:54:20 [INFO] ----------------------------------------------------------------------
2020-12-17 22:54:20 [INFO] Traceback (most recent call last):
2020-12-17 22:54:20 [INFO] File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/openstack/charm_tests/ovn/tests.py", line 180, in setUp
2020-12-17 22:54:20 [INFO] self._add_neutron_api_plugin_ovn_subordinate_relation()
2020-12-17 22:54:20 [INFO] File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/openstack/charm_tests/ovn/tests.py", line 258, in _add_neutron_api_plugin_ovn_subordinate_relation
2020-12-17 22:54:20 [INFO] File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/__init__.py", line 48, in _wrapper
2020-12-17 22:54:20 [INFO] File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/__init__.py", line 36, in run
2020-12-17 22:54:20 [INFO] return task.result()
2020-12-17 22:54:20 [INFO] return await f(*args, **kwargs)
2020-12-17 22:54:20 [INFO] await model.block_until(
2020-12-17 22:54:20 [INFO] await utils.block_until(done,
2020-12-17 22:54:20 [INFO] await asyncio.wait_for(_block(), timeout, loop=loop)
2020-12-17 22:54:20 [INFO] while not all(c() for c in conditions):
2020-12-17 22:54:20 [INFO] while not all(c() for c in conditions):
2020-12-17 22:54:20 [INFO] return _disconnected() or all(c() for c in conditions)
2020-12-17 22:54:20 [INFO] lambda: one_agent_status(model, status), timeout=timeout)
2020-12-17 22:54:20 [INFO] check_model_for_hard_errors(model)
2020-12-17 22:54:20 [INFO] File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/model.py", line 977, in check_model_for_hard_errors
2020-12-17 22:54:20 [INFO] zaza.model.UnitError: Units ovn-chassis/0,ovn-chassis/1 in error state
2020-12-17 22:54:20 [INFO] FAILED
2020-12-17 22:54:20 [INFO] (errors=1)
2020-12-17 22:54:20 [ERROR] {'migrate-ovn': 'zaza-017192566596'}
2020-12-17 22:54:20 [ERROR] Model migrate-ovn (zaza-017192566596)
2020-12-17 22:54:20 [WARNING] unknown delta type: id
2020-12-17 22:54:20 [ERROR] Applications in error state: ovn-chassis
Traceback (most recent call last):
  File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/bin/functest-run-suite", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/charm_lifecycle/func_test_runner.py", line 272, in main
    func_test_runner(
  File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/charm_lifecycle/func_test_runner.py", line 212, in func_test_runner
    run_env_deployment(env_deployment, keep_model=preserve_model,
  File "/home/ubuntu/charms/focal/neutron-openvswitch/.tox/func-smoke/lib/python3.8/site-packages/zaza/charm_lifecycle/func_test_runner.py", line 146, in run_env_deployment

So it fails due to the hard error of ovn-chassis units.

Specifically the test fails at: https://github.com/openstack-charmers/zaza-openstack-tests/blob/master/zaza/openstack/charm_tests/ovn/tests.py#L258

Where:

ubuntu@coreycb-bastion:~/charms/focal/ovn-chassis$ juju status ovn-chassis
Model Controller Cloud/Region Version SLA Timestamp
zaza-017192566596 coreycb-serverstack serverstack/serverstack 2.8.3 unsupported 00:18:55Z

App Version Status Scale Charm Store Rev OS Notes
neutron-openvswitch 16.2.0 maintenance 0 neutron-openvswitch local 0 ubuntu
nova-compute 21.1.0 active 2 nova-compute jujucharms 518 ubuntu
ovn-chassis 20.03.1 error 2 ovn-chassis jujucharms 43 ubuntu

Unit Workload Agent Machine Public address Ports Message
nova-compute/0* active idle 8 10.5.0.37 Unit is ready
  neutron-openvswitch/0* maintenance idle 10.5.0.37 Paused. Use 'resume' action to resume normal service.
  ovn-chassis/0* error idle 10.5.0.37 hook failed: "config-changed"
nova-compute/1 active idle 9 10.5.0.29 Unit is ready
  neutron-openvswitch/1 maintenance idle 10.5.0.29 Paused. Use 'resume' action to resume normal service.
  ovn-chassis/1 error idle 10.5.0.29 hook failed: "config-changed"

Machine State DNS Inst id Series AZ Message
8 started 10.5.0.37 4a37f83f-10f1-42ff-89c5-b37c0fa1b980 focal nova ACTIVE
9 started 10.5.0.29 a36e09e0-acd0-4c78-aa32-c0083631d61d focal nova ACTIVE

ubuntu@juju-ba08dd-zaza-017192566596-8:~$ sudo cat /var/log/juju/unit-ovn-chassis-0.log
...
2020-12-17 22:54:14 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-ovn-chassis-0/charm/reactive/layer_openstack.py", line 129, in default_request_certificates
    for cn, req in instance.get_certificate_requests().items():
  File "lib/charms/ovn_charm.py", line 303, in get_certificate_requests
    return {self.get_ovs_hostname(): {'sans': []}}
  File "lib/charms/ovn_charm.py", line 392, in get_ovs_hostname
    for row in ch_ovsdb.SimpleOVSDB('ovs-vsctl').open_vswitch:
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charmhelpers/contrib/network/ovs/ovsdb.py", line 221, in _find_tbl
    output = utils._run(*cmd)
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charmhelpers/contrib/network/ovs/utils.py", line 26, in _run
    return subprocess.check_output(args, universal_newlines=True)
  File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('ovs-vsctl', '-f', 'json', 'find', 'open_vswitch')' returned non-zero exit status 1.

2020-12-17 22:54:14 ERROR juju.worker.uniter.operation runhook.go:136 hook "config-changed" (via explicit, bespoke hook script) failed: exit status 1

where:

ubuntu@juju-ba08dd-zaza-017192566596-8:~$ sudo ovs-vsctl -f json find open_vswitch
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)

Should a paused ovn-chassis be running hooks? I'm not sure what the expectation is. It seems like it shouldn't be running this command while paused.

See neutron-openvswitch/tests/bundles/focal-ussuri-dvr-snat.yaml defines new ovn-chassis units as paused:

   ovn-chassis:
     charm: cs:~openstack-charmers-next/ovn-chassis
     options:
       # start new units paused to allow unit by unit OVS to OVN migration
       new-units-paused: true

Note that correspondingly this happens right around pausing of neutron-openvswitch:

ubuntu@juju-ba08dd-zaza-017192566596-8:~$ sudo cat /var/log/juju/unit-neutron-openvswitch-0.log
...
2020-12-17 22:44:58 WARNING juju-log Support for use of upstream ``apt_pkg`` module in conjunctionwith charm-helpers is deprecated since 2019-06-25
2020-12-17 22:54:25 WARNING juju-log Unit is pause or upgrading. Skipping config_changed

summary: - paused ovn-chassis unit attempts to
+ paused ovn-chassis unit attempts to run ovs-vsctl command
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Corey Bryant (corey.bryant) wrote :

neutron-openvswitch has this in it's config-changed:

@hooks.hook('neutron-plugin-relation-changed')
@hooks.hook('config-changed')
@restart_on_change(restart_map())
def config_changed():
    # if we are paused, delay doing any config changed hooks.
    # It is forced on the resume.
    if is_unit_paused_set():
        log("Unit is pause or upgrading. Skipping config_changed", "WARN")
        return

Revision history for this message
Corey Bryant (corey.bryant) wrote :

There are also several when_not('charm_paused') decorators throughout reactive charm layers so I wonder if we're missing one here.

Of particular consideration for this issue is:
File "/var/lib/juju/agents/unit-ovn-chassis-0/charm/reactive/layer_openstack.py", line 129, in default_request_certificates

 121 @reactive.when('certificates.available',
 122 'charms.openstack.do-default-certificates.available')
 123 def default_request_certificates():
 124 """When the certificates interface is available, this default handler
 125 requests TLS certificates.
 126 """
 127 tls = reactive.endpoint_from_flag('certificates.available')
 128 with charm.provide_charm_instance() as instance:
 129 for cn, req in instance.get_certificate_requests().items():
 130 tls.add_request_server_cert(cn, req['sans'])
 131 tls.request_server_certs()
 132 instance.assess_status()

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.