Neutron with ovn driver failed to start on Fedora

Bug #1911128 reported by Slawek Kaplonski on 2021-01-12
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Critical
Bernard Cafarelli

Bug Description

Our periodic job neutron-ovn-tempest-ovs-master-fedora is failing since some time 100% of times. It fails due to failed neutron-server start. Error in the neutron logs is like below:

Jan 11 06:30:42.031451 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: /usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/relationships.py:1994: SAWarning: Setting backref / back_populates on relationship QosNetworkPolicyBinding.port to refer to viewonly relationship Port.qos_network_policy_binding should include sync_backref=False set on the QosNetworkPolicyBinding.port relationship. (this warning may be suppressed after 10 occurrences)
Jan 11 06:30:42.031451 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: util.warn_limited(
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager [-] Error during notification for neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver.post_fork_initialize-627086 process, after_init: Exception: Could not retrieve schema from ssl:10.4.70.225:6641
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager Traceback (most recent call last):
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/site-packages/neutron_lib/callbacks/manager.py", line 197, in _notify_loop
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, **kwargs)
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 272, in post_fork_initialize
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager self._wait_for_pg_drop_event()
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 340, in _wait_for_pg_drop_event
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager idl = ovsdb_monitor.OvnInitPGNbIdl.from_server(
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py", line 582, in from_server
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager helper = idlutils.get_schema_helper(connection_string, schema_name)
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 204, in get_schema_helper
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager raise Exception("Could not retrieve schema from %s" % connection)
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager Exception: Could not retrieve schema from ssl:10.4.70.225:6641
Jan 11 06:30:42.032081 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: ERROR neutron_lib.callbacks.manager
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager [-] Error during notification for neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver.post_fork_initialize-627086 process, after_init: Exception: Could not retrieve schema from ssl:10.4.70.225:6641
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager Traceback (most recent call last):
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/site-packages/neutron_lib/callbacks/manager.py", line 197, in _notify_loop
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, **kwargs)
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 272, in post_fork_initialize
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager self._wait_for_pg_drop_event()
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 340, in _wait_for_pg_drop_event
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager idl = ovsdb_monitor.OvnInitPGNbIdl.from_server(
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py", line 582, in from_server
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager helper = idlutils.get_schema_helper(connection_string, schema_name)
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 204, in get_schema_helper
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager raise Exception("Could not retrieve schema from %s" % connection)
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager Exception: Could not retrieve schema from ssl:10.4.70.225:6641
Jan 11 06:30:42.033155 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: ERROR neutron_lib.callbacks.manager
Jan 11 06:30:42.033666 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: INFO neutron.wsgi [-] (89301) wsgi starting up on http://0.0.0.0:19696
Jan 11 06:30:42.034612 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: INFO neutron.wsgi [-] (89302) wsgi starting up on http://0.0.0.0:19696
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: CRITICAL neutron [None req-4c1185cb-214e-4848-91b8-ea3b529f1d30 None None] Unhandled error: neutron_lib.callbacks.exceptions.CallbackFailure: Callback neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver.pre_fork_initialize-627113 failed with "Could not retrieve schema from ssl:10.4.70.225:6641"
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron Traceback (most recent call last):
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/bin/neutron-server", line 10, in <module>
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron sys.exit(main())
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/opt/stack/neutron/neutron/cmd/eventlet/server/__init__.py", line 19, in main
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron server.boot_server(wsgi_eventlet.eventlet_wsgi_server)
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/opt/stack/neutron/neutron/server/__init__.py", line 68, in boot_server
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron server_func()
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/opt/stack/neutron/neutron/server/wsgi_eventlet.py", line 24, in eventlet_wsgi_server
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron neutron_api = service.serve_wsgi(service.NeutronApiService)
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/opt/stack/neutron/neutron/service.py", line 94, in serve_wsgi
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron registry.publish(resources.PROCESS, events.BEFORE_SPAWN, service)
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/neutron_lib/callbacks/registry.py", line 60, in publish
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron _get_callback_manager().publish(resource, event, trigger, payload=payload)
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/neutron_lib/callbacks/manager.py", line 149, in publish
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron return self.notify(resource, event, trigger, payload=payload)
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/neutron_lib/db/utils.py", line 109, in _wrapped
Jan 11 06:30:42.037436 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron raise db_exc.RetryRequest(e)
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron self.force_reraise()
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron six.reraise(self.type_, self.value, self.tb)
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/six.py", line 703, in reraise
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron raise value
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/neutron_lib/db/utils.py", line 104, in _wrapped
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron return function(*args, **kwargs)
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron File "/usr/local/lib/python3.8/site-packages/neutron_lib/callbacks/manager.py", line 174, in notify
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron raise exceptions.CallbackFailure(errors=errors)
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron neutron_lib.callbacks.exceptions.CallbackFailure: Callback neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver.pre_fork_initialize-627113 failed with "Could not retrieve schema from ssl:10.4.70.225:6641"
Jan 11 06:30:42.038157 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: ERROR neutron
Jan 11 06:30:42.075747 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: /usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/relationships.py:1994: SAWarning: Setting backref / back_populates on relationship Tag.standard_attr to refer to viewonly relationship StandardAttribute.tags should include sync_backref=False set on the Tag.standard_attr relationship. (this warning may be suppressed after 10 occurrences)
Jan 11 06:30:42.075747 fedora-32-limestone-regionone-0022468210 neutron-server[88725]: util.warn_limited(
Jan 11 06:30:42.267854 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
Jan 11 06:30:42.267794 fedora-32-limestone-regionone-0022468210 systemd[1]: <email address hidden>: Main process exited, code=exited, status=1/FAILURE
Jan 11 06:30:42.268621 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
Jan 11 06:30:42.268621 fedora-32-limestone-regionone-0022468210 neutron-server[89302]: INFO neutron.wsgi [-] (89302) wsgi exited, is_accepting=True
Jan 11 06:30:42.267851 fedora-32-limestone-regionone-0022468210 systemd[1]: <email address hidden>: Failed with result 'exit-code'.
Jan 11 06:30:42.269078 fedora-32-limestone-regionone-0022468210 neutron-server[89301]: INFO neutron.wsgi [-] (89301) wsgi exited, is_accepting=True
Jan 11 06:30:42.268577 fedora-32-limestone-regionone-0022468210 systemd[1]: <email address hidden>: Consumed 4.684s CPU time.

Link to logs of the failed job: https://zuul.openstack.org/build/4c2d09b13a8c4e9391433aa6382d344a/logs

Terry Wilson (otherwiseguy) wrote :

It's not connecting to ovsdb-server on 6641/6642. The worlddump log shows ovsdb-server running for both north and soutbound dbs, and the devstack log shows the connection being setup for ssl:6641/6642.

Using Vagrant with fedora/32-cloud-base locally and running devstack doesn't have a problem with connecting/starting neutron-server either. So this is almost certainly related to neutron running via wsgi and ml2/ovn not calling post_fork_initialize. I'm re-deploying with NEUTRON_DEPLOY_MOD_WSGI=true to verify.

Terry Wilson (otherwiseguy) wrote :

I can verify that with NEUTRON_DEPLOY_MOD_WSGI=true that nothing is connecting to the running ovn ovsdb servers.

[vagrant@localhost devstack]$ sudo lsof -itcp:6641
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ovsdb-ser 103806 openvswitch 17u IPv4 207781 0t0 TCP *:6641 (LISTEN)

Terry Wilson (otherwiseguy) wrote :

I was wrong above (it was a different failure running with wsgi and the fedora test didn't do that), but it looks like https://review.opendev.org/c/openstack/devstack/+/735097 actually resolves this locally for me. ./stack.sh failed exactly as in the logs w/o the patch. Re-running with the patch applied completed successfully.

Bernard Cafarelli (bcafarel) wrote :

OK I was about to report new bug for this job, but we apparently already have one!

As can be seen from this bug dates, the job has failed for some time, confirmed with recent runs:
https://zuul.openstack.org/builds?job_name=neutron-ovn-tempest-ovs-master-fedora&project=openstack/neutron
And recent neutron-server failure log is still similar:
https://439bb5cdebadfdb73b9f-ace28e47b1213bdab56305960a7f8056.ssl.cf1.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-fedora/4b526e7/controller/logs/screen-q-svc.txt

Digging in other logs, I think I got the root cause though: TLS cert files are not readable by ovs daemons:
https://439bb5cdebadfdb73b9f-ace28e47b1213bdab56305960a7f8056.ssl.cf1.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-fedora/4b526e7/controller/logs/openvswitch/ovs-vswitchd_log.txt
https://439bb5cdebadfdb73b9f-ace28e47b1213bdab56305960a7f8056.ssl.cf1.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-fedora/4b526e7/controller/logs/openvswitch/ovsdb-server_log.txt

Both have many permission denied errors on opening cert files:
2021-03-08T06:28:04.106Z|00005|stream_ssl|ERR|/opt/stack/data/CA/int-ca/private/devstack-cert.key: stat failed (Permission denied)
2021-03-08T06:28:04.107Z|00006|stream_ssl|ERR|SSL_use_PrivateKey_file: error:0200100D:system library:fopen:Permission denied
2021-03-08T06:28:04.107Z|00007|stream_ssl|ERR|/opt/stack/data/CA/int-ca/private/devstack-cert.key: stat failed (Permission denied)

Which can explain why neutron-server has issue connecting then to ssl:6641/6642

Compared to a Focal deployment where ovsdb-server process runs as root, Fedora has it running as its own user openvswitch. And cert files are user-only readable by stack user

I will try a job update that disables TLS to see if other issues are hidden behind (as possible from previous comments by Terry), this could also be an acceptable fix for the job - and in that case a later devstack-general fix to have tls working with Fedora jobs

Changed in neutron:
assignee: nobody → Bernard Cafarelli (bcafarel)
status: Confirmed → In Progress
Bernard Cafarelli (bcafarel) wrote :

https://review.opendev.org/c/openstack/neutron/+/779494 disables tls-proxy and has the job green

Bernard Cafarelli (bcafarel) wrote :

https://review.opendev.org/c/openstack/neutron/+/779494 merged, next periodic run should be green

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 18.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers