Unit status flapping: Services not running that should be: neutron-ovn-metadata-agent

Bug #1907178 reported by Nobuto Murata
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-ovn-chassis
Triaged
High
Unassigned

Bug Description

I see the unit status flapping in multiple units between two status:
- Unit is ready
- Services not running that should be: neutron-ovn-metadata-agent

When I logged into the unit, neutron-ovn-metadata-agent was actually running. And it looked like the charm was just waiting for another update-status hook execution to refresh the status.

However, it's annoying since it prevents the model settles down to move forward with other subsequent steps such as `juju run-action ceilometer-upgrade`.

$ juju show-status-log ovn-chassis/31
Time Type Status Message
07 Dec 2020 16:23:11Z juju-unit executing running nova-compute-relation-changed hook
07 Dec 2020 16:23:31Z juju-unit executing running ovsdb-relation-changed hook
07 Dec 2020 16:23:53Z juju-unit executing running ovsdb-relation-joined hook
07 Dec 2020 16:24:11Z juju-unit executing running ovsdb-relation-changed hook
07 Dec 2020 16:24:19Z juju-unit idle
07 Dec 2020 16:55:30Z workload waiting 'certificates' awaiting server certificate data, 'ovsdb' incomplete
07 Dec 2020 17:00:27Z juju-unit executing running certificates-relation-changed hook
07 Dec 2020 17:00:37Z juju-unit idle
07 Dec 2020 17:01:23Z juju-unit executing running ovsdb-relation-changed hook
07 Dec 2020 17:01:39Z workload waiting 'ovsdb' incomplete
07 Dec 2020 17:04:44Z juju-unit idle
07 Dec 2020 17:33:32Z workload active Unit is ready
07 Dec 2020 17:39:06Z workload blocked Services not running that should be: neutron-ovn-metadata-agent
07 Dec 2020 20:42:11Z workload active Unit is ready
07 Dec 2020 20:47:39Z workload blocked Services not running that should be: neutron-ovn-metadata-agent
08 Dec 2020 02:22:25Z workload active Unit is ready
08 Dec 2020 02:27:23Z workload blocked Services not running that should be: neutron-ovn-metadata-agent
08 Dec 2020 02:41:46Z workload active Unit is ready
08 Dec 2020 02:46:10Z workload blocked Services not running that should be: neutron-ovn-metadata-agent
08 Dec 2020 02:50:51Z workload active Unit is ready

Revision history for this message
Nobuto Murata (nobuto) wrote :

hmm, this is probably due to an incorrect cert generated. However, the charm should error out at that point instead of flapping the status. I'm trying to gather logs around it.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Download full text (7.6 KiB)

Nothing suspicious in the charm unit log, so it may be hard to do something from the charm side. Marking this as incomplete for the time being.

====

08 Dec 2020 02:41:46Z workload active Unit is ready
08 Dec 2020 02:46:10Z workload blocked Services not running that should be: neutron-ovn-metadata-agent

====

2020-12-08 02:27:20 INFO juju-log Reactive main running for hook update-status
2020-12-08 02:27:20 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:59:default_update_status
2020-12-08 02:27:20 INFO juju-log Invoking reactive handler: reactive/ovn_chassis_charm_handlers.py:28:enable_chassis_reactive_code
2020-12-08 02:27:20 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:77:check_really_is_update_status
2020-12-08 02:27:20 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:88:run_default_update_status
2020-12-08 02:27:20 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:121:default_request_certificates
2020-12-08 02:27:20 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:163:default_config_rendered
2020-12-08 02:27:22 WARNING update-status Synchronizing state of neutron-ovn-metadata-agent.service with SysV service script with /lib/systemd/systemd-sysv-install.
2020-12-08 02:27:22 WARNING update-status Executing: /lib/systemd/systemd-sysv-install enable neutron-ovn-metadata-agent
2020-12-08 02:27:23 INFO juju-log Invoking reactive handler: hooks/relations/juju-info/requires.py:24:broken:juju-info
2020-12-08 02:27:23 INFO juju-log Invoking reactive handler: hooks/relations/tls-certificates/requires.py:79:joined:certificates
2020-12-08 02:27:23 INFO juju-log Invoking reactive handler: hooks/relations/ovsdb/requires.py:34:joined:ovsdb
2020-12-08 02:27:23 INFO juju-log ovsdb: OVSDBRequires -> joined
2020-12-08 02:27:23 INFO juju-log ovsdb: OVSDBRequires -> joined
2020-12-08 02:27:23 INFO juju-log Invoking reactive handler: hooks/relations/ovsdb-subordinate/provides.py:104:broken:ovsdb-subordinate
2020-12-08 02:27:24 INFO juju.worker.uniter.operation runhook.go:142 ran "update-status" hook (via explicit, bespoke hook script)
2020-12-08 02:32:21 INFO juju-log Reactive main running for hook update-status
2020-12-08 02:32:21 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:59:default_update_status
2020-12-08 02:32:21 INFO juju-log Invoking reactive handler: reactive/ovn_chassis_charm_handlers.py:28:enable_chassis_reactive_code
2020-12-08 02:32:21 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:77:check_really_is_update_status
2020-12-08 02:32:21 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:88:run_default_update_status
2020-12-08 02:32:21 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:121:default_request_certificates
2020-12-08 02:32:21 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:163:default_config_rendered
2020-12-08 02:32:23 WARNING update-status Synchronizing state of neutron-ovn-metadata-agent.service with SysV service script with /lib/systemd/systemd-sysv-install.
2020-12-08 02:32:23 WARNING update-status Executing: /lib/systemd/systemd...

Read more...

Changed in charm-ovn-chassis:
status: New → Incomplete
Revision history for this message
Nobuto Murata (nobuto) wrote :
Download full text (3.1 KiB)

Dec 08 02:45:55 ps5-rb1-n4 systemd[1]: Started Neutron OVN Metadata Agent.
Dec 08 02:45:56 ps5-rb1-n4 sudo[1047095]: root : TTY=unknown ; PWD=/root ; USER=root ; COMMAND=/usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf privsep-helper --config-file /etc/neutron/neutron.conf
--config-file /etc/neutron/neutron_ovn_metadata_agent.ini --privsep_context neutron.privileged.default --privsep_sock_path /tmp/tmpsew1dsp6/privsep.sock
Dec 08 02:45:56 ps5-rb1-n4 sudo[1047095]: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 08 02:45:57 ps5-rb1-n4 sudo[1047095]: pam_unix(sudo:session): session closed for user root
Dec 08 02:45:57 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Main process exited, code=exited, status=1/FAILURE
Dec 08 02:45:57 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Failed with result 'exit-code'.
Dec 08 02:45:57 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Scheduled restart job, restart counter is at 15173.
Dec 08 02:45:57 ps5-rb1-n4 systemd[1]: Stopped Neutron OVN Metadata Agent.
Dec 08 02:45:57 ps5-rb1-n4 systemd[1]: Started Neutron OVN Metadata Agent.
Dec 08 02:45:58 ps5-rb1-n4 sudo[1047120]: root : TTY=unknown ; PWD=/root ; USER=root ; COMMAND=/usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf privsep-helper --config-file /etc/neutron/neutron.conf
--config-file /etc/neutron/neutron_ovn_metadata_agent.ini --privsep_context neutron.privileged.default --privsep_sock_path /tmp/tmp8ydl_4_4/privsep.sock
Dec 08 02:45:58 ps5-rb1-n4 sudo[1047120]: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 08 02:45:59 ps5-rb1-n4 sudo[1047120]: pam_unix(sudo:session): session closed for user root
Dec 08 02:45:59 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Main process exited, code=exited, status=1/FAILURE
Dec 08 02:45:59 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Failed with result 'exit-code'.
Dec 08 02:45:59 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Scheduled restart job, restart counter is at 15174.
Dec 08 02:45:59 ps5-rb1-n4 systemd[1]: Stopped Neutron OVN Metadata Agent.
Dec 08 02:45:59 ps5-rb1-n4 systemd[1]: Started Neutron OVN Metadata Agent.
Dec 08 02:46:01 ps5-rb1-n4 sudo[1047145]: root : TTY=unknown ; PWD=/root ; USER=root ; COMMAND=/usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf privsep-helper --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/neutron_ovn_metadata_agent.ini --privsep_context neutron.privileged.default --privsep_sock_path /tmp/tmpjxz33cog/privsep.sock
Dec 08 02:46:01 ps5-rb1-n4 sudo[1047145]: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 08 02:46:01 ps5-rb1-n4 sudo[1047145]: pam_unix(sudo:session): session closed for user root
Dec 08 02:46:02 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Main process exited, code=exited, status=1/FAILURE
Dec 08 02:46:02 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Failed with result 'exit-code'.
Dec 08 02:46:02 ps5-rb1-n4 systemd[1]: neutron-ovn-metadata-agent.service: Scheduled restart job, restart counter is at 15175.
Dec 08 02:46:02 ps5-rb1-n4 systemd[1]: Stopped Neutron OVN M...

Read more...

Revision history for this message
Nobuto Murata (nobuto) wrote :
Download full text (4.0 KiB)

2020-12-08 02:46:58.032 1049476 CRITICAL neutron [-] Unhandled error: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Chassis with name=c56b5ed2-f31f-4a55-b53a-471252d920d7
2020-12-08 02:46:58.032 1049476 ERROR neutron Traceback (most recent call last):
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/bin/neutron-ovn-metadata-agent", line 10, in <module>
2020-12-08 02:46:58.032 1049476 ERROR neutron sys.exit(main())
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/neutron/cmd/eventlet/agents/ovn_metadata.py", line 17, in main
2020-12-08 02:46:58.032 1049476 ERROR neutron metadata_agent.main()
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/neutron/agent/ovn/metadata_agent.py", line 39, in main
2020-12-08 02:46:58.032 1049476 ERROR neutron agt.start()
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/neutron/agent/ovn/metadata/agent.py", line 229, in start
2020-12-08 02:46:58.032 1049476 ERROR neutron self.register_metadata_agent()
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/neutron/agent/ovn/metadata/agent.py", line 239, in register_metadata_agent
2020-12-08 02:46:58.032 1049476 ERROR neutron self.sb_idl.db_add(table, self.chassis, 'external_ids',
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/command.py", line 40, in execute
2020-12-08 02:46:58.032 1049476 ERROR neutron t.add(self)
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
2020-12-08 02:46:58.032 1049476 ERROR neutron next(self.gen)
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/api.py", line 119, in transaction
2020-12-08 02:46:58.032 1049476 ERROR neutron del self._nested_txns_map[cur_thread_id]
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/api.py", line 69, in __exit__
2020-12-08 02:46:58.032 1049476 ERROR neutron self.result = self.commit()
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2020-12-08 02:46:58.032 1049476 ERROR neutron raise result.ex
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run
2020-12-08 02:46:58.032 1049476 ERROR neutron txn.results.put(txn.do_commit())
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 89, in do_commit
2020-12-08 02:46:58.032 1049476 ERROR neutron command.run_idl(txn)
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/command.py", line 157, in run_idl
2020-12-08 02:46:58.032 1049476 ERROR neutron record = self.api.lookup(self.table, self.record)
2020-12-08 02:46:58.032 1049476 ERROR neutron File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 107, in loo...

Read more...

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Seen on OSCI on focal-ussuri:

https://review.opendev.org/c/openstack/charm-neutron-gateway/+/770412
https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_func_full/openstack/charm-neutron-gateway/770412/1/7758/index.html

All ovn-chassis instances are blocked with "Services not running that should be: neutron-ovn-metadata-agent"

Changed in charm-ovn-chassis:
status: Incomplete → Triaged
importance: Undecided → High
Revision history for this message
Liam Young (gnuoy) wrote :

I think Comment #5 is a different bug and I've raised https://bugs.launchpad.net/charm-ovn-chassis/+bug/1912471 to cover it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.