The job tripleo-ci-centos-8-standalone-ffu-wallaby is failing when the standalone node is upgrading
The error is:
2022-11-21 21:48:52 | 2022-11-21 21:48:52.584124 | bc764e10-0e1a-faaf-8d43-00000000220c | FATAL | Set connection | standalone | error={"changed": true, "cmd": "podman exec ovn_cluster_north_db_server bash -c \"ovn-nbctl --no-leader-only --inactivity-probe=60000 set-connection ptcp:6641:0.0.0.0\"\npodman exec ovn_cluster_south_db_server bash -c \"ovn-sbctl --no-leader-only --inactivity-probe=60000 set-connection ptcp:6642:0.0.0.0\"\n", "delta": "0:00:00.330020", "end": "2022-11-21 19:25:13.053914", "msg": "non-zero return code", "rc": 255, "start": "2022-11-21 19:25:12.723894", "stderr": "Error: can only create exec sessions on running containers: container state improper\nError: can only create exec sessions on running containers: container state improper", "stderr_lines": ["Error: can only create exec sessions on running containers: container state improper", "Error: can only create exec sessions on running containers: container state improper"], "stdout": "", "stdout_lines": []}
It can be seen at:
https://zuul.opendev.org/t/openstack/build/5fbff124520c41ff9e27ae1a3756cb34/log/logs/undercloud/home/zuul/standalone_upgrade.log#4568
Other logs:
var/log/containers/openvswitch/ovsdb-server-nb.log
2022-11-21T21:31:35.017Z|00013|reconnect|INFO|tcp:192.0.2.254:6641: connecting...
2022-11-21T21:31:39.020Z|00014|reconnect|INFO|tcp:192.0.2.254:6641: connection attempt timed out
2022-11-21T21:31:39.020Z|00015|reconnect|INFO|tcp:192.0.2.254:6641: continuing to reconnect in the background but suppressing further logging
2022-11-21T21:48:10.570Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2022-11-21T21:48:10.571Z|00002|daemon_unix|WARN|/var/run/ovn/ovnnb_db.pid: stale pidfile for pid 136
being deleted by pid 0
2022-11-21T21:48:10.571Z|00003|daemon_unix|EMER|/var/run/ovn/ovnnb_db.pid: pidfile check failed (No such process), aborting
2022-11-21T21:48:15.219Z|00016|jsonrpc|WARN|unix#107: send error: Broken pipe
2022-11-21T21:48:15.220Z|00017|reconnect|WARN|unix#107: connection dropped (Broken pipe)
2022-11-21T21:48:15.226Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2022-11-21T21:48:15.226Z|00002|daemon_unix|WARN|/var/run/ovn/ovnnb_db.pid: stale pidfile for pid 136
being deleted by pid 0
2022-11-21T21:48:15.226Z|00003|daemon_unix|EMER|/var/run/ovn/ovnnb_db.pid: pidfile check failed (No such process), aborting
2022-11-21T21:48:16.658Z|00018|jsonrpc|WARN|unix#110: send error: Broken pipe
2022-11-21T21:48:16.658Z|00019|reconnect|WARN|unix#110: connection dropped (Broken pipe)
2022-11-21T21:48:16.663Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2022-11-21T21:48:16.663Z|00002|daemon_unix|WARN|/var/run/ovn/ovnnb_db.pid: stale pidfile for pid 136
being deleted by pid 0
var/log/containers/openvswitch/ovn-northd.log
2022-11-21T21:31:40.739Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connected
2022-11-21T21:31:40.739Z|00005|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connected
2022-11-21T21:31:40.739Z|00006|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2022-11-21T21:48:46.797Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log
2022-11-21T21:48:46.801Z|00002|daemon_unix|WARN|/run/openvswitch/ovn-northd.pid: stale pidfile for pid 244
being deleted by pid 0
2022-11-21T21:48:46.801Z|00003|daemon_unix|EMER|/run/openvswitch/ovn-northd.pid: pidfile check failed (No such process), aborting
2022-11-21T21:48:51.875Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log
2022-11-21T21:48:51.875Z|00002|daemon_unix|WARN|/run/openvswitch/ovn-northd.pid: stale pidfile for pid 244
being deleted by pid 0
2022-11-21T21:48:51.875Z|00003|daemon_unix|EMER|/run/openvswitch/ovn-northd.pid: pidfile check failed (No such process), aborting
2022-11-21T21:48:53.050Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log
2022-11-21T21:48:53.050Z|00002|daemon_unix|WARN|/run/openvswitch/ovn-northd.pid: stale pidfile for pid 244
being deleted by pid 0
I took a look at this quickly but for me it seems like some issue with container's configuration maybe. Here's what I see in the var/log/ extra/podman/ containers/ ovn_cluster_ north_db_ server/ stdout. log on undercloud:
Running command: 'bash -c $* -- eval source /etc/sysconfig/ ovn_cluster; exec /usr/local/ bin/start- nb-db-server ${OVN_NB_DB_OPTS}' /ovn_cluster; ' exec /usr/local/ bin/start- nb-db-server '${OVN_NB_DB_OPTS}' ovnnb_db. db ovsdb-tool: I/O error: /etc/ovn/ ovnnb_db. db: failed to lock lockfile (Resource temporarily unavailable) ovn/ovnnb_ db.pid: pidfile check failed (No such process), aborting
+ umask 0022
+ exec bash -c '$*' -- eval source '/etc/sysconfig
Creating cluster database /etc/ovn/
[FAILED]
ovsdb-server: /var/run/