'Socket /var/run/openvswitch/ovnnb_db.sock not found' during ovn_start

Bug #1980421 reported by melanie witt
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
devstack
Fix Released
Undecided
Unassigned
neutron
Invalid
High
Brian Haley

Bug Description

This is the same as bug 1942697 [1] and I'm opening a new bug per comment #2.

I've seen this in the gate occasionally, ovn fails to start because a socket file it expects is not present. Here's an example trace [2]:

[...]

2022-06-10 06:16:20.567 | + lib/neutron_plugins/ovn_agent:start_ovn:709 : _start_process ovn-central.service
2022-06-10 06:16:20.569 | + lib/neutron_plugins/ovn_agent:_start_process:237 : sudo systemctl daemon-reload
2022-06-10 06:16:21.393 | + lib/neutron_plugins/ovn_agent:_start_process:238 : sudo systemctl enable ovn-central.service
2022-06-10 06:16:23.834 | + lib/neutron_plugins/ovn_agent:_start_process:239 : sudo systemctl restart ovn-central.service
2022-06-10 06:16:24.814 | + lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_sock_file /var/run/openvswitch/ovnnb_db.sock
2022-06-10 06:16:24.816 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 : local count=0
2022-06-10 06:16:24.818 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/openvswitch/ovnnb_db.sock ']'
2022-06-10 06:16:24.821 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:188 : sleep 1
2022-06-10 06:16:26.152 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:189 : count=1
2022-06-10 06:16:26.156 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:190 : '[' 1 -gt 5 ']'
2022-06-10 06:16:26.160 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/openvswitch/ovnnb_db.sock ']'
2022-06-10 06:16:26.163 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:188 : sleep 1
2022-06-10 06:16:27.170 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:189 : count=2
2022-06-10 06:16:27.174 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:190 : '[' 2 -gt 5 ']'
2022-06-10 06:16:27.178 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/openvswitch/ovnnb_db.sock ']'
2022-06-10 06:16:27.181 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:188 : sleep 1
2022-06-10 06:16:28.188 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:189 : count=3
2022-06-10 06:16:28.192 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:190 : '[' 3 -gt 5 ']'
2022-06-10 06:16:28.196 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/openvswitch/ovnnb_db.sock ']'
2022-06-10 06:16:28.200 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:188 : sleep 1
2022-06-10 06:16:29.207 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:189 : count=4
2022-06-10 06:16:29.211 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:190 : '[' 4 -gt 5 ']'
2022-06-10 06:16:29.215 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/openvswitch/ovnnb_db.sock ']'
2022-06-10 06:16:29.218 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:188 : sleep 1
2022-06-10 06:16:30.225 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:189 : count=5
2022-06-10 06:16:30.229 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:190 : '[' 5 -gt 5 ']'
2022-06-10 06:16:30.233 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/openvswitch/ovnnb_db.sock ']'
2022-06-10 06:16:30.236 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:188 : sleep 1
2022-06-10 06:16:31.243 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:189 : count=6
2022-06-10 06:16:31.247 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:190 : '[' 6 -gt 5 ']'
2022-06-10 06:16:31.251 | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:191 : die 191 'Socket /var/run/openvswitch/ovnnb_db.sock not found'
2022-06-10 06:16:31.255 | + functions-common:die:264 : local exitcode=0
2022-06-10 06:16:31.259 | [Call Trace]
2022-06-10 06:16:31.259 | ./stack.sh:1277:start_ovn_services
2022-06-10 06:16:31.260 | /opt/stack/devstack/lib/neutron-legacy:512:start_ovn
2022-06-10 06:16:31.260 | /opt/stack/devstack/lib/neutron_plugins/ovn_agent:714:wait_for_sock_file
2022-06-10 06:16:31.260 | /opt/stack/devstack/lib/neutron_plugins/ovn_agent:191:die
2022-06-10 06:16:31.266 | [ERROR] /opt/stack/devstack/lib/neutron_plugins/ovn_agent:191 Socket /var/run/openvswitch/ovnnb_db.sock not found
2022-06-10 06:16:32.272 | exit_trap: cleaning up child processes
2022-06-10 06:16:32.275 | Error on exit

opensearch query:

11 hits in the last 30 days, check and gate queues, all failures

https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(build_change,build_status,build_queue),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:lucene,query:'message:%20%22%5BERROR%5D%22%20AND%20message:%22Socket%20%2Fvar%2Frun%2Fopenvswitch%2Fovnnb_db.sock%20not%20found%22%20AND%20tags:devstack'),sort:!())

[1] https://bugs.launchpad.net/devstack/+bug/1942697
[2] https://zuul.opendev.org/t/openstack/build/435a3650d30c40668ae06a3de7efb70f/log/controller/logs/devstacklog.txt#13448

melanie witt (melwitt)
description: updated
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Adding neutron as affected project as I don't know how to tackle this within the QA team.

Revision history for this message
yatin (yatinkarel) wrote :

Ok so it aligns to what we have seen in neutron-designate-tempest job, more details in https://bugs.launchpad.net/neutron/+bug/1970679/comments/5

I checked in above logs, it took 6 - 28 seconds for ovn db sockets to be available and thus failed, couldn't find what's the exact reason to be this much slow sometimes, may be slow infra nodes or some bug in ovn packages? as mentioned in other bug increasing timeout[1] should atleast help here.

[1] https://github.com/openstack/devstack/blob/master/lib/neutron_plugins/ovn_agent#L190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/848548

Changed in devstack:
status: New → In Progress
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/848548
Committed: https://opendev.org/openstack/devstack/commit/bd6e5205b115fb0cafed7f50a676699a4b9fc0fe
Submitter: "Zuul (22348)"
Branch: master

commit bd6e5205b115fb0cafed7f50a676699a4b9fc0fe
Author: Dr. Jens Harbott <email address hidden>
Date: Sun Jul 3 22:27:15 2022 +0200

    Increase timeout waiting for OVN startup

    We see some cases where OVN startup takes much longer than 5 seconds, up
    to 28 seconds have been observed, so increase the limit to 40 to be on
    the safe side.

    Signed-off-by: Dr. Jens Harbott <email address hidden>
    Closes-Bug: 1980421
    Change-Id: I6da4a537e6a8d527ff71a821f07164fc7d342882

Changed in devstack:
status: In Progress → Fix Released
Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
Pedro Arduengo (petarducu) wrote :

You can resolve with a couple of modifications on lines 714 and 715, replace de OVS_RUNDIR by OVN_RUNDIR...

Revision history for this message
Vincent Gerris (vgerris) wrote :

That did not solve the problem for me. Would be great if this script can be made more reliable. It's not very idempotent either.
It worked fine a few weeks ago on another machine, now I can't get it to work properly.

Revision history for this message
Amit Uniyal (auniyal) wrote (last edit ):

I often get this on devstack redployment.

============= Logs ==========
[Call Trace]
./stack.sh:1273:start_ovn_services
/opt/stack/devstack/lib/neutron-legacy:521:start_ovn
/opt/stack/devstack/lib/neutron_plugins/ovn_agent:714:wait_for_sock_file
/opt/stack/devstack/lib/neutron_plugins/ovn_agent:191:die
[ERROR] /opt/stack/devstack/lib/neutron_plugins/ovn_agent:191 Socket /var/run/openvswitch/ovnnb_db.sock not found
Error on exit

Revision history for this message
Brian Haley (brian-haley) wrote :

I have seen this as well, and some tweaks to use OVN_RUNDIR on a number of lines was necessary, as mentioned in #6.

Part of this is due to this line in install_ovn():

  sudo ln -s $OVS_RUNDIR $OVN_RUNDIR

On my test system I had already installed OVS and OVN packages, so it is making the symlink in /var/run/ovn:

/var/run$ ls -l ovn
total 20
lrwxrwxrwx 1 root root 20 Oct 12 21:34 openvswitch -> /var/run/openvswitch

It should have made /var/run/ovn -> /var/run/openvswitch

I'll send a patch that I hope will address the issue.

Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/861915

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/861915
Committed: https://opendev.org/openstack/devstack/commit/71c99655479174750bcedfe458328328a1596766
Submitter: "Zuul (22348)"
Branch: master

commit 71c99655479174750bcedfe458328328a1596766
Author: Brian Haley <email address hidden>
Date: Wed Oct 19 14:08:43 2022 -0400

    Use separate OVS and OVN directories

    If stack.sh is run on a system that already has OVN packages
    installed, it could fail to find its DB sockets. This is because
    the 'ln -s' will place the symlink inside of /var/run/ovn
    instead of using a single directory as intended.

    Change the code in neutron_plugins/ovn_agent to not make the
    symlink and instead use separate directories for OVS and OVN.

    Closes-bug: #1980421

    Change-Id: Ic28a93bdc3dfe4a6159234baeabd0064db452b07

Changed in neutron:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.