devstack build in the gate fails with: ovnnb_db.sock: database connection failed

Bug #2002629 reported by Bence Romsics
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
devstack
Fix Released
Undecided
yatin

Bug Description

Recently we seem to have many the same devstack build failure in many different gate jobs. The usual error message is:

+ lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
+ lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy
+ functions-common:is_service_enabled:2089 : return 0
+ lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
+ lib/neutron_plugins/ovn_agent:start_ovn:1 : exit_trap

A few example logs:

https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36
https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357

The search expression 'message:"ovnnb_db.sock: database connection failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org for the last 2 weeks.

Revision history for this message
yatin (yatinkarel) wrote :
Download full text (12.1 KiB)

<< The search expression 'message:"ovnnb_db.sock: database connection failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org for the last 2 weeks.

I added some more filters and it gives 50 such results:-
https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!t,params:(query:tripleo-ci-centos-9-standalone-external-compute-target-host),type:phrase),query:(match_phrase:(build_name:tripleo-ci-centos-9-standalone-external-compute-target-host))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!t,params:(query:tobiko-tripleo-minimal),type:phrase),query:(match_phrase:(build_name:tobiko-tripleo-minimal))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-output.txt),type:phrase),query:(match_phrase:(filename:job-output.txt)))),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22ovnnb_db.sock:%20database%20connection%20failed%22%20AND%20build_status:FAILURE'),sort:!())

In past we have seen taking much time to start and available of db files for those increasing timeout helped https://review.opendev.org/c/openstack/devstack/+/848548.
But now it's little different issue where it takes time to stop and in that window(can be seen if the window is less than a second in below example) wait_for_sock_file returns true and moves forward and later connection to those .sock files fails as service is not restarted by that time.

2023-01-11 09:24:11.273593 | controller | + lib/neutron_plugins/ovn_agent:_start_process:239 : sudo systemctl restart ovn-central.service
2023-01-11 09:24:11.295863 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:711 : wait_for_sock_file /var/run/ovn/ovnnb_db.sock
2023-01-11 09:24:11.298605 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 : local count=0
2023-01-11 09:24:11.300757 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/ovn/ovnnb_db.sock ']'
2023-01-11 09:24:11.303155 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:712 : wait_for_sock_file /var/run/ovn/ovnsb_db.sock
2023-01-11 09:24:11.305826 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:186 : local count=0
2023-01-11 09:24:11.308367 | controller | + lib/neutron_plugins/ovn_agent:wait_for_sock_file:187 : '[' '!' -S /var/run/ovn/ovnsb_db.sock ']'
2023-01-11 09:24:11.310862 | controller | + lib/neutron_plugins/ovn_agent:start_ovn:713 : wait_for_db_file /var/lib/ovn/ovnnb_db.db
2023-01-11 09:24:11.313570 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
2023-01-11 09:24:11.316126 | controller | + lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnnb_db.db ']'
2023-01-11 09:24:11.319726 | controller | + lib/neutron_...

Changed in neutron:
importance: Undecided → High
Changed in devstack:
assignee: nobody → yatin (yatinkarel)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/869969

Changed in devstack:
status: New → In Progress
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Removing neutron from the affected projects, since Yatin found the cause in devstack.

no longer affects: neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/869969
Committed: https://opendev.org/openstack/devstack/commit/7fecba2f135f16204050b627bb850a87aa597bad
Submitter: "Zuul (22348)"
Branch: master

commit 7fecba2f135f16204050b627bb850a87aa597bad
Author: yatinkarel <email address hidden>
Date: Thu Jan 12 17:31:36 2023 +0530

    [OVN] Ensure socket files are absent in init_ovn

    Just like we remove db files let's also remove
    socket files when initializing ovn. Those will
    reappear once service fully restarts along with
    db files. Without it we see random issue as
    described in the below bug.

    Closes-Bug: #2002629
    Change-Id: I726a9cac9c805d017273aa79e844724f0d00cdf0

Changed in devstack:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/devstack/+/870505

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/devstack/+/870506

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/870505
Committed: https://opendev.org/openstack/devstack/commit/74dbd6ee8d293bbc0e343f98a76ceaab5f76d140
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 74dbd6ee8d293bbc0e343f98a76ceaab5f76d140
Author: yatinkarel <email address hidden>
Date: Thu Jan 12 17:31:36 2023 +0530

    [OVN] Ensure socket files are absent in init_ovn

    Just like we remove db files let's also remove
    socket files when initializing ovn. Those will
    reappear once service fully restarts along with
    db files. Without it we see random issue as
    described in the below bug.

    Closes-Bug: #2002629
    Change-Id: I726a9cac9c805d017273aa79e844724f0d00cdf0
    (cherry picked from commit 7fecba2f135f16204050b627bb850a87aa597bad)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/870506
Committed: https://opendev.org/openstack/devstack/commit/b0fbae15a65590171412a3429cbe2d786ce43789
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit b0fbae15a65590171412a3429cbe2d786ce43789
Author: yatinkarel <email address hidden>
Date: Thu Jan 12 17:31:36 2023 +0530

    [OVN] Ensure socket files are absent in init_ovn

    Just like we remove db files let's also remove
    socket files when initializing ovn. Those will
    reappear once service fully restarts along with
    db files. Without it we see random issue as
    described in the below bug.

    Closes-Bug: #2002629
    Change-Id: I726a9cac9c805d017273aa79e844724f0d00cdf0
    (cherry picked from commit 7fecba2f135f16204050b627bb850a87aa597bad)

tags: added: in-stable-yoga
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.