Recently we seem to have many the same devstack build failure in many different gate jobs. The usual error message is:
+ lib/neutron_plugins/ovn_agent:start_ovn:714 : wait_for_db_file /var/lib/ovn/ovnsb_db.db
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:175 : local count=0
+ lib/neutron_plugins/ovn_agent:wait_for_db_file:176 : '[' '!' -f /var/lib/ovn/ovnsb_db.db ']'
+ lib/neutron_plugins/ovn_agent:start_ovn:716 : is_service_enabled tls-proxy
+ functions-common:is_service_enabled:2089 : return 0
+ lib/neutron_plugins/ovn_agent:start_ovn:717 : sudo ovn-nbctl --db=unix:/var/run/ovn/ovnnb_db.sock set-ssl /opt/stack/data/CA/int-ca/private/devstack-cert.key /opt/stack/data/CA/int-ca/devstack-cert.crt /opt/stack/data/CA/int-ca/ca-chain.pem
ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
+ lib/neutron_plugins/ovn_agent:start_ovn:1 : exit_trap
A few example logs:
https://zuul.opendev.org/t/openstack/build/ec852d75c8094afcb4140871bc9ffa36
https://zuul.opendev.org/t/openstack/build/eae988aa8cd24c78894a3d3438392357
The search expression 'message:"ovnnb_db.sock: database connection failed"' gives me 1200+ hits in https://opensearch.logs.openstack.org for the last 2 weeks.
<< The search expression 'message: "ovnnb_ db.sock: database connection failed"' gives me 1200+ hits in https:/ /opensearch. logs.openstack. org for the last 2 weeks.
I added some more filters and it gives 50 such results:- /opensearch. logs.openstack. org/_dashboards /app/discover/ ?security_ tenant= global# /?_g=(filters: !(),refreshInte rval:(pause: !t,value: 0),time: (from:now- 30d,to: now))&_ a=(columns: !(_source) ,filters: !(('$state' :(store: appState) ,meta:( alias:! n,disabled: !f,index: '94869730- aea8-11ec- 9e6a-83741af3fd cd',key: build_name, negate: !t,params: (query: tripleo- ci-centos- 9-standalone- external- compute- target- host),type: phrase) ,query: (match_ phrase: (build_ name:tripleo- ci-centos- 9-standalone- external- compute- target- host))) ,('$state' :(store: appState) ,meta:( alias:! n,disabled: !f,index: '94869730- aea8-11ec- 9e6a-83741af3fd cd',key: build_name, negate: !t,params: (query: tobiko- tripleo- minimal) ,type:phrase) ,query: (match_ phrase: (build_ name:tobiko- tripleo- minimal) )),('$state' :(store: appState) ,meta:( alias:! n,disabled: !f,index: '94869730- aea8-11ec- 9e6a-83741af3fd cd',key: filename, negate: !f,params: (query: job-output. txt),type: phrase) ,query: (match_ phrase: (filename: job-output. txt)))) ,index: '94869730- aea8-11ec- 9e6a-83741af3fd cd',interval: auto,query: (language: kuery,query: 'message: %22ovnnb_ db.sock: %20database% 20connection% 20failed% 22%20AND% 20build_ status: FAILURE' ),sort: !())
https:/
In past we have seen taking much time to start and available of db files for those increasing timeout helped https:/ /review. opendev. org/c/openstack /devstack/ +/848548.
But now it's little different issue where it takes time to stop and in that window(can be seen if the window is less than a second in below example) wait_for_sock_file returns true and moves forward and later connection to those .sock files fails as service is not restarted by that time.
2023-01-11 09:24:11.273593 | controller | + lib/neutron_ plugins/ ovn_agent: _start_ process: 239 : sudo systemctl restart ovn-central.service plugins/ ovn_agent: start_ovn: 711 : wait_for_sock_file /var/run/ ovn/ovnnb_ db.sock plugins/ ovn_agent: wait_for_ sock_file: 186 : local count=0 plugins/ ovn_agent: wait_for_ sock_file: 187 : '[' '!' -S /var/run/ ovn/ovnnb_ db.sock ']' plugins/ ovn_agent: start_ovn: 712 : wait_for_sock_file /var/run/ ovn/ovnsb_ db.sock plugins/ ovn_agent: wait_for_ sock_file: 186 : local count=0 plugins/ ovn_agent: wait_for_ sock_file: 187 : '[' '!' -S /var/run/ ovn/ovnsb_ db.sock ']' plugins/ ovn_agent: start_ovn: 713 : wait_for_db_file /var/lib/ ovn/ovnnb_ db.db plugins/ ovn_agent: wait_for_ db_file: 175 : local count=0 plugins/ ovn_agent: wait_for_ db_file: 176 : '[' '!' -f /var/lib/ ovn/ovnnb_ db.db ']'
2023-01-11 09:24:11.295863 | controller | + lib/neutron_
2023-01-11 09:24:11.298605 | controller | + lib/neutron_
2023-01-11 09:24:11.300757 | controller | + lib/neutron_
2023-01-11 09:24:11.303155 | controller | + lib/neutron_
2023-01-11 09:24:11.305826 | controller | + lib/neutron_
2023-01-11 09:24:11.308367 | controller | + lib/neutron_
2023-01-11 09:24:11.310862 | controller | + lib/neutron_
2023-01-11 09:24:11.313570 | controller | + lib/neutron_
2023-01-11 09:24:11.316126 | controller | + lib/neutron_
2023-01-11 09:24:11.319726 | controller | + lib/neutron_...