I looked into this and agree with slawek that something is wrong on Neutron OVN side. Adding my findings as below:-
Some Data points:-
- Issue is random as jobs succeeds some time[1], so likely some race or missing events somehow
- Issue is not specific to wallaby, i can see similar failure in all releases(train+)[2]
- Issue is not specific to Distro, seeing in both C8, RHEL8 and C9 jobs[2]
- Issue happening from long, i could see failures one month back, before that logs are not persisted, adding reference to logs from last month[2]
- Issue also seen in jobs running with 1 controller[3], found only few occurances, looked only in wallaby and train.
<< - and OVN reports status UP, but it's way to long after vm was already deleted:
<< 2022-03-15 16:50:31.218 15 INFO neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver [req-dbbfd0fb-bec7-4a80-83af-c863ca531175 - - - - -] OVN reports status up for port: 6a712e97-bc61-49a0-aee6-66d4fcd7b72d
Seems ^ is triggered instead(of PortBindingUpdateUpEvent, missed somehow) by Maintenance task: Fixing resource 6a712e97-bc61-49a0-aee6-66d4fcd7b72d (type: ports) at create/update check_for_inconsistencies /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/maintenance.py:358
One thing i noticed in logs i checked is "OVN reports status up for port" is not logged after event
PortBindingUpdateUpEvent. But didn't got what can cause it as i see it's the first statement to be executed with the event[4][5].
Considering the related event, [6] looked suspicious which is backported till Ussuri. But since it seen also in Train, may be [6] just increased reproducibility or it is some general issue with events processing. Will involve Luis(author of [6]) and someone with better understanding around these.
I looked into this and agree with slawek that something is wrong on Neutron OVN side. Adding my findings as below:-
Some Data points:-
- Issue is random as jobs succeeds some time[1], so likely some race or missing events somehow
- Issue is not specific to wallaby, i can see similar failure in all releases(train+)[2]
- Issue is not specific to Distro, seeing in both C8, RHEL8 and C9 jobs[2]
- Issue happening from long, i could see failures one month back, before that logs are not persisted, adding reference to logs from last month[2]
- Issue also seen in jobs running with 1 controller[3], found only few occurances, looked only in wallaby and train.
<< - and OVN reports status UP, but it's way to long after vm was already deleted: plugins. ml2.drivers. ovn.mech_ driver. mech_driver [req-dbbfd0fb- bec7-4a80- 83af-c863ca5311 75 - - - - -] OVN reports status up for port: 6a712e97- bc61-49a0- aee6-66d4fcd7b7 2d
<< 2022-03-15 16:50:31.218 15 INFO neutron.
Seems ^ is triggered instead(of PortBindingUpda teUpEvent, missed somehow) by Maintenance task: Fixing resource 6a712e97- bc61-49a0- aee6-66d4fcd7b7 2d (type: ports) at create/update check_for_ inconsistencies /usr/lib/ python3. 9/site- packages/ neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ ovsdb/maintenan ce.py:358
One thing i noticed in logs i checked is "OVN reports status up for port" is not logged after event teUpEvent. But didn't got what can cause it as i see it's the first statement to be executed with the event[4][5].
PortBindingUpda
Considering the related event, [6] looked suspicious which is backported till Ussuri. But since it seen also in Train, may be [6] just increased reproducibility or it is some general issue with events processing. Will involve Luis(author of [6]) and someone with better understanding around these.
[1] /review. rdoproject. org/zuul/ builds? job_name= periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t001-wallaby /review. rdoproject. org/zuul/ builds? job_name= periodic- tripleo- ci-centos- 9-ovb-3ctlr_ 1comp-featurese t001-wallaby /logserver. rdoproject. org/openstack- periodic- integration- main/opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 9-ovb-3ctlr_ 1comp-featurese t001-master/ 66b141d/ logs/undercloud /var/log/ tempest/ stestr_ results. html.gz /logserver. rdoproject. org/openstack- periodic- integration- stable1- cs9/opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 9-ovb-3ctlr_ 1comp-featurese t001-wallaby/ 6b04066/ logs/undercloud /var/log/ tempest/ stestr_ results. html.gz /logserver. rdoproject. org/openstack- periodic- integration- stable1/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t001-wallaby/ ee71b17/ logs/undercloud /var/log/ tempest/ stestr_ results. html.gz /logserver. rdoproject. org/openstack- periodic- integration- stable2/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t001-victoria/ ef11bd8/ logs/undercloud /var/log/ tempest/ stestr_ results. html.gz /logserver. rdoproject. org/openstack- periodic- integration- stable3/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t001-ussuri/ da5136f/ logs/undercloud /var/log/ tempest/ stestr_ results. html.gz /logserver. rdoproject. org/openstack- periodic- integration- stable4/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t001-train/ 1cc8115/ stestr_ results. html.gz /logserver. rdoproject. org/openstack- periodic- integration- stable4/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 8-ovb-1ctlr_ 2comp-featurese t020-train/ cea1b09/ stestr_ results. html.gz /logserver. rdoproject. org/29/ 37029/24/ check/periodic- tripleo- ci-centos- 8-ovb-1ctlr_ 2comp-featurese t020-wallaby/ 99bccb6/ stestr_ results. html.gz /github. com/openstack/ neutron/ blob/master/ neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ ovsdb/ovsdb_ monitor. py#L531 /github. com/openstack/ neutron/ blob/2d160d9eec 6c09b01e4d7ae05 07eb2d09527b576 /neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ mech_driver. py#L1077 /review. opendev. org/q/Ib0718892 71f4e4d6acd83b2 19bf908a9ae80ce 5c
https:/
https:/
[2]
https:/
https:/
https:/
https:/
https:/
https:/
[3]
https:/
https:/
[4] https:/
[5] https:/
[6] https:/
<< i wonder if this may be related to https:/ /bugs.launchpad .net/neutron/ +bug/1961184 ?
Doesn't look related based on above points and that is targetting only virtual ports.