ml2/ovn refuses to bind port due to dead agent randomly in the nova-live-migrate ci job
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Confirmed
|
High
|
sean mooney |
Bug Description
we have seen random failures of
test_volume_
in the nova-live-
Details: {'code': 400, 'message': 'Migration pre-check error: Binding failed for port e3308a61-
looking at the neuton log we see
May 09 00:10:26.714817 np0033982852 neutron-
May 09 00:10:26.716243 np0033982852 neutron-
and the following in the neutorn-
May 09 00:10:23.765529 np0033982853 neutron-
This looks like it might be related to
https:/
This modified the code to add some randomness due to https:/
but that seams to negitivly impact the stability of the agent.
to fix this i will propose a patch to change the interval form
interval = randint(0, cfg.CONF.
to
interval = randint(0, cfg.CONF.
to increase the likelihood that we send the heartbeat in time.
when we are making calls to privsep and ovs the logs stop for multiple second while those operations are happening and if that happens the the wrong time i belive this leads to use missing the heartbeat interval.
Changed in neutron: | |
assignee: | nobody → sean mooney (sean-k-mooney) |
Changed in neutron: | |
status: | New → Confirmed |
tags: | added: ovn |
Changed in neutron: | |
importance: | Undecided → High |
Fix proposed to branch: master /review. opendev. org/c/openstack /neutron/ +/883687
Review: https:/