Resize and shelve server fails in the multinode CI jobs intemittent
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Slawek Kaplonski |
Bug Description
I noticed failures of the 2 tests in the Neutron CI multinode job. Both failures looks similar for me at first glance but if those are different issues, feel free to open another bug for one of them.
Failed tests:
tempest.
and
tempest.
Failure examples:
https:/
https:/
https:/
Stacktrace:
Traceback (most recent call last):
File "/opt/stack/
self.
File "/opt/stack/
waiters.
File "/opt/stack/
raise lib_exc.
tempest.
Details: (ServersNegativ
and:
Traceback (most recent call last):
File "/opt/stack/
waiters.
File "/opt/stack/
raise lib_exc.
tempest.
Details: (ServerActionsT
Changed in nova: | |
importance: | Undecided → High |
affects: | nova → neutron |
tags: | added: neutron-proactive-backport-potential |
Changed in neutron: | |
status: | Confirmed → In Progress |
Changed in neutron: | |
status: | In Progress → Fix Released |
tags: | removed: neutron-proactive-backport-potential |
So I looked at first reported failed job[1] where unshelving timed out. Here is the grepped output with the relevant logs [2]
Sequence of events from the nova-compute perspective:
1) unshelving starts
?) (I don't see when the binding of the port happened but it should be done by the compute at [3])
2) vif-plugged event received but treated as unexpected and ignored
3) nova starts waiting for the vif-plugged event
5) nova plugs the vif
4) nova times out waiting for the plugged event.
Does neutron sends the vif plugged event at bind time instead of plug time in this case?!
[1] https:/ /zuul.opendev. org/t/openstack /build/ bbf40b69b30d42a 194af50f60915f9 cd/logs
[2] https:/ /paste. opendev. org/show/ 811535/ /github. com/openstack/ nova/blob/ 9f296d775d8f58f cbd03393c81a023 268c7071cb/ nova/compute/ manager. py#L6675
[3] https:/