At the moment, if the cloud sustain a large number of VIF plugging timeouts, it will lead into a ton of leaked green threads which can cause the nova-compute process to stop reporting/responding.
The tracebacks that would occur would be:
===
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7230, in _create_guest_with_network
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] guest = self._create_guest(
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] next(self.gen)
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] actual_event = event.wait()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/event.py", line 125, in wait
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] result = hub.switch()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] return self.greenlet.switch()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] eventlet.timeout.Timeout: 300 seconds
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] During handling of the above exception, another exception occurred:
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 2409, in _build_and_run_instance
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] self.driver.spawn(context, instance, image_meta,
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 4193, in spawn
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] self._create_guest_with_network(
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7256, in _create_guest_with_network
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] raise exception.VirtualInterfaceCreateException()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]
===
Eventually, with enough of these, the nova-compute process would hang. The output of GMR shows nearly 6094 threads, with around 3038 of them having the traceback below:
===
------ Green Thread ------
/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:355 in run
`self.fire_timers(self.clock())`
/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:476 in fire_timers
`timer()`
/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/timer.py:59 in __call__
`cb(*args, **kw)`
/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/__init__.py:151 in _timeout
`current.throw(exc)`
===
In addition, 3039 of those threads would output the following:
===
------ Green Thread ------
No Traceback!
===
In total, that puts 6077 green threads in total with that weird state. We've had a discussion about this here, and it seems that it may be related to the use of `spawn_n`.
https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2022-05-05.log.html#t2022-05-05T16:20:37
found this https:/ /github. com/openstack/ nova/blob/ 0190d585418f088 728533334872820 689642a9e3/ nova/compute/ manager. py#L479 which goes to https:/ /github. com/openstack/ nova/blob/ 0190d585418f088 728533334872820 689642a9e3/ nova/network/ model.py# L623
which references https:/ /github. com/openstack/ nova/blob/ 0190d585418f088 728533334872820 689642a9e3/ nova/network/ model.py# L590
and we've got a mix of calls of .spawn_n and .spawn ...