network-vif-plugged event timeouts during resize-confirm can resutlt vms enterign error state with a mix of the old and new flavor
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Triaged
|
Medium
|
Unassigned |
Bug Description
if a network vif plugged events times out in resize confirm the VM will enter an error state.
if the VM is not using numa then a hard reboot should be enough to fix that.
if it has a numa toplgoy the instnace_
in this case the VM can try to boot with the instance numa toplgoy for the new flavor on the dest host but the flavor.vcpus form the old flavor.
ideally if we have such a failure the vms should either revert to verify_resize or you should be able to do resize_confirm again to try and finish the resize.
alternately we could provide a nova-manage command to help fix the embedded flavor and or flavor in the request spec and reconsile those with the instance numa topology.
the intest would be to ensure its possible to recover the VM either with a second cofnrim or by using the nova manage command and then hard rebooting the isntnace.
if the instnace numa toplogy disagree on the CPU count the error presents in the log like this
2023-01-17 23:34:32.972 7 ERROR oslo_messaging. rpc.server Traceback (most recent call last): rpc.server File "/usr/lib/ python3. 6/site- packages/ oslo_messaging/ rpc/server. py", line 165, in _process_incoming rpc.server res = self.dispatcher .dispatch( message) rpc.server File "/usr/lib/ python3. 6/site- packages/ oslo_messaging/ rpc/dispatcher. py", line 274, in dispatch rpc.server return self._do_ dispatch( endpoint, method, ctxt, args) rpc.server File "/usr/lib/ python3. 6/site- packages/ oslo_messaging/ rpc/dispatcher. py", line 194, in _do_dispatch rpc.server result = func(ctxt, **new_args) rpc.server File "/usr/lib/ python3. 6/site- packages/ nova/exception_ wrapper. py", line 79, in wrapped rpc.server function_name, call_dict, binary, tb) rpc.server File "/usr/lib/ python3. 6/site- packages/ oslo_utils/ excutils. py", line 220, in __exit__ rpc.server self.force_ reraise( ) rpc.server File "/usr/lib/ python3. 6/site- packages/ oslo_utils/ excutils. py", line 196, in force_reraise rpc.server six.reraise( self.type_ , self.value, self.tb) rpc.server File "/usr/lib/ python3. 6/site- packages/ six.py" , line 675, in reraise rpc.server raise value rpc.server File "/usr/lib/ python3. 6/site- packages/ nova/exception_ wrapper. py", line 69, in wrapped rpc.server return f(self, context, *args, **kw) rpc.server File "/usr/lib/ python3. 6/site- packages/ nova/compute/ manager. py", line 191, in decorated_function rpc.server "Error: %s", e, instance=instance) rpc.server File "/usr/lib/ python3. 6/site- packages/ oslo_utils/ excutils. py", line 220, in __exit__ rpc.server self.force_ reraise( ) rpc.server File "/usr/lib/ python3. 6/site- packages/ oslo_utils/ excutils. py", line 196, in force_reraise rpc.server six.reraise( self.type_ , self.value, self.tb) rpc.server File "/usr/lib/ python3. 6/site- packages/ six.py" , line 675, in reraise rpc.server raise value rpc.server File "/usr/lib/ python3. 6/site- pac...
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.
2023-01-17 23:34:32.972 7 ERROR oslo_messaging.