VMware VCDriver: A node crash, vSphere HA and badly timed _sync_power_states() will shut instances down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Expired
|
Undecided
|
Unassigned |
Bug Description
The release: Icehouse, however the code in juno seems to same
When a VMware node crashes, the instances will be restarted on a new node because of vSphere HA.
If _sync_power_
"Instance shutdown by itself. Calling the stop API."
On next _sync_power_
"Instance is not stopped. Calling the stop API.". This happens because vSphere HA has started instances meanwhile.
To my understanding to fix this we need either
1. change the logic (I don't have ideas unfortunately) or
2. add a config option that states if we force stop or not when an instance is stopped from the database point of view.
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: | added: vmware |
Changed in nova: | |
status: | New → Confirmed |
Changed in nova: | |
assignee: | Gary Kotton (garyk) → nobody |
Changed in nova: | |
assignee: | nobody → Giridhar Jayavelu (gjayavelu) |
Changed in nova: | |
assignee: | Giridhar Jayavelu (gjayavelu) → nobody |
This issue can be avoided by setting sync_power_ state_interval= -1 in nova.con
But that stops the periodic task completely and you miss some of the goodness of this task - which is syncing the database state with the hypervisor state.
In sphere if HA is enabled on a cluster we want to just log the discrepancy in power state and vm state, but not do the action of powering down the hosts as it is a transient state why the instances are being migrated to the new host. Unfortunately there is no way at the compute/manager level to know this. Another solution is yet ANOTHER config variable to control the action of either logging the warning or doing the power off