Taskmanager resize/migration actions Exception does not properly handle failures.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack DBaaS (Trove) |
Fix Released
|
Undecided
|
Joe Cruz |
Bug Description
* First in case of a failure the Taskmanager should call the restart MySQL guest methods only if the Nova server is ACTIVE
When Resizing, Reddwarf checks that the Nova status switches from ACTIVE to VERIFY_RESIZE. If anything goes wrong, it restarts MySQL in any event. If the MySQL app can be restarted, why not?
However, we need to add code to first check that the Nova server status is ACTIVE and only then make the call to restart.
Why?
Let's say a customer is resizing their database, and triggers a migration. As the migration runs, the network is cut, causing the migration to error out. Nova sets the status to ERROR.
The Reddwarf task manager code sees this, and sends a message to restart MySQL.
If the server isn't running, the guest doesn't pick it up until it's turned back on. What happens then depends on the state of the system at that time.
If the original server is still running, the guest may pick up the message and in theory if the volume was disconnected during the Nova migration, but the Nova server is otherwise OK, MySQL may start up and create new databases over the old one. In this theoretical scenario it might look as if data was deleted. Fixing this would require ops to stop MySQL, delete the new database, then reattach the volume and start MySQL.
We can avoid this theoretical scenario by checking that the server is in ACTIVE status before restarting the guest.
* Second revert barrier should be right after Verify_RESIZE is confirmed, but before confirming flavor for a resize action.
Currently if there is a resize failure where the nova server is in VERIFY_RESIZE but has the old flavor id the _perform_
Changed in reddwarf: | |
assignee: | nobody → Joe Cruz (jcruz7) |
summary: |
- Taskmanager should call the restart MySQL guest methods only if the Nova - server is ACTIVE + Taskmanager resize/migration actions Exception does not properly handle + failures. |
description: | updated |
Changed in trove: | |
milestone: | none → havana-2 |
status: | Fix Committed → Fix Released |
Changed in trove: | |
milestone: | havana-2 → 2013.2 |
Fix proposed to branch: master /review. openstack. org/20351
Review: https:/