2012-03-27 15:14:41 |
Johannes Erdfelt |
description |
If an instance is in RESIZE_VERIFY state, a rescue is currently allowed. This resets the task_state, causing nova-compute to forget about the original instance since it's in the middle of a resize.
Either the original instance should be deleted during a rescue (similar to confirm resize), or rescue should be disallowed during RESIZE_VERIFY state.
johannes@compute1:~/openstack/nova/trunk$ nova list
+--------------------------------------+------+--------+----------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+----------------------+
| 4865d9d7-346d-469a-9786-55a4b4127b5d | test | ACTIVE | public=192.168.128.3 |
+--------------------------------------+------+--------+----------------------+
johannes@compute1:~/openstack/nova/trunk$ nova resize 4865d9d7-346d-469a-9786-55a4b4127b5d 1
johannes@compute1:~/openstack/nova/trunk$ nova list
+--------------------------------------+------+---------------+----------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+---------------+----------------------+
| 4865d9d7-346d-469a-9786-55a4b4127b5d | test | VERIFY_RESIZE | public=192.168.128.3 |
+--------------------------------------+------+---------------+----------------------+
johannes@compute1:~/openstack/nova/trunk$ nova rescue 4865d9d7-346d-469a-9786-55a4b4127b5d
[...]
johannes@compute1:~/openstack/nova/trunk$ nova list
+--------------------------------------+------+--------+----------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+----------------------+
| 4865d9d7-346d-469a-9786-55a4b4127b5d | test | RESCUE | public=192.168.128.3 |
+--------------------------------------+------+--------+----------------------+
johannes@compute1:~/openstack/nova/trunk$ nova unrescue 4865d9d7-346d-469a-9786-55a4b4127b5d
johannes@compute1:~/openstack/nova/trunk$ nova list
+--------------------------------------+------+--------+----------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+----------------------+
| 4865d9d7-346d-469a-9786-55a4b4127b5d | test | ACTIVE | public=192.168.128.3 |
+--------------------------------------+------+--------+----------------------+ |
Resizing an instance is a two-step process. First the instance is resized and the task_state ends up set to RESIZE_VERIFY. Then a second step is required to confirm or revert the resize, eventually cleaning up one instance and clearing the task_state.
However, a variety of commands are allowed in RESIZE_VERIFY that will end up clearing the task_state without cleaning up one of the instances (the original since the new instance is the one running at that time).
These operations appear to result in this scenario happening:
stop
reboot
rebuild
pause
suspend
rescue
changePassword
createImage
(I've personally seen it and tested it with rescue and changePassword, the others I found from reading the code, there may be more too).
I'm not sure if the API intended for any of these operations to occur when in RESIZE_VERIFY. If not, the simplest fix would be to prevent any of these from executing while in RESIZE_VERIFY.
If it is intended, some redesigning of how states are tracked is necessary to avoid task_state being cleared. |
|