Live migration failure in API doesn't revert task_state to None
Bug #1276214 reported by
Loganathan Parthipan
This bug affects 6 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Maciej Szankin | ||
Mitaka |
Fix Released
|
Undecided
|
Lee Yarwood |
Bug Description
If API times out on a RPC during the processing of a migrate_server it does not revert the task_state back to NULL before or after sending the error response back to the user. This can prevent further API operations on the VM and leave a good VMs in non-operable state with the exception of perhaps a delete.
This is one possible reproducer. I'm not sure if this is always true, and I'd appreciate if someone else confirm it.
1. Somehow make RPC requests hang
2. Issue a live migration request
3. The call should return an HTTP error (409 perhaps)
4. Check VM. It should be in a good state but the task_state stuck in 'migrating'
Changed in nova: | |
assignee: | nobody → Loganathan Parthipan (parthipan) |
Changed in nova: | |
assignee: | Loganathan Parthipan (parthipan) → Davanum Srinivas (DIMS) (dims-v) |
assignee: | Davanum Srinivas (DIMS) (dims-v) → nobody |
Changed in nova: | |
assignee: | nobody → Pawel Koniszewski (pawel-koniszewski) |
Changed in nova: | |
assignee: | Pawel Koniszewski (pawel-koniszewski) → Bartosz Fic (bartosz-fic) |
tags: | added: liberty-rc-potential |
tags: |
added: live-migration removed: live-migrate |
Changed in nova: | |
assignee: | Bartosz Fic (bartosz-fic) → John Garbutt (johngarbutt) |
Changed in nova: | |
assignee: | John Garbutt (johngarbutt) → Bartosz Fic (bartosz-fic) |
Changed in nova: | |
assignee: | Bartosz Fic (bartosz-fic) → Pawel Koniszewski (pawel-koniszewski) |
Changed in nova: | |
assignee: | Pawel Koniszewski (pawel-koniszewski) → nobody |
status: | In Progress → Confirmed |
Changed in nova: | |
assignee: | nobody → Pawel Koniszewski (pawel-koniszewski) |
status: | Confirmed → In Progress |
Changed in nova: | |
assignee: | Pawel Koniszewski (pawel-koniszewski) → Maciej Szankin (mszankin) |
tags: | removed: liberty-backport-potential |
To post a comment you must log in.
We should either put the VM into ERROR, if we can't rollback.
Or we should rollback and reset to ACTIVE.
I have recently made sure we now record instance faults, so there is a tiny bit more fault tracking.