Comment 6 for bug 1470420

Revision history for this message
Rajesh Tailor (rajesh-tailor) wrote :

Hi

Following are some approaches to solve this issue. Please suggest which would be the better way.

1) As suggested by Paul Murray, we can modify resize operation and set migration status to 'failed' on resize operation failure.
In this case, we need to modify periodic_task _cleanup_incomplete_migrations and add 'failed' status instead of 'error' in filter for migrations.

2) We can add new migration status 'cleaned', which will be set in periodic task _cleanup_incomplete_migrations.

We can filter migration status which are having 'error' or 'failed' status in periodic task _cleanup_incomplete_migrations and once instance files are deleted from compute node (either source or dest node) we can set newly added migration status 'cleaned' so that the same record is not filtered in subsequent periodic task run.

3) As suggested by Nikola Dipanov, it is reasonable to have retry logic in on self.driver.live_migration call. In that case, if retry logic is not successful (i.e. its unrecoverable situation) then ultimately migration status would be set to 'error' by _rollback_live_migration. But as of now, we don't have retry logic on live_migration driver call.

4) We can stick to the patch which is currently under review and replace migration status from 'failed' to 'error' wherever required.