Comment 4 for bug 1799152

Revision history for this message
Fan Zhang (fanzhang) wrote :

In our case, we didn't permit post copy live migration, a pre copy migration was executing, and during migration process, vm, aka qemu process on the *source node* was killed due to host OOM. The domain status is SHUTOFF, then in get_job_info(), self._domain.jobStats() got a libvirt error 'VIR_ERR_OPERATION_INVALID'. In previous code, nova thinks the domain is shutdown or gone away, so it happily return JobInfo(type=libvirt.VIR_DOMAIN_JOB_COMPLETED), but it will eventually trigger post_live_migration() to delete source vm files. That's why I report this bug.

IMHO, if qemu-kvm process was killed by source host OOM, we would get error code VIR_ERR_OPERATION_INVALID reported by libvirt because domain state is SHUTOFF and we try to execute `self._domain.jobStats()`. In this case, migration job should be considered failed. If migration succeeded, libvirt would also kill qemu-kvm process, and domain state is SHUTOFF. Then we could get error code VIR_ERR_OPERATION_INVALID, but in such case, we should consider VIR_ERR_OPERATION_INVALID as nothing, just return JobInfo with type=VIR_DOMAIN_JOB_COMPLETED. The difference between these two cases is that under the latter circumstances, we would eventually get VIR_ERR_NO_DOMAIN if we try to get job info for couple of more times.