When Instance is paused state,live migration failure.

Bug #1921306 reported by Xinxin Shen
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

When the virtual machine is in the paused state and continuously live migrated twice, the Instance state changes to ERROR. The version of nova is Rocky。

2021-03-22 19:34:38.388 6 DEBUG nova.virt.libvirt.guest [- req-None - - - - -] Failed to get job stats: Unable to read from monitor: Connection reset by peer get_job_info /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/guest.py:767
2021-03-22 19:34:38.389 6 WARNING nova.virt.libvirt.driver [- req-None - - - - -] [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Error monitoring migration: Unable to read from monitor: Connection reset by peer: libvirtError: Unable to read from monitor: Connection reset by peer
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Traceback (most recent call last):
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7793, in _live_migration
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] finish_event, disk_paths)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7593, in _live_migration_monitor
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] info = guest.get_job_info()
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 751, in get_job_info
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] stats = self._domain.jobStats()
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] result = proxy_call(self._autowrap, f, *args, **kwargs)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] rv = execute(f, *args, **kwargs)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] six.reraise(c, e, tb)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] rv = meth(*args, **kwargs)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1433, in jobStats
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] if ret is None: raise libvirtError ('virDomainGetJobStats() failed', dom=self)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] libvirtError: Unable to read from monitor: Connection reset by peer
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: d1d2af1f-e973-438c-a6b6-b628091d3596]
2021-03-22 19:34:38.394 6 DEBUG nova.virt.libvirt.driver [- req-None - - - - -] [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Live migration monitoring is all done _live_migration /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7800
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [- req-None - - - - -] [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Live migration failed.: libvirtError: Unable to read from monitor: Connection reset by peer
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Traceback (most recent call last):
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 6510, in _do_live_migration
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] block_migration, migrate_data)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7279, in live_migration
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] migrate_data)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7793, in _live_migration
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] finish_event, disk_paths)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7593, in _live_migration_monitor
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] info = guest.get_job_info()
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 751, in get_job_info
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] stats = self._domain.jobStats()
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] result = proxy_call(self._autowrap, f, *args, **kwargs)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] rv = execute(f, *args, **kwargs)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] six.reraise(c, e, tb)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] rv = meth(*args, **kwargs)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1433, in jobStats
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] if ret is None: raise libvirtError ('virDomainGetJobStats() failed', dom=self)
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] libvirtError: Unable to read from monitor: Connection reset by peer
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: d1d2af1f-e973-438c-a6b6-b628091d3596]
2021-03-22 19:34:38.637 6 ERROR nova.virt.libvirt.driver [- req-None - - - - -] [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Live Migration failure: operation failed: domain is not running: libvirtError: operation failed: domain is not running

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Could you please state the version of nova you are using? Does 'nova.inspur.virt.inspur.driver' is an out of tree virt dirver used in your deployment?

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Marking this Incomplete until version information is provided. Please set it back to New when you answered the above questions.

Changed in nova:
status: New → Incomplete
Xinxin Shen (runsxx)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Changed in nova:
status: Expired → New
Revision history for this message
Pierre Thévenet (pthevenet) wrote :

Hi,

We are experiencing the same issue with the same error "Migration failure: operation failed: domain is not running"

nova-compute version: 22.2.1
virt_type = kvm
qemu-system 4.2.1

Best wishes

Revision history for this message
Artom Lifshitz (notartom) wrote :

A paused domain should be able to get live migrated. Are you sure the domain hasn't been stopped behind Nova's back, directly through `virsh`, for example? Are you able to reproduce this, but before live migrating the VM through Nova, do a `virsh list --all` and observe in what state is the instance?

Thanks!

Changed in nova:
status: New → Incomplete
Revision history for this message
Pierre Thévenet (pthevenet) wrote :

Hi,

A `virsh list --all` shows the domain:
- running after being created (on compute node 3)
- paused after being paused (on compute node 3)
- paused after being live migrated (on compute node 1) (first migration)

After the second live migration, the instance is not shown in `virsh list --all` in any compute node.

After creating the instance, these are the steps I take:
```
openstack server pause myvm
openstack server migrate --live-migration <instance_name> # first live migration
openstack server migrate --live-migration <instance_name> # second live migration
```

Unrelated note if that helps: I managed to recover an instance by doing:
```
openstack server set <instance_name> --state active
openstack server reboot <instance_name> --hard
```

Thanks,
Pierre

Changed in nova:
status: Incomplete → New
Revision history for this message
Lee Yarwood (lyarwood) wrote :

This smells like a known QEMU issue that we encountered in CI a while ago causing the underlying guest to crash:

[QEMU] Back-n-forth live migration of a paused VM results in QEMU crash with: "bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed."

https://bugzilla.redhat.com/show_bug.cgi?id=1713009

You should see the above error captured in the guest log under /var/log/libvirt/qemu/$domain_name.log

Marking this bug as invalid for Nova but feel free to add the QEMU project in assuming you're using the Ubuntu provided package.

Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.