Fuel for OpenStack

libvirtd: Unable to read from monitor: Connection reset by peer

Bug #1397385 reported by Dennis Dmitriev on 2014-11-28

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Matthew Mosesohn	Fuel for OpenStack 5.1.1-updates

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/5.1_swarm/job/5.1_fuelmain.system_test.centos.thread_1/60/

OSTF test "Launch instance, create snapshot, launch instance from snapshot" failed because of libvirtd error.

Steps to reproduce:
    1. Create a cluster (CentOS, 1 compute, 1 controller, nova-network flat-dhcp)
    2. Launch instance from image,
    3. Create snapshot from the instance,
    4. Terminate the instance,
    5. Launch instance from the snapshot.

The test failed on step 3.
The same scenario on this same environment ran without errors later.

I face the same behaviour of libvirtd on my workstation from time to time, when system overloaded. In these cases help retry of the command. Maybe we should add some retries to the operations with libvirtd.

========== /var/log/libvirt/libvirtd.log :
2014-11-28 04:52:07.767+0000: 1601: error : qemuMonitorIORead:557 : Unable to read from monitor: Connection reset by peer

That caused an error on the compute node:

========== /var/log/nova/compute.log
2014-11-28 04:51:57.755 1631 AUDIT nova.compute.manager [req-7309f427-88c6-4db1-b572-2206405a38c3 None] [instance: b885b752-649b-4293-b615-b88d811086e2] Terminating instance
2014-11-28 04:52:08.095 1631 DEBUG nova.virt.driver [-] Emitting event <LifecycleEvent: 1417150328.09, b885b752-649b-4293-b615-b88d811086e2 => Stopped> emit_event /usr/lib/python2.6/site-packages/nova/virt/driver.py:1214
2014-11-28 04:52:08.095 1631 INFO nova.compute.manager [-] [instance: b885b752-649b-4293-b615-b88d811086e2] VM Stopped (Lifecycle Event)
2014-11-28 04:52:08.101 1631 INFO nova.virt.libvirt.driver [req-16a63e32-f662-400d-bf29-23c36289110a None] [instance: b885b752-649b-4293-b615-b88d811086e2] Snapshot extracted, beginning image upload
2014-11-28 04:52:08.102 1631 DEBUG nova.compute.manager [req-16a63e32-f662-400d-bf29-23c36289110a None] [instance: b885b752-649b-4293-b615-b88d811086e2] Cleaning up image 51ff1c8f-4906-498a-a779-a0dd81ec6211 decorated_function /usr/lib/python2.6/site-packages/nova/compute/manager.py:355
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] Traceback (most recent call last):
...
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] rv = meth(*args,**kwargs)
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] File "/usr/lib64/python2.6/site-packages/libvirt.py", line 662, in blockJobAbort
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] libvirtError: Unable to read from monitor: Connection reset by peer
======================================

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2014-11-28:

fail_error_simple_flat_warm_restart-2014_11_28__04_54_06.tar.gz Edit (2.4 MiB, application/x-tar)

Matthew Mosesohn (raytrac3r) on 2014-12-01

Changed in fuel:
assignee:	nobody → Matthew Mosesohn (raytrac3r)
status:	New → Confirmed

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2014-12-01:

I did some research and found some possible upstream bugs this is related to. It's either a race condition or libvirt bug. We should get a reproduction. The env is lost from 11/28, so waiting for a reproducer.

Changed in fuel:
importance:	Undecided → High
status:	Confirmed → Incomplete

Dmitry Borodaenko (angdraug) on 2014-12-03

Changed in fuel:
milestone:	5.1.1 → 5.1.2

Oleksiy Molchanov (omolchanov) on 2015-01-09

Changed in fuel:
status:	Incomplete → Invalid

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2015-01-12:

This bug was incomplete for more than 4 weeks. We cannot investigate it further so we are setting the status to Invalid. If you think it is not correct, please feel free to provide requested information and reopen the bug, and we will look into it further.

Revision history for this message

Dennis Schridde (urzds) wrote on 2018-06-11:

> I did some research and found some possible upstream bugs this is related to.

Could you please point out these bug reports?