libvirtd: Unable to read from monitor: Connection reset by peer

Bug #1397385 reported by Dennis Dmitriev
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Matthew Mosesohn

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/5.1_swarm/job/5.1_fuelmain.system_test.centos.thread_1/60/

OSTF test "Launch instance, create snapshot, launch instance from snapshot" failed because of libvirtd error.

Steps to reproduce:
    1. Create a cluster (CentOS, 1 compute, 1 controller, nova-network flat-dhcp)
    2. Launch instance from image,
    3. Create snapshot from the instance,
    4. Terminate the instance,
    5. Launch instance from the snapshot.

The test failed on step 3.
The same scenario on this same environment ran without errors later.

I face the same behaviour of libvirtd on my workstation from time to time, when system overloaded. In these cases help retry of the command. Maybe we should add some retries to the operations with libvirtd.

========== /var/log/libvirt/libvirtd.log :
2014-11-28 04:52:07.767+0000: 1601: error : qemuMonitorIORead:557 : Unable to read from monitor: Connection reset by peer

That caused an error on the compute node:

========== /var/log/nova/compute.log
2014-11-28 04:51:57.755 1631 AUDIT nova.compute.manager [req-7309f427-88c6-4db1-b572-2206405a38c3 None] [instance: b885b752-649b-4293-b615-b88d811086e2] Terminating instance
2014-11-28 04:52:08.095 1631 DEBUG nova.virt.driver [-] Emitting event <LifecycleEvent: 1417150328.09, b885b752-649b-4293-b615-b88d811086e2 => Stopped> emit_event /usr/lib/python2.6/site-packages/nova/virt/driver.py:1214
2014-11-28 04:52:08.095 1631 INFO nova.compute.manager [-] [instance: b885b752-649b-4293-b615-b88d811086e2] VM Stopped (Lifecycle Event)
2014-11-28 04:52:08.101 1631 INFO nova.virt.libvirt.driver [req-16a63e32-f662-400d-bf29-23c36289110a None] [instance: b885b752-649b-4293-b615-b88d811086e2] Snapshot extracted, beginning image upload
2014-11-28 04:52:08.102 1631 DEBUG nova.compute.manager [req-16a63e32-f662-400d-bf29-23c36289110a None] [instance: b885b752-649b-4293-b615-b88d811086e2] Cleaning up image 51ff1c8f-4906-498a-a779-a0dd81ec6211 decorated_function /usr/lib/python2.6/site-packages/nova/compute/manager.py:355
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] Traceback (most recent call last):
...
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] rv = meth(*args,**kwargs)
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] File "/usr/lib64/python2.6/site-packages/libvirt.py", line 662, in blockJobAbort
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
2014-11-28 04:52:08.102 1631 TRACE nova.compute.manager [instance: b885b752-649b-4293-b615-b88d811086e2] libvirtError: Unable to read from monitor: Connection reset by peer
======================================

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Changed in fuel:
assignee: nobody → Matthew Mosesohn (raytrac3r)
status: New → Confirmed
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

I did some research and found some possible upstream bugs this is related to. It's either a race condition or libvirt bug. We should get a reproduction. The env is lost from 11/28, so waiting for a reproducer.

Changed in fuel:
importance: Undecided → High
status: Confirmed → Incomplete
Changed in fuel:
milestone: 5.1.1 → 5.1.2
Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

This bug was incomplete for more than 4 weeks. We cannot investigate it further so we are setting the status to Invalid. If you think it is not correct, please feel free to provide requested information and reopen the bug, and we will look into it further.

Revision history for this message
Dennis Schridde (urzds) wrote :

> I did some research and found some possible upstream bugs this is related to.

Could you please point out these bug reports?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.