libvirtError: Unable to read from monitor: Connection reset by peer

Bug #1255624 reported by Joe Gordon
70
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Unassigned

Bug Description

http://logs.openstack.org/67/35767/34/check/check-tempest-devstack-vm-full/d096dde/logs/screen-n-cpu.txt.gz?#_2013-11-27_17_29_40_345

Stacktrace:
...
   203-11-27 17:29:40.345 23415 TRACE nova.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist- packages/eventlet/tpool.py", line 77, in tworker
   2013-11-27 17:29:40.345 23415 TRACE nova.openstack.common.rpc.amqp rv = meth(*args,**kwargs)
   2013-11-27 17:29:40.345 23415 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1352, in suspend
   2013-11-27 17:29:40.345 23415 TRACE nova.openstack.common.rpc.amqp if ret == -1: raise libvirtError ('virDomainSuspend() failed', dom=self)
   2013-11-27 17:29:40.345 23415 TRACE nova.openstack.common.rpc.amqp libvirtError: Unable to read from monitor: Connection reset by peer

logstash query: message:"libvirtError\: Unable to read from monitor: Connection reset by peer" AND "pause_instance"

This stacktrace cause gate to fail some of the time, the gate failing case above was for a pause command.

Revision history for this message
Joe Gordon (jogo) wrote :

Gate issue. marking as critical

Changed in nova:
importance: Undecided → Critical
Revision history for this message
Joe Gordon (jogo) wrote :
Revision history for this message
Daniel Berrange (berrange) wrote :

> 2013-11-27 17:29:40.345 23415 TRACE nova.openstack.common.rpc.amqp libvirtError: Unable to read from monitor: Connection reset by peer

This error message from libvirt indicates that the QEMU monitor has closed the connection. This is usually a result of QEMU crashing, or being killed. I've never seen this happen in response to a virDomainSuspend API call.

To further diagnose this you need to examine more logs, in particular /var/log/libvirt/qemu/$GUESTNAME.log - if we're lucky (we're usually not) then this may indicate why QEMU crashed. Also check /var/log/messages for stupid things like the OOM killer going mental & slaughtering QEMU.

Joe Gordon (jogo)
description: updated
Michael Still (mikal)
tags: added: libvirt
Changed in nova:
status: New → Triaged
Thierry Carrez (ttx)
Changed in nova:
milestone: none → icehouse-2
Revision history for this message
Matt Riedemann (mriedem) wrote :

I think bug 1260644 is related, except it's doing a rescue operation instead of pause, but I don't know if that matters.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Marked bug 1255624 as a dupe of this since the libvirt driver code now automatically disables the host service when the connection is not available to libvirtd which causes the ComputeFilter in the nova scheduler to skip over the (only) host and set the instance to ERROR state in some gate runs.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Hoping that these will provide some libvirt logs that we can use to uniquely fingerprint this error:

https://review.openstack.org/#/c/65834/ - Enable server-side and client-side logs for libvirt
https://review.openstack.org/#/c/65833/ - Capture libvirtd logs

Changed in nova:
milestone: icehouse-2 → icehouse-3
Revision history for this message
Peter Portante (peter-a-portante) wrote :

Is this failure similar enough to this that it has the same cause?

http://logs.openstack.org/52/70652/1/check/check-tempest-dsvm-full/a9bf53c/logs/screen-n-cpu.txt.gz

Revision history for this message
Peter Portante (peter-a-portante) wrote :

Sorry, extracting the line I am keying off of:

2014-02-03 06:04:24.335 28690 TRACE nova.compute.manager [instance: 0103d8f3-f900-4c9e-b718-96552cebc559] libvirtError: Unable to write to monitor: Broken pipe

Joe Gordon (jogo)
tags: added: testing
Revision history for this message
Joe Gordon (jogo) wrote :
Changed in nova:
importance: Critical → High
Changed in nova:
milestone: icehouse-3 → none
Changed in nova:
status: Triaged → Fix Committed
Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.