Restarting libvirtd breaks Eucalyptus NC

Bug #512887 reported by ariel on 2010-01-26
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Dustin Kirkland 
Dustin Kirkland 

Bug Description

Steps to reproduce:

- make sure Eucalyptus NC is up and running
- run /etc/init.d/libvirt-bin restart
- try to start a new Instance. If all other NC's are busy and the new request lands in the current NC, then
  will show:

[EUCAINFO ] doRunInstance() invoked (id=i-4912092F cores=2 disk=10 memory=512)
[EUCAERROR ] libvirt: cannot send data: Broken pipe (code=38)
[EUCAINFO ] currently running/booting: i-4912092F
[EUCAERROR ] libvirt: cannot send data: Broken pipe (code=38)
[EUCAFATAL ] hypervisor failed to start domain
[EUCAINFO ] doTerminateInstance() invoked (id=i-4912092F)
[EUCAERROR ] libvirt: cannot send data: Broken pipe (code=38)
[EUCAWARN ] warning: domain i-4912092F to be terminated not running on hypervisor

Apparently the NC keeps trying to use the old libvirtd socket and doesn't notice the daemon was restarted.
Another VERY nasty consequence is that Eucalyptus looses track of the previously running instances in that node! (even if they stay running in KVM)

In CC.log:
[EUCAINFO ] TerminateInstances(): calling terminate instance (i-42A10746) on (141.x.x.x)
[EUCAERROR ] ERROR: TerminateInstance() could not be invoked (check NC host, port, and credentials)

while in the NC we still have:
# virsh list
Connecting to uri: qemu:///system
 Id Name State
 107 i-42A10746 running

At the very least the libvirt-bin/Euca-NC upstart dependencies should be such that restarting libvirtd restarts NC, or ideally and much better, Eucalyptus NC should be fixed to handle a restarted libvirtd.

Description: Ubuntu 9.10
Release: 9.10

ii euca2ools 1.0+bzr20091007-0ubuntu1.1 managing cloud instances for Eucalyptus
ii eucalyptus-common 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Com
ii eucalyptus-gl 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture - Log
ii eucalyptus-nc 1.6~bzr931-0ubuntu7.4 Elastic Utility Computing Architecture

Thierry Carrez (ttx) wrote :

Needs reproduction on lucid / 1.6.2

Changed in eucalyptus (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Dustin Kirkland  (kirkland) wrote :

We should test and confirm/fix and close this next week at the sprint.

Changed in eucalyptus (Ubuntu):
milestone: none → lucid-alpha-3
Dustin Kirkland  (kirkland) wrote :

Still a problem in current Lucid.

Dustin Kirkland  (kirkland) wrote :

Dan, looking for some advice here...

eucalyptus-nc definitely is not surviving libvirt-bin restarts. It loses the socket. Is there an obvious way we can get eucalyptus-nc to re-attach the socket on a libvirt restart?

Dustin Kirkland  (kirkland) wrote :

Nevermind, Dan, I'm on it...

Changed in eucalyptus (Ubuntu Lucid):
assignee: nobody → Dustin Kirkland (kirkland)
status: Confirmed → In Progress
Changed in eucalyptus (Ubuntu Lucid):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2~bzr1166-0ubuntu3

eucalyptus (1.6.2~bzr1166-0ubuntu3) lucid; urgency=low

  * debian/eucalyptus-nc.upstart: handle libvirt restarts, LP: #512887
  * eucalyptus-cc.eucalyptus-cc-publication.upstart,
    eucalyptus-cloud.upstart, eucalyptus-common.eucalyptus.upstart,
    eucalyptus-walrus.upstart, uec-component-listener.upstart: add a few
    inline comments, including a comment at the top of every upstart script
    that seems to be required to get get vim syntax highlighting to work
  * eucalyptus-cc.postrm, eucalyptus-cloud.postrm,
    eucalyptus-common.postrm, eucalyptus-sc.postrm,
    eucalyptus-walrus.postrm, uec-component-listener.postrm: fix package
    purging with per-package file purging lists, LP: #503063
  * eucalyptus-cc.eucalyptus-cc-publication.upstart,
    eucalyptus-walrus.eucalyptus-walrus-publication.upstart: stop publication
    jobs if the relevant service stops running
 -- Dustin Kirkland <email address hidden> Wed, 03 Feb 2010 19:01:47 -0800

Changed in eucalyptus (Ubuntu Lucid):
status: Fix Committed → Fix Released
ariel (garcia) wrote :

Great, thanks!

BTW, shouldn't this issue be reported upstream, to fix the NC to not loose the socket?
Fixing eucalyptus-nc.upstart will handle the case of a "civilized" restart by the sysadmin, but not if for instance the libvirtd dies and gets started by hand with /usr/sbin/libvirtd -d right? and also only in Ubuntu ;-)
Should i report it upstream, or does it get forwarded automatically, or does somebody else care?

Hmm, you can report it upstream, if you like. Upstream uses sysvinit
scripts rather than upstart, so the issue may or may not be present
there, I'm not entirely sure...

ariel (garcia) wrote :

It is upstream bug

No, IMHO they shouldn't "fix" it at the init script level but in the code itself.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers