Nova no longer reconnects if libvirt is restarted

Bug #1154473 reported by Rafi Khardalian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Russell Bryant

Bug Description

Nova used to recover gracefully if libvirt was restarted out from under it, this is no longer the case. Steps to reproduce:

1. Start libvirtd (I am using version 1.0.2)
2. Start nova-compute (tested using master branch as of 3/13/2013)
3. Observe that everything functions normally
4. Restart libvirtd
5. nova-compute will now throw exceptions for every subsequent operation (nova.manager libvirtError: internal error client socket is closed)

This is a serious issue, as nova-compute will not recover on its own without being restarted.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Download full text (3.3 KiB)

2013-03-13 09:41:40.865 DEBUG nova.virt.libvirt.driver [-] Updating host stats from (pid=25201) update_status /opt/stack/nova/nova/virt/libvirt/driver.py:3681
2013-03-13 09:41:40.865 DEBUG nova.virt.libvirt.driver [-] Connecting to libvirt: qemu:///system from (pid=25201) _get_connection /opt/stack/nova/nova/virt/libvirt/driver.py:553
libvir: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
2013-03-13 09:41:40.866 ERROR nova.virt.libvirt.driver [-] Connection to libvirt failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
2013-03-13 09:41:40.866 TRACE nova.virt.libvirt.driver Traceback (most recent call last):
2013-03-13 09:41:40.866 TRACE nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 626, in _connect
2013-03-13 09:41:40.866 TRACE nova.virt.libvirt.driver return libvirt.openAuth(uri, auth, 0)
2013-03-13 09:41:40.866 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.7/dist-packages/libvirt.py", line 102, in openAuth
2013-03-13 09:41:40.866 TRACE nova.virt.libvirt.driver if ret is None:raise libvirtError('virConnectOpenAuth() failed')
2013-03-13 09:41:40.866 TRACE nova.virt.libvirt.driver libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
2013-03-13 09:41:40.866 TRACE nova.virt.libvirt.driver
2013-03-13 09:41:40.868 DEBUG nova.virt.libvirt.driver [-] Registering for lifecycle events <nova.virt.libvirt.driver.LibvirtDriver object at 0x2a2b910> from (pid=25201) _get_connection /opt/stack/nova/nova/virt/libvirt/driver.py:563
2013-03-13 09:41:40.868 WARNING nova.virt.libvirt.driver [-] URI qemu:///system does not support events
2013-03-13 09:41:40.868 ERROR nova.manager [-] Error during ComputeManager._report_driver_status: 'NoneType' object has no attribute 'getInfo'
2013-03-13 09:41:40.868 TRACE nova.manager Traceback (most recent call last):
2013-03-13 09:41:40.868 TRACE nova.manager File "/opt/stack/nova/nova/manager.py", line 241, in periodic_tasks
2013-03-13 09:41:40.868 TRACE nova.manager task(self, context)
2013-03-13 09:41:40.868 TRACE nova.manager File "/opt/stack/nova/nova/compute/manager.py", line 3597, in _report_driver_status
2013-03-13 09:41:40.868 TRACE nova.manager capabilities = self.driver.get_host_stats(refresh=True)
2013-03-13 09:41:40.868 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3381, in get_host_stats
2013-03-13 09:41:40.868 TRACE nova.manager return self.host_state.get_host_stats(refresh=refresh)
2013-03-13 09:41:40.868 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3676, in get_host_stats
2013-03-13 09:41:40.868 TRACE nova.manager self.update_status()
2013-03-13 09:41:40.868 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3683, in update_status
2013-03-13 09:41:40.868 TRACE nova.manager data["vcpus"] = self.driver.get_vcpu_total()
2013-03-13 09:41:40.868 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2514, in get_vcpu_total
2013-03-13 09:41:40.868 TRACE nova.manager return s...

Read more...

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

@rmk, i see the above in the logs, however a subsequent launch works fine

"nova boot --image cirros-0.3.0-x86_64 --flavor m1.small my-first-server-2"

dims@dims-desktop:~$ libvirtd --version
libvirtd (libvirt) 0.9.13

tags: added: grizzly-rc-potential
Revision history for this message
Russell Bryant (russellb) wrote :
Download full text (3.6 KiB)

Testing on Fedora 18. After restarting libvirtd, once the next periodic task that uses libvirt runs, I get:

libvir: XML-RPC error : internal error client socket is closed
2013-03-13 14:43:55.577 WARNING nova.virt.libvirt.driver [-] Cannot get the number of cpu, because this function is
not implemented for this platform.
libvir: XML-RPC error : internal error client socket is closed
2013-03-13 14:43:55.578 ERROR nova.manager [-] Error during ComputeManager._report_driver_status: internal error cli
ent socket is closed
2013-03-13 14:43:55.578 TRACE nova.manager Traceback (most recent call last):
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/manager.py", line 241, in periodic_tasks
2013-03-13 14:43:55.578 TRACE nova.manager task(self, context)
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/compute/manager.py", line 3529, in _report_driver_status
2013-03-13 14:43:55.578 TRACE nova.manager capability['host_ip'] = CONF.my_ip
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3367, in get_host_stats
2013-03-13 14:43:55.578 TRACE nova.manager return self.host_state.get_host_stats(refresh=refresh)
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3662, in get_host_stats
2013-03-13 14:43:55.578 TRACE nova.manager self.update_status()
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3670, in update_status
2013-03-13 14:43:55.578 TRACE nova.manager data["vcpus_used"] = self.driver.get_vcpu_used()
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2551, in get_vcpu_used
2013-03-13 14:43:55.578 TRACE nova.manager dom_ids = self.list_instance_ids()
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 656, in list_instance_ids
2013-03-13 14:43:55.578 TRACE nova.manager if self._conn.numOfDomains() == 0:
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 552, in _get_connection
2013-03-13 14:43:55.578 TRACE nova.manager if not self._wrapped_conn or not self._test_connection():
2013-03-13 14:43:55.578 TRACE nova.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 579, in _test_connection
2013-03-13 14:43:55.578 TRACE nova.manager self._wrapped_conn.getLibVersion()
2013-03-13 14:43:55.578 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 187, in doit
2013-03-13 14:43:55.578 TRACE nova.manager result = proxy_call(self._autowrap, f, *args, **kwargs)
2013-03-13 14:43:55.578 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 147, in proxy_call
2013-03-13 14:43:55.578 TRACE nova.manager rv = execute(f,*args,**kwargs)
2013-03-13 14:43:55.578 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 76, in tworker
2013-03-13 14:43:55.578 TRACE nova.manager rv = meth(*args,**kwargs)
2013-03-13 14:43:55.578 TRACE nova.manager File "/usr/lib64/python2.7/site...

Read more...

Changed in nova:
status: New → Confirmed
importance: Undecided → High
milestone: none → grizzly-rc1
Changed in nova:
assignee: nobody → Russell Bryant (russellb)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/24323

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/24323
Committed: http://github.com/openstack/nova/commit/956be49d0dd266be26673b05480523be55bd6267
Submitter: Jenkins
Branch: master

commit 956be49d0dd266be26673b05480523be55bd6267
Author: Russell Bryant <email address hidden>
Date: Wed Mar 13 11:03:42 2013 -0400

    Fix reconnecting to libvirt.

    The right logic was in place for dealing with losing a connection to
    libvirt and reconnecting. However, it was observed that we actually get
    VIR_ERR_INTERNAL_ERROR, VIR_FROM_RPC after restarting libvirtd. The
    code is just updated to deal with INTERNAL_ERROR.

    Fix bug 1154473.

    Change-Id: Idf4abf3fe485cf534f1732e4340fc35652fec003

Changed in nova:
status: In Progress → Fix Committed
tags: added: folsom-backport-potential
removed: grizzly-rc-potential
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-rc1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.