instances do not stop correctly

Bug #868349 reported by Eric Dodemont on 2011-10-05
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nova (Ubuntu)
High
Unassigned

Bug Description

The instances do not stop correctly. An error is triggered and the instances are not removed from the DB and are still shown as ACTIVE (even if the KVM processes and the network configuration are deleted correctly ).

* Package used: nova 2011.3-0ubuntu5 (but the problem is probably there since 2011.3-0ubuntu4)

* Commands: nova delete or euca-terminate-instance

* Error:

(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 222, in close
(nova.exception): TRACE: os.close(self.fd)
(nova.exception): TRACE: OSError: [Errno 9] Bad file descriptor

The error happens in:

File: /usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py
Class: ConsoleLogger(object)

It looks like this error happens since the modifications made to have a ring/fifo console log for KVM:

nova (2011.3-0ubuntu5) oneiric; urgency=low
* debian/patches/backport-libvirt-console-pipe.patch:
  - Patch updated to use correct patchset from upstream - incorrect version
    was uploaded in -0ubuntu4 (LP: #832507).

nova (2011.3-0ubuntu4) oneiric; urgency=low
* debian/patches/backport-libvirt-console-pipe.patch:
  Move console.log to a ringbuffer so that the console.log
  keeps filling up. (LP: #832507)

If I install Nova from the OpenStack PPA trunk (ppa:nova-core/trunk), I do not have the error.

Complete log:

2011-10-05 14:29:24,767 INFO nova.compute.manager [8d5251b5-7df9-4b58-a7c9-ca49beac00c6 dodeeric project-one] check_instance_lock: arguments: |<nova.compute.manager.ComputeManager object at 0x2395990>| |<nova.rpc.impl_kombu.RpcContext object at 0x46ed7d0>| |1|
2011-10-05 14:29:24,768 DEBUG nova.compute.manager [8d5251b5-7df9-4b58-a7c9-ca49beac00c6 dodeeric project-one] instance 1: getting locked state from (pid=912) get_lock /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1165
2011-10-05 14:29:24,843 INFO nova.compute.manager [8d5251b5-7df9-4b58-a7c9-ca49beac00c6 dodeeric project-one] check_instance_lock: locked: |False|
2011-10-05 14:29:24,843 INFO nova.compute.manager [8d5251b5-7df9-4b58-a7c9-ca49beac00c6 dodeeric project-one] check_instance_lock: admin: |True|
2011-10-05 14:29:24,843 INFO nova.compute.manager [8d5251b5-7df9-4b58-a7c9-ca49beac00c6 dodeeric project-one] check_instance_lock: executing: |<function terminate_instance at 0x2af1758>|
2011-10-05 14:29:24,897 AUDIT nova.compute.manager [8d5251b5-7df9-4b58-a7c9-ca49beac00c6 dodeeric project-one] Terminating instance 1
2011-10-05 14:29:24,897 DEBUG nova.rpc [-] Making asynchronous call on network ... from (pid=912) multicall /usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py:721
2011-10-05 14:29:24,897 DEBUG nova.rpc [-] MSG_ID is 5e8492b8c59e48cb8e4d4e745f73c240 from (pid=912) multicall /usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py:724
2011-10-05 14:29:25,044 DEBUG nova.rpc [-] Making asynchronous cast on network... from (pid=912) cast /usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py:747
2011-10-05 14:29:26,701 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=912) inner /usr/lib/python2.7/dist-packages/nova/utils.py:675
2011-10-05 14:29:26,701 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=912) inner /usr/lib/python2.7/dist-packages/nova/utils.py:680
2011-10-05 14:29:26,702 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=912) execute /usr/lib/python2.7/dist-packages/nova/utils.py:168
2011-10-05 14:29:27,397 INFO nova.virt.libvirt_conn [-] Instance instance-00000001 destroyed successfully.
2011-10-05 14:29:27,416 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=912) execute /usr/lib/python2.7/dist-packages/nova/utils.py:168
2011-10-05 14:29:27,442 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=912) execute /usr/lib/python2.7/dist-packages/nova/utils.py:168
2011-10-05 14:29:27,472 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=912) execute /usr/lib/python2.7/dist-packages/nova/utils.py:168
2011-10-05 14:29:28,627 DEBUG nova.virt.libvirt.firewall [-] The nwfilter(nova-instance-instance-00000001-secgroup) for instance-00000001 is not found. from (pid=912) unfilter_instance /usr/lib/python2.7/dist-packages/nova/virt/libvirt/firewall.py:320
2011-10-05 14:29:28,631 ERROR nova.exception [-] Uncaught exception
(nova.exception): TRACE: Traceback (most recent call last):
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 98, in wrapped
(nova.exception): TRACE: return f(*args, **kw)
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 117, in decorated_function
(nova.exception): TRACE: function(self, context, instance_id, *args, **kwargs)
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 493, in terminate_instance
(nova.exception): TRACE: self._shutdown_instance(context, instance_id, 'Terminating')
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 484, in _shutdown_instance
(nova.exception): TRACE: self.driver.destroy(instance, network_info)
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 399, in destroy
(nova.exception): TRACE: self._cleanup(instance)
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 406, in _cleanup
(nova.exception): TRACE: self._stop_console_logger(instance_name)
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 299, in _stop_console_logger
(nova.exception): TRACE: self.console_loggers[name].close()
(nova.exception): TRACE: File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 222, in close
(nova.exception): TRACE: os.close(self.fd)
(nova.exception): TRACE: OSError: [Errno 9] Bad file descriptor
(nova.exception): TRACE:
2011-10-05 14:29:28,638 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 620, in _process_data
(nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 129, in wrapped
(nova.rpc): TRACE: raise Error(str(e))
(nova.rpc): TRACE: Error: [Errno 9] Bad file descriptor
(nova.rpc): TRACE:
2011-10-05 14:30:22,457 INFO nova.compute.manager [-] Updating host status
2011-10-05 14:30:23,056 INFO nova.compute.manager [-] Found 1 in the database and 0 on the hypervisor.
2011-10-05 14:31:23,742 INFO nova.compute.manager [-] Found 1 in the database and 0 on the hypervisor.

Related branches

Robie Basak (racb) wrote :

Eric,

Thanks for reporting this.

I haven't been able to reproduce this in my test environment. I suspect a race, I think I know where it is and I've tried writing a few test cases but haven't been able to catch this.

Can you tell me if you are able to reliably reproduce this every time, or at all?

James Page (james-page) on 2011-10-05
tags: added: server-o-rs
Eric Dodemont (dodeeric) wrote :

Yes, this happens every time on my installation.

- If I install Nova from ppa:nova-core/trunk I do not have the problem (there is no console log fifo/ring).

- If I remove line 221 and line 222 in /usr/share/pyshared/nova/virt/libvirt/connection.py-ubuntu:

---
    def close(self):
        self.reader_thread.kill()
        self.data_queue.put(None)
        try:
            self.writer_thread.wait()
        except eventlet.greenlet.GreenletExit:
            pass
        if self.fd is not None: <=========== Line 221: Removed
            os.close(self.fd) <============ Line 222: Removed

    def peek(self):
        return self.ringbuffer.peek()
---

then everything works well.

I will reinstall Ubuntu Oneiric from scratch (because I made a lot of installs/uninstalls :-) to see if the problem is still there.

I keep you updated maybe still today.

Robie Basak (racb) wrote :

Thanks Eric. I've reproduced it on a fresh oneiric install and am working on it now.

Changed in nova (Ubuntu):
status: New → Confirmed
Robie Basak (racb) on 2011-10-05
Changed in nova (Ubuntu):
importance: Undecided → High
Robie Basak (racb) wrote :

It seems that os.open(..., os.O_NONBLOCK) nevertheless does block, so self.fd is not replaced. If ConsoleLogger.close() gets called at this point from another thread, then a double close is attempted. The fix for this is trivial, although I'm still baffled by how this is possible in eventlet. I also can't seem to be able to reproduce it in a test. However I have reproduced by hand, and have verified that the fix appears to work. The fix is also trivial enough that I'm confident it won't cause any other issues.

The attachment "868349.patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Robie Basak (racb) on 2011-10-06
Changed in nova (Ubuntu):
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2011.3-0ubuntu6

---------------
nova (2011.3-0ubuntu6) oneiric; urgency=low

  * debian/patches/backport-libvirt-console-pipe.patch:
    - Patch updated to fix race on instance termination (LP: #868349)
 -- Robie Basak <email address hidden> Wed, 05 Oct 2011 17:37:49 +0100

Changed in nova (Ubuntu):
status: Fix Committed → Fix Released
Eric Dodemont (dodeeric) wrote :

I reinstalled all from scratch (Ubuntu Oneiric and Nova Diablo), and indeed the bug was still there.

After applying your patch, the problem was solved!

Thanks.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers