Bug #855030 “Encountering sporadic AMQPChannelException” : Bugs : OpenStack Compute (nova)

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-09-20:

#1

Pretty interesting. This has to be from an rpc.call. The caller creates a queue for the response from the worker. It appears the worker is not finding this queue when it goes to 'msg_reply'.

The only reasons I can think of how this could happen:

1) After the caller publishes a request for a worker.... it is getting an exception during waiting for the response. That would cause the queue to disappear. (Seems unlikely)
2) After the caller publishes a request for a worker... it gets disconnected from rabbit, which causes the queue to be deleted momentarily before it reconnects and re-declares it... and the worker tries to respond in the middle of this. (Seems unlikely)
3) rabbit is returning an ACK to the caller (amqplib) before it is actually finished creating the queue... and there's a race condition where the worker tries to reply before the queue is created. (Seems unlikely)
4) kombu+amqplib randomly doesn't send a declare_queue() ? Seems unlikely without an exception thrown. And an exception would abort the publish to the worker.

I wonder what I'm missing. What version of kombu and what version of rabbit? Have you tried falling back to carrot? You can specify: 'rpc_backend=nova.rpc.impl_carrot' to try it.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-21:

#2

ii python-kombu 1.0.4-2 AMQP Messaging Framework for Python
ii rabbitmq-server 2.5.0-1ubuntu2 An AMQP server written in Erlang

I added the carrot suggestion to nova.conf and it still fails about the same 1 in 20 or so vms not coming up but with a different error. I would be happy to help diagnose this if you have any other suggestions.

2011-09-21 09:17:24,546 DEBUG nova.network.manager [db549763-93f2-444f-bdd8-d4931b2fa6c7 6deb27e2-4b73-4d73-8196-6ccdbcee6859 kranzproje
ct] network allocations for instance 108 from (pid=1096) allocate_for_instance /usr/lib/pymodules/python2.7/nova/network/manager.py:431
2011-09-21 09:17:24,705 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_carrot.py", line 285, in _process_data
(nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 216, in allocate_for_instance
(nova.rpc): TRACE: ips = super(FloatingIP, self).allocate_for_instance(context, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 438, in allocate_for_instance
(nova.rpc): TRACE: requested_networks=requested_networks)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 166, in _allocate_fixed_ips
(nova.rpc): TRACE: vpn=vpn, address=address)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 609, in allocate_fixed_ip
(nova.rpc): TRACE: instance_id)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/db/api.py", line 343, in fixed_ip_associate_pool
(nova.rpc): TRACE: instance_id, host)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 101, in wrapper
(nova.rpc): TRACE: return f(*args, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 720, in fixed_ip_associate_pool
(nova.rpc): TRACE: raise exception.NoMoreFixedIps()
(nova.rpc): TRACE: NoMoreFixedIps: None
(nova.rpc): TRACE:
2011-09-21 09:17:24,739 ERROR nova.rpc [-] Returning exception None to caller
2011-09-21 09:17:24,739 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/pymodules/python2.7/nova/rpc/impl_

ii  python-kombu                 1.0.4-2                      AMQP Messaging Framework for Python
ii  rabbitmq-server              2.5.0-1ubuntu2               An AMQP server written in Erlang

I added the carrot suggestion to nova.conf and it still fails about the same 1 in 20 or so vms not coming up but with a different error. I would be happy to help diagnose this if you have any other suggestions.

2011-09-21 09:17:24,546 DEBUG nova.network.manager [db549763-93f2-444f-bdd8-d4931b2fa6c7 6deb27e2-4b73-4d73-8196-6ccdbcee6859 kranzproje
ct] network allocations for instance 108 from (pid=1096) allocate_for_instance /usr/lib/pymodules/python2.7/nova/network/manager.py:431
2011-09-21 09:17:24,705 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_carrot.py", line 285, in _process_data
(nova.rpc): TRACE:     rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 216, in allocate_for_instance
(nova.rpc): TRACE:     ips = super(FloatingIP, self).allocate_for_instance(context, **kwargs)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 438, in allocate_for_instance
(nova.rpc): TRACE:     requested_networks=requested_networks)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 166, in _allocate_fixed_ips
(nova.rpc): TRACE:     vpn=vpn, address=address)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 609, in allocate_fixed_ip
(nova.rpc): TRACE:     instance_id)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/db/api.py", line 343, in fixed_ip_associate_pool
(nova.rpc): TRACE:     instance_id, host)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 101, in wrapper
(nova.rpc): TRACE:     return f(*args, **kwargs)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 720, in fixed_ip_associate_pool
(nova.rpc): TRACE:     raise exception.NoMoreFixedIps()
(nova.rpc): TRACE: NoMoreFixedIps: None
(nova.rpc): TRACE: 
2011-09-21 09:17:24,739 ERROR nova.rpc [-] Returning exception None to caller
2011-09-21 09:17:24,739 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-21:

#3

My apologies. The carrot test was not valid. The last error was caused by the fact that I was running out of fixed ip addresses. My guess is that when the instances that fail to come up are terminated, their IP addresses are not returned. I will continue to investigate on a clean install.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-21:

#4

Download full text (5.8 KiB)

I tried again on freshly reinstalled OS and the behavior is repeatable. I also saw one case where there was a slightly different AMPQ error that happened during instance termination:

headline: AMQPConnectionException: (503, u"COMMAND_INVALID - second 'channel.open' seen", (20, 10), 'Channel.open')

2011-09-21 11:33:05,739 DEBUG nova.utils [-] Attempting to grab semaphore "get_dhcp" for method "_get_dhcp_ip"... from (pid=1094) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
2011-09-21 11:33:05,746 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 626, in _process_data
(nova.rpc): TRACE: ctxt.reply(rval, None)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
(nova.rpc): TRACE: msg_reply(self.msg_id, *args, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
(nova.rpc): TRACE: conn.direct_send(msg_id, msg)
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
(nova.rpc): TRACE: self._done()
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
(nova.rpc): TRACE: self.connection.reset()
(nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 383, in reset
(nova.rpc): TRACE: self.channel = self.connection.channel()
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 99, in channel
(nova.rpc): TRACE: return self.transport.create_channel(self.connection)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 197, in create_channel
(nova.rpc): TRACE: return connection.channel()
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 130, in channel
(nova.rpc): TRACE: return Channel(self, channel_id)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 82, in __init__
(nova.rpc): TRACE: self._x_open()
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 471, in _x_open
(nova.rpc): TRACE: (20, 11), # Channel.open_ok
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 95, in wait
(nova.rpc): TRACE: self.channel_id, allowed_methods)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py", line 231, in _wait_method
(nova.rpc): TRACE: self.wait()
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
(nova.rpc): TRACE: return self.dispatch_method(method_sig, args, content)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
(nova.rpc): TRACE: return amqp_method(self, args)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py", line 380, in _close...

I tried again on freshly reinstalled OS and the behavior is repeatable. I also saw one case where there was a slightly different AMPQ error that happened during instance termination:

headline:  AMQPConnectionException: (503, u"COMMAND_INVALID - second 'channel.open' seen", (20, 10), 'Channel.open')

2011-09-21 11:33:05,739 DEBUG nova.utils [-] Attempting to grab semaphore "get_dhcp" for method "_get_dhcp_ip"... from (pid=1094) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
2011-09-21 11:33:05,746 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 626, in _process_data
(nova.rpc): TRACE:     ctxt.reply(rval, None)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
(nova.rpc): TRACE:     msg_reply(self.msg_id, *args, **kwargs)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
(nova.rpc): TRACE:     conn.direct_send(msg_id, msg)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
(nova.rpc): TRACE:     self._done()
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
(nova.rpc): TRACE:     self.connection.reset()
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 383, in reset
(nova.rpc): TRACE:     self.channel = self.connection.channel()
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 99, in channel
(nova.rpc): TRACE:     return self.transport.create_channel(self.connection)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 197, in create_channel
(nova.rpc): TRACE:     return connection.channel()
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 130, in channel
(nova.rpc): TRACE:     return Channel(self, channel_id)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 82, in __init__
(nova.rpc): TRACE:     self._x_open()
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 471, in _x_open
(nova.rpc): TRACE:     (20, 11),    # Channel.open_ok
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 95, in wait
(nova.rpc): TRACE:     self.channel_id, allowed_methods)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py", line 231, in _wait_method
(nova.rpc): TRACE:     self.wait()
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
(nova.rpc): TRACE:     return self.dispatch_method(method_sig, args, content)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
(nova.rpc): TRACE:     return amqp_method(self, args)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py", line 380, in _close
(nova.rpc): TRACE:     raise AMQPConnectionException(reply_code, reply_text, (class_id, method_id))
(nova.rpc): TRACE: AMQPConnectionException: (503, u"COMMAND_INVALID - second 'channel.open' seen", (20, 10), 'Channel.open')
(nova.rpc): TRACE:
2011-09-21 11:33:05,780 ERROR nova.rpc [-] Returning exception (503, u"COMMAND_INVALID - second 'channel.open' seen", (20, 10), 'Channel.open') to caller
2011-09-21 11:33:05,780 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 626, in _process_data\n    ctxt.reply(rval, None)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply\n    msg_reply(self.msg_id, *args, **kwargs)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_\
kombu.py", line 781, in msg_reply\n    conn.direct_send(msg_id, msg)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__\n    self._done()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done\n    self.connection.reset()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 383, in reset\n    self.ch\
annel = self.connection.channel()\n', '  File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 99, in channel\n    return self.transport.create_channel(self.connection)\n', '  File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 197, in create_channel\n    return connection.channel()\n', '  File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqpli\
b.py", line 130, in channel\n    return Channel(self, channel_id)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 82, in __init__\n    self._x_open()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 471, in _x_open\n    (20, 11),    # Channel.open_ok\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abst\
ract_channel.py", line 95, in wait\n    self.channel_id, allowed_methods)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py", line 231, in _wait_method\n    self.wait()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait\n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/pytho\
n2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method\n    return amqp_method(self, args)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py", line 380, in _close\n    raise AMQPConnectionException(reply_code, reply_text, (class_id, method_id))\n', 'AMQPConnectionException: (503, u"COMMAND_INVALID - second \'channel.open\' s\
een", (20, 10), \'Channel.open\')\n']

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-09-21: Re: [Bug 855030] Encountering sporadic AMQPChannelException

#5

Download full text (9.2 KiB)

Looks likeyou are running out of ips. Maybe you are running /27 which only has 32 addresses?

On Sep 21, 2011, at 6:22 AM, David Kranz wrote:

> ii python-kombu 1.0.4-2 AMQP Messaging Framework for Python
> ii rabbitmq-server 2.5.0-1ubuntu2 An AMQP server written in Erlang
>
> I added the carrot suggestion to nova.conf and it still fails about the
> same 1 in 20 or so vms not coming up but with a different error. I would
> be happy to help diagnose this if you have any other suggestions.
>
>
>
> 2011-09-21 09:17:24,546 DEBUG nova.network.manager [db549763-93f2-444f-bdd8-d4931b2fa6c7 6deb27e2-4b73-4d73-8196-6ccdbcee6859 kranzproje
> ct] network allocations for instance 108 from (pid=1096) allocate_for_instance /usr/lib/pymodules/python2.7/nova/network/manager.py:431
> 2011-09-21 09:17:24,705 ERROR nova.rpc [-] Exception during message handling
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_carrot.py", line 285, in _process_data
> (nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 216, in allocate_for_instance
> (nova.rpc): TRACE: ips = super(FloatingIP, self).allocate_for_instance(context, **kwargs)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 438, in allocate_for_instance
> (nova.rpc): TRACE: requested_networks=requested_networks)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 166, in _allocate_fixed_ips
> (nova.rpc): TRACE: vpn=vpn, address=address)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 609, in allocate_fixed_ip
> (nova.rpc): TRACE: instance_id)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/db/api.py", line 343, in fixed_ip_associate_pool
> (nova.rpc): TRACE: instance_id, host)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 101, in wrapper
> (nova.rpc): TRACE: return f(*args, **kwargs)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 720, in fixed_ip_associate_pool
> (nova.rpc): TRACE: raise exception.NoMoreFixedIps()
> (nova.rpc): TRACE: NoMoreFixedIps: None
> (nova.rpc): TRACE:
> 2011-09-21 09:17:24,739 ERROR nova.rpc [-] Returning exception None to caller
> 2011-09-21 09:17:24,739 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/pymodules/python2.7/nova/rpc/impl_
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
>
> Title:
> Encountering sporadic AMQPChannelException
>
> Status in OpenStack Compute (Nova):
> New
>
> Bug description:
> Running one controller and one compute, using nova-network version
> 2011.3~rc~20110909.r1155-0ub on Oneiric . Repeating the following:
>
> Create four instances, terminate them.
>
> This mostly works but one in every thre...

Looks likeyou are running out of ips.   Maybe you are running /27 which only has 32 addresses?

On Sep 21, 2011, at 6:22 AM, David Kranz wrote:

> ii  python-kombu                 1.0.4-2                      AMQP Messaging Framework for Python
> ii  rabbitmq-server              2.5.0-1ubuntu2               An AMQP server written in Erlang
> 
> I added the carrot suggestion to nova.conf and it still fails about the
> same 1 in 20 or so vms not coming up but with a different error. I would
> be happy to help diagnose this if you have any other suggestions.
> 
> 
> 
> 2011-09-21 09:17:24,546 DEBUG nova.network.manager [db549763-93f2-444f-bdd8-d4931b2fa6c7 6deb27e2-4b73-4d73-8196-6ccdbcee6859 kranzproje
> ct] network allocations for instance 108 from (pid=1096) allocate_for_instance /usr/lib/pymodules/python2.7/nova/network/manager.py:431
> 2011-09-21 09:17:24,705 ERROR nova.rpc [-] Exception during message handling
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_carrot.py", line 285, in _process_data
> (nova.rpc): TRACE:     rval = node_func(context=ctxt, **node_args)
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 216, in allocate_for_instance
> (nova.rpc): TRACE:     ips = super(FloatingIP, self).allocate_for_instance(context, **kwargs)
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 438, in allocate_for_instance
> (nova.rpc): TRACE:     requested_networks=requested_networks)
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 166, in _allocate_fixed_ips
> (nova.rpc): TRACE:     vpn=vpn, address=address)
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/network/manager.py", line 609, in allocate_fixed_ip
> (nova.rpc): TRACE:     instance_id)
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/db/api.py", line 343, in fixed_ip_associate_pool
> (nova.rpc): TRACE:     instance_id, host)
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 101, in wrapper
> (nova.rpc): TRACE:     return f(*args, **kwargs)
> (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/db/sqlalchemy/api.py", line 720, in fixed_ip_associate_pool
> (nova.rpc): TRACE:     raise exception.NoMoreFixedIps()
> (nova.rpc): TRACE: NoMoreFixedIps: None
> (nova.rpc): TRACE: 
> 2011-09-21 09:17:24,739 ERROR nova.rpc [-] Returning exception None to caller
> 2011-09-21 09:17:24,739 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_
> 
> -- 
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
> 
> Title:
>  Encountering sporadic AMQPChannelException
> 
> Status in OpenStack Compute (Nova):
>  New
> 
> Bug description:
>  Running one controller and one compute, using nova-network version
>  2011.3~rc~20110909.r1155-0ub  on Oneiric . Repeating the following:
> 
>  Create four instances, terminate them.
> 
>  This mostly works but one in every three or four times, one of the vms
>  does not come up and I see this error in the nova-network log. This
>  can occur both in the controller node (also running compute) or the
>  compute node.  (contents of nova.conf follows)
> 
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
>  2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
>  (nova.rpc): TRACE: Traceback (most recent call last):
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
>  (nova.rpc): TRACE:     ctxt.reply(None, None)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
>  (nova.rpc): TRACE:     msg_reply(self.msg_id, *args, **kwargs)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
>  (nova.rpc): TRACE:     conn.direct_send(msg_id, msg)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
>  (nova.rpc): TRACE:     self._done()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
>  (nova.rpc): TRACE:     self.connection.reset()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset
>  (nova.rpc): TRACE:     self.channel.close()
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close
>  (nova.rpc): TRACE:     (20, 41),    # Channel.close_ok
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
>  (nova.rpc): TRACE:     return self.dispatch_method(method_sig, args, content)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
>  (nova.rpc): TRACE:     return amqp_method(self, args)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
>  (nova.rpc): TRACE:     (class_id, method_id))
>  (nova.rpc): TRACE: AMQPChannelException: (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish')
>  (nova.rpc): TRACE: 
>  2011-09-20 13:22:59,451 ERROR nova.rpc [-] Returning exception (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish') to caller
>  2011-09-20 13:22:59,452 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data\n    ctxt.reply(None, None)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply\n    msg_reply(self.msg_id, *args, **kwargs)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply\n    conn.direct_send(msg_id, msg)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__\n    self._done()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done\n    self.connection.reset()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset\n    self.channel.close()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close\n    (20, 41),    # Channel.close_ok\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait\n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method\n    return amqp_method(self, args)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close\n    (class_id, method_id))\n', 'AMQPChannelException: (404, u"NOT_FOUND - no exchange \'3ff1ba7e274a4ec2a2b0a217d7532c70\' in vhost \'/\'", (60, 40), \'Channel.basic_publish\')\n']
> 
> 
>  I also see this at the same time in the rabbitmq log:
> 
>  =ERROR REPORT==== 20-Sep-2011::13:22:59 ===
>  connection <0.1061.0>, channel 1 - error:
>  {amqp_error,not_found,
>              "no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'",
>              'basic.publish'}
> 
> 
> 
>  --flagfile=/etc/nova/nova-compute.conf
>  --use_deprecated_auth
>  --dhcpbridge_flagfile=/etc/nova/nova.conf
>  --dhcpbridge=/usr/bin/nova-dhcpbridge
>  --sql_connection=mysql://nova:notnova@172.18.0.141/nova
>  --s3_host=172.18.0.141
>  --rabbit_host=172.18.0.141
>  --glance_api_servers=172.18.0.141:9292
>  --logdir=/var/log/nova
>  --state_path=/var/lib/nova
>  --lock_path=/var/lock/nova
>  --verbose
>  --ec2_url=http://172.18.0.141:8773/services/Cloud
>  --dmz_cidr=172.18.0.141/32
>  --fixed_range=172.18.77.0/27
>  --network_size=8
>  --flat_network_dhcp_start=172.18.7.6
>  --image_service=nova.image.glance.GlanceImageService
>  --bridge_interface=eth1
>  --flat_network_bridge=br100
>  --network_manager=nova.network.manager.FlatDHCPManager
>  --public_interface=eth0
>  --multi_host
>  --osapi_host=172.18.0.141
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/855030/+subscriptions

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-21:

#6

Yes, I am. I did not realize that the DHCP leases were not immediately marked as reusable when an instance is terminated. It seems that the default time is 2min and when I was doing rapid testing of this problem I exceeded 32 available. Anyway, I replaced kombu with carrot and was able to start up many, many instances without problems. I then switched back to kombu and the problem happened on the first shot after that. Not sure what more I can offer at this point...

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-09-21:

#7

Hmm.. Does this happen under load? Is it just 4 instances one after another?

Another thing to check: amqplib version:

>>>
>>> from amqplib.client_0_8 import connection
>>> connection.LIBRARY_PROPERTIES
{'library': 'Python amqplib', 'library_version': '1.0.0'}

It's possible carrot works with an older version, but kombu wants a newer version...

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-09-21:

#8

Oh, and I was going to also suggest trying to remove the current kombu version and do a 'pip install kombu', which will grab the latest from pypi... and see if the issue goes away. kombu is up to 1.3.* or something now.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-22:

#9

I'm kind of at a loss here. The ampqlib version is the same as you said. I used pip to uninstall kombu and upgrade to 1.3.5. That was the latest version this morning though the latest seems to be 1.4.0 right how. Anyway it made no difference. I am no expert on the mess that is the relationship between python package versions between pip and dpkg so it is possible nova was still using the old kombu after my upgrade. I don't think so because just invoking python from the command line showed the new version of kombu when I imported it.

This does happen under load in the sense that I used the euca command to create 4 instances at once. Has any one else done something like this with Oneiric? On the one hand it is hard to see how I am having this problem if others are not, but I don't see what I could be doing wrong either given that everything works for instance creations where this does not happen.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-09-23:

#10

Download full text (7.8 KiB)

Hmm,

One further question. Are you using mysql or sqlite? Sqlite doesn't really support concurrency well, so you could be getting db locking errors that are bubbling up to the messaging layer. If you are using sqlite, could you see if you get the same error with mysql?

On Sep 22, 2011, at 11:11 AM, David Kranz wrote:

> I'm kind of at a loss here. The ampqlib version is the same as you said.
> I used pip to uninstall kombu and upgrade to 1.3.5. That was the latest
> version this morning though the latest seems to be 1.4.0 right how.
> Anyway it made no difference. I am no expert on the mess that is the
> relationship between python package versions between pip and dpkg so it
> is possible nova was still using the old kombu after my upgrade. I don't
> think so because just invoking python from the command line showed the
> new version of kombu when I imported it.
>
> This does happen under load in the sense that I used the euca command to
> create 4 instances at once. Has any one else done something like this
> with Oneiric? On the one hand it is hard to see how I am having this
> problem if others are not, but I don't see what I could be doing wrong
> either given that everything works for instance creations where this
> does not happen.
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
>
> Title:
> Encountering sporadic AMQPChannelException
>
> Status in OpenStack Compute (Nova):
> New
>
> Bug description:
> Running one controller and one compute, using nova-network version
> 2011.3~rc~20110909.r1155-0ub on Oneiric . Repeating the following:
>
> Create four instances, terminate them.
>
> This mostly works but one in every three or four times, one of the vms
> does not come up and I see this error in the nova-network log. This
> can occur both in the controller node (also running compute) or the
> compute node. (contents of nova.conf follows)
>
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
> 2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
> (nova.rpc): TRACE: Traceback (most recent call l...

Hmm,

One further question.  Are you using mysql or sqlite?  Sqlite doesn't really support concurrency well, so you could be getting db locking errors that are bubbling up to the messaging layer.  If you are using sqlite, could you see if you get the same error with mysql?

On Sep 22, 2011, at 11:11 AM, David Kranz wrote:

> I'm kind of at a loss here. The ampqlib version is the same as you said.
> I used pip to uninstall kombu and upgrade to 1.3.5. That was the latest
> version this morning though the latest seems to be 1.4.0 right how.
> Anyway it made no difference. I am no expert on the mess that is the
> relationship between python package versions between pip and dpkg so it
> is possible nova was still using the old kombu after my upgrade. I don't
> think so because just invoking python from the command line showed the
> new version of kombu when I imported it.
> 
> This does happen under load in the sense that I used the euca command to
> create 4 instances at once. Has any one else done something like this
> with Oneiric?  On the one hand it is hard to see how I am having this
> problem if others are not, but I don't see what I could be doing wrong
> either given that everything works for instance creations where this
> does not happen.
> 
> -- 
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
> 
> Title:
>  Encountering sporadic AMQPChannelException
> 
> Status in OpenStack Compute (Nova):
>  New
> 
> Bug description:
>  Running one controller and one compute, using nova-network version
>  2011.3~rc~20110909.r1155-0ub  on Oneiric . Repeating the following:
> 
>  Create four instances, terminate them.
> 
>  This mostly works but one in every three or four times, one of the vms
>  does not come up and I see this error in the nova-network log. This
>  can occur both in the controller node (also running compute) or the
>  compute node.  (contents of nova.conf follows)
> 
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
>  2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
>  (nova.rpc): TRACE: Traceback (most recent call last):
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
>  (nova.rpc): TRACE:     ctxt.reply(None, None)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
>  (nova.rpc): TRACE:     msg_reply(self.msg_id, *args, **kwargs)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
>  (nova.rpc): TRACE:     conn.direct_send(msg_id, msg)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
>  (nova.rpc): TRACE:     self._done()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
>  (nova.rpc): TRACE:     self.connection.reset()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset
>  (nova.rpc): TRACE:     self.channel.close()
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close
>  (nova.rpc): TRACE:     (20, 41),    # Channel.close_ok
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
>  (nova.rpc): TRACE:     return self.dispatch_method(method_sig, args, content)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
>  (nova.rpc): TRACE:     return amqp_method(self, args)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
>  (nova.rpc): TRACE:     (class_id, method_id))
>  (nova.rpc): TRACE: AMQPChannelException: (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish')
>  (nova.rpc): TRACE: 
>  2011-09-20 13:22:59,451 ERROR nova.rpc [-] Returning exception (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish') to caller
>  2011-09-20 13:22:59,452 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data\n    ctxt.reply(None, None)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply\n    msg_reply(self.msg_id, *args, **kwargs)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply\n    conn.direct_send(msg_id, msg)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__\n    self._done()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done\n    self.connection.reset()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset\n    self.channel.close()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close\n    (20, 41),    # Channel.close_ok\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait\n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method\n    return amqp_method(self, args)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close\n    (class_id, method_id))\n', 'AMQPChannelException: (404, u"NOT_FOUND - no exchange \'3ff1ba7e274a4ec2a2b0a217d7532c70\' in vhost \'/\'", (60, 40), \'Channel.basic_publish\')\n']
> 
> 
>  I also see this at the same time in the rabbitmq log:
> 
>  =ERROR REPORT==== 20-Sep-2011::13:22:59 ===
>  connection <0.1061.0>, channel 1 - error:
>  {amqp_error,not_found,
>              "no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'",
>              'basic.publish'}
> 
> 
> 
>  --flagfile=/etc/nova/nova-compute.conf
>  --use_deprecated_auth
>  --dhcpbridge_flagfile=/etc/nova/nova.conf
>  --dhcpbridge=/usr/bin/nova-dhcpbridge
>  --sql_connection=mysql://nova:notnova@172.18.0.141/nova
>  --s3_host=172.18.0.141
>  --rabbit_host=172.18.0.141
>  --glance_api_servers=172.18.0.141:9292
>  --logdir=/var/log/nova
>  --state_path=/var/lib/nova
>  --lock_path=/var/lock/nova
>  --verbose
>  --ec2_url=http://172.18.0.141:8773/services/Cloud
>  --dmz_cidr=172.18.0.141/32
>  --fixed_range=172.18.77.0/27
>  --network_size=8
>  --flat_network_dhcp_start=172.18.7.6
>  --image_service=nova.image.glance.GlanceImageService
>  --bridge_interface=eth1
>  --flat_network_bridge=br100
>  --network_manager=nova.network.manager.FlatDHCPManager
>  --public_interface=eth0
>  --multi_host
>  --osapi_host=172.18.0.141
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/855030/+subscriptions

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-23:

#11

I am using mysql. In case it matters I am running nova-network on both nodes and also nova-api on both nodes as suggested by you here http://osdir.com/ml/openstack-cloud-computing/2011-09/msg00283.html

I did a fresh install using lucid and the same thing happens with one exception. On oneiric, the AMQP trace showing in the log is always associated with a vm never transitioning from pending to running. On lucid, I still sometimes see the error in the log but the VM does come up.

There must be some kind of race condition. Is there currently a test for nova that

1. Deploys a multinode configuration
2. Starts and terminates groups of vms for a long time, making sure they all end up running
3. Also fails if any error is written to a log file

Also, I did the latest install using ppa:openstack-release/2011.3 but I sometimes see things like this in the log. I find these messages in bug tickets that seem like they were fixed for diablo so I am not sure why I see them:

DBError: exceptions must be old-style classes or derived from BaseException, not NoneType

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-09-23:

#12

Hm. I haven't used the multi host network stuff, myself, so I can't comment much on the above right now.

There _is_ a bug filed for the DBError above. Unclear exactly what it is, but sqlalchemy is catching an exception, doing transaction cleanup, and then doing a re-raise via a bare raise. Somewhere in sqlalchemy's transaction cleanup, the original exception is getting eaten, so the bare raise does a raise of NoneType.

BTW, there is 1 case I can think of that could happen in the real world where the queue doesn't exist when doing a reply. This can happen if a service makes an rpc.call that results in a long-running transaction.... and the original service is restarted before the long-running transaction completes. I know this probably isn't what is happening in your case, but we probably should catch the AMQPChannelException for this case...

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-09-24:

#13

381 def cursor(self, *args, **kwargs):
382 try:
383 c = self.connection.cursor(*args, **kwargs)
384 return _CursorFairy(self, c)
385 except Exception, e:
386 self.invalidate(e=e)
387 raise

I expect the bare raise is here (sqlalchemy/pool.py). Pretty sure self.invalidate is making socket connections, so the exception is getting eaten by eventlet. I'm trying to investigate if there is a way to make eventlet not clear exception info. This could be especially dangerous if the higher level code is trying to catch a specific type of exception. Might be worth it to try one of the following things:

1) Try to make eventlet stop eating exception info:

diff -r f30a2fa65f30 eventlet/hubs/hub.py
--- a/eventlet/hubs/hub.py Wed Jun 08 23:47:26 2011 -0700
+++ b/eventlet/hubs/hub.py Sat Sep 24 05:04:30 2011 -0700
@@ -173,7 +173,7 @@
                 cur.parent = self.greenlet
         except ValueError:
             pass # gets raised if there is a greenlet parent cycle
- clear_sys_exc_info()
+ #clear_sys_exc_info()
         return self.greenlet.switch()

def squelch_exception(self, fileno, exc_info):

or 2) modify the above code to reraise the exception

diff --git a/Library/Python/2.7/site-packages/sqlalchemy/pool.py b/pool.py
index 632d955..21e7055 100644
--- a/Library/Python/2.7/site-packages/sqlalchemy/pool.py
+++ b/pool.py
@@ -383,8 +383,10 @@ class _ConnectionFairy(object):
             c = self.connection.cursor(*args, **kwargs)
             return _CursorFairy(self, c)
         except Exception, e:
+ import sys
+ exc = sys.exc_info()
             self.invalidate(e=e)
- raise
+ raise exc

def __getattr__(self, key):
return getattr(self.connection, key)

I will do some more investigation into this when I can. We need a clear repro case for this.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-26:

#14

I created a small script that calls euca-create-instance four times, creating one instance each time, and then waits for them all to be running. It then terminates them and tries again. I started running this against a single nova node running all services and after a few times through got this error in nova-network:

ResourceClosedError: This result object does not return rows. It has been closed automatically.

I see that there is a recent bug 857714 about that error. At this point I am forced to conclude that either I have no idea what I am doing (I am pretty new to Nova) , or there are some serious issues with Diablo. I think what I am doing is the most elementary thing one could do as part of testing Nova so I don't know what to make of this. Has this kind of rudimentary stress test been successfully run by any one?

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-09-27:

#15

Give this a try now. This was marked a duplicate of lp:838581 and a fix was merged earlier.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-27:

#16

I will, but I have been installing from oneiric packages. How do I know when an updated package is available?

Revision history for this message

David Kranz (david-kranz) wrote on 2011-09-30:

#17

Download full text (9.6 KiB)

I built and installed packages from 2012.1~e1~20110909.1546-0ubuntu1 and now I just get these errors about some bad SQL syntax:

2011-09-30 13:36:59,829 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 620, in _process_data
(nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 317, in get_floating_ips_by_fixed_address
(nova.rpc): TRACE: fixed_address)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 295, in floating_ip_get_by_fixed_address
(nova.rpc): TRACE: return IMPL.floating_ip_get_by_fixed_address(context, fixed_address)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper
(nova.rpc): TRACE: return f(*args, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 668, in floating_ip_get_by_fixed_address
(nova.rpc): TRACE: fixed_ip = fixed_ip_get_by_address(context, fixed_address, session)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper
(nova.rpc): TRACE: return f(*args, **kwargs)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 838, in fixed_ip_get_by_address
(nova.rpc): TRACE: options(joinedload('instance')).\
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1652, in first
(nova.rpc): TRACE: ret = list(self[0:1])
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1560, in __getitem__
(nova.rpc): TRACE: return list(res)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1721, in __iter__
(nova.rpc): TRACE: return self._execute_and_instances(context)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1726, in _execute_and_instances
(nova.rpc): TRACE: mapper=self._mapper_zero_or_none())
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 724, in execute
(nova.rpc): TRACE: clause, params or {})
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1191, in execute
(nova.rpc): TRACE: params)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1271, in _execute_clauseelement
(nova.rpc): TRACE: return self.__execute_context(context)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1302, in __execute_context
(nova.rpc): TRACE: context.parameters[0], context=context)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1401, in _cursor_execute
(nova.rpc): TRACE: context)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1394, in _cursor_execute
(nova.rpc): TRACE: context)
(nova.rpc): TRACE: File ...

I built and installed packages from 2012.1~e1~20110909.1546-0ubuntu1 and now I just get these errors about some bad SQL syntax:

2011-09-30 13:36:59,829 ERROR nova.rpc [-] Exception during message handling
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 620, in _process_data
(nova.rpc): TRACE:     rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 317, in get_floating_ips_by_fixed_address
(nova.rpc): TRACE:     fixed_address)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 295, in floating_ip_get_by_fixed_address
(nova.rpc): TRACE:     return IMPL.floating_ip_get_by_fixed_address(context, fixed_address)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper
(nova.rpc): TRACE:     return f(*args, **kwargs)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 668, in floating_ip_get_by_fixed_address
(nova.rpc): TRACE:     fixed_ip = fixed_ip_get_by_address(context, fixed_address, session)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper
(nova.rpc): TRACE:     return f(*args, **kwargs)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 838, in fixed_ip_get_by_address
(nova.rpc): TRACE:     options(joinedload('instance')).\
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1652, in first
(nova.rpc): TRACE:     ret = list(self[0:1])
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1560, in __getitem__
(nova.rpc): TRACE:     return list(res)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1721, in __iter__
(nova.rpc): TRACE:     return self._execute_and_instances(context)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 1726, in _execute_and_instances
(nova.rpc): TRACE:     mapper=self._mapper_zero_or_none())
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 724, in execute
(nova.rpc): TRACE:     clause, params or {})
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1191, in execute
(nova.rpc): TRACE:     params)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1271, in _execute_clauseelement
(nova.rpc): TRACE:     return self.__execute_context(context)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1302, in __execute_context
(nova.rpc): TRACE:     context.parameters[0], context=context)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1401, in _cursor_execute
(nova.rpc): TRACE:     context)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1394, in _cursor_execute
(nova.rpc): TRACE:     context)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 299, in do_execute
(nova.rpc): TRACE:     cursor.execute(statement, parameters)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 174, in execute
(nova.rpc): TRACE:     self.errorhandler(self, exc, value)
(nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/MySQLdb/connections.py", line 36, in defaulterrorhandler
(nova.rpc): TRACE:     raise errorclass, errorvalue
(nova.rpc): TRACE: ProgrammingError: (ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') AND fixed_ips.deleted = 0 \n LIMIT 0, 1) AS anon_1 LEFT OUTER JOIN networks AS ' at line 4") 'SELECT anon_1.fixed_ips_created_at AS anon_1_fixed_ips_created_at, anon_1.fixed_ips_updated_at AS anon_1_fixed_ips_updated_at, anon_1.fixed_ips_deleted_at AS anon_1_fixed_ips_deleted_at, anon_1.fixed_ips_deleted AS anon_1_fixed_ips_deleted, anon_1.fixed_ips_id AS anon_1_fixed_ips_id, anon_1.fixed_ips_address AS anon_1_fixed_ips_address, anon_1.fixed_ips_network_id AS anon_1_fixed_ips_network_id, anon_1.fixed_ips_virtual_interface_id AS anon_1_fixed_ips_virtual_interface_id, anon_1.fixed_ips_instance_id AS anon_1_fixed_ips_instance_id, anon_1.fixed_ips_allocated AS anon_1_fixed_ips_allocated, anon_1.fixed_ips_leased AS anon_1_fixed_ips_leased, anon_1.fixed_ips_reserved AS anon_1_fixed_ips_reserved, anon_1.fixed_ips_host AS anon_1_fixed_ips_host, networks_1.created_at AS networks_1_created_at, networks_1.updated_at AS networks_1_updated_at, networks_1.deleted_at AS networks_1_deleted_at, networks_1.deleted AS networks_1_deleted, networks_1.id AS networks_1_id, networks_1.label AS networks_1_label, networks_1.injected AS networks_1_injected, networks_1.cidr AS networks_1_cidr, networks_1.cidr_v6 AS networks_1_cidr_v6, networks_1.multi_host AS networks_1_multi_host, networks_1.gateway_v6 AS networks_1_gateway_v6, networks_1.netmask_v6 AS networks_1_netmask_v6, networks_1.netmask AS networks_1_netmask, networks_1.bridge AS networks_1_bridge, networks_1.bridge_interface AS networks_1_bridge_interface, networks_1.gateway AS networks_1_gateway, networks_1.broadcast AS networks_1_broadcast, networks_1.dns1 AS networks_1_dns1, networks_1.dns2 AS networks_1_dns2, networks_1.vlan AS networks_1_vlan, networks_1.vpn_public_address AS networks_1_vpn_public_address, networks_1.vpn_public_port AS networks_1_vpn_public_port, networks_1.vpn_private_address AS networks_1_vpn_private_address, networks_1.dhcp_start AS networks_1_dhcp_start, networks_1.project_id AS networks_1_project_id, networks_1.priority AS networks_1_priority, networks_1.host AS networks_1_host, networks_1.uuid AS networks_1_uuid, instances_1.created_at AS instances_1_created_at, instances_1.updated_at AS instances_1_updated_at, instances_1.deleted_at AS instances_1_deleted_at, instances_1.deleted AS instances_1_deleted, instances_1.id AS instances_1_id, instances_1.user_id AS instances_1_user_id, instances_1.project_id AS instances_1_project_id, instances_1.image_ref AS instances_1_image_ref, instances_1.kernel_id AS instances_1_kernel_id, instances_1.ramdisk_id AS instances_1_ramdisk_id, instances_1.server_name AS instances_1_server_name, instances_1.launch_index AS instances_1_launch_index, instances_1.key_name AS instances_1_key_name, instances_1.key_data AS instances_1_key_data, instances_1.power_state AS instances_1_power_state, instances_1.vm_state AS instances_1_vm_state, instances_1.task_state AS instances_1_task_state, instances_1.memory_mb AS instances_1_memory_mb, instances_1.vcpus AS instances_1_vcpus, instances_1.local_gb AS instances_1_local_gb, instances_1.hostname AS instances_1_hostname, instances_1.host AS instances_1_host, instances_1.instance_type_id AS instances_1_instance_type_id, instances_1.user_data AS instances_1_user_data, instances_1.reservation_id AS instances_1_reservation_id, instances_1.scheduled_at AS instances_1_scheduled_at, instances_1.launched_at AS instances_1_launched_at, instances_1.terminated_at AS instances_1_terminated_at, instances_1.availability_zone AS instances_1_availability_zone, instances_1.display_name AS instances_1_display_name, instances_1.display_description AS instances_1_display_description, instances_1.launched_on AS instances_1_launched_on, instances_1.locked AS instances_1_locked, instances_1.os_type AS instances_1_os_type, instances_1.architecture AS instances_1_architecture, instances_1.vm_mode AS instances_1_vm_mode, instances_1.uuid AS instances_1_uuid, instances_1.root_device_name AS instances_1_root_device_name, instances_1.default_local_device AS instances_1_default_local_device, instances_1.default_swap_device AS instances_1_default_swap_device, instances_1.config_drive AS instances_1_config_drive, instances_1.access_ip_v4 AS instances_1_access_ip_v4, instances_1.access_ip_v6 AS instances_1_access_ip_v6, instances_1.managed_disk AS instances_1_managed_disk, instances_1.progress AS instances_1_progress, floating_ips_1.created_at AS floating_ips_1_created_at, floating_ips_1.updated_at AS floating_ips_1_updated_at, floating_ips_1.deleted_at AS floating_ips_1_deleted_at, floating_ips_1.deleted AS floating_ips_1_deleted, floating_ips_1.id AS floating_ips_1_id, floating_ips_1.address AS floating_ips_1_address, floating_ips_1.fixed_ip_id AS floating_ips_1_fixed_ip_id, floating_ips_1.project_id AS floating_ips_1_project_id, floating_ips_1.host AS floating_ips_1_host, floating_ips_1.auto_assigned AS floating_ips_1_auto_assigned \nFROM (SELECT fixed_ips.created_at AS fixed_ips_created_at, fixed_ips.updated_at AS fixed_ips_updated_at, fixed_ips.deleted_at AS fixed_ips_deleted_at, fixed_ips.deleted AS fixed_ips_deleted, fixed_ips.id AS fixed_ips_id, fixed_ips.address AS fixed_ips_address, fixed_ips.network_id AS fixed_ips_network_id, fixed_ips.virtual_interface_id AS fixed_ips_virtual_interface_id, fixed_ips.instance_id AS fixed_ips_instance_id, fixed_ips.allocated AS fixed_ips_allocated, fixed_ips.leased AS fixed_ips_leased, fixed_ips.reserved AS fixed_ips_reserved, fixed_ips.host AS fixed_ips_host \nFROM fixed_ips \nWHERE fixed_ips.address = %s AND fixed_ips.deleted = %s \n LIMIT 0, 1) AS anon_1 LEFT OUTER JOIN networks AS networks_1 ON networks_1.id = anon_1.fixed_ips_network_id LEFT OUTER JOIN instances AS instances_1 ON anon_1.fixed_ips_instance_id = instances_1.id AND anon_1.fixed_ips_deleted = %s LEFT OUTER JOIN floating_ips AS floating_ips_1 ON floating_ips_1.fixed_ip_id = anon_1.fixed_ips_id AND floating_ips_1.deleted = %s' ([u'172.18.77.2'], 0, 0, 0)

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-13:

#18

I tested again using the devstack install to create two nodes (after verifying that the fix for bug 838581 is in session.py) and still see the problem, but less frequently. It is not quite the same because this is running from source rather than packages, and using 'nova boot' instead of euca-run-instances. I don't know how to get euca-run-instances to work when using keystone to manage users. 'nova boot' doesn't seem to allow multiple instances to be created in one call but I changed the min_count default in the source to 8 for this test.

diff --git a/novaclient/v1_1/shell.py b/novaclient/v1_1/shell.py
index 6b83292..21ee0c9 100644
--- a/novaclient/v1_1/shell.py
+++ b/novaclient/v1_1/shell.py
@@ -33,7 +33,7 @@ AUTO_KEY = object()
def _boot(cs, args, reservation_id=None, min_count=None, max_count=None):
     """Boot a new server."""
     if min_count is None:
- min_count = 1
+ min_count = 8
     if max_count is None:
         max_count = min_count
     if min_count > max_count:

This bug really needs to be reopened but I am not sure how to do that.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-13:

#19

My nova.conf for the last test:

--logdir=/var/log/nova
--verbose
--nodaemon
--scheduler_driver=nova.scheduler.simple.SimpleScheduler
--dhcpbridge_flagfile=/opt/stack/nova/bin/nova.conf
--network_manager=nova.network.manager.FlatDHCPManager
--my_ip=172.18.0.141
--public_interface=eth0
--vlan_interface=eth0
--sql_connection=mysql://root:mysql@172.18.0.141/nova
--libvirt_type=kvm
--osapi_extensions_path=/opt/stack/openstackx/extensions
--vncproxy_url=http://172.18.0.141:6080
--vncproxy_wwwroot=/opt/stack/noVNC/
--api_paste_config=/opt/stack/keystone/examples/paste/nova-api-paste.ini
--image_service=nova.image.glance.GlanceImageService
--ec2_dmz_host=172.18.0.141
--rabbit_host=172.18.0.141
--rabbit_password=rabbit
--glance_api_servers=172.18.0.141:9292
--flat_network_bridge=br100
--flat_interface=eth1
--multi_host=1

Revision history for this message

Scott Moser (smoser) wrote on 2011-10-18:

#20

Per David's comments, I've removed the 'duplicate' flag for this bug.

Revision history for this message

Scott Moser (smoser) wrote on 2011-10-19:

#21

David,
I talked some with Vish in IRC today. It seems like we're in need of some more data to diagnose. I was unable to reproduce this issue in a little "all in one" nova.sh install that I did and Vish hasn't seen anything like it.
We're assuming that the bug history and traces pasted represents different failures.

Since you're able to reproduce with devstack, could you set up one on ubuntu either Natty or Oneiric, recreate and report:
* git commit tip
* nova.conf (as you have above)
* traces that you see in the logs when the issue occurs

I think you've addressed it, but make sure you're not just running out of IP addresses.

Thanks in advance.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2011-10-19.log

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-20:

#22

Download full text (4.8 KiB)

OK. I just did it again. I am not sure what "git commit tip" means. The running out of addresses was a one-time issue I had before understanding that the leased IP addresses for the vms are not returned immediately when the vms are terminated.
I am assuming that this kind of error and stacktrace in the log would be considered a bug even if it does not cause any obvious misbehavior immediately.

I used the current version of devstack but with a modified stack.sh from last week. I modified it to log to files in /var/log/nova instead of stdout so you have to create that directory before running. I used one compute node running api,network,compute and one controller node running everything. After running the modified stack.sh I exported the shell variables in exercise.sh, modified the min count in nova client to create 10 vms, and did

nova boot --flavor 1 --image 3 testvms

In this case the vms seen to have started but there is the AMQP error stacktrace in the log. When this happens, sometimes all the vms come up and sometimes not. I have attached:

1. nova-network log with stacktrace
2. modified stack.sh
3. localrc for both controller and compute
4. nova.conf (even though it is generated)

(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 628, in _process_data
(nova.rpc): TRACE: ctxt.reply(None, None)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 673, in reply
(nova.rpc): TRACE: msg_reply(self.msg_id, *args, **kwargs)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 781, in msg_reply
(nova.rpc): TRACE: conn.direct_send(msg_id, msg)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 562, in __exit__
(nova.rpc): TRACE: self._done()
(nova.rpc): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 547, in _done
(nova.rpc): TRACE: self.connection.reset()
(nova.rpc): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 382, in reset
(nova.rpc): TRACE: self.channel.close()
(nova.rpc): TRACE: File "/usr/local/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 201, in close
(nova.rpc): TRACE: super(Channel, self).close()
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close
(nova.rpc): TRACE: (20, 41), # Channel.close_ok
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
(nova.rpc): TRACE: return self.dispatch_method(method_sig, args, content)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
(nova.rpc): TRACE: return amqp_method(self, args)
(nova.rpc): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
(nova.rpc): TRACE: (class_id, method_id))
(nova.rpc): TRACE: AMQPChannelException: (404, u"NOT_FOUND - no exchange '5b9a19d1b5e849d7ba749fada648a375' in vhost '/'", (60, 40), 'Channel.basic_publish')
(nova.rpc): TRACE:
2011-10-20 10:45:24,724 ERROR nova.rpc [-] Returning except...

OK. I just did it again. I am not sure what "git commit tip" means. The running out of addresses was a one-time issue I had before understanding that the leased IP addresses for the vms are not returned immediately when the vms are terminated.
I am assuming that this kind of error and stacktrace in the log would be considered a bug even if it does not cause any obvious misbehavior immediately.

I used the current version of devstack  but with a modified stack.sh from last week. I modified it to log to files in /var/log/nova instead of stdout so you have to create that directory before running. I used one compute node running api,network,compute and one controller node running everything. After running the modified stack.sh I exported the shell variables in exercise.sh, modified the min count in nova client to create 10 vms, and did

nova boot --flavor 1 --image 3 testvms

In this case the vms seen to have started but there is the AMQP error stacktrace in the log. When this happens, sometimes all the vms come up and sometimes not. I have attached:

1. nova-network log with stacktrace
2. modified stack.sh
3. localrc for both controller and compute
4. nova.conf (even though it is generated)

(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE:   File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 628, in _process_data
(nova.rpc): TRACE:     ctxt.reply(None, None)
(nova.rpc): TRACE:   File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 673, in reply
(nova.rpc): TRACE:     msg_reply(self.msg_id, *args, **kwargs)
(nova.rpc): TRACE:   File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 781, in msg_reply
(nova.rpc): TRACE:     conn.direct_send(msg_id, msg)
(nova.rpc): TRACE:   File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 562, in __exit__
(nova.rpc): TRACE:     self._done()
(nova.rpc): TRACE:   File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 547, in _done
(nova.rpc): TRACE:     self.connection.reset()
(nova.rpc): TRACE:   File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 382, in reset
(nova.rpc): TRACE:     self.channel.close()
(nova.rpc): TRACE:   File "/usr/local/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 201, in close
(nova.rpc): TRACE:     super(Channel, self).close()
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close
(nova.rpc): TRACE:     (20, 41),    # Channel.close_ok
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
(nova.rpc): TRACE:     return self.dispatch_method(method_sig, args, content)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
(nova.rpc): TRACE:     return amqp_method(self, args)
(nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
(nova.rpc): TRACE:     (class_id, method_id))
(nova.rpc): TRACE: AMQPChannelException: (404, u"NOT_FOUND - no exchange '5b9a19d1b5e849d7ba749fada648a375' in vhost '/'", (60, 40), 'Channel.basic_publish')
(nova.rpc): TRACE: 
2011-10-20 10:45:24,724 ERROR nova.rpc [-] Returning exception (404, u"NOT_FOUND - no exchange '5b9a19d1b5e849d7ba749fada648a375' in vhost '/'", (60, 40), 'Channel.basic_publish') to caller
2011-10-20 10:45:24,724 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 628, in _process_data\n    ctxt.reply(None, None)\n', '  File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 673, in reply\n    msg_reply(self.msg_id, *args, **kwargs)\n', '  File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 781, in msg_reply\n    conn.direct_send(msg_id, msg)\n', '  File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 562, in __exit__\n    self._done()\n', '  File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 547, in _done\n    self.connection.reset()\n', '  File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 382, in reset\n    self.channel.close()\n', '  File "/usr/local/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 201, in close\n    super(Channel, self).close()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close\n    (20, 41),    # Channel.close_ok\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait\n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method\n    return amqp_method(self, args)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close\n    (class_id, method_id))\n', 'AMQPChannelException: (404, u"NOT_FOUND - no exchange \'5b9a19d1b5e849d7ba749fada648a375\' in vhost \'/\'", (60, 40), \'Channel.basic_publish\')\n']

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-20:

#23

nova.conf Edit (792 bytes, text/plain)

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-20:

#24

compute-localrc Edit (300 bytes, text/plain)

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-20:

#25

controller-localrc Edit (265 bytes, text/plain)

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-20:

#26

nova-network.log Edit (121.1 KiB, text/plain)

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-20:

#27

stack.sh Edit (24.9 KiB, text/x-sh)

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-20:

#28

One more thing. Since this seems to be some kind of race condition issue, it might be relevant that I was using two real Ubuntu 11.10 machines with the OS reinstalled prior to running this experiment. I left these machines as is so if there is any other state of the machines that is of interest I can provide it.

Revision history for this message

Mark McLoughlin (markmc) wrote on 2011-10-20:

#29

By the way, I saw a similar exception today when disassociating a floating IP with diablo https://bugzilla.redhat.com/747568

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-10-20:

#30

Thank you for the extensive report. This should help us debug. If it is easy for you to reproduce I'm very curious if you get the same exception using the carrot driver instead of kombu.

--rpc_backend=nova.rpc.impl_carrot

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-21:

#31

I tried this before and didn't see the problem as you can see in comment #6. However, given the randomness of when this bug happens and not I can't be sure it is definitive. I can try it again tomorrow.

Thierry Carrez (ttx) on 2011-10-21

Changed in nova:
status:	New → Incomplete

Revision history for this message

David Kranz (david-kranz) wrote on 2011-10-21:

#32

I tried creating 10 vms 5 times with carrot and all was well. Back to kombu, it failed the second time.

Revision history for this message

Ask Solem (asksol) wrote on 2011-10-21:

#33

Hey,

Are you using eventlet with both kombu and carrot?

The difference then would be that Kombu synchronized *_declare calls by delegating them to another greenthread
and having the current thread wait for that thread to finish (kombu.syn.blocking)
So in a sense it *should* be safer than carrot, why it wouldn't be I'm not sure.
Also are you declaring entities both when publishing the request *and* when publishing the reply?

Revision history for this message

Ask Solem (asksol) wrote on 2011-10-21:

#34

s/synchronized/synchronizes/

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-10-21:

#35

Ask: Hm. I implemented the code in nova that uses kombu. There's probably something funny about what I did, because I probably didn't use what one would call the 'common interfaces'. :) No offense, but the default interfaces didn't provide great enough flexibility.

Here's a summary of how things work:

To create a consumer, I'm essentially doing this:

exchange = entity.Exchange()
queue = entity.Queue(...., exchange=exchange)
queue.declare().
queue.consume() ***

*** In some cases we declare multiple queues on the same channel. But, what I don't do is use messaging.Consumer(). What did not appear to be supported in kombu was the ability to specify a different callback for each of those queues if I were to pass them all to the same Consumer(). If I did that, Comsumer.consume() would always call the same callback (the ones provided to Consumer()). I think carrot actually supported that, in a way, via ConsumerSet(). But anyway... point is here.. I'm not using Consumer() at all. To consume, I'm essentially calling queue.consume() similar to what Consumer.consume() does... except each Queue has its own callback. (I keep a list of the queues that are declared on a channel and loop through them). I can't find anything wrong with this... as I'm essentially just doing the same work that Consumer() does.

For publishing:

exchange = entity.Exchange()
producer = entity.Producer(..., exchange=exchange)
producer.publish()

There was also some sort of issue i ran into with the reconnecting code within kombu, so all of this stuff is wrapped by our own classes that handle re-creating the appropriate kombu instances.

Now.. back to the bug. If you're not familiar with nova RPC... what's happening in this bug report is:

1) generate a unique uuid to use as exchange and queue names. call queue.declare() which declares the exchange and the queue and binds them. This queue will be used to get a response from remote worker.
2) Using the same connection, publish a message into some other queue for the remote service to pick up. Put the unique uuid in the message so it knows what exchange/queue to respond to.
3) Using the same connection, start consuming on the previously declared queue
4) the remote worker does some work... and tries to respond into the queue we created in #1. The above Exception is reported... sometimes and apparently very rarely for most people. I've never seen the above exception at all... nor have any of the people I work with closely. However, David seems to see it quite easily.

#4 can actually happen before #3... which is fine. The queue was declared in #1, so it certainly should exist, should it not?

Any other ideas?

Ask:  Hm.   I implemented the code in nova that uses kombu.  There's probably something funny about what I did, because I probably didn't use what one would call the 'common interfaces'. :)   No offense, but the default interfaces didn't provide great enough flexibility.

Here's a summary of how things work:

To create a consumer, I'm essentially doing this:

exchange = entity.Exchange()
queue = entity.Queue(...., exchange=exchange)
queue.declare().
queue.consume() ***

*** In some cases we declare multiple queues on the same channel.   But, what I don't do is use messaging.Consumer().  What did not appear to be supported in kombu was the ability to specify a different callback for each of those queues if I were to pass them all to the same Consumer().   If I did that, Comsumer.consume() would always call the same callback (the ones provided to Consumer()).  I think carrot actually supported that, in a way, via ConsumerSet().  But anyway...   point is here.. I'm not using Consumer() at all.   To consume, I'm essentially calling queue.consume() similar to what Consumer.consume() does... except each Queue has its own callback.  (I keep a list of the queues that are declared on a channel and loop through them).  I can't find anything wrong with this... as I'm essentially just doing the same work that Consumer() does.

For publishing:

exchange = entity.Exchange()
producer = entity.Producer(..., exchange=exchange)
producer.publish()

There was also some sort of issue i ran into with the reconnecting code within kombu, so all of this stuff is wrapped by our own classes that handle re-creating the appropriate kombu instances.

Now.. back to the bug.  If you're not familiar with nova RPC... what's happening in this bug report is:

1) generate a unique uuid to use as exchange and queue names.  call queue.declare() which declares the exchange and the queue and binds them.  This queue will be used to get a response from remote worker.
2) Using the same connection, publish a message into some other queue for the remote service to pick up.  Put the unique uuid in the message so it knows what exchange/queue to respond to.
3) Using the same connection, start consuming on the previously declared queue
4) the remote worker does some work... and tries to respond into the queue we created in #1.  The above Exception is reported...  sometimes and apparently very rarely for most people.    I've never seen the above exception at all... nor have any of the people I work with closely.  However, David seems to see it quite easily.

#4 can actually happen before #3... which is fine.  The queue was declared in #1, so it certainly should exist, should it not?

Any other ideas?

Revision history for this message

Viacheslav Biriukov (v-v-biriukov) wrote on 2011-10-25:

#36

Have the same issue when using live migration. Change rpc to the "carrot" fix this.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-04:

#37

I've been trying very hard to reproduce this bug. On my test system with oneiric (which has kombu 1.4.3), I cannot get it to fail. I can run 6 instances simultaneously with euca (works on current devstack) and change max_count to 6 and do a nova boot successfully. I've also tested with current trunk and I don't get any issue there either.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-11-04:

#38

OK, I will try it again but will be on a trip next week. Unfortunately the fix for the db pool bug was removed from ubuntu6.2 when it was released so I cannot easily do this at the moment because our automated testing is based on packages. This is another reason why it would be good for the automated build process to produce ppas for both trunk and diablo stable.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-04: Re: [Bug 855030] Re: Encountering sporadic AMQPChannelException

#39

Download full text (7.1 KiB)

Is it possible to use the oneiric packages? They are roughly based on diablo stable.

On Nov 4, 2011, at 6:24 AM, David Kranz wrote:

> OK, I will try it again but will be on a trip next week. Unfortunately
> the fix for the db pool bug was removed from ubuntu6.2 when it was
> released so I cannot easily do this at the moment because our automated
> testing is based on packages. This is another reason why it would be
> good for the automated build process to produce ppas for both trunk and
> diablo stable.
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
>
> Title:
> Encountering sporadic AMQPChannelException
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> Running one controller and one compute, using nova-network version
> 2011.3~rc~20110909.r1155-0ub on Oneiric . Repeating the following:
>
> Create four instances, terminate them.
>
> This mostly works but one in every three or four times, one of the vms
> does not come up and I see this error in the nova-network log. This
> can occur both in the controller node (also running compute) or the
> compute node. (contents of nova.conf follows)
>
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
> 2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
> (nova.rpc): TRACE: ctxt.reply(None, None)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
> (nova.rpc): TRACE: msg_reply(self.msg_id, *args, **kwargs)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
> (nova.rpc): TRACE: conn.direct_send(msg_id, msg)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
> (nova.rpc): TRACE: self._done()
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
> (nova.rpc): TRACE: self.co...

Is it possible to use the oneiric packages?    They are roughly based on diablo stable.

On Nov 4, 2011, at 6:24 AM, David Kranz wrote:

> OK, I will try it again but will be on a trip next week. Unfortunately
> the fix for the db pool bug was removed from ubuntu6.2 when it was
> released so I cannot easily do this at the moment because our automated
> testing is based on packages. This is another reason why it would be
> good for the automated build process to produce ppas for both trunk and
> diablo stable.
> 
> -- 
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
> 
> Title:
>  Encountering sporadic AMQPChannelException
> 
> Status in OpenStack Compute (Nova):
>  Incomplete
> 
> Bug description:
>  Running one controller and one compute, using nova-network version
>  2011.3~rc~20110909.r1155-0ub  on Oneiric . Repeating the following:
> 
>  Create four instances, terminate them.
> 
>  This mostly works but one in every three or four times, one of the vms
>  does not come up and I see this error in the nova-network log. This
>  can occur both in the controller node (also running compute) or the
>  compute node.  (contents of nova.conf follows)
> 
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
>  2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
>  (nova.rpc): TRACE: Traceback (most recent call last):
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
>  (nova.rpc): TRACE:     ctxt.reply(None, None)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
>  (nova.rpc): TRACE:     msg_reply(self.msg_id, *args, **kwargs)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
>  (nova.rpc): TRACE:     conn.direct_send(msg_id, msg)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
>  (nova.rpc): TRACE:     self._done()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
>  (nova.rpc): TRACE:     self.connection.reset()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset
>  (nova.rpc): TRACE:     self.channel.close()
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close
>  (nova.rpc): TRACE:     (20, 41),    # Channel.close_ok
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
>  (nova.rpc): TRACE:     return self.dispatch_method(method_sig, args, content)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
>  (nova.rpc): TRACE:     return amqp_method(self, args)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
>  (nova.rpc): TRACE:     (class_id, method_id))
>  (nova.rpc): TRACE: AMQPChannelException: (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish')
>  (nova.rpc): TRACE: 
>  2011-09-20 13:22:59,451 ERROR nova.rpc [-] Returning exception (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish') to caller
>  2011-09-20 13:22:59,452 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data\n    ctxt.reply(None, None)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply\n    msg_reply(self.msg_id, *args, **kwargs)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply\n    conn.direct_send(msg_id, msg)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__\n    self._done()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done\n    self.connection.reset()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset\n    self.channel.close()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close\n    (20, 41),    # Channel.close_ok\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait\n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method\n    return amqp_method(self, args)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close\n    (class_id, method_id))\n', 'AMQPChannelException: (404, u"NOT_FOUND - no exchange \'3ff1ba7e274a4ec2a2b0a217d7532c70\' in vhost \'/\'", (60, 40), \'Channel.basic_publish\')\n']
> 
> 
>  I also see this at the same time in the rabbitmq log:
> 
>  =ERROR REPORT==== 20-Sep-2011::13:22:59 ===
>  connection <0.1061.0>, channel 1 - error:
>  {amqp_error,not_found,
>              "no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'",
>              'basic.publish'}
> 
> 
> 
>  --flagfile=/etc/nova/nova-compute.conf
>  --use_deprecated_auth
>  --dhcpbridge_flagfile=/etc/nova/nova.conf
>  --dhcpbridge=/usr/bin/nova-dhcpbridge
>  --sql_connection=mysql://nova:notnova@172.18.0.141/nova
>  --s3_host=172.18.0.141
>  --rabbit_host=172.18.0.141
>  --glance_api_servers=172.18.0.141:9292
>  --logdir=/var/log/nova
>  --state_path=/var/lib/nova
>  --lock_path=/var/lock/nova
>  --verbose
>  --ec2_url=http://172.18.0.141:8773/services/Cloud
>  --dmz_cidr=172.18.0.141/32
>  --fixed_range=172.18.77.0/27
>  --network_size=8
>  --flat_network_dhcp_start=172.18.7.6
>  --image_service=nova.image.glance.GlanceImageService
>  --bridge_interface=eth1
>  --flat_network_bridge=br100
>  --network_manager=nova.network.manager.FlatDHCPManager
>  --public_interface=eth0
>  --multi_host
>  --osapi_host=172.18.0.141
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/855030/+subscriptions

Revision history for this message

David Kranz (david-kranz) wrote on 2011-11-04:

#40

As far as I can see, the current oneiric packages are based on a security update that does not have any of the diablo fixes. Please correct me if I am wrong. In any event we just built our own packages based on diable/stable and I am installing now.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-11-04:

#41

OK, I tried this with packages generated from diablo stable and do not see the problem any more. I think this ticket could be closed for now. It would be a lot of work to figure out exactly which source change made the difference and there are more important things to do. I am working on a more general torture test that will trigger this if it still exists.

Revision history for this message

David Kranz (david-kranz) wrote on 2011-11-04:

#42

I am sorry but I was a dork and forgot to remove the carrot directive from nova.conf. When I remove that, the same problem easily happens. I really don't know what to suggest at this point but this is a real bug still there. I was again using two nodes, both running compute, with euca-run-instances creating 16 at a time.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-04:

#43

Download full text (7.1 KiB)

Ok good to know. Perhaps this only occurs on multiple machines. Are you running just one nova-network?

If I can reproduce this I'm going to turn on amqplib debug logging and see if I can track down what is happening.

Vish

On Nov 4, 2011, at 11:04 AM, David Kranz wrote:

> As far as I can see, the current oneiric packages are based on a
> security update that does not have any of the diablo fixes. Please
> correct me if I am wrong. In any event we just built our own packages
> based on diable/stable and I am installing now.
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
>
> Title:
> Encountering sporadic AMQPChannelException
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> Running one controller and one compute, using nova-network version
> 2011.3~rc~20110909.r1155-0ub on Oneiric . Repeating the following:
>
> Create four instances, terminate them.
>
> This mostly works but one in every three or four times, one of the vms
> does not come up and I see this error in the nova-network log. This
> can occur both in the controller node (also running compute) or the
> compute node. (contents of nova.conf follows)
>
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
> 2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
> (nova.rpc): TRACE: ctxt.reply(None, None)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
> (nova.rpc): TRACE: msg_reply(self.msg_id, *args, **kwargs)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
> (nova.rpc): TRACE: conn.direct_send(msg_id, msg)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
> (nova.rpc): TRACE: self._done()
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
> (nova.rpc):...

Ok good to know.  Perhaps this only occurs on multiple machines.  Are you running just one nova-network?

If I can reproduce this I'm going to turn on amqplib debug logging and see if I can track down what is happening.

Vish

On Nov 4, 2011, at 11:04 AM, David Kranz wrote:

> As far as I can see, the current oneiric packages are based on a
> security update that does not have any of the diablo fixes. Please
> correct me if I am wrong. In any event we just built our own packages
> based on diable/stable and I am installing now.
> 
> -- 
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
> 
> Title:
>  Encountering sporadic AMQPChannelException
> 
> Status in OpenStack Compute (Nova):
>  Incomplete
> 
> Bug description:
>  Running one controller and one compute, using nova-network version
>  2011.3~rc~20110909.r1155-0ub  on Oneiric . Repeating the following:
> 
>  Create four instances, terminate them.
> 
>  This mostly works but one in every three or four times, one of the vms
>  does not come up and I see this error in the nova-network log. This
>  can occur both in the controller node (also running compute) or the
>  compute node.  (contents of nova.conf follows)
> 
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
>  2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
>  (nova.rpc): TRACE: Traceback (most recent call last):
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
>  (nova.rpc): TRACE:     ctxt.reply(None, None)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
>  (nova.rpc): TRACE:     msg_reply(self.msg_id, *args, **kwargs)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
>  (nova.rpc): TRACE:     conn.direct_send(msg_id, msg)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
>  (nova.rpc): TRACE:     self._done()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
>  (nova.rpc): TRACE:     self.connection.reset()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset
>  (nova.rpc): TRACE:     self.channel.close()
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close
>  (nova.rpc): TRACE:     (20, 41),    # Channel.close_ok
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
>  (nova.rpc): TRACE:     return self.dispatch_method(method_sig, args, content)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
>  (nova.rpc): TRACE:     return amqp_method(self, args)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
>  (nova.rpc): TRACE:     (class_id, method_id))
>  (nova.rpc): TRACE: AMQPChannelException: (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish')
>  (nova.rpc): TRACE: 
>  2011-09-20 13:22:59,451 ERROR nova.rpc [-] Returning exception (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish') to caller
>  2011-09-20 13:22:59,452 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data\n    ctxt.reply(None, None)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply\n    msg_reply(self.msg_id, *args, **kwargs)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply\n    conn.direct_send(msg_id, msg)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__\n    self._done()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done\n    self.connection.reset()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset\n    self.channel.close()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close\n    (20, 41),    # Channel.close_ok\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait\n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method\n    return amqp_method(self, args)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close\n    (class_id, method_id))\n', 'AMQPChannelException: (404, u"NOT_FOUND - no exchange \'3ff1ba7e274a4ec2a2b0a217d7532c70\' in vhost \'/\'", (60, 40), \'Channel.basic_publish\')\n']
> 
> 
>  I also see this at the same time in the rabbitmq log:
> 
>  =ERROR REPORT==== 20-Sep-2011::13:22:59 ===
>  connection <0.1061.0>, channel 1 - error:
>  {amqp_error,not_found,
>              "no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'",
>              'basic.publish'}
> 
> 
> 
>  --flagfile=/etc/nova/nova-compute.conf
>  --use_deprecated_auth
>  --dhcpbridge_flagfile=/etc/nova/nova.conf
>  --dhcpbridge=/usr/bin/nova-dhcpbridge
>  --sql_connection=mysql://nova:notnova@172.18.0.141/nova
>  --s3_host=172.18.0.141
>  --rabbit_host=172.18.0.141
>  --glance_api_servers=172.18.0.141:9292
>  --logdir=/var/log/nova
>  --state_path=/var/lib/nova
>  --lock_path=/var/lock/nova
>  --verbose
>  --ec2_url=http://172.18.0.141:8773/services/Cloud
>  --dmz_cidr=172.18.0.141/32
>  --fixed_range=172.18.77.0/27
>  --network_size=8
>  --flat_network_dhcp_start=172.18.7.6
>  --image_service=nova.image.glance.GlanceImageService
>  --bridge_interface=eth1
>  --flat_network_bridge=br100
>  --network_manager=nova.network.manager.FlatDHCPManager
>  --public_interface=eth0
>  --multi_host
>  --osapi_host=172.18.0.141
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/855030/+subscriptions

Revision history for this message

David Kranz (david-kranz) wrote on 2011-11-04:

#44

I am running two nodes, both running network, api, and compute as documented for ha mode.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-09:

#45

nova-network.log Edit (402.7 KiB, text/plain)

ok, I have seen a few error messges, although I'm having a hard time getting it to fail consistently. Turning on logging seems to make the problem go away so i suspect a tricky race condition. Here is a log of 10 instances going with no failures.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-09:

#46

nova-network-c.log Edit (118.1 KiB, text/plain)

It appears that carrot works a little bit differently, opening multiple channels as needed. (greenthread ids are logged as well).

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-09:

#47

nova-network-k.log Edit (381.6 KiB, text/plain)

Here is the kombu version with greenthread ids logged (note that it always uses channel 1)

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-11-09:

#48

David:

Taking a shot in the dark here... but could you try applying the following patch to impl_kombu.py:

http://paste.openstack.org/show/3207/

And see if the problem goes away?

Revision history for this message

David Kranz (david-kranz) wrote on 2011-11-09:

#49

I could do this but am out of the office until Monday. Since Vish can now reproduce this he could give it a try. I will do it on Monday if nothing else ha happened.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-09: Re: [Bug 855030] Encountering sporadic AMQPChannelException

#50

Download full text (7.0 KiB)

I will try it, although my errors are extremely inconsistent. I've managed to start and stop 20 vms regularly without errors, so I'm not sure how many i would have to do to consider it fixed.

Vish

On Nov 9, 2011, at 6:17 AM, David Kranz wrote:

> I could do this but am out of the office until Monday. Since Vish can
> now reproduce this he could give it a try. I will do it on Monday if
> nothing else ha happened.
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
>
> Title:
> Encountering sporadic AMQPChannelException
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> Running one controller and one compute, using nova-network version
> 2011.3~rc~20110909.r1155-0ub on Oneiric . Repeating the following:
>
> Create four instances, terminate them.
>
> This mostly works but one in every three or four times, one of the vms
> does not come up and I see this error in the nova-network log. This
> can occur both in the controller node (also running compute) or the
> compute node. (contents of nova.conf follows)
>
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
> 2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
> 2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
> 2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
> (nova.rpc): TRACE: ctxt.reply(None, None)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
> (nova.rpc): TRACE: msg_reply(self.msg_id, *args, **kwargs)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
> (nova.rpc): TRACE: conn.direct_send(msg_id, msg)
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
> (nova.rpc): TRACE: self._done()
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
> (nova.rpc): TRACE: self.connection.reset()
> (nova.rpc): TRACE: File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu....

I will try it, although my errors are extremely inconsistent.  I've managed to start and stop 20 vms regularly without errors, so I'm not sure how many i would have to do to consider it fixed.

Vish

On Nov 9, 2011, at 6:17 AM, David Kranz wrote:

> I could do this but am out of the office until Monday. Since Vish can
> now reproduce this he could give it a try. I will do it on Monday if
> nothing else ha happened.
> 
> -- 
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/855030
> 
> Title:
>  Encountering sporadic AMQPChannelException
> 
> Status in OpenStack Compute (Nova):
>  Incomplete
> 
> Bug description:
>  Running one controller and one compute, using nova-network version
>  2011.3~rc~20110909.r1155-0ub  on Oneiric . Repeating the following:
> 
>  Create four instances, terminate them.
> 
>  This mostly works but one in every three or four times, one of the vms
>  does not come up and I see this error in the nova-network log. This
>  can occur both in the controller node (also running compute) or the
>  compute node.  (contents of nova.conf follows)
> 
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:672
>  2011-09-20 13:22:59,295 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=1082) inner /usr/lib/pymodules/python2.7/nova/utils.py:677
>  2011-09-20 13:22:59,296 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,311 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,350 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,366 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=1082) execute /usr/lib/pymodules/python2.7/nova/utils.py:165
>  2011-09-20 13:22:59,424 ERROR nova.rpc [-] Exception during message handling
>  (nova.rpc): TRACE: Traceback (most recent call last):
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data
>  (nova.rpc): TRACE:     ctxt.reply(None, None)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply
>  (nova.rpc): TRACE:     msg_reply(self.msg_id, *args, **kwargs)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply
>  (nova.rpc): TRACE:     conn.direct_send(msg_id, msg)
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__
>  (nova.rpc): TRACE:     self._done()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done
>  (nova.rpc): TRACE:     self.connection.reset()
>  (nova.rpc): TRACE:   File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset
>  (nova.rpc): TRACE:     self.channel.close()
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close
>  (nova.rpc): TRACE:     (20, 41),    # Channel.close_ok
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
>  (nova.rpc): TRACE:     return self.dispatch_method(method_sig, args, content)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method
>  (nova.rpc): TRACE:     return amqp_method(self, args)
>  (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
>  (nova.rpc): TRACE:     (class_id, method_id))
>  (nova.rpc): TRACE: AMQPChannelException: (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish')
>  (nova.rpc): TRACE: 
>  2011-09-20 13:22:59,451 ERROR nova.rpc [-] Returning exception (404, u"NOT_FOUND - no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'", (60, 40), 'Channel.basic_publish') to caller
>  2011-09-20 13:22:59,452 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 628, in _process_data\n    ctxt.reply(None, None)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 673, in reply\n    msg_reply(self.msg_id, *args, **kwargs)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 781, in msg_reply\n    conn.direct_send(msg_id, msg)\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 562, in __exit__\n    self._done()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 547, in _done\n    self.connection.reset()\n', '  File "/usr/lib/pymodules/python2.7/nova/rpc/impl_kombu.py", line 382, in reset\n    self.channel.close()\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 194, in close\n    (20, 41),    # Channel.close_ok\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait\n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 115, in dispatch_method\n    return amqp_method(self, args)\n', '  File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close\n    (class_id, method_id))\n', 'AMQPChannelException: (404, u"NOT_FOUND - no exchange \'3ff1ba7e274a4ec2a2b0a217d7532c70\' in vhost \'/\'", (60, 40), \'Channel.basic_publish\')\n']
> 
> 
>  I also see this at the same time in the rabbitmq log:
> 
>  =ERROR REPORT==== 20-Sep-2011::13:22:59 ===
>  connection <0.1061.0>, channel 1 - error:
>  {amqp_error,not_found,
>              "no exchange '3ff1ba7e274a4ec2a2b0a217d7532c70' in vhost '/'",
>              'basic.publish'}
> 
> 
> 
>  --flagfile=/etc/nova/nova-compute.conf
>  --use_deprecated_auth
>  --dhcpbridge_flagfile=/etc/nova/nova.conf
>  --dhcpbridge=/usr/bin/nova-dhcpbridge
>  --sql_connection=mysql://nova:notnova@172.18.0.141/nova
>  --s3_host=172.18.0.141
>  --rabbit_host=172.18.0.141
>  --glance_api_servers=172.18.0.141:9292
>  --logdir=/var/log/nova
>  --state_path=/var/lib/nova
>  --lock_path=/var/lock/nova
>  --verbose
>  --ec2_url=http://172.18.0.141:8773/services/Cloud
>  --dmz_cidr=172.18.0.141/32
>  --fixed_range=172.18.77.0/27
>  --network_size=8
>  --flat_network_dhcp_start=172.18.7.6
>  --image_service=nova.image.glance.GlanceImageService
>  --bridge_interface=eth1
>  --flat_network_bridge=br100
>  --network_manager=nova.network.manager.FlatDHCPManager
>  --public_interface=eth0
>  --multi_host
>  --osapi_host=172.18.0.141
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/855030/+subscriptions

Revision history for this message

David Kranz (david-kranz) wrote on 2011-11-15:

#51

OK, I tried this patch and it made no difference. Note that I found it even easier to see this problem by starting 16 vms with euca-run-instances. BTW, though it probably not matter, the vm image I am using is

https://uec-images.ubuntu.com/server/oneiric/current/oneiric-server-cloudimg-amd64.tar.gz

as m1.small

Revision history for this message

Michael Hempel (otaku) wrote on 2011-11-25:

#52

Hi, this issue happend to me too, when I added my first compute-node to a devstack-install. There was no way to spin up a vm, with random disconnects/errors from rabbit and glance. Turned out I just exceeded the quota for RAM set in horizon, perhaps some meaningful error-message should be created ;-)

Vish Ishaya (vishvananda) on 2011-11-28

Changed in nova:
status:	Incomplete → Confirmed
importance:	Undecided → Critical
assignee:	nobody → Vish Ishaya (vishvananda)
milestone:	none → essex-2

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-11-28:

#53

We finally tracked down the cause of this. Complicated issue regarding the way that we return from rpc calls. If an rpc call was returning None, then it was closing the connection early on the receiving end. This is because we use None to signify the final return in a series of returned values via multicall.

Comstud is working on a fix for multicall so that None is a legal return. In the meantime I found the method that was erroneously returning None and will be submitting a merge (and backport request) right away.

If you want to try out the fix right away, it is pretty simple:

diff --git a/nova/network/manager.py b/nova/network/manager.py
index 96d6dee..955cca0 100644
--- a/nova/network/manager.py
+++ b/nova/network/manager.py
@@ -180,7 +180,7 @@ class RPCAllocateFixedIP(object):
         perform network lookup on the far side of rpc.
         """
         network = self.db.network_get(context, network_id)
- self.allocate_fixed_ip(context, instance_id, network, **kwargs)
+ return self.allocate_fixed_ip(context, instance_id, network, **kwargs)

Revision history for this message

Openstack Gerrit (openstack-gerrit) wrote on 2011-11-28: Fix merged to nova (master)

#54

Reviewed: https://review.openstack.org/1929
Committed: http://github.com/openstack/nova/commit/cabdc3b1f64b7f022a1c62a4cebce54a2deba807
Submitter: Jenkins
Branch: master

status fixcommitted
done

commit cabdc3b1f64b7f022a1c62a4cebce54a2deba807
Author: Vishvananda Ishaya <email address hidden>
Date: Mon Nov 28 14:42:38 2011 -0800

Makes rpc_allocate_fixed_ip return properly

* Fixes bug 855030
* Includes test

Change-Id: If5b874fb0e4abd567445e67141d61942866cc5ec

Changed in nova:
status:	Confirmed → Fix Committed

Revision history for this message

Openstack Gerrit (openstack-gerrit) wrote on 2011-11-29: Fix merged to nova (stable/diablo)

#55

Reviewed: https://review.openstack.org/1930
Committed: http://github.com/openstack/nova/commit/1e3b88ba20b9f9ab925ecd1238845adbc0157cb1
Submitter: Jenkins
Branch: stable/diablo

tag in-stable-diablo
done

commit 1e3b88ba20b9f9ab925ecd1238845adbc0157cb1
Author: Vishvananda Ishaya <email address hidden>
Date: Mon Nov 28 14:42:38 2011 -0800

Makes rpc_allocate_fixed_ip return properly

* Fixes bug 855030
* Includes test

Change-Id: If5b874fb0e4abd567445e67141d61942866cc5ec

Scott Moser (smoser) on 2011-11-29

Changed in nova (Ubuntu Oneiric):
importance:	Undecided → High
status:	New → Triaged
Changed in nova (Ubuntu Precise):
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Chuck Short (zulcss) wrote on 2011-12-07:

#56

This is fixed in precise.

Chuck Short (zulcss) on 2011-12-07

Changed in nova (Ubuntu Precise):
status:	Triaged → Fix Committed
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2011-12-14

Changed in nova:
status:	Fix Committed → Fix Released

Revision history for this message

Tom Ellis (tellis) wrote on 2012-03-09:

#57

Could this be pushed to oneiric too Chuck?

Thierry Carrez (ttx) on 2012-04-05

Changed in nova:
milestone:	essex-2 → 2012.1

Revision history for this message

Rolf Leggewie (r0lf) wrote on 2014-12-03:

#58

oneiric has seen the end of its life and is no longer receiving any updates. Marking the oneiric task for this ticket as "Won't Fix".

Changed in nova (Ubuntu Oneiric):
status:	Triaged → Won't Fix

OpenStack Compute (nova)

Encountering sporadic AMQPChannelException

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
OpenStack Compute (nova)	Fix Released	Critical	Vish Ishaya	OpenStack Compute (nova) 2012.1 "essex"
Diablo	Fix Released	Undecided	Unassigned	OpenStack Compute (nova) 2011.3.1
nova (Ubuntu)	Fix Released	High	Unassigned
Oneiric	Won't Fix	High	Unassigned
Precise	Fix Released	High	Unassigned