nova-compute: can't spawn new instances after restarting RabbitMQ

Bug #718869 reported by Christian Berendt
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Sandy Walsh

Bug Description

I restarted rabbitmq server and afterwards several services are not reconnecting to the server (nova-compute, nova-api).

nova-scheduler reconnects successfully:
2011-02-14 16:54:46,144 ERROR nova.rpc [-] Reconnected to queue

but other services (I checked nova-compute and nova-api) aren't reconnecting.

i tried spawning a new instance without success (only getting an unknown error..):
# euca-run-instances ami-3x776h8w -t m1.tiny
UnknownError: An unknown error has occurred. Please try your request again.

nova-api, nova-scheduler are not reconnecting:
2011-02-14 16:54:40,786 ERROR nova.rpc [-] Failed to fetch message from queue
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 117, in fetch
(nova.rpc): TRACE: super(Consumer, self).fetch(no_ack, auto_ack, enable_callbacks)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/messaging.py", line 307, in fetch
(nova.rpc): TRACE: message = self.backend.get(self.queue, no_ack=no_ack)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/backends/pyamqplib.py", line 277, in get
(nova.rpc): TRACE: raw_message = self.channel.basic_get(queue, no_ack=no_ack)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/channel.py", line 2032, in basic_get
(nova.rpc): TRACE: (60, 72), # Channel.basic_get_empty
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
(nova.rpc): TRACE: self.channel_id, allowed_methods)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 218, in _wait_method
(nova.rpc): TRACE: self.wait()
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 105, in wait
(nova.rpc): TRACE: return amqp_method(self, args)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 365, in _close
(nova.rpc): TRACE: self._x_close_ok()
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 384, in _x_close_ok
(nova.rpc): TRACE: self._send_method((10, 61))
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 70, in _send_method
(nova.rpc): TRACE: method_sig, args, content)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/method_framing.py", line 233, in write_method
(nova.rpc): TRACE: self.dest.write_frame(1, channel, payload)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/transport.py", line 125, in write_frame
(nova.rpc): TRACE: frame_type, channel, size, payload, 0xce))
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/eventlet/greenio.py", line 273, in sendall
(nova.rpc): TRACE: tail = self.send(data, flags)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/eventlet/greenio.py", line 259, in send
(nova.rpc): TRACE: total_sent += fd.send(data[total_sent:], flags)
(nova.rpc): TRACE: error: [Errno 32] Broken pipe
(nova.rpc): TRACE:

Related branches

Revision history for this message
Christian Berendt (berendt) wrote :

after restarting nova-compute and nova-api i can spawn new instances, but i think they should reconnect to the queue automatically...

# euca-run-instances ami-3x776h8w -t m1.tiny
RESERVATION r-7x789omn testing default
INSTANCE i-0000000d ami-3x776h8w scheduling None (testing, None) 2011-02-14 17:03:07 None None

Revision history for this message
Christian Berendt (berendt) wrote :

or not.. now i have this error in /var/log/nova/scheduler.log...

2011-02-14 17:05:16,922 ERROR nova.root [-] Exception during message handling
(nova.root): TRACE: Traceback (most recent call last):
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 192, in receive
(nova.root): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/nova/scheduler/manager.py", line 68, in _schedule
(nova.root): TRACE: "args": kwargs})
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 350, in cast
(nova.root): TRACE: publisher = TopicPublisher(connection=conn, topic=topic)
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 210, in __init__
(nova.root): TRACE: super(TopicPublisher, self).__init__(connection=connection)
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/messaging.py", line 657, in __init__
(nova.root): TRACE: self.declare()
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/messaging.py", line 669, in declare
(nova.root): TRACE: auto_delete=self.auto_delete)
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/backends/pyamqplib.py", line 253, in exchange_declare
(nova.root): TRACE: return self.channel.exchange_declare(exchange=exchange,
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/backends/pyamqplib.py", line 188, in channel
(nova.root): TRACE: self._channel_ref = weakref.ref(connection.channel())
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 230, in channel
(nova.root): TRACE: return Channel(self, channel_id)
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/channel.py", line 82, in __init__
(nova.root): TRACE: self._x_open()
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/channel.py", line 471, in _x_open
(nova.root): TRACE: (20, 11), # Channel.open_ok
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
(nova.root): TRACE: self.channel_id, allowed_methods)
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 198, in _wait_method
(nova.root): TRACE: self.method_reader.read_method()
(nova.root): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/method_framing.py", line 215, in read_method
(nova.root): TRACE: raise m
(nova.root): TRACE: RuntimeError: Second simultaneous read on fileno 10 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_multiple_reader_prevention(False)
(nova.root): TRACE:

Revision history for this message
Thierry Carrez (ttx) wrote :

This was supposed to be fixed in bug 661472, apparently it's not really retrying ?

Changed in nova:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Christian Berendt (berendt) wrote :
Download full text (9.1 KiB)

tried it again.. restarted rabbitmq-server..

tried to spawn 2 new instances a few seconds after the start of rabbitmq-server:

---snip---
ares:~ # euca-describe-instances
RESERVATION r-n6jwk5yj testing default
INSTANCE i-00000107 ami-2a1izx00 scheduling foobar (testing, None) 0 m1.tiny 2011-02-23T13:52:56Z unknown zone
INSTANCE i-00000108 ami-2a1izx00 scheduling foobar (testing, None) 1 m1.tiny 2011-02-23T13:52:56Z unknown zone
---snap---

---snip---
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 120, in fetch
(nova.rpc): TRACE: super(Consumer, self).fetch(no_ack, auto_ack, enable_callbacks)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/messaging.py", line 307, in fetch
(nova.rpc): TRACE: message = self.backend.get(self.queue, no_ack=no_ack)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/backends/pyamqplib.py", line 277, in get
(nova.rpc): TRACE: raw_message = self.channel.basic_get(queue, no_ack=no_ack)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/channel.py", line 2032, in basic_get
(nova.rpc): TRACE: (60, 72), # Channel.basic_get_empty
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
(nova.rpc): TRACE: self.channel_id, allowed_methods)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 218, in _wait_method
(nova.rpc): TRACE: self.wait()
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 105, in wait
(nova.rpc): TRACE: return amqp_method(self, args)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 367, in _close
(nova.rpc): TRACE: raise AMQPConnectionException(reply_code, reply_text, (class_id, method_id))
(nova.rpc): TRACE: AMQPConnectionException: (320, u"CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", (0, 0), '')
(nova.rpc): TRACE:
2011-02-23 14:52:46,720 ERROR nova.rpc [-] Failed to fetch message from queue
(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 120, in fetch
(nova.rpc): TRACE: super(Consumer, self).fetch(no_ack, auto_ack, enable_callbacks)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/messaging.py", line 307, in fetch
(nova.rpc): TRACE: message = self.backend.get(self.queue, no_ack=no_ack)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/backends/pyamqplib.py", line 277, in get
(nova.rpc): TRACE: raw_message = self.channel.basic_get(queue, no_ack=no_ack)
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/channel.py", line 2032, in basic_get
(nova.rpc): TRACE: (60, 72), # Channel.basic_get_empty
(nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
(nova.rpc): TRACE: self.channel_id, allowed_methods)
(nova.rpc)...

Read more...

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 718869] Re: nova-compute: can't spawn new instances after restarting RabbitMQ
Download full text (13.1 KiB)

I've also noticed that restarting rabbit breaks things. Apparently our fix was not complete, or it broke again with soren's threadpool addition.

Vish

On Feb 23, 2011, at 6:05 AM, Christian Berendt wrote:

> tried it again.. restarted rabbitmq-server..
>
> tried to spawn 2 new instances a few seconds after the start of
> rabbitmq-server:
>
> ---snip---
> ares:~ # euca-describe-instances
> RESERVATION r-n6jwk5yj testing default
> INSTANCE i-00000107 ami-2a1izx00 scheduling foobar (testing, None) 0 m1.tiny 2011-02-23T13:52:56Z unknown zone
> INSTANCE i-00000108 ami-2a1izx00 scheduling foobar (testing, None) 1 m1.tiny 2011-02-23T13:52:56Z unknown zone
> ---snap---
>
> ---snip---
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 120, in fetch
> (nova.rpc): TRACE: super(Consumer, self).fetch(no_ack, auto_ack, enable_callbacks)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/messaging.py", line 307, in fetch
> (nova.rpc): TRACE: message = self.backend.get(self.queue, no_ack=no_ack)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/backends/pyamqplib.py", line 277, in get
> (nova.rpc): TRACE: raw_message = self.channel.basic_get(queue, no_ack=no_ack)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/channel.py", line 2032, in basic_get
> (nova.rpc): TRACE: (60, 72), # Channel.basic_get_empty
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
> (nova.rpc): TRACE: self.channel_id, allowed_methods)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 218, in _wait_method
> (nova.rpc): TRACE: self.wait()
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 105, in wait
> (nova.rpc): TRACE: return amqp_method(self, args)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_0_8/connection.py", line 367, in _close
> (nova.rpc): TRACE: raise AMQPConnectionException(reply_code, reply_text, (class_id, method_id))
> (nova.rpc): TRACE: AMQPConnectionException: (320, u"CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", (0, 0), '')
> (nova.rpc): TRACE:
> 2011-02-23 14:52:46,720 ERROR nova.rpc [-] Failed to fetch message from queue
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/nova/rpc.py", line 120, in fetch
> (nova.rpc): TRACE: super(Consumer, self).fetch(no_ack, auto_ack, enable_callbacks)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/messaging.py", line 307, in fetch
> (nova.rpc): TRACE: message = self.backend.get(self.queue, no_ack=no_ack)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/carrot/backends/pyamqplib.py", line 277, in get
> (nova.rpc): TRACE: raw_message = self.channel.basic_get(queue, no_ack=no_ack)
> (nova.rpc): TRACE: File "/usr/lib64/python2.6/site-packages/amqplib/client_...

Changed in nova:
assignee: nobody → Sandy Walsh (sandy-walsh)
Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Yeah, I'm seeing this as well: http://paste.openstack.org/show/1033/

On it ...

Thierry Carrez (ttx)
Changed in nova:
status: Confirmed → Fix Committed
Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Seems my previous fix will result in excessive logging on a failure. Will put another branch in against it.

Changed in nova:
status: Fix Committed → Incomplete
Thierry Carrez (ttx)
Changed in nova:
status: Incomplete → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.