rabbitmq-server restart twice, log is crazy increasing until service restart

Bug #1504725 reported by Windy Wang
86
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Mistral
Incomplete
Medium
Unassigned
neutron
Expired
Undecided
Unassigned
oslo.messaging
Won't Fix
Undecided
Unassigned

Bug Description

After I restart the rabbitmq-server for the second time, the service log(such as nova,neutron and so on) is increasing crazy, log is such as " TypeError: 'NoneType' object has no attribute '__getitem__'".
It seems that the channel is setted to None.

trace log:

2015-10-10 15:20:59.413 29515 TRACE root Traceback (most recent call last):
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func
2015-10-10 15:20:59.413 29515 TRACE root return infunc(*args, **kwargs)
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 96, in _executor_thread
2015-10-10 15:20:59.413 29515 TRACE root incoming = self.listener.poll()
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 122, in poll
2015-10-10 15:20:59.413 29515 TRACE root self.conn.consume(limit=1, timeout=timeout)
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1202, in consume
2015-10-10 15:20:59.413 29515 TRACE root six.next(it)
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1100, in iterconsume
2015-10-10 15:20:59.413 29515 TRACE root error_callback=_error_callback)
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 868, in ensure
2015-10-10 15:20:59.413 29515 TRACE root ret, channel = autoretry_method()
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 458, in _ensured
2015-10-10 15:20:59.413 29515 TRACE root return fun(*args, **kwargs)
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 545, in __call__
2015-10-10 15:20:59.413 29515 TRACE root self.revive(create_channel())
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 251, in channel
2015-10-10 15:20:59.413 29515 TRACE root chan = self.transport.create_channel(self.connection)
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 91, in create_channel
2015-10-10 15:20:59.413 29515 TRACE root return connection.channel()
2015-10-10 15:20:59.413 29515 TRACE root File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 289, in channel
2015-10-10 15:20:59.413 29515 TRACE root return self.channels[channel_id]
2015-10-10 15:20:59.413 29515 TRACE root TypeError: 'NoneType' object has no attribute '__getitem__'
2015-10-10 15:20:59.413 29515 TRACE root

Tags: oslo rabbitmq
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

What version of oslo.messaging are you using?

Changed in oslo.messaging:
status: New → Incomplete
Revision history for this message
Windy Wang (275588680-b) wrote :

 My RPM package is python-oslo-messaging-1.8.2-1.el7.noarch

Revision history for this message
lizheming (lizheming-li) wrote :

I also get this issue.

Revision history for this message
Simon Guilbault (simon-guilbault) wrote :

I am also hitting the same bug.

I can restart the rabbitmq-server multiple time on one of the controller but if I also restart the second rabbitmq-server in the HA pair, some openstack services will fill their logs files with almost the same error. The impacted services are:

neutron-dhcp-agent
neutron-l3-agent
neutron-metadata-agent
neutron-openvswitch-agent
openstack-nova-compute (exceptions are occurring at a slower rate)

Similar bug report:
https://bugs.launchpad.net/neutron/+bug/1496774

[root@openstack1 ~]# pip freeze | grep oslo.messaging
oslo.messaging==1.8.3

RPMs are from RDO/kilo
[root@openstack1 ~]# rpm -qa | grep oslo-messaging
python-oslo-messaging-1.8.3-1.el7.noarch

# cat /var/log/neutron/openvswitch-agent.log
http://paste.openstack.org/show/478217/

nova-compute will also throws similar stack traces at a slower rate

# cat /var/log/nova/nova-compute.log
http://paste.openstack.org/show/478218/

Relevant parts of neutron.conf
http://paste.openstack.org/show/478219/

Relevant parts of nova.conf
http://paste.openstack.org/show/478220/

rabbitmq.config
http://paste.openstack.org/show/478221/

# rabbitmqctl -p openstack list_queues name policy
http://paste.openstack.org/show/478222/

Revision history for this message
sampsonhuo (13261149612-5) wrote :

I also get this issue. My oslo.messaging version is python-oslo-messaging-1.8.3-1.el7.noarch.
Does any one fiex it?

Revision history for this message
Kevin Tibi (ktibi) wrote :

Same bug, after restart rabbit-server.

bug on heat-engine, nova compute or nuetron dhcp

Changed in oslo.messaging:
status: Incomplete → Confirmed
Revision history for this message
Kevin Tibi (ktibi) wrote :

For debug I had to purge notification.error queue and all re-works :)

Revision history for this message
Mike Merinov (mikhail-merinov) wrote :

This bug affects us as well, making unable the services such as nova* and neutron* to sustain a connection loss to rabbit twice (tried the last oslo.messaging stable/kilo from github). The rabbitmq is HA (ha queues etc), but I'm not yet sure if it's relevant.
Seems, self.channel in oslo_messaging/_drivers/impl_rabbit.py around line 865 is None after reconnecting the second time (expected to be kombu.transport.pyamqp.Channel object).
Reproducibility in my environment is 100%.

information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Augustina Ragwitz (auggy) wrote :

Marking this as confirmed so someone from Nova can verify if anything is happening on the Nova side to contribute to it.

tags: added: oslo rabbitmq
Changed in nova:
status: New → Confirmed
Revision history for this message
Anton Aksola (aakso) wrote :

In our environment upgrading to Kombu 3.0.23 resolved the issue. We are running CentOS 7.2 with RDO packages.

It seems that RedHat has pushed a source RPM to their Enterprise repo: http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHOS/SRPMS/python-kombu-3.0.23-1.el7ost.src.rpm

Revision history for this message
Mike Merinov (mikhail-merinov) wrote :

The kombu 3.0.23 from RedHat seems to resolve the issue for us too, as mentioned by Anton.

information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Looks like upgrading kombu fixes the issue, please reopen if there's anything we need to do in oslo.messaging.

Changed in oslo.messaging:
status: Confirmed → Won't Fix
Revision history for this message
sampsonhuo (13261149612-5) wrote :

In our environment upgrading to Kombu 3.0.23 from RedHat does not resolved the issue. We are running CentOS 7.2 with RDO packages. When I upgrade rabbitmq-server to rabbitmq-server-3.6.1-1 the neutron openvswitch-agent.log is crazy increasing until service restart.

trace log:

    2016-03-16 *.*.*.* 2081 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: 'NoneType' object has no attribute '__getitem__'
2016-03-16 *.*.*.* 2081 ERROR oslo_messaging._drivers.amqpdriver [-] Failed to process incoming message, retrying...
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver Traceback (most recent call last):
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 197, in poll
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver self.conn.consume(limit=1)
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1172, in consume
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver six.next(it)
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1083, in iterconsume
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver error_callback=_error_callback)
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 870, in ensure
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver ret, channel = autoretry_method()
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 453, in _ensured
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver return fun(*args, **kwargs)
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 520, in __call__
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver self.revive(create_channel())
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 251, in channel
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver chan = self.transport.create_channel(self.connection)
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 91, in create_channel
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver return connection.channel()
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 279, in channel
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver return self.channels[channel_id]
2016-03-16 *.*.*.* 2081 TRACE oslo_messaging._drivers.amqpdriver TypeError: 'NoneType' object has no attribute '__getitem__'

Changed in nova:
status: Confirmed → Invalid
no longer affects: nova
Revision history for this message
Zhang Yun (zhangyun) wrote :

Does it still has the same problem in Newton release for newtron? Any one can help confirm this? Thanks in advance.

sato wataru (watari53)
Changed in neutron:
status: New → Incomplete
Revision history for this message
Evan Powell (evanjpowell) wrote :

I see this currently with the following versions (RDO packages):

python-kombu-3.0.32
python-oslo-messaging-1.8.3
openstack-neutron-2015.1.2 (kilo)

Other logs complain about connection to RabbitMQ, but only neutron-openvswitch-agent will fill the disk.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Revision history for this message
Windy Wang (275588680-b) wrote :

I updated the kombu to version 3.0.20 and above ,then the problem did't appear.

Revision history for this message
sunzuohua (zuohuasun) wrote :

I use kombu to version 4.0.2 and amqp to version 2.1.4.

Fabo Yi (folkart)
no longer affects: nova
Revision history for this message
Fabo Yi (folkart) wrote :

This bug affects us as well,my oslo_messaging version is python-oslo-messaging-1.8.3-3.1.noarch,kombu version is python-kombu-2.5.16-1.el7.noarch. Openvswitch-agent and nova-conductor will fill the disk.

Revision history for this message
Andras Kovi (akovi) wrote :

In Mistral, kombu 4.1.0 and amqp 2.2.2, the same issue is present.

Revision history for this message
Dougal Matthews (d0ugal) wrote :

Andras, can you confirm which Mistral version you reproduced this with?

Changed in mistral:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → rocky-2
Dougal Matthews (d0ugal)
Changed in mistral:
status: Confirmed → Incomplete
milestone: rocky-2 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.