Too many reconnects in logs

Bug #1371723 reported by Serg Melikyan on 2014-09-19
336
This bug affects 15 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
High
Ilya Pekelny
5.0.x
High
Ilya Pekelny
5.1.x
High
Serg Melikyan
6.0.x
High
Ilya Pekelny

Bug Description

We see too many reconnects in different components logs like:

2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 648, in ensure
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 753, in _publish
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 396, in __init__
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit **options)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 339, in __init__
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 347, in reconnect
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 84, in __init__
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 218, in revive
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 104, in declare
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive,
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 613, in exchange_declare
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((40, 10), args)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content,
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit write_frame(1, channel, payload)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/transport.py", line 177, in write_frame
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit frame_type, channel, size, payload, 0xce,
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 307, in sendall
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit tail = self.send(data, flags)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 293, in send
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit total_sent += fd.send(data[total_sent:], flags)
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 32] Broken pipe
2014-09-19 15:58:42.807 29170 TRACE oslo.messaging._drivers.impl_rabbit
<182>Sep 19 15:58:42 node-5 nova-api 2014-09-19 15:58:42.819 29170 INFO oslo.messaging._drivers.impl_rabbit [req-408a15b9-532e-4748-a07e-770761bfedf5 ] Reconnecting to AMQP server on 192.168.0.5:5672
<182>Sep 19 15:58:42 node-5 nova-api 2014-09-19 15:58:42.821 29170 INFO oslo.messaging._drivers.impl_rabbit [req-408a15b9-532e-4748-a07e-770761bfedf5 ] Delaying reconnect for 1.0 seconds...
<182>Sep 19 15:58:46 node-5 nova-api 2014-09-19 15:58:46.763 29170 INFO oslo.messaging._drivers.impl_rabbit [req-408a15b9-532e-4748-a07e-770761bfedf5 ] Connected to AMQP server on 192.168.0.5:5672
<179>Sep 19 15:58:46 node-5 nova-api 2014-09-19 15:58:46.773 29170 ERROR oslo.messaging._drivers.impl_rabbit [req-408a15b9-532e-4748-a07e-770761bfedf5 ] Failed to consume message from queue: Socket closed
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 648, in ensure
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 737, in _consume
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit return self.connection.drain_events(timeout=timeout)
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 279, in drain_events
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 299, in drain_events
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit chanmap, None, timeout=timeout,
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 362, in _wait_multiple
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit channel, method_sig, args, content = read_timeout(timeout)
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 333, in read_timeout
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit return self.method_reader.read_method()
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit raise m
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2014-09-19 15:58:46.773 29170 TRACE oslo.messaging._drivers.impl_rabbit
<182>Sep 19 15:58:46 node-5 nova-api 2014-09-19 15:58:46.778 29170 INFO oslo.messaging._drivers.impl_rabbit [req-408a15b9-532e-4748-a07e-770761bfedf5 ] Reconnecting to AMQP server on 192.168.0.5:5672
<182>Sep 19 15:58:46 node-5 nova-api 2014-09-19 15:58:46.780 29170 INFO oslo.messaging._drivers.impl_rabbit [req-408a15b9-532e-4748-a07e-770761bfedf5 ] Delaying reconnect for 1.0 seconds...
<182>Sep 19 15:58:49 node-5 nova-api 2014-09-19 15:58:49.519 29170 INFO oslo.messaging._drivers.impl_rabbit [req-408a15b9-532e-4748-a07e-770761bfedf5 ] Connected to AMQP server on 192.168.0.5:5672

Changed in mos:
status: New → Triaged
importance: Medium → Critical
importance: Critical → High
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-28770/ubuntu

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-28770/centos

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-28770/ubuntu

The problem still exists on the latest nightly iso. is there any fix yet?

Serg Melikyan (smelikyan) wrote :

There is a fix for this issue, we are working on testing that fix, should be release soon.

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-28770/centos

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-28770/ubuntu

Is this pushed to nightly iso?

Serg Melikyan (smelikyan) wrote :

Unfortunately no, this will be pushed once commit will be tested and merged.

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-28770/ubuntu

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable/centos

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/28770
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable/ubuntu

Tyler Wilson (loth) wrote :
Serg Melikyan (smelikyan) wrote :

Tyler, it is already available in repository that you've mentioned

Dmitry Borodaenko (angdraug) wrote :

Does 5.0.2 include AMQP heartbeat? If not, is this bug valid for 5.0.x?

Serg Melikyan (smelikyan) wrote :

Dmitry, unfortunately no, this is not ported for 5.0.2

Serg Melikyan (smelikyan) wrote :

Dmitry, we are working on porting fix for this issue to 5.0.3 and 6.0

tags: added: scale
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/29966
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-6.0-stable-29966/ubuntu

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/29966
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-6.0-stable-29966/centos

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/29965
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0.3-stable-29965/centos

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/29965
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0.3-stable-29965/ubuntu

OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: https://review.fuel-infra.org/120
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-6.0-stable-120/ubuntu

wandk (wandk000) on 2014-11-06
information type: Public → Public Security
Pavel Vaylov (pvaylov) wrote :

Guys, do we have a manual how to fix existed 5.1 environment and direct download links for updated oslo messaging package ?

Igor Marnat (imarnat) wrote :

Pavel, you can find this information in release notes for 5.1:
http://docs.mirantis.com/openstack/fuel/fuel-5.1/pdf/Mirantis-OpenStack-5.1-RelNotes.pdf

Just look for topic "OpenStack services report multiple AMQP reconnects in the logs" in known issues.

Igor, version of document that you mentioned contains this text about the bug:

============
OpenStack services report multiple AMQP reconnects in the logs
nova-api and other services may call fork() soon after the process starts. This may lead to a situation when the
AMQP heartbeat sending thread is initialized and run in the context of the parent process, while the reply waiters
are added to the list in the child process. When this happens, the thread won't send any heartbeats and RabbitMQ
will close AMQP connections after timeout.
This problem is addressed in oslo.messaging packages from MOS 5.1.1 package repositories (CentOS, Ubuntu).
After installing these packages, restart the affected services. See LP1371723.
============

But current version of docs, updated for 5.1.1 (docs.mirantis.com/openstack/fuel/fuel-5.1/release-notes.html#openstack-bugs-are-fixed) has only this:
============
The Oslo messaging service had a bug that affected MOS HA environments. The bug could make OpenStack unstable; the problem got worse as the load increased. A large number of broken AMQP connections caused Oslo-related errors appear in different logs. See LP1371723.
============

So, instructions for those, who still use 5.1.0 is gone...

OSCI Robot (oscirobot) wrote :

RPM package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1.mira10.git.873c247.acb3abf

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: Pekelny Ilya
subject: rabbit: heartbeat implementation
status: patchset-created

Files placed on repository:
python-oslo-messaging-1.8.0-fuel6.1.mira10.git.873c247.acb3abf.noarch.rpm
python-oslo-messaging-doc-1.8.0-fuel6.1.mira10.git.873c247.acb3abf.noarch.rpm

NOTE: Changeset is not merged, created temporary package repository.
RPM repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-master-4736/centos

OSCI Robot (oscirobot) wrote :

DEB package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1~mira10+git.873c247.acb3abf

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: Pekelny Ilya
subject: rabbit: heartbeat implementation
status: patchset-created

Files placed on repository:
python-oslo.messaging_1.8.0-fuel6.1~mira10+git.873c247.acb3abf_all.deb

NOTE: Changeset is not merged, created temporary package repository.
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-master-4736/ubuntu

OSCI Robot (oscirobot) wrote :

RPM package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1.mira10.git.689b558.acb3abf

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: Pekelny Ilya
subject: rabbit: heartbeat implementation
status: patchset-created

Files placed on repository:
python-oslo-messaging-1.8.0-fuel6.1.mira10.git.689b558.acb3abf.noarch.rpm
python-oslo-messaging-doc-1.8.0-fuel6.1.mira10.git.689b558.acb3abf.noarch.rpm

NOTE: Changeset is not merged, created temporary package repository.
RPM repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-master-4736/centos

OSCI Robot (oscirobot) wrote :

DEB package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1~mira10+git.689b558.acb3abf

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: Pekelny Ilya
subject: rabbit: heartbeat implementation
status: patchset-created

Files placed on repository:
python-oslo.messaging_1.8.0-fuel6.1~mira10+git.689b558.acb3abf_all.deb

NOTE: Changeset is not merged, created temporary package repository.
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-master-4736/ubuntu

Dear Team,

It is possible for those packages to be backported for MOS 5.1.1 too?

Thank you.

OSCI Robot (oscirobot) wrote :

RPM package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1.mira10.git.b9e134d.acb3abf

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: openstack-ci-mirrorer-jenkins
subject: rabbit: heartbeat implementation
status: patchset-created

Files placed on repository:
python-oslo-messaging-1.8.0-fuel6.1.mira10.git.b9e134d.acb3abf.noarch.rpm
python-oslo-messaging-doc-1.8.0-fuel6.1.mira10.git.b9e134d.acb3abf.noarch.rpm

NOTE: Changeset is not merged, created temporary package repository.
RPM repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-master-4736/centos

OSCI Robot (oscirobot) wrote :

RPM package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1.mira10

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: openstack-ci-mirrorer-jenkins
subject: rabbit: heartbeat implementation
status: change-merged

Files placed on repository:
python-oslo-messaging-1.8.0-fuel6.1.mira10.noarch.rpm
python-oslo-messaging-doc-1.8.0-fuel6.1.mira10.noarch.rpm

Changeset merged. Package placed on primary repository
RPM repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-master/centos

OSCI Robot (oscirobot) wrote :

DEB package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1~mira10

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: openstack-ci-mirrorer-jenkins
subject: rabbit: heartbeat implementation
status: change-merged

Files placed on repository:
python-oslo.messaging_1.8.0-fuel6.1~mira10_all.deb

Changeset merged. Package placed on primary repository
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-master/ubuntu

OSCI Robot (oscirobot) wrote :

DEB package oslo.messaging has been built for project openstack/oslo.messaging
Package version == 1.8.0, package release == fuel6.1~mira10+git.b9e134d.acb3abf

Changeset: https://review.fuel-infra.org/4736
project: openstack/oslo.messaging
branch: master
author: Pekelny Ilya
committer: openstack-ci-mirrorer-jenkins
subject: rabbit: heartbeat implementation
status: patchset-created

Files placed on repository:
python-oslo.messaging_1.8.0-fuel6.1~mira10+git.b9e134d.acb3abf_all.deb

NOTE: Changeset is not merged, created temporary package repository.
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-master-4736/ubuntu

Vasilios, sorry, I've just noticed your comment. The fix is already included into 5.1.1. If you still see a similar issue, please file a bug for it.

Dear Dmitry, just to check that i run the latest packages. Could you please verify which packages correct this bug?
It would be very helpful. thank you

Sean Murphy (murp) wrote :

We have MOS 5.1.1 installed but we still see some behaviour as described here. Here are the relevant packages on our controller:

root@node-1:~# dpkg -l | grep nova-api
ii nova-api 1:2014.1.3-fuel5.1.1~mira3 OpenStack Compute - API frontend

root@node-1:~# dpkg -l | grep oslo
ii python-oslo.config 1:1.2.1-0ubuntu1~cloud0 Common code for Openstack Projects (configuration API)
ii python-oslo.i18n 1.0.0-fuel5.1.1~mira1 Oslo Internationalization Utilities - Python 2.x
ii python-oslo.messaging 1.3.1-fuel5.1.1~mira0 oslo messaging library
ii python-oslo.rootwrap 1.0.0-0ubuntu2 Fine filtering of shell commands as 'root'
ii python-oslo.serialization 1.0.0-fuel5.1.1~mira0 utilities for serialization , especially JSON - Python 2.x
ii python-oslo.utils 1.0.0-fuel5.1.1~mira0 set of utility functions for OpenStack - Python 2.x
ii python-oslo.vmware 0.2-0ubuntu1 oslo vmware library

root@node-1:~# dpkg -l | grep rabbit
ii rabbitmq-server 3.3.5-ubuntu3 AMQP server written in Erlang

We see issues that the rabbit closes connections to many of the processes (it seems many of the
nova-api processes); when there is new request to nova-api, it often has a broken pipe and then
it started a new connection to rabbitmq. So we have this sequence:

- rabbit kills connection to many nova-api processes due to heartbeat timeouts
- nova-api receives request
- nova-api tries to talk to rabbit but experiences broken pipe
- (many) nova-api processes restart communications with rabbit
- nova-api is able to then communicate

It does not have much of an impact in userland, but it's suboptimal from pov of keeping
clean logs.

@Sean, please teke a look at the following bug:
https://bugs.launchpad.net/fuel/+bug/1440018

You probably may want to backport scripts using that docs.

Sean Murphy (murp) wrote :

@Alexander - thanks for that.

Sean Murphy (murp) wrote :

@Alexander - it looks like that is more targetted at the HA context - we see the problem in a basic single controller context. Do you know of any processes/instructions for the non-HA context?

Thanks,
Seán.

Probably Ilya Pekelny can comment on this...

JohnsonYi (yichengli) wrote :

We faced the same issue on our MOS 6.0 environment, even I replaced the newest oslo.messaging 1.4.1 from MOS 6.1 to my MOS 6.0 environment, this issue came up some times, not only during reboot one controller node, but also when I try to create a new instance from horizon. The reconnect will be finished after 6~8 mins.

Does this issue also impact mos 6.1? Or any fix for oslo.messaging 1.4.1? Thanks in advance.

JohnsonYi (yichengli) wrote :

I can reproduce the error by create new instance fron horizon, during instance "spawning", the error will come out until spawning finished. F.Y.I

lk (1415511065-f) on 2015-10-01
Changed in mos:
status: Fix Committed → New
status: New → Fix Committed

Is there a work around without patching anything (restarting services/cleaning up database, disabling HA etc...)?

tangjinli (767844151-t) wrote :

gg

tangjinli (767844151-t) wrote :
Download full text (27.9 KiB)

part of the log:
2016-05-20 09:27:44.163 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:27:59.164 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:28:14.165 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:28:17.235 5576 INFO nova.compute.resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Auditing locally available compute resources for node vdi11
2016-05-20 09:28:17.637 5576 INFO nova.compute.resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Total usable vcpus: 24, total allocated vcpus: 0
2016-05-20 09:28:17.638 5576 INFO nova.compute.resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Final resource view: name=vdi11 phys_ram=64389MB used_ram=12800MB phys_disk=199GB used_disk=70GB total_vcpus=24 used_vcpus=0 pci_stats=None
2016-05-20 09:28:19.290 5576 INFO nova.compute.cloudos_resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Compute_service record updated for vdi11:vdi11
2016-05-20 09:28:29.166 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:28:32.564 5576 WARNING nova.compute.manager [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] While synchronizing instance power states, found 7 instances in the database and 2 instances on the hypervisor.
2016-05-20 09:28:44.167 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:28:59.168 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:29:14.169 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:29:18.236 5576 INFO nova.compute.resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Auditing locally available compute resources for node vdi11
2016-05-20 09:29:18.627 5576 INFO nova.compute.resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Total usable vcpus: 24, total allocated vcpus: 0
2016-05-20 09:29:18.628 5576 INFO nova.compute.resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Final resource view: name=vdi11 phys_ram=64389MB used_ram=12800MB phys_disk=199GB used_disk=70GB total_vcpus=24 used_vcpus=0 pci_stats=None
2016-05-20 09:29:20.310 5576 INFO nova.compute.cloudos_resource_tracker [req-73b5a98e-226a-4592-890a-e6e6f76f0090 - - - - -] Compute_service record updated for vdi11:vdi11
2016-05-20 09:29:29.170 5576 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 32] Broken pipe
2016-05-20 09:29:32.563 5576 WARNING nova.compute.manager [req-73b5a98e-226a-45...

Pooja Ghumre (pooja-9) on 2017-07-10
summary: - To many reconnects in logs
+ Too many reconnects in logs
To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Duplicates of this bug

Other bug subscribers