After network glitch, nova-api service fails to reestablish connection

Bug #1477608 reported by Peter Sabaini
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Invalid
Undecided
Unassigned
nova (Ubuntu)
Expired
Low
Unassigned
nova-cloud-controller (Juju Charms Collection)
Invalid
Undecided
Unassigned

Bug Description

Due to a network glitch the connection between nova-api-os-compute and rabbitmq was temporarily dropping packets (nf_conntrack was set to low). After fixing this, the network became stable again (no more dropped packets). However, nova-api-os-compute couldn't get a clean rabbitmq connection again:

2015-07-23 13:08:38.896 26976 AUDIT nova.api.openstack.compute.contrib.volumes [req-bb32c331-7a40-409f-8657-b0b608932ba5 ed3a885912d142b7bde36dbd58e388c1
12bb569bf909441b90791482ae6f9ca9] Attach volume da2549c8-6091-49db-852e-3558ff5e584c to instance 91f74484-ee0b-4ec4-b2f0-b208206c98ef at None
2015-07-23 13:08:38.940 26976 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 104] Connection reset by peer
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py",
line 624, in ensure
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py",
line 717, in _consume
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit return self.connection.drain_events(timeout=poll_timeout)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 279, in drain_
events
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 90, in d
rain_events
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 303, in drain_e
vents
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit return amqp_method(channel, args)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 506, in _close
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit self._x_close_ok()
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 534, in _x_clos
e_ok
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((10, 51))
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 62, in _s
end_method
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content,
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 227, in wri
te_method
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit write_frame(1, channel, payload)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/transport.py", line 183, in write_fr
ame
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit frame_type, channel, size, payload, 0xce,
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 308, in sendal
l
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit tail = self.send(data, flags)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 293, in send
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit total_sent += fd.send(data[total_sent:], flags)
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2015-07-23 13:08:38.940 26976 TRACE oslo.messaging._drivers.impl_rabbit
2015-07-23 13:08:38.967 26976 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 10.24.0.137:5672
2015-07-23 13:08:38.967 26976 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...

In this case, a nova volume-attach operation failed.

After restarting the nova-api-os-compute process the volume-attach was functional again, also no connection breakage in the log

Versions:
n-c-c charm: lp:~canonical-bootstack/charms/trusty/nova-cloud-controller/ps45-filters;revno=164
nova-api-os-compute 1:2014.1.5-0ubuntu1.1

Both n-c-c and rabbitmq are containerized on the same metal

James Page (james-page)
Changed in nova-cloud-controller (Juju Charms Collection):
status: New → Invalid
Revision history for this message
James Page (james-page) wrote :

This is quite and old bug, but the resilience of AMQP connections in nova is really a nova/oslo.messaging problem, rather than a charm problem; raising bug tasks to this effect.

This is also quite an old openstack version - so general improvements may have been made since Kilo.

Changed in charm-nova-cloud-controller:
status: New → Invalid
Changed in nova (Ubuntu):
importance: Undecided → Low
Revision history for this message
James Page (james-page) wrote :

Peter

I appreciate this bug was raised some time ago - do you still see this type of issue in Kilo deployments? and do you see similar issues with later OpenStack release versions?.

Marking 'Incomplete' for now.

Changed in nova (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for nova (Ubuntu) because there has been no activity for 60 days.]

Changed in nova (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.