Api hangs if it doesn't receive an ack from conductor

Bug #1510776 reported by Dimitry Ushakov
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Surojit Pathak

Bug Description

API hangs when waiting to receive ack from Conductor when unable to send message to RabbitMQ

Test infrastructure:

$ uname -a
Linux containers-test 3.13.0-62-generic #102-Ubuntu SMP Tue Aug 11
14:29:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Stepping: 4
CPU MHz: 2800.000
BogoMIPS: 5601.67
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39

$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system--
r b swpd free buff cache si so bi bo in cs us sy id
wa st
5 0 0 95732384 315932 19664548 0 0 11 10 0 0 7
0 93 0 0

Test setup:

image_id = fedora-21-atomic-3
nic_id = public
keypair_id = default
flavor_id = m1.small


I did notice open sockets leaking on this environment (RabbitMQ setup out of the box - mismanaged).

$ lsof -i | grep amqp | wc -l

$ lsof -i | grep amqp | grep magnum | wc -l

I asked others to reproduce and those utilizing bare metal servers were
able to see similar behavior. Digging in further, it looks like
oslo-messaging opens up a new connection every time it attempts to send a
message to rabbit, even though rabbit might not be available to receive
the message (for whatever reason)

$ sudo rabbitmqctl status
Status of node 'rabbit@containers-test' ...

As you can see here, sockets are not cleaned up but the run_queue is

hongbin (hongbin034)
Changed in magnum:
status: New → Confirmed
importance: Undecided → Medium
Adrian Otto (aotto)
Changed in magnum:
milestone: none → mitaka-1
Changed in magnum:
assignee: nobody → Surojit Pathak (suro-patz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/264333

Changed in magnum:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (master)

Reviewed: https://review.openstack.org/264333
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=873214b6e2d8f206ba4a2750e911998c2000745a
Submitter: Jenkins
Branch: master

commit 873214b6e2d8f206ba4a2750e911998c2000745a
Author: Surojit Pathak <email address hidden>
Date: Wed Jan 6 19:30:10 2016 +0000

    Fix socket descriptor leak

    The connection to amqp was not getting cleaned up, even after the
    communication to conductor across amqp was complete, for a given
    request. Thus, sockets were leaking with each communication and finally
    led to a hang situation, where no more fds were available.

    Change-Id: I1deabdbce6ba448fe4c25d7694aabe5e5fec7b5a
    Closes-Bug: #1510776

Changed in magnum:
status: In Progress → Fix Released
Revision history for this message
hongbin (hongbin034) wrote :

Re-open this bug since the fix has been reverted https://review.openstack.org/#/c/274910/ .

Changed in magnum:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/magnum 2.0.0

This issue was fixed in the openstack/magnum 2.0.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related blueprints