RabbitMQ OCF RA wastes network bandwidth

Bug #1614071 reported by Alexey Lebedeff
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Alexey Lebedeff

Bug Description

Detailed bug description:
  RabbitMQ OCF RA uses `rabbitmqctl list_queues`/`list_channels` as a health-check.
  E.g. a 200-node OpenStack installation produces aronud 10k queues and
  10k channels. Doing single list_queues/list_channels in cluster in this
  environment results in 27k TCP packets and around 12 megabytes of
  network traffic. Given that this calls happen ~10 times a minute with 3
  controllers, it results in pretty significant overhead.

  But as `list_queues`/`list_channels` are actually used as a health-check, it doesn't make sense
  to check anything except queues/channels that exist on current node.
  RabbitMQ 3.6.6 will contain some improvements that will allow us to perform such local-only checks:
  - https://github.com/rabbitmq/rabbitmq-server/pull/883
  - https://github.com/rabbitmq/rabbitmq-server/pull/911
  - https://github.com/rabbitmq/rabbitmq-server/pull/915

Reproducibility:
 100%

Impact:
 When somebody else (e.g. LMA) starts to regularly call `list_queues` rabbit cluster can experience network split.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: nobody → MOS Packaging Team (mos-packaging)
Changed in fuel:
importance: Undecided → High
assignee: MOS Packaging Team (mos-packaging) → Alexey Lebedeff (alebedev-a)
milestone: none → 9.1
status: New → Confirmed
tags: added: area-mos
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/355477
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=b9cb86a51fc1f6036b9fd6ef417dad48f705776f
Submitter: Jenkins
Branch: stable/mitaka

commit b9cb86a51fc1f6036b9fd6ef417dad48f705776f
Author: Alexey Lebedeff <email address hidden>
Date: Mon Aug 22 14:30:12 2016 +0300

    Monitor rabbitmq from OCF with less overhead

    This will stop wasting network bandwidth for monitoring.

    E.g. a 200-node OpenStack installation produces aronud 10k queues and
    10k channels. Doing single list_queues/list_channels in cluster in this
    environment results in 27k TCP packets and around 12 megabytes of
    network traffic. Given that this calls happen ~10 times a minute with 3
    controllers, it results in pretty significant overhead.

    Upstream change:
    - https://github.com/rabbitmq/rabbitmq-server/pull/916

    To enable those features you shoud have rabbitmq containing following patches:
    - https://github.com/rabbitmq/rabbitmq-server/pull/883
    - https://github.com/rabbitmq/rabbitmq-server/pull/911
    - https://github.com/rabbitmq/rabbitmq-server/pull/915

    Change-Id: Icfde3360b42a841ad3a219b94f65a69b2a18cea7
    Closes-Bug: 1614071

tags: added: in-stable-mitaka
Changed in fuel:
status: Confirmed → Fix Committed
tags: added: rabbitmq
Alexey Galkin (agalkin)
tags: added: on-verification
Revision history for this message
Alexey Galkin (agalkin) wrote :

Validation on 9.x with `mos-repos/ubuntu/snapshots/9.0-2016-09-09-040322` repository, that `list_queues` and `list_channels` no longer used in OCF scripts.

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.