metering.sample queue grows with no consumer

Bug #1676586 reported by David Britton
64
This bug affects 12 people
Affects Status Importance Assigned to Milestone
OpenStack Ceilometer Charm
Fix Released
Medium
Edward Hope-Morley
OpenStack Designate Charm
Invalid
Medium
Unassigned

Bug Description

openstack: ocata
distro: 16.04

root@juju-a7951b-1-lxd-2:~# rabbitmqctl list_queues -p openstack |grep sample
alarm.all.sample 20
event.sample 20
metering.sample 426
notifications.sample 0

This is a non-HA 6 node deployment. on an HA deployment up for a few days, this number was already over 6000 growing at the rate of about 1k per day.

These messages should either be consumed, or should go in with a TTL.

Revision history for this message
Fairbanks. (fairbanks) wrote :

Hello there,

I'm exeperincing the same problems.
With a lot of activity like creating and destroying instances these ceilometer queue are getting very large like over 45000+ items in just a day.

I currently have ceilometer disabled now at one environment to stop these messages.
But i can't remove ceilometer at an other, it would be nice if it is possible to mitigate this some how.

Also, since i do not see the usage tab any more in the horizon dashboard, it doesn't make sense anyway of enabling this. But i think that is more a charm-openstack-dashboard problem?

Changed in charm-ceilometer:
status: New → Confirmed
importance: Undecided → High
tags: added: cpec
James Page (james-page)
Changed in charm-ceilometer:
assignee: nobody → James Page (james-page)
Revision history for this message
James Page (james-page) wrote :

I'm having trouble re-producing this problem with the latest development charms - I'm running tempest against the deployed cloud and I see messages going through these queues, but they are consumed by ceilometer and aodh correctly afaict.

I'll try to reproduce with stable charms, but bearing in mind we're going to release this week that effort might not be worth it; I've done quite a bit of work on the telemetry stack this cycle so this may have been swept up in one of a number of other changes to ceilometer and notification configuration across the charm set.

Revision history for this message
James Page (james-page) wrote :

If you are *not* running ceilometer I can see how these queues would build up - we have a spec planned for early next cycle which should help services auto-configure themselves correctly based on what other services are deployed in the cloud.

Changed in charm-ceilometer:
status: Confirmed → In Progress
Revision history for this message
James Page (james-page) wrote :

(Scrub #3 that might not be correct)

Ante Karamatić (ivoks)
tags: added: cpe-onsite
removed: cpec
tags: added: 4010
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

for my use-case it turned out that ceilometer units were needed to be shutdown (all at once) and then purging of the queue succeeded. With running ceilometer the queue was ever increasing.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

@james-page

I'm seeing this for alarm.all.sample and event.sample coming OUT of ceilometer and destined for aodh, but this cloud has no aodh related to ceilometer. Must we install AODH to consume these queues, or will the next batch of charm fixes determine whether ceilometer should be generating samples for alerting depending on aodh relation? We've only just begun seeing alerts on these queues in rabbitmq filling up on xenial/ocata 17.08 charms environments.

Revision history for this message
James Page (james-page) wrote :

Marking designate task as invalid as this bug covers ceilometer -> aodh notifications for events and alarms.

Short answer would be to deploy aodh to consume this information; however we should update ceilometer to only generate the notifications when aodh is actually deployed.

Changed in charm-designate:
importance: Undecided → Medium
status: New → In Progress
status: In Progress → Triaged
Changed in charm-ceilometer:
status: In Progress → Triaged
assignee: James Page (james-page) → nobody
importance: High → Medium
milestone: none → 18.02
Changed in charm-designate:
status: Triaged → Invalid
Ryan Beisner (1chb1n)
Changed in charm-ceilometer:
milestone: 18.02 → 18.05
Revision history for this message
Tilman Baumann (tilmanbaumann) wrote :

As a workaround, the charm could configure a TTL policy for the queue. That way the queue would at least not grow indefinitely.
Or even better perhaps use max-length. https://www.rabbitmq.com/maxlength.html

rabbitmqctl set_policy ceilometer-overflow metering.sample '{"max-length":20000}' --apply-to queues

tags: added: canonical-bootstack
David Ames (thedac)
Changed in charm-ceilometer:
milestone: 18.05 → 18.08
Revision history for this message
Xav Paice (xavpaice) wrote :

In another very similar case, we have event.sample also:

~$ sudo rabbitmqctl -p openstack list_queues name messages consumers state

event.sample 2129 0 running
notifications_designate.info 190 0 running

There's actually a number of instances where rabbit queues keep on growing, event.sample being one of the bigger ones, but also notifications_designate.* as well as metering.sample.

I'm guessing that this is the same bug?

James Page (james-page)
Changed in charm-ceilometer:
milestone: 18.08 → 18.11
David Ames (thedac)
Changed in charm-ceilometer:
milestone: 18.11 → 19.04
Michał Ajduk (majduk)
tags: removed: 4010
Revision history for this message
Edward Hope-Morley (hopem) wrote :

hi @xavpaice I think i know why you see your event.sample queue piling up with messages. This appears to quite simply be down to the fact that we are still (in queens) configuring the event_sink to publish to notifier:// and notifier://?topic=aodh [2] which results in both the aodh and event (default) topics being used hence why messages end up in both. This is a problem for two reasons (1) if you don;t have aodh deployed the messages will pile up in queues listening on the aodh and ceilometer exchanges such as alarm.all.sample and event.sample. Upstream queens simple has this configured to gnocchi [1] and if ever Panko got charmed we could set it to that as well. So solution-wise i think we need to (a) only set to aodh if aodh is related and (b) default to gnocchi if not aodh and (c) remove all publishers if neither is available since ceilometer itself will not process these events anymore.

[1] https://github.com/openstack/ceilometer/blob/stable/queens/ceilometer/pipeline/data/event_pipeline.yaml
[2] https://github.com/openstack/charm-ceilometer/blob/stable/18.11/templates/mitaka/event_pipeline.yaml#L35

Changed in charm-ceilometer:
assignee: nobody → Edward Hope-Morley (hopem)
Changed in charm-ceilometer:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/633564

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceilometer (master)

Reviewed: https://review.openstack.org/633564
Committed: https://git.openstack.org/cgit/openstack/charm-ceilometer/commit/?id=d398cdff364ba07a7f18f59417bb4f3af5cccbc3
Submitter: Zuul
Branch: master

commit d398cdff364ba07a7f18f59417bb4f3af5cccbc3
Author: Edward Hope-Morley <email address hidden>
Date: Mon Jan 28 17:21:18 2019 +0000

    Make event_sink publisher configurable

    The charm currently configures events to be published to
    rabbit on both the config.event_topic (event.sample queue)
    and alarm topic but as of Queens Ceilometer no longer
    consumes event.sample. This patch makes the event_sink
    publishers configurable and defaults to publishing to
    aodh to retain backwards compatibility.

    Change-Id: I5b55f31adcf2b069ff51e387a416f9f1ac4099f8
    Partial-Bug: #1676586

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceilometer (stable/18.11)

Fix proposed to branch: stable/18.11
Review: https://review.openstack.org/638107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceilometer (stable/18.11)

Reviewed: https://review.openstack.org/638107
Committed: https://git.openstack.org/cgit/openstack/charm-ceilometer/commit/?id=b03fe9696d316bfa47d7f416c9671c1f50ef64f8
Submitter: Zuul
Branch: stable/18.11

commit b03fe9696d316bfa47d7f416c9671c1f50ef64f8
Author: Edward Hope-Morley <email address hidden>
Date: Mon Jan 28 17:21:18 2019 +0000

    Make event_sink publisher configurable

    The charm currently configures events to be published to
    rabbit on both the config.event_topic (event.sample queue)
    and alarm topic but as of Queens Ceilometer no longer
    consumes event.sample. This patch makes the event_sink
    publishers configurable and defaults to publishing to
    aodh to retain backwards compatibility.

    Change-Id: I5b55f31adcf2b069ff51e387a416f9f1ac4099f8
    Partial-Bug: #1676586
    (cherry picked from commit d398cdff364ba07a7f18f59417bb4f3af5cccbc3)

Changed in charm-ceilometer:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.