Bug #1729617 “notification agent does not refresh” : Bugs : Ceilometer

OpenStack Infra (hudson-openstack) on 2017-11-02

Changed in ceilometer:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-07: Fix merged to ceilometer (master)

#1

Reviewed: https://review.openstack.org/517337
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=75cc518c2f86afd02a7e60df150148c8a0f2e813
Submitter: Zuul
Branch: master

commit 75cc518c2f86afd02a7e60df150148c8a0f2e813
Author: gord chung <email address hidden>
Date: Thu Nov 2 14:49:00 2017 +0000

refresh agent if group membership changes

    this broke when we switched to tooz partitioner
    - ensure we trigger refresh if group changes
    - ensure we have heartbeat or else members will just die.

    - remove retain_common_targets tests because it doesn't make sense.
    it was originally designed for when we had listener per pipeline
    but that was changed 726b2d4d67ada3df07f36ecfd81b0cf72881e159
    - remove testing workload partitioning path in standard notification
    agent tests
    - correct test_unique test to properly validate a single target
    rather than the number of listeners we have.
    - add test to ensure group_state is updated when a member joins
    - add test to verify that listener assigned topics based on hashring

Closes-Bug: #1729617
Change-Id: I5039c93e6845a148c24094f755a78870d49ec19f

Changed in ceilometer:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-07: Fix proposed to ceilometer (stable/pike)

#2

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/518401

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-08: Fix merged to ceilometer (stable/pike)

#3

Reviewed: https://review.openstack.org/518401
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=124d03bf9d0a6628abf171698d94a7e17112f4ee
Submitter: Zuul
Branch: stable/pike

commit 124d03bf9d0a6628abf171698d94a7e17112f4ee
Author: gord chung <email address hidden>
Date: Thu Nov 2 14:49:00 2017 +0000

refresh agent if group membership changes

    this broke when we switched to tooz partitioner
    - ensure we trigger refresh if group changes
    - ensure we have heartbeat or else members will just die.

    - remove retain_common_targets tests because it doesn't make sense.
    it was originally designed for when we had listener per pipeline
    but that was changed 726b2d4d67ada3df07f36ecfd81b0cf72881e159
    - remove testing workload partitioning path in standard notification
    agent tests
    - correct test_unique test to properly validate a single target
    rather than the number of listeners we have.
    - add test to ensure group_state is updated when a member joins
    - add test to verify that listener assigned topics based on hashring

    Closes-Bug: #1729617
    Change-Id: I5039c93e6845a148c24094f755a78870d49ec19f
    (cherry picked from commit 75cc518c2f86afd02a7e60df150148c8a0f2e813)

tags:

added: in-stable-pike

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-16: Fix included in openstack/ceilometer 9.0.2

#4

This issue was fixed in the openstack/ceilometer 9.0.2 release.

Revision history for this message

György Szombathelyi (gyurco) wrote on 2017-11-22:

#5

Applied the patch to 9.0.1, but still more than one consumers in some queues:

Two agents are running:

rabbitmqctl list_queues name consumers messages_ready messages_unacknowledged -p ceilometer
Listing queues ...
ceilometer-pipe-cpu_source:cpu_sink-0.sample 2 0 501
ceilometer-pipe-cpu_source:cpu_sink-1.sample 1 0 40
ceilometer-pipe-cpu_source:cpu_sink-2.sample 1 0 546
ceilometer-pipe-cpu_source:cpu_sink-3.sample 1 0 273
ceilometer-pipe-cpu_source:cpu_sink-4.sample 1 0 20
ceilometer-pipe-cpu_source:cpu_sink-5.sample 1 0 312
ceilometer-pipe-cpu_source:cpu_sink-6.sample 2 0 385
ceilometer-pipe-cpu_source:cpu_sink-7.sample 2 0 424
ceilometer-pipe-cpu_source:cpu_sink-8.sample 1 0 32
ceilometer-pipe-cpu_source:cpu_sink-9.sample 2 0 139

Three agents:
rabbitmqctl list_queues name consumers messages_ready messages_unacknowledged -p ceilometer
Listing queues ...
ceilometer-pipe-cpu_source:cpu_sink-0.sample 2 0 505
ceilometer-pipe-cpu_source:cpu_sink-1.sample 1 0 40
ceilometer-pipe-cpu_source:cpu_sink-2.sample 1 0 560
ceilometer-pipe-cpu_source:cpu_sink-3.sample 1 0 280
ceilometer-pipe-cpu_source:cpu_sink-4.sample 1 0 20
ceilometer-pipe-cpu_source:cpu_sink-5.sample 2 0 320
ceilometer-pipe-cpu_source:cpu_sink-6.sample 2 0 388
ceilometer-pipe-cpu_source:cpu_sink-7.sample 3 0 31
ceilometer-pipe-cpu_source:cpu_sink-8.sample 1 0 32
ceilometer-pipe-cpu_source:cpu_sink-9.sample 3 0 41

In Ocata, there were only 1 consumers/queue.

Revision history for this message

gordon chung (chungg) wrote on 2017-11-22:

#6

do you have [coordination]/check_watchers set? it will basically resync every check_watchers seconds.

i have redis as my tooz coordinator and it redistributes to one consumer per queue ~1min after start up

Revision history for this message

György Szombathelyi (gyurco) wrote on 2017-11-23:

#7

No, I don't have it, but as I see, it is 10.0 (seconds?) by default.

Revision history for this message

György Szombathelyi (gyurco) wrote on 2017-11-23:

#8

Ok, seems if the queues are not having a long backlog, then they'll sort them out finally.

Revision history for this message

gordon chung (chungg) wrote on 2017-11-23:

#9

it is 10s by default but for everything to be fully sync'd it sometimes takes more than a single 10s cycle.

just to be clear when you say "they'll sort them out finally". does it eventually become 1 consumer per queue? i did notice you did not set:

[oslo_messaging_rabbit]
rabbit_qos_prefetch_count = 256

this means, you're grabbing the entire queue (even though you only process 'batch_size' messages)
this will cause significant memory usage if your system is backed up, and it is probably why your system didn't redistribute consumer. it first handles all the messages it grabbed (entire queue without prefretch) and then redistributes to avoid losing messages.

György Szombathelyi (gyurco) on 2017-11-29

affects:	ceilometer → ceilometer (Ubuntu)
affects:	ceilometer (Ubuntu) → ceilometer

Revision history for this message

György Szombathelyi (gyurco) wrote on 2017-11-29:

#10

No, I did not set rabbit_qos_prefetch_count = 256, good to know about it :)
But it became 1 consumer/queue, since the queues was fully consumed after a while (during experimenting with the bug #1729865).

Revision history for this message

György Szombathelyi (gyurco) wrote on 2017-11-29:

#11

Well, I had something in my long-term memory about the prefetch count and the bug #1551667, and that was it:
https://review.openstack.org/#/c/385079/
So it seems, it is no longer necessary to set the prefetch count. This code is still in oslo.messaging.

Revision history for this message

James Page (james-page) wrote on 2017-12-05:

#12

Marking Ubuntu task as Invalid; Ubuntu will pickup any changes in Ceilometer through the current development release and any stable point releases.

Changed in ceilometer (Ubuntu):
status:	New → Invalid

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-08: Fix included in openstack/ceilometer 10.0.0

#13

This issue was fixed in the openstack/ceilometer 10.0.0 release.

Affects		Status	Importance	Assigned to	Milestone
	Ceilometer	Fix Released	Critical	gordon chung
	ceilometer (Ubuntu)	Invalid	Undecided	Unassigned

Ceilometer

notification agent does not refresh

Bug Description

Other bug subscribers

Remote bug watches