Collector continuously re-queues sample when dispatcher reports persistent error when requeue is enabled
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceilometer |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
When requeue_
When the sample gets requeued, it will get picked up again and dispatched to the storage layer. If the same underlying error condition prevails, the error will be raised back to the collector and the message will be requeued again. This cyclical process will continue until the error is gone(in which case the sample is not requeued again) or the collector/ RMQ are restarted.
In this scenario, when there is an persistent error condition, frequent retrying puts extra load on storage db and the messaging layer (rabbit), wasting collector CPU cycles since the message and potentially more samples keep getting requeued and not cleared from the queue.
It does not make sense to keep retrying continuously in case of a persistent error condition.
There should be a configurable upper limit to cap the number of requeues of samples/events by Collector in case of dispatcher error.
eg. requeue_
Changed in ceilometer: | |
assignee: | nobody → Rohit Jaiswal (rohit-jaiswal-3) |
summary: |
- Collector keeps on requeueing a message in case of a persistent error - from dispatcher when requeueing is enabled + Collector keeps on requeueing and retrying a message in case of a + persistent error from dispatcher when requeueing is enabled |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
summary: |
- Collector keeps on requeueing and retrying a message in case of a - persistent error from dispatcher when requeueing is enabled + Collector continuously re-queues sample when dispatcher reports + persistent error when requeue is enabled |
Changed in ceilometer: | |
status: | Triaged → New |
assignee: | Rohit Jaiswal (rohit-jaiswal-3) → nobody |
Changed in ceilometer: | |
status: | New → Triaged |
importance: | Undecided → Medium |
what's the proposed solution here? i get the feeling this has multiple different solutions all with pros/cons.