Canonical Juju

GetMeterStatus called too frequently

Bug #1811700 reported by John A Meinel on 2019-01-14

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Canonical Juju	Fix Released	High	Unassigned	Canonical Juju 2.6-beta1
2.4	Won't Fix	High	Unassigned
2.5	Fix Released	High	@les	Canonical Juju 2.5.1

Bug Description

Looking at production Prometheus metrics, we can see that roughly every 5 minutes the controller sees a spike of calls of GetMeterStatus (500/s).

Auditing the code, it seems that the WatchMeterStatus watches 2 documents, the MeterStatus document for the individual application, *and* the MetricManager global document.

It does this because if you start failing to send metrics, you'll go into Amber alert after we have 3 failed metric sends. So we need to know if we are starting to fail sends.

However, the MetricManager global document *also* includes a "last successful upload" key. Which means that on every successful upload, it also update that document, causing all agents to wake up checking if their MeterStatus has changed.

Also confusing is that the 'last successful upload' document is stored as a global singleton, but it appears the "MetricsWorker" is run for every model. (Possibly on each controller.)

A simple fix might be to split out the 'last-successful-upload' from the 'number of failed uploads', and only update the document if the number of failed attempts actually changes.

We could potentially do things like encode the "upload is in amber state" to the db, so that a single failure doesn't wake everything up. But that is of much lower priority (and potentially affects correctness) than just splitting out the fields so we don't end up waking up on every successful send.

A different possibility would be to use a custom watcher that internally is backed by a DocWatcher but then omits changes that don't affect the consecutiveerrors count.

However, that is harder to actually implement.

Tags:

Revision history for this message

Casey Marshall (cmars) wrote on 2019-01-28:

https://github.com/juju/juju/pull/9676

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2019-03-22:

This has been forward ported to 2.6. I'll mark as Fix Committed.

But it will not be backported to 2.4. I will mark as "Wont Fix".

Changed in juju:
status:	Triaged → Fix Committed
milestone:	none → 2.6-beta1

Anastasia (anastasia-macmood) on 2019-05-13

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.