kolla-ansible

Bug #2048223
Comment #3

Comment 3 for bug 2048223

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-01-06: Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/904823
Committed: https://opendev.org/openstack/kolla-ansible/commit/97e5c0e9b1906f2993b4c12820ac3cb9ddcfe821
Submitter: "Zuul (22348)"
Branch: master

commit 97e5c0e9b1906f2993b4c12820ac3cb9ddcfe821
Author: Mark Goddard <email address hidden>
Date: Fri Jan 5 11:02:39 2024 +0000

cadvisor: Set housekeeping interval to Prometheus scrape interval

    The prometheus_cadvisor container has high CPU usage. On various
    production systems I checked it sits around 13-16% on controllers,
    averaged over the prometheus 1m scrape interval. When viewed with top we
    can see it is a bit spikey and can jump over 100%.

    There are various bugs about this, but I found
    https://github.com/google/cadvisor/issues/2523 which suggests reducing
    the per-container housekeeping interval. This defaults to 1s, which
    provides far greater granularity than we need with the default
    prometheus scrape interval of 60s.

    Reducing the housekeeping interval to 60s on a production controller
    reduced the CPU usage from 13% to 3.5% average. This still seems high,
    but is more reasonable.

Change-Id: I89c62a45b1f358aafadcc0317ce882f4609543e7
Closes-Bug: #2048223