StarlingX

collectd does not breakdown platform cpu usage fully

Bug #1849511 reported by Frank Miller on 2019-10-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Jim Gauld

Bug Description

Brief Description
-----------------
The collectd platform cpu collection is not splitting out the cpu usage between platform buckets.

Severity
--------
Minor

Steps to Reproduce
------------------

Expected Behavior
------------------
Would like to see collectd report cpu broken down by platform and applications. For the platform cpu would like to see this further broken down by kubernetes system processes vs base (eg: flock services) processes vs stx-monitor or stx-openstack processes.

Actual Behavior
----------------
collectd platform cpu does not currently separate out the key platform components (kubernetes, flock, stx-openstack, stx-monitor).

Reproducibility
---------------
Reproducible

System Configuration
--------------------
All configs

Branch/Pull Time/Commit
-----------------------
Any stx.3.0 or earlier load

Last Pass
---------
n/a

Timestamp/Logs
--------------
n/a

Test Activity
-------------
System testing

Tags:

Frank Miller (sensfan22) on 2019-10-23

Changed in starlingx:
assignee:	nobody → Jim Gauld (jgauld)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-10-23:

stx.3.0 / medium priority - helps w/ debugging system issues

tags:	added: stx.3.0 stx.tools
Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-23: Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/690743

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-24: Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/690743
Committed: https://git.openstack.org/cgit/starlingx/monitoring/commit/?id=c52e8f11ba87c355ef2bacb41b7384d1ce80f5d3
Submitter: Zuul
Branch: master

commit c52e8f11ba87c355ef2bacb41b7384d1ce80f5d3
Author: Jim Gauld <email address hidden>
Date: Wed Oct 23 16:23:04 2019 -0400

Update collectd breakdown of platform cpu

This updates collectd cpu metrics. New metrics are dispatched and
logged to give a better platform breakdown.

The platform cpu usage is an average per cpu percent occupancy of
Platform cores, derived from Linux per-cpu schedstats.

    This update adds a breakdown of the platform cpu usage, derived from
    Linux cgroups cpuacct. The platform total is broken down into: base,
    kube-system, and addon.

This also adds verbose logging of these metrics per memory sample
collection audit.

The following collectd samples are dispatched with these headings:
type, type_instance, plugin, plugin_instance: description

    Based on schedstats:
    percent, used, cpu, platform:
         platform cpu occupancy average per platform core (%)

    Based on cgroup cpuacct:
    percent, occupancy, cpu, platform:
         platform cpu occupancy average per platform core (%)

percent, occupancy, cpu, base:
base cpu occupancy average per platform core (%)

percent, occupancy, cpu, kube-system:
kube-system cpu occupancy average per platform core (%)

percent, occupancy, cpu, kube-addon:
kube-addon cpu occupancy average per platform core (%)

    Change-Id: Id27129cc368f18e5d85b6f3986d750f8bc189230
    Closes-Bug: 1849511
    Depends-On: https://review.opendev.org/689749
    Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-27: Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/691503

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-27: Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/691503
Committed: https://git.openstack.org/cgit/starlingx/monitoring/commit/?id=22913f67a50bf8362e9308f828dffa2018b2c72d
Submitter: Zuul
Branch: master

commit 22913f67a50bf8362e9308f828dffa2018b2c72d
Author: Jim Gauld <email address hidden>
Date: Sun Oct 27 00:47:07 2019 -0400

Correct collectd cpu and memory plugin exceptions

    The collectd cpu and memory plugins were failing to initialize properly
    when the kubepods/k8s-infra cgroups were not configured.
    The k8s-infra directory is present only after kubernetes is configured,
    and only on kubernetes nodes.

    This updates the collectd cpu and memory plugins to skip walking the
    kubepods/k8s-infra directory tree if that path does not exist.
    The summary algorithm works correctly when skipping this path since
    it does not need to add values that do not exist. The k8s-system and
    k8s-addon values will be reported as 0.

This also removes the benign error and warning logs that came out
when there is no worker_reserved.conf file, i.e., non-worker nodes.

    Change-Id: I36ecd689dad45543f5a7cc6c7bc99a21c00c8122
    Closes-Bug: 1849511
    Signed-off-by: Jim Gauld <email address hidden>

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.