Collectd shows UIDs not found for platform memory usage

Bug #2019007 reported by Cesar Bombonate
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Cesar Bombonate

Bug Description

Brief Description
-----------------
Observe collectd platform memory usage shows not found for specific UIDs with no additional information.
Also there is no way to see memory growth over time by day, nor is there memory details for pods not a part of the predetermined kubernetes-addon or kubernetes-system namespaces.
Slab info is not displayed for the 4k numa nodes

Severity
--------
Minor

Steps to Reproduce
------------------
Install latest startlinx build and observer the collectd logs.

Expected Behavior
------------------
Useful information should be available for UIDs that are not found.
Collectd should show information on all pods instead of a select few.
A file with memory dump per day should be available for debugging.
Slab info should be present for 4k numa nodes

Actual Behavior
----------------
Kubernetes System logs contain addon processes

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Kernel: Real-time (low latency)
Hyperthreading: Disabled
Platform cores: 2
Application cores: 34
Labels
kube-cpu-mgr-policy=static
Huge pages: One 1 GB huge page was configured on each processo

Branch/Pull Time/Commit
-----------------------
StarlingX/Master May. 05, 2023

Last Pass
---------

Timestamp/Logs
--------------
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid 261d40cea94de12fc54c41279cf269c9 not found
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid e90a2332-5753-48bc-a706-f611b9fa4f2e not found
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid f38297b6-6940-437d-996b-addacb2cb330 not found
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid 711acaed-df49-448f-908a-4910334dc324 not found
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid 2cfee370-06f8-494b-b147-f9b10347da30 not found
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid 65e394fd-4605-4e72-97a6-54e5c021d1a0 not found
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid 5ff28eb6-d88b-4b3f-849f-d2c939e5d445 not found
2023-05-08T13:10:50.059 controller-0 collectd[72636]: info platform memory usage: uid d307ad16-05b3-49bc-b669-3dfbf722c33a not found
2023-05-08T13:15:50.060 controller-0 collectd[72636]: info platform memory usage: uid b344599a-f51f-48ef-889f-6119ba77e24b not found

Test Activity
-------------
Performance Testing

Workaround
----------
NA

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/monitoring/+/883845

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/883866

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config-files (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/883866
Committed: https://opendev.org/starlingx/stx-puppet/commit/62e2c8b8e58f108aa32b5f9bd5ee11c0f987a4d9
Submitter: "Zuul (22348)"
Branch: master

commit 62e2c8b8e58f108aa32b5f9bd5ee11c0f987a4d9
Author: Cesar Bombonate <email address hidden>
Date: Fri May 12 18:15:08 2023 +0000

    Introduce new log file in /var/log/rss-memory.log

    This change adds a new log /var/log/rss-memory.log for
    memory growth debuging

    The following entry into crontab will output daily at 01:00:
    0 1 * * * /usr/bin/date >> /var/log/rss-memory.log;
     /usr/bin/ps -e -o ppid,pid,nlwp,rss:10,vsz:10,
     comm,cmd --sort=-rss >> /var/log/rss-memory.log

    Test Plan:
            - PASS: Build an image, install and bootstrap successfully
            - PASS: Apply monitor pods so addon logs would be installed.
            - PASS: Check that log entries are correctly displayed.
            - PASS: Tested on controller, AIO, worker and storage hosts.

    Closes-Bug: 2019007
    Change-Id: I6f8e6208d203bcc77320ced3766af04dab977829
    Signed-off-by: Cesar Bombonate <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config-files (master)

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/883870
Committed: https://opendev.org/starlingx/config-files/commit/0de73866cd317ca1358caa3d21ffb9743d85db2f
Submitter: "Zuul (22348)"
Branch: master

commit 0de73866cd317ca1358caa3d21ffb9743d85db2f
Author: cpompeud <email address hidden>
Date: Mon May 22 16:45:04 2023 -0300

    Introduce logrotate for /var/log/rss-memory.log

    This log rotation config for /var/log/rss-memory.log used
    for memory growth debugging

    Test Plan:
            - PASS: Build an image, install and bootstrap successfully
            - PASS: Apply monitor pods so addon logs would be installed.
            - PASS: Check that log entries are correctly displayed.

    Partial-Bug: 2019007
    Depends-On: https://review.opendev.org/c/starlingx/monitoring/+/883866

    Change-Id: Ia440154482cc9907bf43670390cf85efee18960b
    Signed-off-by: cpompeud <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/c/starlingx/monitoring/+/883845
Committed: https://opendev.org/starlingx/monitoring/commit/fc336f95b6c772721d48b79a2665f7dd4df5c34e
Submitter: "Zuul (22348)"
Branch: master

commit fc336f95b6c772721d48b79a2665f7dd4df5c34e
Author: Cesar Bombonate <email address hidden>
Date: Mon May 22 13:43:39 2023 +0000

    Add additional logging for Collectd and fix non descriptive output.

    This change adds additional logging for pods not in the kube-system
    or in the kube-addon namespace that are logged every 30 minutes.

    Additionally we have added additional information for pods
     where the UID was not found.

    The logs now include entries for pods outside of
     kube-addon and kube-system namespaces:
    2023-05-12T15:00:42.351 controller-0 collectd[72599]: info The pod:
    cm-cert-manager-55659b97c7-w52bq running in
    namespace:cert-manager has the following
    processes{95662: {'rss': 55248.0, 'name': 'controller'}
    , 95352: {'rss': 4.0, 'name': 'pause'}}

    Non descriptive logs exemplified below:
    2023-05-08T13:10:50.059 controller-0 collectd[72636]: info
     platform memory usage: uid 261d40cea94de12fc54c41279cf269c9 not found
    2023-05-08T13:10:50.059 controller-0 collectd[72636]: info
     platform memory usage: uid e90a2332-5753-48bc-a706-f611b9fa4f2e not found
    2023-05-08T13:10:50.059 controller-0 collectd[72636]: info
     platform memory usage: uid f38297b6-6940-437d-996b-addacb2cb330 not found

    Thus we have changed this to now include the podname and namespace:
    collectd.warning('%s: uid %s for pod %s not found in namespace %s' % (
                        PLUGIN, uid, pod.name, pod.namespace))

    Test Plan:
       - PASS: Build an image, install and bootstrap successfully
       - PASS: Apply monitor pods so addon logs would be installed.
       - PASS: Check that log entries are correctly displayed.

    Closes-Bug: 2019007
    Signed-off-by: Cesar Bombonate <email address hidden>
    Change-Id: If9207b8d23aefe010d0475e36b0644343df911ea

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
assignee: nobody → Cesar Bombonate (cpompeud)
tags: added: stx.9.0 stx.monitor
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.