Comment 4 for bug 1975520

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/843139
Committed: https://opendev.org/starlingx/metal/commit/aaf9d080289b63a2d90fc9874b7bc91524bed0a1
Submitter: "Zuul (22348)"
Branch: master

commit aaf9d080289b63a2d90fc9874b7bc91524bed0a1
Author: Eric MacDonald <email address hidden>
Date: Tue May 24 12:10:06 2022 +0000

    Mtce: Fix bmc password fetch error handling

    The mtcAgent process sometimes segfaults while trying to fetch
    the bmc password from a failing barbican process.

    With that issue fixed the mtcAgent sends the bmc access
    credentials to the hardware monitor (hwmond) process which
    then segfaults for a reason similar

    In cases where the process does not segfault but also does not
    get a bmc password, the mtcAgent will flood its log file.

    This update

     1. Prevents the segfault case by properly managing acquired
        json-c object releases. There was one in the mtcAgent and
        another in the hardware monitor (hwmond).

        The json_object_put object release api should only be called
        against objects that were created with very specific apis.
        See new comments in the code.

     2. Avoids log flooding error case by performing a password size
        check rather than assume the password is valid following the
        secret payload receive stage.

     3. Simplifies the secret fsm and error and retry handling.

     4. Deletes useless creation and release of a few unused json
        objects in the common jsonUtil and hwmonJson modules.

    Note: This update temporarily disables sensor and sensorgroup
          suppression support for the debian hardware monitor while
          a suppression type fix in sysinv is being investigated.

    Test Plan:

    PASS: Verify success path bmc password secret fetch
    PASS: Verify secret reference get error handling
    PASS: Verify secret password read error handling
    PASS: Verify 24 hr provision/deprov success path soak
    PASS: Verify 24 hr provision/deprov error path path soak
    PASS: Verify no memory leak over success and failure path soaking
    PASS: Verify failure handling stress soak ; reduced retry delay
    PASS: Verify blocking secret fetch success and error handling
    PASS: Verify non-blocking secret fetch success and error handling
    PASS: Verify secret fetch is set non-blocking
    PASS: Verify success and failure path logging
    PASS: Verify all of jsonUtil module manages object release properly
    PASS: Verify hardware monitor sensor model creation, monitoring,
                 alarming and relearning. This test requires suppress
                 disable in order to create sensor groups in debian.
    PASS: Verify both ipmi and redfish and switch between them with
                 just bm_type change.
    PASS: Verify all above tests in CentOS
    PASS: Verify over 4000 provision/deprovision cycles across both
                 failure and success path handling with no process
                 failures

    Closes-Bug: 1975520
    Signed-off-by: Eric MacDonald <email address hidden>
    Change-Id: Ibbfdaa1de662290f641d845d3261457904b218ff