sysinv-conductor can lose pending hugepage count data

Bug #1882044 reported by Don Penney
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Don Penney

Bug Description

Brief Description
-----------------
A race condition exists during initial AIO configuration where sysinv-conductor can lose the hugepage _pending counts, between the initial sysinv configuration and the allocation of hugepages after the initial unlock. A query during this window can result in a nil pending value with pagecount of 0 (where the pagecount is actual configured hugepages), when a non-zero count has been requested, rather than continuing to show a non-zero pending value until the memory allocation is complete.

Severity
--------
Minor

Steps to Reproduce
------------------
On an AIO-SX system, configure a non-zero hugepage count. Post-unlock, query the system memory data in the window between sysinv-conductor launching, memory allocation is performed via worker puppet manifest, and the sysinv-agent audit updating sysinv-conductor with the newly configured hugepages.

Expected Behavior
------------------
sysinv-conductor should continue to report a non-zero pending value until its hugepage count is updated.

Actual Behavior
----------------
sysinv-conductor wipes the pending value upon update from sysinv-agent, when the admin state is unlocked. The sysinv-agent audit can result in this update being sent multiple times prior to hugepage allocation.

Reproducibility
---------------
Intermittent

System Configuration
--------------------
AIO

Branch/Pull Time/Commit
-----------------------
master branch, June 2, 2020

Last Pass
---------
n/a

Test Activity
-------------
Sanity

Don Penney (dpenney)
Changed in starlingx:
assignee: nobody → Don Penney (dpenney)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/733523
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=70f4967b435a9577226022b183f921c007af1acd
Submitter: Zuul
Branch: master

commit 70f4967b435a9577226022b183f921c007af1acd
Author: Don Penney <email address hidden>
Date: Thu Jun 4 05:06:25 2020 -0400

    Fix sysinv hugepage pending value handling in memory updates

    A race condition exists during initial AIO configuration where
    sysinv-conductor can lose the hugepage _pending counts, between the
    initial sysinv configuration and the allocation of hugepages after the
    initial unlock. A query during this window can result in a nil pending
    value with pagecount of 0 (where the pagecount is actual configured
    hugepages), when a non-zero count has been requested, rather than
    continuing to show a non-zero pending value until the memory
    allocation is complete.

    This commit updates the imemory_update_by_ihost function to check
    whether the pending value aligns with the configured hugepage count,
    prior to clearing it.

    Change-Id: I94fee45be2c3bdb2b7b80f9badcec4ff3f6fbbf8
    Closes-bug: 1882044
    Signed-off-by: Don Penney <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.4.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.