sm amd sm-db fail to build

Bug #1917527 reported by Scott Little
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Bin Qian

Bug Description

Brief Description
-----------------
build fails on packages sm and sm-db
sm has a pre-existing buildRequires on sm-db-dev.
sm-db-dev has added an "#include "sm_failover_utils.h" in a recent update.
It failed to add a buildRequires on sm... the provider of that include file,
so the build fails. Adding the buildRequires will not solve it.
The buildRequires would then form an unresolvable circular loop.

Severity
--------
Critical

Steps to Reproduce
------------------
build-pkgs --clean
build-pkgs

Expected Behavior
------------------
build passes

Actual Behavior
----------------
build failes

Reproducibility
---------------
100%

System Configuration
--------------------
N/A

Branch/Pull Time/Commit
-----------------------
March 2 2021
Commit f39ca95924a0a44dc287c1a560fa9f6f52cdea51 vs 'ha'

Last Pass
---------
March 1 2021

Timestamp/Logs
--------------
15:41:18 Failed to build packages: sm-1.0.0-39.tis.src.rpm sm-db-1.0.0-40.tis.src.rpm

--------------
$MY_WORKSPACE/std/results/*/sm-db-1.0.0-40.tis/build.log:
...
sm_db_foreach.c:12:31: fatal error: sm_failover_utils.h: No such file or directory
 #include "sm_failover_utils.h"
                               ^
compilation terminated.

--------------
$MY_WORKSPACE/std/results/*/sm-1.0.0-39.tis/root.log:
...
DEBUG util.py:446: Error: No Package found for sm-db-dev

Test Activity
-------------
Build

Workaround
----------
revert the commit

Ghada Khalil (gkhalil)
tags: added: stx.5.0 stx.ha
Changed in starlingx:
assignee: nobody → Bin Qian (bqian20)
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / critical - recent commit causing build failure

Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ha (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ha/+/792251

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ha (f/centos8)
Download full text (20.2 KiB)

Reviewed: https://review.opendev.org/c/starlingx/ha/+/792251
Committed: https://opendev.org/starlingx/ha/commit/85bab5d2b394114feabe524504339a55eb8904e0
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 9f70df63fd0d83bf0f94d1b9ac2f98516d5971c8
Author: Bin Qian <email address hidden>
Date: Fri May 7 16:36:23 2021 -0400

    Fix no swact for failure of critical services

    This fix is to ensure keeping service failure counting over successful
    audit.

    When service enabled audit successfully completes, SM reset the service
    failure state. However it should not reset the service fail-count.
    The fail-count should only be reset after the grace period.

    Closes-Bug: 1893669
    Change-Id: I6996fe3f1c08c38da6f26243aee2b95b083069f0
    Signed-off-by: Bin Qian <email address hidden>

commit 0b99b594f83b7c626cc0c4f7dc970ce373a7b748
Author: Bin Qian <email address hidden>
Date: Tue May 4 11:33:43 2021 -0400

    Fix AIO-DX failover issues

    This fix is to fix AIO unexpected failover behaviors.
    1. active controller reboots itself when standby controller
       reboot/lost power
    2. standby controller becomes degraded after active controller
       reboot/lost power

    Closes-bug: 1927133
    Change-Id: If3c9f6251f689a89cd206c672092ba296f00bd6b
    Signed-off-by: Bin Qian <email address hidden>

commit cb5fa9510f3ebda66f9850ac697e542bf041ce8c
Author: Eric MacDonald <email address hidden>
Date: Tue Apr 27 09:43:00 2021 -0400

    Remove hbsAgent restart in failover failure recovery handling

    A forced reboot of the active controller in an AIO DC system
    puts SM into a failover failure recovery loop that prevents
    maintenance from detecting the heartbeat failure of the just-
    rebooted controller.

    The SM's failover failure recovery handling algorithm includes
    a self (sm process) restart preceded by a restart of the
    hbsAgent, both added by the following update last year.

    update: Add unhealthy state recovery audit to service management (sm)
    review: https://review.opendev.org/c/starlingx/ha/+/735219

    The self restart of SM was and is required in this case. However,
    the restart of the hbsAgent was only included as a safety measure,
    at the time, to ensure SM received updated cluster state info. The
    hbsAgent restart was only added at that time with the longer term
    intention to have it removed once the hbsAgent cluster state change
    notification improvement was implemented. That change is now
    implemented and merged by the following update.

    update: Mtce heartbeat cluster state change notification improvement
    review: https://review.opendev.org/c/starlingx/metal/+/769936

    Testing of the fix for the following issue in an AIO DC system
    resulted in the takeover controller not detecting a heartbeat loss
    of the just rebooted standby controller.

    title: Force active controller reboot results in a second reboot
    issue: https://bugs.launchpad.net/starlingx/+bug/1922584

    The hbsAgent is not able to detect the heartbeat loss of the just-
    booted controller because SM keeps re...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.