Ansible: sysinv.conf not populated in /opt/platform/... on initial install

Bug #1829004 reported by Allain Legacy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Ovidiu Poncea

Bug Description

Brief Description
-----------------
After installation with Ansible and unlocking controller-0 the sysinv.conf configuration values must be copied up to the /opt/platform/sysinv/19.01/sysinv.conf.default in a deterministic way so that regardless of when a new node is provisioned following the unlock of the controller-0 it will be able to pull down a valid copy of the sysinv.conf file.

Tee already has a fix in progress for this issue, but it is not being tracked by a bug and since people are being asked to test with Ansible to ensure no regressions are introduced this needs to be tracked by a bug for better visibility. Teresa ran into this issue this morning and was unable to make progress with her system installation.

Severity
--------
Critical, unable to install and configure any nodes other than controller-0.

Steps to Reproduce
------------------
Install a system using Ansible, configure controller-0 and unlock, then configure all other nodes as per normal procedures. After the unlock observe that the
/opt/platform/sysinv/19.01/sysinv.conf.default file contains "localhost" in the rabbit attributes.

[wrsroot@controller-0 ~(keystone_admin)]$ cat /opt/platform/sysinv/19.01/sysinv.conf.default|grep rabbit_host
rabbit_host=localhost
rabbit_hosts=localhost:5672

Expected Behavior
------------------
/opt/platform/sysinv/19.01/sysinv.conf.default should contain valid values that when copied to newly installed nodes will enable their sysinv-agent process to report inventory to the active controller.

Actual Behavior
----------------
/opt/platform/sysinv/19.01/sysinv.conf.default contains default values that cannot be used by the sysinv-agent process running on the remote nodes to communicate with the active controller.

Reproducibility
---------------
Not sure. Seems like 100% in my testing.

System Configuration
--------------------
Standard and AIO-DX

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2019-05-12_20-18-02"

Last Pass
---------
unknown

Timestamp/Logs
--------------
n/a

Test Activity
-------------
Developer testing

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Tee Ngo (teewrs)
summary: - sysinv.conf not populated in /opt/platform/... on initial install
+ Ansible: sysinv.conf not populated in /opt/platform/... on initial
+ install
Revision history for this message
Tee Ngo (teewrs) wrote :

The generic fix for this issue has been merged https://opendev.org/starlingx/config/commit/1e56cdca940b7df1d407e37505d7f2fc38e1341f

However, the way Ceph OSD is configured for All-in-one systems has a side effect which causes an issue in AIODX.

Assigning to Ovidiu, who has been working on making Ceph a default storage backend user story, for resolution.

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Tee Ngo (teewrs) → Ovidiu Poncea (ovidiu.poncea)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; issue related to new ansible feature

tags: added: stx.2.0 stx.config
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

This is related to: https://bugs.launchpad.net/starlingx/+bug/1828271 once that fix is merged it should fix this issue too. Tee merged the change required to fix this. I'll test both issues when testing fix for 1828271.

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/658391
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=9720932899b69287871a419422880f04d618286f
Submitter: Zuul
Branch: master

commit 9720932899b69287871a419422880f04d618286f
Author: Ovidiu Poncea <email address hidden>
Date: Fri May 10 17:46:27 2019 +0300

    Fix missing reboot flag for config uuid on unlock

    Due to a limitation in config uuid functionality, on first unlock
    of controller-0, node remains in config-out-of-date as we loose
    the reboot flag.

    Example output after unlock:
    $ system host-show controller-0 | grep config
    | config_applied | 62228cc1-e5da-4f2e-a3c3-c468e9a46fb5 |
    | config_status | Config out-of-date |
    | config_target | e2228cc1-e5da-4f2e-a3c3-c468e9a46fb5 |

    The reboot flag is:
    CONFIG_REBOOT_REQUIRED = (1 << 127)

    We set config_target through sysinv and config_applied
    through puppet once manifests have applied. If there the reboot
    flag in config_target is set but not in config_applied we are
    "Config-out-of-date".

    On host-unlock or runtime manifest apply we set config_uuid in
    hieradata to e.g.:
    platform::config::params::config_uuid: \
       62228cc1-e5da-4f2e-a3c3-c468e9a46fb5

    Then, after runtime manifest apply or after reboot, sysinv-agent
    takes this value and updates config_applied.

    A config uuid with the reboot flag is passed to puppet ONLY when
    host is unlocked (which makes sense as this is when we do the
    reboot). Runtime manifests don't pass the reboot flag to puppet
    (it is a runtime, reboot flag has to remain).
    So, in our case, at unlock it is correctly set but then sysinv
    does a runtime manifest apply and resets it to a value w/o
    the reboot flag. Therefore, the reboot flag is no longer set,
    that's why even after unlock we still have Config-out-of-date.

    To fix the issue we generate a new config_uuid with the reboot
    flag set and we properly send it to puppet as the last operation
    we attempt before reboot.

    Change-Id: I12865d45f4456de81d72689f799441531a444bea
    Closes-Bug: #1828271
    Closes-Bug: #1829004
    Closes-Bug: #1829260
    Signed-off-by: Ovidiu Poncea <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.