Simplex upgrade corrupts /etc/platform/platform.conf file

Bug #1891935 reported by Bart Wensley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Yuxing

Bug Description

Brief Description
-----------------
When an AIO-SX (simplex) upgrade is done, the /etc/platform/platform.conf file is corrupted by writing multiple copies of many of the key/value pairs in this file. For example:

[root@controller-0 log(keystone_admin)]# cat /etc/platform/platform.conf
nodetype=controller
subfunction=controller,worker
system_type=All-in-one
security_profile=standard
http_port=8080
INSTALL_UUID=5b767715-9d6c-4a6f-8958-e38ce744e73c
UUID=4fac1c66-6ad0-40e4-965b-f2786f4016d9
region_2_name=subcloud3
sdn_enabled=no
region_config=yes
distributed_cloud_role=subcloud
system_mode=simplex
region_1_name=subcloud3
sw_version=20.06
security_feature="nopti nospectre_v2 nospectre_v1"
vswitch_type=none
management_interface=vlan240
region_config=True
security_profile=standard
http_port=8080
region_2_name=subcloud3
sdn_enabled=no
region_config=yes
distributed_cloud_role=subcloud
system_mode=simplex
region_1_name=subcloud3
security_feature="nopti nospectre_v2 nospectre_v1"
vswitch_type=none
region_config=True
cluster_host_interface=vlan241
oam_interface=eno1

Additionally, when this file is being updated (and during database migrations) the new sysinv-fpga-agent process is left running, which could cause problems if it accesses the file (or the database) before the data migration is complete.

Severity
--------
Major: We haven't seen any specific breakages yet, but there is the risk this is going to cause issues on an AIO-SX system that has been upgraded.

Steps to Reproduce
------------------
Upgrade an AIO-SX from one release to another (e.g. stx.4.0 to stx.5.0).

Expected Behavior
------------------
The /etc/platform/platform.conf file should be migrated from the old release to the new, keeping entries, updating entries and adding new entries as appropriate. Any services that could be using this file should be stopped before the file is updated.

Actual Behavior
----------------
The file ends up with duplicate entries. The sysinv-fpga-agent service is not stopped.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
AIO-SX (One node system)

Branch/Pull Time/Commit
-----------------------
Seen in stx.4.0 load.

Last Pass
---------
Never

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Feature Testing

Workaround
----------
None

tags: added: stx.update
Changed in starlingx:
assignee: nobody → David Sullivan (dsullivanwr)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium - should be cleaned up as part of further upgrade framework support

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.5.0
Changed in starlingx:
assignee: David Sullivan (dsullivanwr) → Yuxing (yuxing)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/762018

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/762554

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/762554
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/dd3e8d1106878896eea3458ef68acb1e1f562249
Submitter: Zuul
Branch: master

commit dd3e8d1106878896eea3458ef68acb1e1f562249
Author: Yuxing Jiang <email address hidden>
Date: Thu Nov 12 12:09:15 2020 -0500

    Stop sysinv-fpga-agent during data migration

    This commit stops the sysinv-fpga-agent service before restoring the
    postgres data and platform data, and restarts this service after the
    migration finishes. As the sysinv-fpga-agent service does not run on
    standard(not AIO) controllers, the stop and restart of this service
    only works on AIO systems.

    Change-Id: Ic39a09ff757806a3c4226cbe1ea2057c875c5805
    Partial-Bug: 1891935
    Signed-off-by: Yuxing Jiang <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/762018
Committed: https://opendev.org/starlingx/config/commit/98f5d692151d88df46470b162fb2ceeade1dca69
Submitter: Zuul
Branch: master

commit 98f5d692151d88df46470b162fb2ceeade1dca69
Author: Yuxing Jiang <email address hidden>
Date: Mon Nov 9 19:37:59 2020 -0500

    Merges key/value pairs in platform.conf

    Change migrate_platform_conf method inserts multiple copies of many
    key value pairs from the backup file to the new installed file which
    results duplicated values for a certain key. This commit
    updates/preserves the value for the keys to insure every key has a
    unique value.

    Tested with building a new iso and upgrading an AIOSX with the new
    iso, the upgrade activated and completed successfully. In the
    /etc/platform/platform.conf, every key has a unique value. The values
    of the keys listed in the "skip_options" list equals to the previous
    values before upgrading.

    Closes-bug: 1891935
    Depends-on: https://review.opendev.org/#/c/762554/
    Change-Id: I250f49d8ce4f2d06d068017c46ac8bd0c08b69be
    Signed-off-by: Yuxing Jiang <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.