DC Scale lab | 250.001 alarm raised after powering on subcloud

Bug #2026759 reported by Karla Felix
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Karla Felix

Bug Description

Brief Description

250.001 alarm (controller-0 configuration is out-of-date), appeared after powering on subclouds

Severity

Major.

Steps to Reproduce

    Deploy 1000 virtual subclouds
    Install WRA on the System Controller and on the subclouds
    Power off all the subclouds
    Power on all the subclouds

Expected Behavior

All 1000 subclouds are back online without any alarms.

Actual Behavior

Subclouds in a degraded state raised 250.001 alarm

Reproducibility

Run 1: 1 out of 1000 subclouds

Run 2: 5 out of 1000 subclouds

System Configuration

Distributed Cloud (DC1000-2)

Load info (eg: 2022-03-10_20-00-07)

SW_VERSION="22.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2022-11-29_22-00-05"
SRC_BUILD_ID="11"

Last Pass

NA

Timestamp/Logs

// Collect All

System Controller: /folk/cgts_logs/CGTS-41723/ALL_NODES_20221207.123222.tar
Subcloud47: /folk/cgts_logs/CGTS-41723/subcloud47_20221207.122123.tar

// Subcloud47 degraded
$ dcmanager alarm summary | grep -v OK +--------------+-----------------+--------------+--------------+----------+----------+ | NAME | CRITICAL_ALARMS | MAJOR_ALARMS | MINOR_ALARMS | WARNINGS | STATUS | +--------------+-----------------+--------------+--------------+----------+----------+ | subcloud47 | 0 | 1 | 0 | 0 | degraded | +--------------+-----------------+--------------+--------------+----------+----------+

// 250.001 alarm

$ fm alarm-list
+----------+----------------------------------------------------------------------------------------------------------------------+-------------------+----------+----------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+----------------------------------------------------------------------------------------------------------------------+-------------------+----------+----------------------+
| 250.001 | controller-0 Configuration is out-of-date. (applied: 0db3c488-8b9d-4a73-ab13-6d7e2f1ebbe0 target: | host=controller-0 | major | 2022-12-07T09:45:16. |
| | 1e7583c2-2e50-4272-8800-f3c35e7f70ce) | | | 351124 |

[sysadmin@controller-0 ~(keystone_admin)]$ date
Wed 07 Dec 2022 12:18:01 PM UTC

 // target UUID grep in sysinv.log
$ grep "1e7583c2-2e50-4272-8800-f3c35e7f70ce" /var/log/sysinv.log sysinv 2022-12-07 09:45:16.175 73386 INFO sysinv.conductor.manager [-] _config_update_hosts personalities=['controller'] host_uuids=['b8ed3bbe-5d82-49e1-b61b-d488e520518c'] reboot=False config_uuid=1e7583c2-2e50-4272-8800-f3c35e7f70ce tb= File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 5826, in _controller_config_active_apply sysinv 2022-12-07 09:45:16.224 73386 INFO sysinv.conductor.manager [-] Setting config target of host 'controller-0' to '1e7583c2-2e50-4272-8800-f3c35e7f70ce'. sysinv 2022-12-07 09:45:16.347 73386 WARNING sysinv.conductor.manager [-] controller-0: iconfig out of date: target 1e7583c2-2e50-4272-8800-f3c35e7f70ce, applied 0db3c488-8b9d-4a73-ab13-6d7e2f1ebbe0 sysinv 2022-12-07 09:45:16.349 73386 WARNING sysinv.conductor.manager [-] SYS_I Raise system config alarm: host controller-0 config applied: 0db3c488-8b9d-4a73-ab13-6d7e2f1ebbe0 vs. target: 1e7583c2-2e50-4272-8800-f3c35e7f70ce. sysinv 2022-12-07 09:45:16.439 73386 INFO sysinv.conductor.manager [-] _config_update_hosts config_uuid=1e7583c2-2e50-4272-8800-f3c35e7f70ce sysinv 2022-12-07 09:45:16.470 73386 INFO sysinv.conductor.manager [-] applying runtime manifest config_uuid=1e7583c2-2e50-4272-8800-f3c35e7f70ce, classes: ['openstack::keystone::endpoint::runtime', 'platform::firewall::runtime'] sysinv 2022-12-07 09:45:16.553 73386 INFO sysinv.puppet.puppet [-] Updating hiera for host: controller-0 with config_uuid: 1e7583c2-2e50-4272-8800-f3c35e7f70ce

// Config_applied

root@controller-0:/var/home/sysadmin# cat /etc/platform/.config_applied

Alarms

Subcloud was free of alarms after the deployment

Test Activity

Scalability Testing

Workaround

Lock/Unlock subcloud controller-0

Karla Felix (kkarolin)
Changed in starlingx:
assignee: nobody → Karla Felix (kkarolin)
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.config stx.distcloud
tags: added: stx.9.0
Revision history for this message
John Kung (john-kung) wrote :
Changed in starlingx:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.