250.001 controller Configuration is out-of-date alarm not cleared after DX installation

Bug #1851874 reported by Peng Peng on 2019-11-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Austin Sun

Bug Description

Brief Description
-----------------
After DX fresh installed. Alarm "250.001 | controller-0 Configuration is out-of-date." was listed and not cleared.

Severity
--------
Major

Steps to Reproduce
------------------
install DX system

TC-name:

Expected Behavior
------------------
no 250.001 alarm

Actual Behavior
----------------

Reproducibility
---------------
Seen once

System Configuration
--------------------
Two node system
IPv6

Lab-name: WCP_78-79

Branch/Pull Time/Commit
-----------------------
2019-11-04_20-00-00

Last Pass
---------
not sure

Timestamp/Logs
--------------
[2019-11-08 08:58:14,558] 311 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-11-08 08:58:16,572] 433 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------------+
| 3cba2cdb-fd95-4ab2-9939-4ba6b6db1f3a | 100.114 | NTP cannot reach external time source; syncing with peer controller only | host=controller-1.ntp | minor | 2019-11-08T08:50:50.668608 |
| 3c61f744-a456-4f0d-b0bb-d7f51e7c0633 | 250.001 | controller-0 Configuration is out-of-date. | host=controller-0 | major | 2019-11-08T08:10:46.354822 |
+--------------------------------------+----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------------+

Test Activity
-------------
installation

Peng Peng (ppeng) wrote :
summary: 250.001 controller Configuration is out-of-date alarm not cleared after
- hbsClient killed and recovered
+ DX installation
Peng Peng (ppeng) on 2019-11-08
description: updated
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / medium priority as this would result in blocking subsequent swact operations.

tags: added: stx.config
Changed in starlingx:
assignee: nobody → Cindy Xie (xxie1)
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.3.0
Yang Liu (yliu12) on 2019-11-19
tags: added: stx.retestneeded
Cindy Xie (xxie1) on 2019-11-19
Changed in starlingx:
assignee: Cindy Xie (xxie1) → Austin Sun (sunausti)
Austin Sun (sunausti) wrote :

pengpeng:
  from statement ,
  250.001 controller Configuration is out-of-date alarm not cleared after
  - hbsClient killed and recovered
  + DX installation

  do you kill hbsClient manually and recovered manually ?
  would you like share the whole test log ?

Changed in starlingx:
status: Triaged → Incomplete
Austin Sun (sunausti) wrote :

From the sysinv log, the sequence should be:
L501:config_target:a29dbaf3-d60d-402b-8ad6-6e8147dd416b
L510:config_target:f5d83753-c485-4221-b4aa-e986f5421d6d
L528:config_applied change to a29dbaf3-d60d-402b-8ad6-6e8147dd416b.
L600~: controller-0 is unlocking and rebooting
L992:config_target a9c89b7e-51fe-40cb-8700-e22c83ed5fa9, but config_uuid=29c89b7e-51fe-40cb-8700-e22c83ed5fa9, since currently config_target(f5d83753-c485-4221-b4aa-e986f5421d6d) is reboot requested(but actually reboot already happened)
L1040:config_applied change to f5d83753-c485-4221-b4aa-e986f5421d6d.
L1048:config_applied=f5d83753-c485-4221-b4aa-e986f5421d6d, and target is a9c89b7e-51fe-40cb-8700-e22c83ed5fa9
L1060:Agent report config applied 29c89b7e-51fe-40cb-8700-e22c83ed5fa9
but found the config_applied(29c89b7e-51fe-40cb-8700-e22c83ed5fa9) is not equal config_target(a9c89b7e-51fe-40cb-8700-e22c83ed5fa9), so continue report alarm. but in this time, it should clear alarm.

Fix proposed to branch: master
Review: https://review.opendev.org/696030

Changed in starlingx:
status: Incomplete → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers