Config out of date alarm is not clearing after lock and unlock during ntp server modify test

Bug #1824814 reported by Anujeyan Manokeran
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Austin Sun

Bug Description

Brief Description
-----------------
   During test on ntpservers modify config out of alarm was not clearing on controllers even after lock and unlock on standby controlle(controller-1).Due to this fail to swact because of the alarm(controller-1 target Config accd7638-35b5-480d-be8b-94a85c489733 not yet applied). This test is to validate ntp server address can be modified. After Below cli shows sequence of cli executed to see the issue.
Before the address change ntp server are as follows
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne ntp-show'
[2019-04-14 04:48:52,982] 387 DEBUG MainThread ssh.expect :: Output:
+--------------+----------------------------------------------+
| Property | Value |
+--------------+----------------------------------------------+
| uuid | fbdb9c1c-0c71-4088-b40b-5294b968e142 |
| enabled | True |
| ntpservers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org |
| isystem_uuid | c07fd33d-338e-4414-a786-3b3f4a73af34 |
| created_at | 2019-04-14T01:06:06.306086+00:00 |
| updated_at | 2019-04-14T01:16:27.727605+00:00 |

Ntp server are modified as below
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne ntp-modify ntpservers="1.pool.ntp.org,2.pool.ntp.org,2.pool.ntp.org"'
[2019-04-14 04:48:57,661] 387 DEBUG MainThread ssh.expect :: Output:
+--------------+----------------------------------------------+
| Property | Value |
+--------------+----------------------------------------------+
| uuid | fbdb9c1c-0c71-4088-b40b-5294b968e142 |
| enabled | True |
| ntpservers | 1.pool.ntp.org,2.pool.ntp.org,2.pool.ntp.org |
| isystem_uuid | c07fd33d-338e-4414-a786-3b3f4a73af34 |
| created_at | 2019-04-14T01:06:06.306086+00:00 |
| updated_at | 2019-04-14T01:16:27.727605+00:00 |
+--------------+----------------------------------------------+
Lock and unlock standby controllers
send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-1'
[2019-04-14 04:50:08,794] 387 DEBUG MainThread ssh.expect :: Output:
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1'
[2019-04-14 04:50:34,473] 387 DEBUG MainThread ssh.expect :: Output:

Swact to lock and unlock active controller

send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-0'
[2019-04-14 04:57:24,401] 387 DEBUG MainThread ssh.expect :: Output:
controller-1 target Config accd7638-35b5-480d-be8b-94a85c489733 not yet applied. Apply target Config via Lock/Unlock prior to Swact
[wrsroot@controller-0 ~(keystone_admin)]$

Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-04-14 04:49:00,793] 387 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+------------------------------------------------------------------------+-----------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+------------------------------------------------------------------------+-----------------------+----------+----------------------------+
| 707e83fc-5f75-483f-8b93-4922a75f8e73 | 250.001 | controller-1 Configuration is out-of-date. | host=controller-1 | major | 2019-04-14T04:48:57.522005 |
| 3a1f8ccf-8bc2-4539-8380-c647e1fe3a48 | 250.001 | controller-0 Configuration is out-of-date. | host=controller-0 | major | 2019-04-14T04:48:57.457711 |
| 07c2a434-b339-49da-8f79-eb526fd97c71 | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2019-04-14T04:48:12.367360 |
+--------------------------------------+----------+------------------------------------------------------------------------+-----------------------+----------+----------------------------+

Severity
--------
Provide the severity of the defect.
Major
Steps to Reproduce
------------------
1. Install any regular system
2. Check NTP and modify the server address.
'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne ntp-show'
Ntp server are modified as below
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne ntp-modify ntpservers="1.pool.ntp.org,2.pool.ntp.org,2.pool.ntp.org"'
[2019-04-14 04:48:57,661] 387 DEBUG MainThread ssh.expect :: Output:
+--------------+----------------------------------------------+
| Property | Value |
+--------------+----------------------------------------------+
| uuid | fbdb9c1c-0c71-4088-b40b-5294b968e142 |
| enabled | True |
| ntpservers | 1.pool.ntp.org,2.pool.ntp.org,2.pool.ntp.org |
| isystem_uuid | c07fd33d-338e-4414-a786-3b3f4a73af34 |
| created_at | 2019-04-14T01:06:06.306086+00:00 |
| updated_at | 2019-04-14T01:16:27.727605+00:00 |
+--------------+----------------------------------------------+

3. lock and unlock standby

Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-1'
[2019-04-14 04:50:08,794] 387 DEBUG MainThread ssh.expect :: Output:
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1'
[2019-04-14 04:50:34,473] 387 DEBUG MainThread ssh.expect :: Output:

4. Swact to lock and unlock active controller
send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-0'
[2019-04-14 04:57:24,401] 387 DEBUG MainThread ssh.expect :: Output:
controller-1 target Config accd7638-35b5-480d-be8b-94a85c489733 not yet applied. Apply target Config via Lock/Unlock prior to Swact
[wrsroot@controller-0 ~(keystone_admin)]$

Expected Behavior
------------------
No alarm after lock and unlock and able to swact .
Actual Behavior
----------------
As per description lock and unlock on standby controller doesn’t clear alarm.
Reproducibility
---------------
Yes 100%

System Configuration
--------------------
Regular system
Branch/Pull Time/Commit
-----------------------
2019-04-10 01:30:00 +0000"
Last Pass
---------
n/a
Timestamp/Logs
--------------
n/a
Test Activity
-------------
Regression Test

Revision history for this message
Maria Yousaf (myousaf) wrote :

I also observed this behaviour on a standard 2+X system. System recovery was not possible in my case.

Ghada Khalil (gkhalil)
summary: - Config out of alarm is not clearing after lock and unlock during ntp
- server modify test
+ Config out of date alarm is not clearing after lock and unlock during
+ ntp server modify test
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating given the issue is reproducible. Needs further investigation.

tags: added: stx.2.0 stx.config
tags: added: stx.retestneeded
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Cindy Xie (xxie1)
Austin Sun (sunausti)
Changed in starlingx:
assignee: Cindy Xie (xxie1) → Austin Sun (sunausti)
Revision history for this message
Austin Sun (sunausti) wrote :

HI, I can not reproduce this issue, in my setup, could you provide more detail log?

Austin Sun (sunausti)
Changed in starlingx:
status: Triaged → Incomplete
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Please let me know which logs needed .
thanks
Anujeyan

Revision history for this message
Austin Sun (sunausti) wrote :

when you meet this issue, could you run collect command and share the tar ball ?

Thanks.
BR
Austin Sun

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

I have attached logs that was captured from original issue.

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

This issue was reproduce in load 20190508T013000Z . Seen during the install after configuring ntp . To clear alarm need to lock and unlock controller-0.

Revision history for this message
Austin Sun (sunausti) wrote :

Thanks. from the controller-0 sysinv log. this is similar with https://bugs.launchpad.net/starlingx/+bug/1828271

2019-04-14 05:38:34.321 178882 WARNING sysinv.conductor.manager [req-e00470c5-4b6c-455c-8573-21ffdad8bb13 admin admin] controller-1: iconfig out of date: target ebc986fc-9051-4bc7-b76b-8a4fbe23f322, applied 6bc986fc-9051-4bc7-b76b-8a4fbe23f322

The reboot flag is not use correct. could you watch bug#1828271, once change is merged . we can reproduce this issue with latest.

Revision history for this message
Austin Sun (sunausti) wrote :

Mark this as duplicate to #1829260 , since it should be same root cause , the config uuid (reboot) is not right.the fix of #1829260 was just merged. so if you reproduce this issue from today , please re-open it.

Changed in starlingx:
status: Incomplete → Confirmed
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Duplicate bug https://bugs.launchpad.net/starlingx/+bug/1829260 has been addressed by review: https://review.opendev.org/#/c/658391/
The fix merged in stx master on May 15

Changed in starlingx:
status: Confirmed → Fix Released
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Verified in load "2019-06-13

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.