Unexpected NTP alarms seen at end of AIO-SX install

Bug #1800909 reported by Maria Yousaf
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
On install of AIO-SX labs, there is an unexpected NTP alarm present.

Severity
--------
Major

Steps to Reproduce
------------------
1. Install an AIO-SX lab
2. When install/configuration is complete, view fm alarm-list:

(2018-10-31 05:49:35) [INFO] [MainThread] Output:
+----------+------------------------------------------------------------------------+-----------------------+----------+----------------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+------------------------------------------------------------------------+-----------------------+----------+----------------------------+
| 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2018-10-31T05:29:17.898247 |
+----------+------------------------------------------------------------------------+-----------------------+----------+---------------------------

This alarm does not seem to clear. The problem seem to be AIO-SX specific. Checking other lab types, I'm not seeing this error.

This ends up impacting automated regression that runs on AIO-SX.

Here are the settings used for this lab:

DNS -> 8.8.8.8, 4.4.4.4
NTP -> 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org

Expected Behavior
------------------
System has unexpected alarms at the end of an install

Actual Behavior
----------------
System is alarm free at the end of an install

Reproducibility
---------------
Reproducible. Seen on at least two different labs (different loads)

System Configuration
--------------------
One node system

Branch/Pull Time/Commit
-----------------------
stx master as of 2018-10-30_20-18-00

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Targeting stx.2019.03 as this issue seems to be reproducible in multiple all-in-one simplex labs

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Kristine Bujold (kbujold)
tags: added: stx.2019.03 stx.config
Revision history for this message
Dariush Eslimi (deslimi) wrote :

Triaged by Eric MacDonald:

Issue is that in SX systems there is no peer controller IP that shows up in the ntp query server list and the bash code is not handling that well.

Changing the ntp query script from/to fixes the issue.

controller_host_ip=$(getControllerIP $server_list)
server_list=$(echo $server_list | grep -v $controller_host_ip)

to

controller_host_ip=$(getControllerIP $server_list)
if [ "${controller_host_ip}" != "" ] ; then
    server_list=$(echo $server_list | grep -v $controller_host_ip)
fi

Should also consider removing the user.log err log generally or for the SX case.

"err Could not find the Controller's IP address"

Changed in starlingx:
assignee: Kristine Bujold (kbujold) → Bruce Jones (brucej)
Bruce Jones (brucej)
Changed in starlingx:
assignee: Bruce Jones (brucej) → Cindy Xie (xxie1)
Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

This issue is fixed with the following two updates.

https://review.openstack.org/#/c/628687/
https://review.openstack.org/#/c/628685/

The first moves ntp monitoring to collectd.
The second removes ntp monitoring from rmon.

This issue was specific to the rmon ntp query script.

Changed in starlingx:
assignee: Cindy Xie (xxie1) → Eric MacDonald (rocksolidmtce)
status: Triaged → Fix Released
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.