Comment 8 for bug 1851043

Revision history for this message
Kristine Bujold (kbujold) wrote :

I am seeing errors in the daemon.log regarding raising and clearing the NTP alarms "Failed to execute clear_fault"

2019-11-02T03:13:23.877 controller-1 collectd[113678]: info WARNING:root:fm_python_extension: Failed to connect to FM manager
2019-11-02T03:13:23.880 controller-1 collectd[113678]: info NTP query plugin 'clear_fault' exception ; 100.114:host=controller-1.ntp ; Failed to execute clear_fault.
2019-11-02T03:13:23.884 controller-1 collectd[113678]: info NTP query plugin 100.114:host=controller-1.ntp=64:ff9b::ce6c:83 alarm cleared
2019-11-02T03:13:23.885 controller-1 collectd[113678]: info NTP query plugin selected server changed from 'None' to '64:ff9b::ce6c:83'
2019-11-02T03:13:23.885 controller-1 collectd[113678]: info NTP query plugin reachable servers: ['64:ff9b::ce6c:83']

The alarm for "64:ff9b::ce6c:83" appears to eventually get cleared 20 minutes later

fm-manager.log:2019-11-02T03:53:23.836 fmMsgServer.cpp(398): Raising Alarm/Log, (100.114) (host=controller-1.ntp=64:ff9b::ce6c:83)
fm-manager.log:2019-11-02T03:53:23.838 fmMsgServer.cpp(421): Alarm created/updated: (100.114) (host=controller-1.ntp=64:ff9b::ce6c:83) (2) (8606362d-0906-4610-b4df-983d11f1eb9d)

fm-manager.log:2019-11-02T04:13:23.837 fmMsgServer.cpp(494): Deleted alarm: (100.114) (host=controller-1.ntp=64:ff9b::ce6c:83)

But also I also see these generic NTP alarms

cat alarms.info
--------------------------------------------------------------------
Sat Nov 2 14:21:24 UTC 2019 : : fm alarm-list
--------------------------------------------------------------------
+----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------+
| 100.114 | NTP cannot reach external time source; syncing with peer controller only | host=controller-0.ntp | minor | 2019-11-02T14:14:05. |
| | | | | 895979 |
| | | | | |
| 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-1.ntp | major | 2019-11-02T03:03:25. |
| | | | | 836876 |
| | | | | |
+----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------+

These should of been cleared based on the ntpq output

2019-11-02T14:13:23.807 controller-1 collectd[113678]: info NTP query plugin server list: ['0.pool.ntp.org', '1.pool.ntp.org', '3.pool.ntp.org']
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: +abcd:204::2
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: 232.178.14.219 3 u 689 1024 377 0.084 0.607 0.352
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: *64:ff9b::ce6c:83
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: .PTP0. 1 u 992 1024 377 8.263 -0.172 0.120
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: +64:ff9b::c7b6:ccc5
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: 219.119.208.14 2 u 754 1024 377 76.361 0.814 0.161
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: +64:ff9b::8180:c14
2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: 172.30.248.10 2 u 757 1024 377 47.996 -1.377 0.182

Maybe there is an issue with the NTP query plugin code in collectd?