I am seeing errors in the daemon.log regarding raising and clearing the NTP alarms "Failed to execute clear_fault"
2019-11-02T03:13:23.877 controller-1 collectd[113678]: info WARNING:root:fm_python_extension: Failed to connect to FM manager 2019-11-02T03:13:23.880 controller-1 collectd[113678]: info NTP query plugin 'clear_fault' exception ; 100.114:host=controller-1.ntp ; Failed to execute clear_fault. 2019-11-02T03:13:23.884 controller-1 collectd[113678]: info NTP query plugin 100.114:host=controller-1.ntp=64:ff9b::ce6c:83 alarm cleared 2019-11-02T03:13:23.885 controller-1 collectd[113678]: info NTP query plugin selected server changed from 'None' to '64:ff9b::ce6c:83' 2019-11-02T03:13:23.885 controller-1 collectd[113678]: info NTP query plugin reachable servers: ['64:ff9b::ce6c:83']
The alarm for "64:ff9b::ce6c:83" appears to eventually get cleared 20 minutes later
fm-manager.log:2019-11-02T03:53:23.836 fmMsgServer.cpp(398): Raising Alarm/Log, (100.114) (host=controller-1.ntp=64:ff9b::ce6c:83) fm-manager.log:2019-11-02T03:53:23.838 fmMsgServer.cpp(421): Alarm created/updated: (100.114) (host=controller-1.ntp=64:ff9b::ce6c:83) (2) (8606362d-0906-4610-b4df-983d11f1eb9d)
fm-manager.log:2019-11-02T04:13:23.837 fmMsgServer.cpp(494): Deleted alarm: (100.114) (host=controller-1.ntp=64:ff9b::ce6c:83)
But also I also see these generic NTP alarms
cat alarms.info -------------------------------------------------------------------- Sat Nov 2 14:21:24 UTC 2019 : : fm alarm-list -------------------------------------------------------------------- +----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------+ | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------+ | 100.114 | NTP cannot reach external time source; syncing with peer controller only | host=controller-0.ntp | minor | 2019-11-02T14:14:05. | | | | | | 895979 | | | | | | | | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-1.ntp | major | 2019-11-02T03:03:25. | | | | | | 836876 | | | | | | | +----------+--------------------------------------------------------------------------+-----------------------+----------+----------------------+
These should of been cleared based on the ntpq output
2019-11-02T14:13:23.807 controller-1 collectd[113678]: info NTP query plugin server list: ['0.pool.ntp.org', '1.pool.ntp.org', '3.pool.ntp.org'] 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: +abcd:204::2 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: 232.178.14.219 3 u 689 1024 377 0.084 0.607 0.352 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: *64:ff9b::ce6c:83 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: .PTP0. 1 u 992 1024 377 8.263 -0.172 0.120 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: +64:ff9b::c7b6:ccc5 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: 219.119.208.14 2 u 754 1024 377 76.361 0.814 0.161 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: +64:ff9b::8180:c14 2019-11-02T14:13:23.836 controller-1 collectd[113678]: info NTPQ: 172.30.248.10 2 u 757 1024 377 47.996 -1.377 0.182
Maybe there is an issue with the NTP query plugin code in collectd?
I am seeing errors in the daemon.log regarding raising and clearing the NTP alarms "Failed to execute clear_fault"
2019-11- 02T03:13: 23.877 controller-1 collectd[113678]: info WARNING: root:fm_ python_ extension: Failed to connect to FM manager 02T03:13: 23.880 controller-1 collectd[113678]: info NTP query plugin 'clear_fault' exception ; 100.114: host=controller -1.ntp ; Failed to execute clear_fault. 02T03:13: 23.884 controller-1 collectd[113678]: info NTP query plugin 100.114: host=controller -1.ntp= 64:ff9b: :ce6c:83 alarm cleared 02T03:13: 23.885 controller-1 collectd[113678]: info NTP query plugin selected server changed from 'None' to '64:ff9b::ce6c:83' 02T03:13: 23.885 controller-1 collectd[113678]: info NTP query plugin reachable servers: ['64:ff9b: :ce6c:83' ]
2019-11-
2019-11-
2019-11-
2019-11-
The alarm for "64:ff9b::ce6c:83" appears to eventually get cleared 20 minutes later
fm-manager. log:2019- 11-02T03: 53:23.836 fmMsgServer. cpp(398) : Raising Alarm/Log, (100.114) (host=controlle r-1.ntp= 64:ff9b: :ce6c:83) log:2019- 11-02T03: 53:23.838 fmMsgServer. cpp(421) : Alarm created/updated: (100.114) (host=controlle r-1.ntp= 64:ff9b: :ce6c:83) (2) (8606362d- 0906-4610- b4df-983d11f1eb 9d)
fm-manager.
fm-manager. log:2019- 11-02T04: 13:23.837 fmMsgServer. cpp(494) : Deleted alarm: (100.114) (host=controlle r-1.ntp= 64:ff9b: :ce6c:83)
But also I also see these generic NTP alarms
cat alarms.info ------- ------- ------- ------- ------- ------- ------- ------- ----- ------- ------- ------- ------- ------- ------- ------- ------- ----- ----+-- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- --+---- ------- ------- -----+- ------- --+---- ------- ------- ----+ ----+-- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- --+---- ------- ------- -----+- ------- --+---- ------- ------- ----+ -0.ntp | minor | 2019-11- 02T14:14: 05. | -1.ntp | major | 2019-11- 02T03:03: 25. | ----+-- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- --+---- ------- ------- -----+- ------- --+---- ------- ------- ----+
-------
Sat Nov 2 14:21:24 UTC 2019 : : fm alarm-list
-------
+------
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 100.114 | NTP cannot reach external time source; syncing with peer controller only | host=controller
| | | | | 895979 |
| | | | | |
| 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller
| | | | | 836876 |
| | | | | |
+------
These should of been cleared based on the ntpq output
2019-11- 02T14:13: 23.807 controller-1 collectd[113678]: info NTP query plugin server list: ['0.pool.ntp.org', '1.pool.ntp.org', '3.pool.ntp.org'] 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: +abcd:204::2 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: 232.178.14.219 3 u 689 1024 377 0.084 0.607 0.352 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: *64:ff9b::ce6c:83 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: .PTP0. 1 u 992 1024 377 8.263 -0.172 0.120 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: +64:ff9b::c7b6:ccc5 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: 219.119.208.14 2 u 754 1024 377 76.361 0.814 0.161 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: +64:ff9b::8180:c14 02T14:13: 23.836 controller-1 collectd[113678]: info NTPQ: 172.30.248.10 2 u 757 1024 377 47.996 -1.377 0.182
2019-11-
2019-11-
2019-11-
2019-11-
2019-11-
2019-11-
2019-11-
2019-11-
Maybe there is an issue with the NTP query plugin code in collectd?