ntp alarms are not raised promptly when testing loss to ntp server

Bug #2042093 reported by Takamasa Takenaka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Takamasa Takenaka

Bug Description

Brief Description
-----------------
When ntp server becomes unreachable, the alarm is not raised promptly.

Severity
--------
Minor

Steps to Reproduce
------------------
1. Make ntp server unreachable
2. Observe alarm

Expected Behavior
------------------
After some time, alarm is raised

Actual Behavior
----------------
It took about a couple of hours before the alarm was raised.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any configuration

Branch/Pull Time/Commit
-----------------------
master

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Developer Testing

Workaround
----------
N/A

Changed in starlingx:
assignee: nobody → Takamasa Takenaka (ttakenak)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/899423
Committed: https://opendev.org/starlingx/stx-puppet/commit/7ca9a4c32387ccb663475a250630eb500afd8867
Submitter: "Zuul (22348)"
Branch: master

commit 7ca9a4c32387ccb663475a250630eb500afd8867
Author: Takamasa Takenaka <email address hidden>
Date: Thu Oct 26 14:51:04 2023 -0300

    Add maxpoll for ntp server/peer

    Ntp related alarm is raised/cleared by the script ntpq.py in
    monitoring repo. This script analyzes the output of the command
    "ntpq -np". ntpq polls to ntp servers/peers every interval time
    (By default, minpoll=6(2^6=64sec), maxpoll=10(1024sec)). This
    interval time is gradually increased by ntp starting from minpoll
    up to maxpoll.

    The 8-bit p.reach shift register in the poll is used to determine
    whether the server is reachable, and the data is fresh. If the
    register contains any nonzero bits, the server is considered
    reachable. Otherwise, it is unreachable.

    Once the ntp server stabilizes its polling, the interval time can
    be as long as 1024 seconds. Then if the server becomes unreachable,
    max time to recognize as unreachable is 1024sec * 8 (about 2hours
    and 15 minutes). The alarm will not be raised or cleared until a
    couple of hours later

    This fix set maxpoll=7(128sec) to shorten the interval. This
    change will raise/clear alarms occasionally (ntpq.py will run
    every 5 minutes. Unreachable detection will take about 17 minutes.
    It might need up to 25 minutes to raise/clear ntp alarms).

    maxpoll is set with 7 as short enough but not too short (consider
    large amount of traffic). (The allowable range is 4 (16 s) to
    17 (36.4 h))

    Closes-bug: 2042093

    Test Plan:
    PASS: Build package and ISO. Fresh installation successfully.
    PASS: Configure ntp server and confirm server/peer entry
          in /etc/ntp.conf has "maxpoll 7" option
    PASS: Set reachable ntp server and wait more than 2 hours,
          confirm poll interval does not become bigger than 128.
    PASS: Make reachable ntp server unreachable. Confirm expected
          ntp alarms are raised in reasonable time
          (up to about 30 minutes)
    PASS: Make unreachable ntp server reachable. Confirm existing
          ntp alarms are cleared in reasonable time
          (up to about 30 minutes)

    Change-Id: I6ff6d754cf9b5526bb51d9277287dabd205bab1f
    Signed-off-by: Takamasa Takenaka <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.9.0 stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.