StarlingX

ntp alarms are not raised promptly when testing loss to ntp server

Bug #2042093 reported by Takamasa Takenaka on 2023-10-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Low	Takamasa Takenaka

Bug Description

Brief Description
-----------------
When ntp server becomes unreachable, the alarm is not raised promptly.

Severity
--------
Minor

Steps to Reproduce
------------------
1. Make ntp server unreachable
2. Observe alarm

Expected Behavior
------------------
After some time, alarm is raised

Actual Behavior
----------------
It took about a couple of hours before the alarm was raised.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any configuration

Branch/Pull Time/Commit
-----------------------
master

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Developer Testing

Workaround
----------
N/A

Tags:

Takamasa Takenaka (ttakenak) on 2023-10-31

Changed in starlingx:
assignee:	nobody → Takamasa Takenaka (ttakenak)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-11-01: Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/899423
Committed: https://opendev.org/starlingx/stx-puppet/commit/7ca9a4c32387ccb663475a250630eb500afd8867
Submitter: "Zuul (22348)"
Branch: master

commit 7ca9a4c32387ccb663475a250630eb500afd8867
Author: Takamasa Takenaka <email address hidden>
Date: Thu Oct 26 14:51:04 2023 -0300

Add maxpoll for ntp server/peer

    Ntp related alarm is raised/cleared by the script ntpq.py in
    monitoring repo. This script analyzes the output of the command
    "ntpq -np". ntpq polls to ntp servers/peers every interval time
    (By default, minpoll=6(2^6=64sec), maxpoll=10(1024sec)). This
    interval time is gradually increased by ntp starting from minpoll
    up to maxpoll.

    The 8-bit p.reach shift register in the poll is used to determine
    whether the server is reachable, and the data is fresh. If the
    register contains any nonzero bits, the server is considered
    reachable. Otherwise, it is unreachable.

    Once the ntp server stabilizes its polling, the interval time can
    be as long as 1024 seconds. Then if the server becomes unreachable,
    max time to recognize as unreachable is 1024sec * 8 (about 2hours
    and 15 minutes). The alarm will not be raised or cleared until a
    couple of hours later

    This fix set maxpoll=7(128sec) to shorten the interval. This
    change will raise/clear alarms occasionally (ntpq.py will run
    every 5 minutes. Unreachable detection will take about 17 minutes.
    It might need up to 25 minutes to raise/clear ntp alarms).

    maxpoll is set with 7 as short enough but not too short (consider
    large amount of traffic). (The allowable range is 4 (16 s) to
    17 (36.4 h))

Closes-bug: 2042093

    Test Plan:
    PASS: Build package and ISO. Fresh installation successfully.
    PASS: Configure ntp server and confirm server/peer entry
          in /etc/ntp.conf has "maxpoll 7" option
    PASS: Set reachable ntp server and wait more than 2 hours,
          confirm poll interval does not become bigger than 128.
    PASS: Make reachable ntp server unreachable. Confirm expected
          ntp alarms are raised in reasonable time
          (up to about 30 minutes)
    PASS: Make unreachable ntp server reachable. Confirm existing
          ntp alarms are cleared in reasonable time
          (up to about 30 minutes)

Change-Id: I6ff6d754cf9b5526bb51d9277287dabd205bab1f
Signed-off-by: Takamasa Takenaka <email address hidden>

Reviewed:  https://review.opendev.org/c/starlingx/stx-puppet/+/899423
Committed: https://opendev.org/starlingx/stx-puppet/commit/7ca9a4c32387ccb663475a250630eb500afd8867
Submitter: "Zuul (22348)"
Branch:    master

commit 7ca9a4c32387ccb663475a250630eb500afd8867
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Thu Oct 26 14:51:04 2023 -0300

Add maxpoll for ntp server/peer
    
    Ntp related alarm is raised/cleared by the script ntpq.py in
    monitoring repo. This script analyzes the output of the command
    "ntpq -np". ntpq polls to ntp servers/peers every interval time
    (By default, minpoll=6(2^6=64sec), maxpoll=10(1024sec)). This
    interval time is gradually increased by ntp starting from minpoll
    up to maxpoll.
    
    The 8-bit p.reach shift register in the poll is used to determine
    whether the server is reachable, and the data is fresh. If the
    register contains any nonzero bits, the server is considered
    reachable. Otherwise, it is unreachable.
    
    Once the ntp server stabilizes its polling, the interval time can
    be as long as 1024 seconds. Then if the server becomes unreachable,
    max time to recognize as unreachable is 1024sec * 8 (about 2hours
    and 15 minutes). The alarm will not be raised or cleared until a
    couple of hours later
    
    This fix set maxpoll=7(128sec) to shorten the interval. This
    change will raise/clear alarms occasionally (ntpq.py will run
    every 5 minutes. Unreachable detection will take about 17 minutes.
    It might need up to 25 minutes to raise/clear ntp alarms).
    
    maxpoll is set with 7 as short enough but not too short (consider
    large amount of traffic). (The allowable range is 4 (16 s) to
    17 (36.4 h))
    
    Closes-bug: 2042093
    
    Test Plan:
    PASS: Build package and ISO. Fresh installation successfully.
    PASS: Configure ntp server and confirm server/peer entry
          in /etc/ntp.conf has "maxpoll 7" option
    PASS: Set reachable ntp server and wait more than 2 hours,
          confirm poll interval does not become bigger than 128.
    PASS: Make reachable ntp server unreachable. Confirm expected
          ntp alarms are raised in reasonable time
          (up to about 30 minutes)
    PASS: Make unreachable ntp server reachable. Confirm existing
          ntp alarms are cleared in reasonable time
          (up to about 30 minutes)
    
    Change-Id: I6ff6d754cf9b5526bb51d9277287dabd205bab1f
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

Changed in starlingx:
status:	In Progress → Fix Released

Ghada Khalil (gkhalil) on 2023-11-11

Changed in starlingx:
importance:	Undecided → Low
tags:	added: stx.9.0 stx.config

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.