Subcloud periodically loses NTP sync even with the NTP servers pingable

Bug #2017697 reported by Caio Bruchert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Caio Bruchert

Bug Description

Brief Description
-----------------
Lab nodes (subcloud) will lose NTP sync. The node is syncing fine but eventually lose sync with a daemon log message from ntpd, deleting vlan401 interface.

Context/Details:
-----------------
- Seen on a ZT Triton and ZT Proteous, losing NTP sync. It took 2 days for this issue to surface,

Severity
-----------------
Major

Steps to Reproduce
-----------------
- Configure NTP and leave it running for about 2-3 days.
- ntpd reports it's deleting the oam interface:

2023-01-29T21:26:17.000 controller-0 ntpd[137389]: info Deleting interface #13 vlan410, 2607:f160:10:8219:ce:40a:0:f401#123, interface stats: received=477, sent=477, dropped=0, active_time=186141 secs

#Then the NTP reach count goes to 1
    Line 5740: 2023-01-29T21:29:18.212 controller-0 collectd[138921]: info NTPQ: 2607:f160:10:8077:ce:40a:0:2 .XFAC. 16 u - 1024 0 0.000 0.000 0.000
    Line 5741: 2023-01-29T21:29:18.212 controller-0 collectd[138921]: info NTPQ: 2607:f160:10:9200::a 10.139.252.25 2 u 1 64 1 1.409 -0.435 0.000
    Line 5744: 2023-01-29T21:29:18.216 controller-0 collectd[138921]: info NTPQ: 2607:f160:10:9200::b 10.139.252.25 2 u 1 64 1 1.317 1.299 0.000

#NTP servers declared unreachable.
2023-01-29T21:29:18.222 controller-0 collectd[138921]: info NTP query plugin raised alarm 100.114:host=controller-0=2607:f160:10:9200::b
2023-01-29T21:29:18.223 controller-0 collectd[138921]: info NTP query plugin added '2607:f160:10:9200::b' to unreachable servers list: ['2607:f160:10:9200::a', '2607:f160:10:9200::b']

#Until the collect taken, vlan410 gets deleted from ntpd every 30 seconds.
    Line 43998: 2023-01-30T12:38:44.000 controller-0 ntpd[137389]: info Listen normally on 22960 vlan410 2607:f160:10:8219:ce:40a:0:f401 UDP 123
    Line 44066: 2023-01-30T12:39:15.000 controller-0 ntpd[137389]: info Deleting interface #22960 vlan410, 2607:f160:10:8219:ce:40a:0:f401#123, interface stats: received=2, sent=2, dropped=0, active_time=31 secs

--------------------------------------------------------------------
Mon Jan 30 15:19:04 UTC 2023 : : fm alarm-list
--------------------------------------------------------------------
+----------+----------------------------------------------------------------------------+----------------------------------------+----------+----------------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+----------------------------------------------------------------------------+----------------------------------------+----------+----------------------------+
| 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0 | major | 2023-01-29T21:29:18.222937 |
| 100.114 | NTP address 2607:f160:10:9200::b is not a valid or a reachable NTP server. | host=controller-0=2607:f160:10:9200::b | minor | 2023-01-29T21:29:18.216934 |
| 100.114 | NTP address 2607:f160:10:9200::a is not a valid or a reachable NTP server. | host=controller-0=2607:f160:10:9200::a | minor | 2023-01-29T21:29:18.212782 |

Key logs:
fm-event
daemon
kernel
alarms.info

System Configuration
-----------------
CentOS build

Subcloud
AIO-SX
458: vlan410@ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1380 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether b4:96:91:b5:89:64 brd ff:ff:ff:ff:ff:ff

    Mon Jan 30 15:19:10 UTC 2023 : : ethtool -i ens1f0
--------------------------------------------------------------------
driver: ice
version: 1.5.8
firmware-version: 2.54 0x8000cf15 1.2960.0
expansion-rom-version:
bus-info: 0000:18:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Load info (eg: 2022-03-10_20-00-07)

Last Pass
---------
Did this test scenario pass previously? If so, please indicate the load/pull time info of the last pass.

Use this section to also indicate if this is a new test scenario.

Timestamp/Logs
--------------
See above

Workaround
----------
No workaround

Caio Bruchert (cbrucher)
Changed in starlingx:
assignee: nobody → Caio Bruchert (cbrucher)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/881511

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.networking
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/881511
Committed: https://opendev.org/starlingx/stx-puppet/commit/e2b54be8e40a30959ac2faf90291ffc290a3a1fb
Submitter: "Zuul (22348)"
Branch: master

commit e2b54be8e40a30959ac2faf90291ffc290a3a1fb
Author: Caio Bruchert <email address hidden>
Date: Tue Apr 25 17:19:01 2023 -0300

    Fix ntpd losing sync after some days

    The default ntpd configuration enables network interfaces scanning and
    this is causing ntpd to lose sync after about 2 days and 9 to 10 hours.

    This fix disables ntpd interface scanning by adding the -U 0 option.

    Note: this was detected on CentOS and both CentOS and Debian will have
    add the same option to maintain consistency.

    Test Plan:
    PASSED: Debian: check that ntpd -U 0 configuration is applied
    PASSED: Debian: wait for more than 5 days and check that ntp sync is still working

    Closes-Bug: 2017697

    Change-Id: I1c2727b71d71bf03966c834c470bd225e2a95c81
    Signed-off-by: Caio Bruchert <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.