NTP 100.14 alarm is not cleared

Bug #1812440 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
alarm 100.114 popped up during regression test suite run and it had not been cleared.

Severity
--------
Major

Steps to Reproduce
------------------

Expected Behaviour
------------------
All unexpected alarm should be cleared

Actual Behaviour
----------------
100.114 alarm not cleared

Reproducibility
---------------
Reproducible
rate: unknown

System Configuration
--------------------
Two node system
Low latency

Branch/Pull Time/Commit
-----------------------
master as of as of 2019-01-15_20-18-00

Timestamp/Logs
--------------
[2019-01-17 12:16:19,350] 262 DEBUG MainThread ssh.send :: Send 'sudo ntpq -pn'
[2019-01-17 12:16:19,476] 387 DEBUG MainThread ssh.expect :: Output:
     remote refid st t when poll reach delay offset jitter
==============================================================================
+192.168.204.4 144.217.65.183 3 u 12 64 377 0.111 1.903 0.156
+54.39.20.247 213.251.128.249 2 u 48 128 377 14.397 -0.533 0.201
*206.108.0.131 .PPS. 1 u 114 128 377 7.141 0.371 0.312
+149.56.121.16 213.251.128.249 2 u 51 128 377 14.227 -0.095 0.220
controller-0:~$
[2019-01-17 12:16:19,476] 262 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-01-17 12:16:19,580] 387 DEBUG MainThread ssh.expect :: Output:
0
controller-0:~$
[2019-01-17 12:16:19,580] 4006 INFO MainThread host_helper.wait_for_ntp_sync:: NTPQ status: controller-0 NTPQ is in healthy state; NTP alarms: ['100.114::::host=controller-0.ntp=206.108.0.131']
[2019-01-17 12:16:49,593] 423 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-01-17 12:16:49,594] 262 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-01-17 12:16:51,152] 387 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+---------------------------------------------------------------------+-------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+---------------------------------------------------------------------+-------------------------------------+----------+----------------------------+
| 92156e60-c4c4-4931-b82c-d80b1887fd70 | 100.114 | NTP address 206.108.0.131 is not a valid or a reachable NTP server. | host=controller-0.ntp=206.108.0.131 | minor | 2019-01-17T11:36:44.274685 |
+--------------------------------------+----------+---------------------------------------------------------------------+-------------------------------------+----------+----------------------------+
controller-0:~$

controller-1:~$ sudo ntpq -pn
Password:
     remote refid st t when poll reach delay offset jitter
==============================================================================
+192.168.204.3 206.108.0.131 2 u 843 1024 333 0.077 -0.914 0.494
*144.217.65.183 152.2.133.52 2 u 827 1024 377 14.563 1.353 0.404
+144.217.181.221 5.56.147.93 2 u 825 1024 377 14.530 -0.798 1.446
+159.203.8.72 192.5.41.209 2 u 288 1024 377 7.659 -2.216 2.103
controller-1:~$

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (master)

Reviewed: https://review.openstack.org/631888
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=abaff6b27525aaa91df53319f84004640f75e6a3
Submitter: Zuul
Branch: master

commit abaff6b27525aaa91df53319f84004640f75e6a3
Author: Eric MacDonald <email address hidden>
Date: Fri Jan 18 16:29:56 2019 -0500

    Remove alarm query before clear in NTP plugin

    Issue titled 'NTP 100.14 alarm is not cleared' exposed
    an issue where the NTP plugin alarm clear operation is
    circumvented when its pre-curser fm_api.get_fault call
    returns None if the fm process is not running.
    From the callers point of view the None return suggests
    that the alarm to be cleared does not exist so the code
    skips the call to clear.

    This update works around this by simply issuing the
    clear without the query.

    Change-Id: Idcc05bb0e7e1aa1082af1e8ecdcb1a5463b19440
    Closes-Bug: 1812440
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-integ (f/centos76)

Fix proposed to branch: f/centos76
Review: https://review.openstack.org/632506

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (f/centos76)
Download full text (3.2 KiB)

Reviewed: https://review.openstack.org/632506
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=7b3586b5f84bc0bc2eda34bd74bab50cb3e7bcc6
Submitter: Zuul
Branch: f/centos76

commit abaff6b27525aaa91df53319f84004640f75e6a3
Author: Eric MacDonald <email address hidden>
Date: Fri Jan 18 16:29:56 2019 -0500

    Remove alarm query before clear in NTP plugin

    Issue titled 'NTP 100.14 alarm is not cleared' exposed
    an issue where the NTP plugin alarm clear operation is
    circumvented when its pre-curser fm_api.get_fault call
    returns None if the fm process is not running.
    From the callers point of view the None return suggests
    that the alarm to be cleared does not exist so the code
    skips the call to clear.

    This update works around this by simply issuing the
    clear without the query.

    Change-Id: Idcc05bb0e7e1aa1082af1e8ecdcb1a5463b19440
    Closes-Bug: 1812440
    Signed-off-by: Eric MacDonald <email address hidden>

commit 7bb43963d30e77eae84873f497188e4018c21b74
Author: Jerry Sun <email address hidden>
Date: Tue Jan 15 09:47:33 2019 -0500

    Build registry-token-server without dep

    This change reworks the registry-token-server package spec with
    go dependencies downloaded at mirror-download time, rather than
    at build time. The dependencies (at fixed revisions) are
    extracted into the package's build tree for compilation.

    Story: 2002840
    Task: 22783
    Depends-On: https://review.openstack.org/#/c/631001/
    Change-Id: Ib7d745c6469beacf029195c3e6eaa4935f398483
    Signed-off-by: Jerry Sun <email address hidden>
    Signed-off-by: Jason McKenna <email address hidden>

commit 6db8e31b21b271f827f3c9cabf0f0558e8ca6b58
Author: Ovidiu Poncea <email address hidden>
Date: Thu Dec 20 09:10:00 2018 -0500

    Add StarlingX specific restart command for Ceph monitors

    Since we don't use systemd to manage Ceph and we have pmon monitoring we
    have to make sure that:
    1. Restarting is properly handled as "systemctl restart" will return
       error and manifest will fail;
    2. Pmon does not check ceph-mon status during restart. Otherwise we risk
       getting into a race condition between the puppet restart and pmon
       detecting that ceph is down and trying a restart.

    Both are resolved when using /etc/init.d/ceph-init-wrapper restart.

    Change-Id: Ie316bb611a006bbbc92ac22c52c3973cc9f15109
    Co-Authored-By: Ovidiu Poncea <email address hidden>
    Implements: containerization-2002844-CEPH-persistent-storage-backend-for-Kubernetes
    Story: 2002844
    Task: 28723
    Signed-off-by: Ovidiu Poncea <email address hidden>

commit d2a4c3d012d7863221ae059cc9cb7035fcdfcfb4
Author: Angie Wang <email address hidden>
Date: Mon Jan 14 14:53:10 2019 -0500

    Helm repository replication

    This updates the helm-upload to stop syncing charts to standby
    controller as charts are changed to store in drbd fs.

    Story: 2004520
    Task: 28343
    Depends-On: https://review.openstack.org/#/c/630763/
    Change-Id: I12f17fae6124650d878ba7a560f94b7a8ed36e56
   ...

Read more...

tags: added: in-f-centos76
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (f/stein)
Download full text (3.7 KiB)

Reviewed: https://review.openstack.org/632811
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=d6a1fd98d66811d2336efc640e8fb909b28ab19a
Submitter: Zuul
Branch: f/stein

commit ed8655fa77e774a442f5af565c749d03a6716f33
Author: Wei Zhou <email address hidden>
Date: Mon Jan 21 16:39:07 2019 -0500

    Move STX specific files from stx-ceph to stx-integ

    By moving STX specific files from stx-ceph to stx-integ, we
    decouple STX code from the upstream ceph repo. When making
    changes in those STX files, we don't need to make "pull
    request" in stx-ceph repo any more.

    Change-Id: Ifaaae452798561ddfa7557cf59b072535bec7687
    Story: 2002844
    Task: 28993
    Signed-off-by: Wei Zhou <email address hidden>

commit abaff6b27525aaa91df53319f84004640f75e6a3
Author: Eric MacDonald <email address hidden>
Date: Fri Jan 18 16:29:56 2019 -0500

    Remove alarm query before clear in NTP plugin

    Issue titled 'NTP 100.14 alarm is not cleared' exposed
    an issue where the NTP plugin alarm clear operation is
    circumvented when its pre-curser fm_api.get_fault call
    returns None if the fm process is not running.
    From the callers point of view the None return suggests
    that the alarm to be cleared does not exist so the code
    skips the call to clear.

    This update works around this by simply issuing the
    clear without the query.

    Change-Id: Idcc05bb0e7e1aa1082af1e8ecdcb1a5463b19440
    Closes-Bug: 1812440
    Signed-off-by: Eric MacDonald <email address hidden>

commit 7bb43963d30e77eae84873f497188e4018c21b74
Author: Jerry Sun <email address hidden>
Date: Tue Jan 15 09:47:33 2019 -0500

    Build registry-token-server without dep

    This change reworks the registry-token-server package spec with
    go dependencies downloaded at mirror-download time, rather than
    at build time. The dependencies (at fixed revisions) are
    extracted into the package's build tree for compilation.

    Story: 2002840
    Task: 22783
    Depends-On: https://review.openstack.org/#/c/631001/
    Change-Id: Ib7d745c6469beacf029195c3e6eaa4935f398483
    Signed-off-by: Jerry Sun <email address hidden>
    Signed-off-by: Jason McKenna <email address hidden>

commit 6db8e31b21b271f827f3c9cabf0f0558e8ca6b58
Author: Ovidiu Poncea <email address hidden>
Date: Thu Dec 20 09:10:00 2018 -0500

    Add StarlingX specific restart command for Ceph monitors

    Since we don't use systemd to manage Ceph and we have pmon monitoring we
    have to make sure that:
    1. Restarting is properly handled as "systemctl restart" will return
       error and manifest will fail;
    2. Pmon does not check ceph-mon status during restart. Otherwise we risk
       getting into a race condition between the puppet restart and pmon
       detecting that ceph is down and trying a restart.

    Both are resolved when using /etc/init.d/ceph-init-wrapper restart.

    Change-Id: Ie316bb611a006bbbc92ac22c52c3973cc9f15109
    Co-Authored-By: Ovidiu Poncea <email address hidden>
    Implements: containerization-2002844-CEPH-persistent-storage-backend-fo...

Read more...

tags: added: in-f-stein
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
importance: Undecided → Medium
tags: added: stx.2019.05 stx.metal
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.