Distributed cloud - dcorch.log showing error logs for ntp and ptp

Bug #1857068 reported by Gerry Kopec
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Gerry Kopec

Bug Description

Brief Description
-----------------
When running distributed cloud system notice frequent ptp and ntp error logs in dcorch.log during sync audit. Logs are "tuple index out of range: IndexError: tuple index out of range", "get_ptp_resources error enabled" and "get_ntp_resources error enabled". These happen multiple times for each subcloud during every audit cycle.

Further investigation of code indicates that ptp and ntp coexistence data schema changes (https://review.opendev.org/#/c/682121/) were not propagated to dcorch code and and this is causing exceptions. Don't think the dc audit is working properly for ntp and ptp.

Severity
--------
Major

Steps to Reproduce
------------------
Set up DC system with subclouds. Observe /var/log/dcorch/dcorch.log

Expected Behavior
------------------
Expect clean audit run without errors.

Actual Behavior
----------------
Seeing error logs

Reproducibility
---------------
Reproducible

System Configuration
--------------------
All-in-one duplex plus worker, DC system controller with 10 subclouds

Branch/Pull Time/Commit
-----------------------
2019-12-09_20-00-00

Last Pass
---------
Unknown

Timestamp/Logs
--------------
2019-12-16 22:21:31.031 1249743 INFO dcorch.engine.sync_thread [-] subcloud10/platform: Audit intp: [None] vs [None]
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread [-] tuple index out of range: IndexError: tuple index out of range
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread Traceback (most recent call last):
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 474, in sync_audit
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread abort_resources)
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 509, in audit_find_missing
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread master_id = self.get_resource_id(resource_type, m_r)
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/sysinv.py", line 1059, in get_resource_id
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread resource_type))
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread IndexError: tuple index out of range
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread
...
2019-12-16 22:21:34.747 1249743 INFO dcorch.engine.sync_services.sysinv [-] subcloud5/platform: get_ptp_resources error enabled
...
2019-12-16 22:21:37.863 1249743 INFO dcorch.engine.sync_thread [-] subcloud6/platform: Audit ptp: [None] vs [None]
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread [-] tuple index out of range: IndexError: tuple index out of range
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread Traceback (most recent call last):
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 474, in sync_audit
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread abort_resources)
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 509, in audit_find_missing
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread master_id = self.get_resource_id(resource_type, m_r)
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/sysinv.py", line 1059, in get_resource_id
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread resource_type))
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread IndexError: tuple index out of range
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread
...
2019-12-16 22:21:38.633 1249743 INFO dcorch.engine.sync_services.sysinv [-] subcloud5/platform: get_ptp_resources error enabled

Test Activity
-------------
DC engineering

Workaround
----------
n/a

Revision history for this message
Gerry Kopec (gerry-kopec) wrote :
Revision history for this message
Gerry Kopec (gerry-kopec) wrote :

Looking at the code some problem areas are:
dcorch/drivers/openstack/sysinv_v1.py:
- get_ptp() -- ptp.enabled no longer exists
        LOG.debug("get_ptp uuid=%s enabled=%s mode=%s "
                  "transport=%s mechanism=%s" %
                  (ptp.uuid, ptp.enabled, ptp.mode,
                   ptp.transport, ptp.mechanism))
- get_ntp() -- intp.enabled no longer exists
        LOG.debug("get_ntp uuid=%s enabled=%s ntpservers=%s" %
                  (intp.uuid, intp.enabled, intp.ntpservers))

It appears the above exceptions are handled and the "error enabled" logs are generated but ptp/ntp is now set to None and that then triggers other problems.

dcorch/engine/sync_services/sysinv.py:
- get_resource_id() - NO uuid log expects 2 parms but is only give one and throws exception
                LOG.info("get_resource_id {} NO uuid resource_type={}".format(
                    resource_type))

- get_resource_info() - resource=None is not handled, resource._info will throw exception
        if resource_type in payload_resources:
            if 'payload' not in resource._info:
                dumps = jsonutils.dumps({"payload": resource._info})
            else:
                dumps = jsonutils.dumps(resource._info)
            LOG.info("get_resource_info resource_type={} dumps={}".format(
                resource_type, dumps),
                extra=self.log_extra)
            return dumps

summary: - Distributed cloud - dcorch showing error logs for ntp and ptp
+ Distributed cloud - dcorch.log showing error logs for ntp and ptp
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.4.0 / medium priority - should be fixed in master. No plan to cherrypick to r/stx.3.0 as the Distributed Cloud testing did not report any serious issues as a result of this.

tags: added: stx.4.0 stx.distcloud
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Gerry Kopec (gerry-kopec)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/704276

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/704276
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=17fe724edbda7d321c9cb65b5b4527cb08949f10
Submitter: Zuul
Branch: master

commit 17fe724edbda7d321c9cb65b5b4527cb08949f10
Author: Gerry Kopec <email address hidden>
Date: Fri Jan 24 20:56:40 2020 -0500

    Remove ptp and ntp from dcorch platform sync audit

    Remove ptp and ntp from sync audit as it is not required that they are
    in sync with the system controller. This reduces overhead of the audit
    as number of subclouds are increased. It also appears that these
    components were not working since change to remove system wide enabled
    status for ptp and ntp by https://review.opendev.org/#/c/682121/

    Change-Id: Iafcc351a3f001fa259f54184ad023d2a85701289
    Closes-Bug: 1857068
    Signed-off-by: Gerry Kopec <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705841

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (f/centos8)
Download full text (4.8 KiB)

Reviewed: https://review.opendev.org/705841
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=70f72b588a8202d459233729904ffd603603b8e6
Submitter: Zuul
Branch: f/centos8

commit 17fe724edbda7d321c9cb65b5b4527cb08949f10
Author: Gerry Kopec <email address hidden>
Date: Fri Jan 24 20:56:40 2020 -0500

    Remove ptp and ntp from dcorch platform sync audit

    Remove ptp and ntp from sync audit as it is not required that they are
    in sync with the system controller. This reduces overhead of the audit
    as number of subclouds are increased. It also appears that these
    components were not working since change to remove system wide enabled
    status for ptp and ntp by https://review.opendev.org/#/c/682121/

    Change-Id: Iafcc351a3f001fa259f54184ad023d2a85701289
    Closes-Bug: 1857068
    Signed-off-by: Gerry Kopec <email address hidden>

commit 05c20371e1df44b1aea4783228b8b3043540de44
Author: Tao Liu <email address hidden>
Date: Fri Jan 24 11:52:27 2020 -0500

    Import subprocess from eventlet.green package

    The python kubernetes client requires a newer version of eventlet.
    However, newer versions of eventlet have an issue with the subprocess
    module, which requires subprocess to be imported from eventlet.green
    instead of being imported directly. See:
    https://github.com/eventlet/eventlet/issues/413

    The eventlet has been upversioned to 0.24.1, therefore, the
    subprocess import is changed. This update also fixes the issue
    that raise e triggers 'CalledProcessError' object is not
    callable error.

    Change-Id: If3fd8506ececf062ee1b390dc8a87771cb01dec9
    Story: 2006980
    Task: 37715
    Signed-off-by: Tao Liu <email address hidden>

commit 46f5626efea2095ab8bcb82e4091f31535445585
Author: Tao Liu <email address hidden>
Date: Wed Jan 15 22:01:45 2020 -0500

    Remote install of sub-cloud controller-0

    This update extends the remote sub-cloud deployment to include
    the installation of controller-0, which provides a complete ZTP
    solution. This is an optional capability that leverages
    Redfish Virtual Media Controller(rvmc).

    Optional install-value parameters, are added to the dcmanager
    subcloud add command, which provides the data required by the
    rvmc tool, and update-iso.sh script.

    Once install-values are provided, the dcmanager prepares for the
    installation and performs the following:
    . Downloads an iso image from the url provided, and creates a new
      bootable image based on the install values.
      The new image contains essential info to config a bootstrap ip
      interface that is used to reach the system controller
    . Creates a config file for the rvmc tool
    . Creates an ansible override file which is used by the install
      playbook

    In the next step, the dcmanager runs the install playbook to
    install the controller-0 of the subcloud. Once the installation
    is completed, the bootstrapping of the sub-cloud would continue.

    Story: 2006980
    Task: 37715

    Depends-On: https://review.opendev.org/#/c/702786/
    Change-Id: Id3a1b97adb83a0da5...

Read more...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.