Distributed cloud - dcorch.log showing error logs for ntp and ptp

Bug #1857068 reported by Gerry Kopec on 2019-12-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Gerry Kopec

Bug Description

Brief Description
-----------------
When running distributed cloud system notice frequent ptp and ntp error logs in dcorch.log during sync audit. Logs are "tuple index out of range: IndexError: tuple index out of range", "get_ptp_resources error enabled" and "get_ntp_resources error enabled". These happen multiple times for each subcloud during every audit cycle.

Further investigation of code indicates that ptp and ntp coexistence data schema changes (https://review.opendev.org/#/c/682121/) were not propagated to dcorch code and and this is causing exceptions. Don't think the dc audit is working properly for ntp and ptp.

Severity
--------
Major

Steps to Reproduce
------------------
Set up DC system with subclouds. Observe /var/log/dcorch/dcorch.log

Expected Behavior
------------------
Expect clean audit run without errors.

Actual Behavior
----------------
Seeing error logs

Reproducibility
---------------
Reproducible

System Configuration
--------------------
All-in-one duplex plus worker, DC system controller with 10 subclouds

Branch/Pull Time/Commit
-----------------------
2019-12-09_20-00-00

Last Pass
---------
Unknown

Timestamp/Logs
--------------
2019-12-16 22:21:31.031 1249743 INFO dcorch.engine.sync_thread [-] subcloud10/platform: Audit intp: [None] vs [None]
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread [-] tuple index out of range: IndexError: tuple index out of range
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread Traceback (most recent call last):
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 474, in sync_audit
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread abort_resources)
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 509, in audit_find_missing
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread master_id = self.get_resource_id(resource_type, m_r)
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/sysinv.py", line 1059, in get_resource_id
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread resource_type))
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread IndexError: tuple index out of range
2019-12-16 22:21:31.032 1249743 ERROR dcorch.engine.sync_thread
...
2019-12-16 22:21:34.747 1249743 INFO dcorch.engine.sync_services.sysinv [-] subcloud5/platform: get_ptp_resources error enabled
...
2019-12-16 22:21:37.863 1249743 INFO dcorch.engine.sync_thread [-] subcloud6/platform: Audit ptp: [None] vs [None]
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread [-] tuple index out of range: IndexError: tuple index out of range
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread Traceback (most recent call last):
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 474, in sync_audit
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread abort_resources)
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_thread.py", line 509, in audit_find_missing
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread master_id = self.get_resource_id(resource_type, m_r)
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/sysinv.py", line 1059, in get_resource_id
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread resource_type))
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread IndexError: tuple index out of range
2019-12-16 22:21:37.863 1249743 ERROR dcorch.engine.sync_thread
...
2019-12-16 22:21:38.633 1249743 INFO dcorch.engine.sync_services.sysinv [-] subcloud5/platform: get_ptp_resources error enabled

Test Activity
-------------
DC engineering

Workaround
----------
n/a

Gerry Kopec (gerry-kopec) wrote :
Gerry Kopec (gerry-kopec) wrote :

Looking at the code some problem areas are:
dcorch/drivers/openstack/sysinv_v1.py:
- get_ptp() -- ptp.enabled no longer exists
        LOG.debug("get_ptp uuid=%s enabled=%s mode=%s "
                  "transport=%s mechanism=%s" %
                  (ptp.uuid, ptp.enabled, ptp.mode,
                   ptp.transport, ptp.mechanism))
- get_ntp() -- intp.enabled no longer exists
        LOG.debug("get_ntp uuid=%s enabled=%s ntpservers=%s" %
                  (intp.uuid, intp.enabled, intp.ntpservers))

It appears the above exceptions are handled and the "error enabled" logs are generated but ptp/ntp is now set to None and that then triggers other problems.

dcorch/engine/sync_services/sysinv.py:
- get_resource_id() - NO uuid log expects 2 parms but is only give one and throws exception
                LOG.info("get_resource_id {} NO uuid resource_type={}".format(
                    resource_type))

- get_resource_info() - resource=None is not handled, resource._info will throw exception
        if resource_type in payload_resources:
            if 'payload' not in resource._info:
                dumps = jsonutils.dumps({"payload": resource._info})
            else:
                dumps = jsonutils.dumps(resource._info)
            LOG.info("get_resource_info resource_type={} dumps={}".format(
                resource_type, dumps),
                extra=self.log_extra)
            return dumps

summary: - Distributed cloud - dcorch showing error logs for ntp and ptp
+ Distributed cloud - dcorch.log showing error logs for ntp and ptp
Ghada Khalil (gkhalil) wrote :

Marking as stx.4.0 / medium priority - should be fixed in master. No plan to cherrypick to r/stx.3.0 as the Distributed Cloud testing did not report any serious issues as a result of this.

tags: added: stx.4.0 stx.distcloud
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Gerry Kopec (gerry-kopec)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments