Distributed Cloud: platform ptp fail to sync

Bug #1791997 reported by Andy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Alexander Kozyrev

Bug Description

Brief Description
-----------------
In a Distributed Cloud system, platform (resource ptp) fails to sync with subcloud.

Severity
--------
Major

Steps to Reproduce
------------------
- Deploy a Distribute Cloud with at least one subcloud
- query subcloud sync status, platform sync status is "out-of-sync".

[root@controller-0 dcorch(keystone_admin)]# dcmanager subcloud show 2
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 2 |
| name | subcloud2 |
| description | subcloud2 description |
| location | subcloud 2 location |
| software_version | 18.08 |
| management | managed |
| availability | online |
| management_subnet | 192.168.121.0/24 |
| management_start_ip | 192.168.121.2 |
| management_end_ip | 192.168.121.50 |
| management_gateway_ip | 192.168.121.1 |
| systemcontroller_gateway_ip | 192.168.204.1 |
| created_at | 2018-09-07 01:39:35.407561 |
| updated_at | 2018-09-10 14:43:03.415724 |
| compute_sync_status | in-sync |
| identity_sync_status | in-sync |
| network_sync_status | in-sync |
| patching_sync_status | in-sync |
| platform_sync_status | out-of-sync |
| volume_sync_status | in-sync |
+-----------------------------+----------------------------+

Expected Behavior
------------------
After audit, platform_sync_status should be in-sync.

Actual Behavior
----------------
platform_sync_status is out-of-sync

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Two nodes SystemController, with two nodes subcloud.

Branch/Pull Time/Commit
-----------------------
master as of 2018-09-05_20-18-00

Timestamp/Logs
--------------

dcorch.log:
==========
- In SystemController drorch log, we can see dcorch started ptp sync job:

52968 2018-09-07 15:09:51.606 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Audit ptp: [<ptp {u'uuid': u'c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'links': [{u'href': u'http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'rel': u'self'}, {u'href': u'http://192.168.204.2:6385/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8f e', u'rel': u'bookmark'}], u'created_at': u'2018-09-06T19:46:50.911863+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem _uuid': u'cfb03391-c91c-4abc-b2ac-1b5f357c602c', u'transport': u'l2'}>] vs [<ptp {u'uuid': u'1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'links': [{u'href': u'http://192.168.12 1.2:6385/v1/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'self'}, {u'href': u'http://192.168.121.2:6385/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'bookma rk'}], u'created_at': u'2018-09-07T01:49:21.047102+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem_uuid': u'7c839ff5-7 afc-460f-836d-6d595644e4d8', u'transport': u'l2'}>]
52969 2018-09-07 15:09:51.606 5415 INFO dcorch.engine.sync_services.sysinv [-] get_resource_id ptp uuid=c7fba00b-3970-4a9a-90f5-37f9c373d8fe
52970 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: c7fba00b-3970-4a9a-90f5-37f9c373d8fe not found in DB, will create it
52971 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_services.sysinv [-] subcloud2/platform: audit_action: missing/ptp
52972 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_services.sysinv [-] get_resource_id ptp uuid=c7fba00b-3970-4a9a-90f5-37f9c373d8fe
52973 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_services.sysinv [-] subcloud2/platform: get_resource_info resource_type=ptp dumps={"payload": {"uuid": "c7fba00b-3970-4 a9a-90f5-37f9c373d8fe", "links": [{"href": "http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "self"}, {"href": "http://192.168.204.2:6385/ptp s/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "bookmark"}], "created_at": "2018-09-06T19:46:50.911863+00:00", "enabled": false, "updated_at": null, "mechanism": "e2e", "m ode": "hardware", "isystem_uuid": "cfb03391-c91c-4abc-b2ac-1b5f357c602c", "transport": "l2"}}
52974 2018-09-07 15:09:51.608 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Scheduling patch work for ptp/c7fba00b-3970-4a9a-90f5-37f9c373d8fe
52975 2018-09-07 15:09:51.645 5415 INFO dcorch.common.utils [-] Resource created in DB 11/ptp/c7fba00b-3970-4a9a-90f5-37f9c373d8fe/patch
52976 2018-09-07 15:09:51.695 5415 INFO dcorch.common.utils [-] Work order created for Subcloud(availability_status='online',id=2,management_state='managed',region_name='subcloud 2',software_version='18.08',uuid=2e1cf964-6f2d-41ec-a92e-e07264f3211e):11/ptp/c7fba00b-3970-4a9a-90f5-37f9c373d8fe/patch
52977 2018-09-07 15:09:51.708 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Got 1 sync request(s)
52978 2018-09-07 15:09:51.718 5415 INFO dcorch.drivers.openstack.sdk_platform [-] get new keystone client for subcloud subcloud2
52979 2018-09-07 15:09:51.764 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Invoking sync_ptp for ptp [patch]
52980 2018-09-07 15:09:51.765 5415 INFO dcorch.engine.sync_services.sysinv [-] subcloud2/platform: sync_ptp resource_info={"payload": {"uuid": "c7fba00b-3970-4a9a-90f5-37f9c373d8 fe", "links": [{"href": "http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "self"}, {"href": "http://192.168.204.2:6385/ptps/c7fba00b-3970-4a9 a-90f5-37f9c373d8fe", "rel": "bookmark"}], "created_at": "2018-09-06T19:46:50.911863+00:00", "enabled": false, "updated_at": null, "mechanism": "e2e", "mode": "hardware", " isystem_uuid": "cfb03391-c91c-4abc-b2ac-1b5f357c602c", "transport": "l2"}}

- But later audits aborted:

55748 2018-09-07 15:30:47.059 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Audit ptp: [<ptp {u'uuid': u'c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'links': [{u'href': u'http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'rel': u'self'}, {u'href': u'http://192.168.204.2:6385/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8f e', u'rel': u'bookmark'}], u'created_at': u'2018-09-06T19:46:50.911863+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem _uuid': u'cfb03391-c91c-4abc-b2ac-1b5f357c602c', u'transport': u'l2'}>] vs [<ptp {u'uuid': u'1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'links': [{u'href': u'http://192.168.12 1.2:6385/v1/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'self'}, {u'href': u'http://192.168.121.2:6385/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'bookma rk'}], u'created_at': u'2018-09-07T01:49:21.047102+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem_uuid': u'7c839ff5-7 afc-460f-836d-6d595644e4d8', u'transport': u'l2'}>]
55749 2018-09-07 15:30:47.060 5415 INFO dcorch.engine.sync_services.sysinv [-] get_resource_id ptp uuid=c7fba00b-3970-4a9a-90f5-37f9c373d8fe
55750 2018-09-07 15:30:47.060 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: audit_find_missing: Aborting audit for c7fba00b-3970-4a9a-90f5-37f9c373d8fe
55751 2018-09-07 15:30:47.060 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: audit_find_extra: Aborting audit for c7fba00b-3970-4a9a-90f5-37f9c373d8fe
55752 2018-09-07 15:30:47.068 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Will not audit [u'c7fba00b-3970-4a9a-90f5-37f9c373d8fe']. 1 sync request(s) pending

- Check orch_request and orch_job DB, we found the PTP audit job stuck in
  "in-progress" status forever.

dcorch=# select * from orch_request where state='in-progress';
 id | uuid | state | try_count | api_version | target_region_name | capabilities | orch_job_id | created_at | updat
ed_at | deleted_at | deleted
-----+--------------------------------------+-------------+-----------+-------------+--------------------+--------------+-------------+----------------------------+--------------
--------------+------------+---------
 454 | ae320cea-836d-423b-94a1-378272787a72 | in-progress | 0 | | subcloud1 | | 454 | 2018-09-10 15:46:03.921932 | 2018-09-10 15
:46:04.00064 | | 0
  11 | e3dcd00d-dd21-4f53-b245-894f84d616be | in-progress | 0 | | subcloud2 | | 11 | 2018-09-07 15:09:51.688719 | 2018-09-07 15
:09:51.744064 | | 0
(2 rows)

dcorch=# select * from orch_job where id=11;
 id | uuid | user_id | project_id | endpoint_type | source_resource_id | operation_type | resource_id |

                            resource_info
| capabilities | created_at | updated_at | deleted_at | deleted
----+--------------------------------------+---------+------------+---------------+--------------------------------------+----------------+-------------+-------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------+--------------+----------------------------+------------+------------+---------
 11 | 18f9a671-2630-4cc1-846c-f8b170a7d5fb | | | platform | c7fba00b-3970-4a9a-90f5-37f9c373d8fe | patch | 11 | {"payload": {"uuid": "c7
fba00b-3970-4a9a-90f5-37f9c373d8fe", "links": [{"href": "http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "self"}, {"href": "http://192.168.204.2:6
385/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "bookmark"}], "created_at": "2018-09-06T19:46:50.911863+00:00", "enabled": false, "updated_at": null, "mechanism": "e2e", "
mode": "hardware", "isystem_uuid": "cfb03391-c91c-4abc-b2ac-1b5f357c602c", "transport": "l2"}} | | 2018-09-07 15:09:51.664904 | | | 0
(1 row)

Andy (andy.wrs)
summary: - STX: Distributed Cloud: platform ptp fail to sync
+ Distributed Cloud: platform ptp fail to sync
Ghada Khalil (gkhalil)
tags: added: stx.distcloud
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The issue results in an out-of-sync condition on the subclouds. However, it doesn't have a functional impact to the subcloud operations. Targeting the stx.2019.03 release.

Changed in starlingx:
assignee: nobody → Alex Kozyrev (akozyrev)
importance: Undecided → Medium
tags: added: stx.2019.03
Changed in starlingx:
status: New → Triaged
Revision history for this message
Ghada Khalil (gkhalil) wrote :

To clarify further, this is an impacting issue if a user wants to configure Distributed Cloud with PTP.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Based on further input from John Kung, increasing the priority of this issue and changing the target release to stx.2018.10

From John Kung:
The platform resources will be out of sync between the system controller and the subcloud because they would be blocked waiting for another platform resource (ptp) to complete.

The PTP feature was introduced recently. This is a new issue which needs to be fixed in order to allow the subcloud sync to the system controller.

Syncing of the following platform resources would be impacted as well:
 - CERTIFICATE - this is the https certificate (including tpm)
 - DNS - nameservers
 - FIREWALL RULES - OAM firewall rules
 - SNMP - affects ability to raise events/alarms to common trap destinations
 - USER - wrsroot password synchronization

Changed in starlingx:
importance: Medium → High
tags: added: stx.2018.10
removed: stx.2019.03
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-distcloud (master)

Fix proposed to branch: master
Review: https://review.openstack.org/607619

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-distcloud (master)

Reviewed: https://review.openstack.org/607619
Committed: https://git.openstack.org/cgit/openstack/stx-distcloud/commit/?id=27654d679cab2d61a1baa9db8ffcc77fdc35dbd9
Submitter: Zuul
Branch: master

commit 27654d679cab2d61a1baa9db8ffcc77fdc35dbd9
Author: Alex Kozyrev <email address hidden>
Date: Wed Oct 3 11:40:14 2018 -0400

    Fix for PTP sync failure n a Distributed Cloud system.

    PTP failed to sync because of "enabled" parameter passed.
    It was passed as a boolean and ingnored by SysInv API.
    Need to convert it to string before passing to SysInv.

    Change-Id: Ice439bcb46bc901390c562f4ca5a8af0a73b738e
    Closes-Bug: 1791997
    Signed-off-by: Alex Kozyrev <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-distcloud (r/2018.10)

Fix proposed to branch: r/2018.10
Review: https://review.openstack.org/607978

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-distcloud (r/2018.10)

Reviewed: https://review.openstack.org/607978
Committed: https://git.openstack.org/cgit/openstack/stx-distcloud/commit/?id=b4d2ae3833f108729e8acd3d7fadaec40e0af062
Submitter: Zuul
Branch: r/2018.10

commit b4d2ae3833f108729e8acd3d7fadaec40e0af062
Author: Alex Kozyrev <email address hidden>
Date: Wed Oct 3 11:40:14 2018 -0400

    Fix for PTP sync failure n a Distributed Cloud system.

    PTP failed to sync because of "enabled" parameter passed.
    It was passed as a boolean and ingnored by SysInv API.
    Need to convert it to string before passing to SysInv.

    Change-Id: Ice439bcb46bc901390c562f4ca5a8af0a73b738e
    Closes-Bug: 1791997
    Signed-off-by: Alex Kozyrev <email address hidden>
    (cherry picked from commit 27654d679cab2d61a1baa9db8ffcc77fdc35dbd9)

Ken Young (kenyis)
tags: added: stx.1.0
removed: stx.2018.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.