Brief Description
-----------------
In a Distributed Cloud system, platform (resource ptp) fails to sync with subcloud.
Severity
--------
Major
Steps to Reproduce
------------------
- Deploy a Distribute Cloud with at least one subcloud
- query subcloud sync status, platform sync status is "out-of-sync".
[root@controller-0 dcorch(keystone_admin)]# dcmanager subcloud show 2
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 2 |
| name | subcloud2 |
| description | subcloud2 description |
| location | subcloud 2 location |
| software_version | 18.08 |
| management | managed |
| availability | online |
| management_subnet | 192.168.121.0/24 |
| management_start_ip | 192.168.121.2 |
| management_end_ip | 192.168.121.50 |
| management_gateway_ip | 192.168.121.1 |
| systemcontroller_gateway_ip | 192.168.204.1 |
| created_at | 2018-09-07 01:39:35.407561 |
| updated_at | 2018-09-10 14:43:03.415724 |
| compute_sync_status | in-sync |
| identity_sync_status | in-sync |
| network_sync_status | in-sync |
| patching_sync_status | in-sync |
| platform_sync_status | out-of-sync |
| volume_sync_status | in-sync |
+-----------------------------+----------------------------+
Expected Behavior
------------------
After audit, platform_sync_status should be in-sync.
Actual Behavior
----------------
platform_sync_status is out-of-sync
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Two nodes SystemController, with two nodes subcloud.
Branch/Pull Time/Commit
-----------------------
master as of 2018-09-05_20-18-00
Timestamp/Logs
--------------
dcorch.log:
==========
- In SystemController drorch log, we can see dcorch started ptp sync job:
52968 2018-09-07 15:09:51.606 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Audit ptp: [<ptp {u'uuid': u'c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'links': [{u'href': u'http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'rel': u'self'}, {u'href': u'http://192.168.204.2:6385/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8f e', u'rel': u'bookmark'}], u'created_at': u'2018-09-06T19:46:50.911863+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem _uuid': u'cfb03391-c91c-4abc-b2ac-1b5f357c602c', u'transport': u'l2'}>] vs [<ptp {u'uuid': u'1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'links': [{u'href': u'http://192.168.12 1.2:6385/v1/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'self'}, {u'href': u'http://192.168.121.2:6385/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'bookma rk'}], u'created_at': u'2018-09-07T01:49:21.047102+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem_uuid': u'7c839ff5-7 afc-460f-836d-6d595644e4d8', u'transport': u'l2'}>]
52969 2018-09-07 15:09:51.606 5415 INFO dcorch.engine.sync_services.sysinv [-] get_resource_id ptp uuid=c7fba00b-3970-4a9a-90f5-37f9c373d8fe
52970 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: c7fba00b-3970-4a9a-90f5-37f9c373d8fe not found in DB, will create it
52971 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_services.sysinv [-] subcloud2/platform: audit_action: missing/ptp
52972 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_services.sysinv [-] get_resource_id ptp uuid=c7fba00b-3970-4a9a-90f5-37f9c373d8fe
52973 2018-09-07 15:09:51.607 5415 INFO dcorch.engine.sync_services.sysinv [-] subcloud2/platform: get_resource_info resource_type=ptp dumps={"payload": {"uuid": "c7fba00b-3970-4 a9a-90f5-37f9c373d8fe", "links": [{"href": "http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "self"}, {"href": "http://192.168.204.2:6385/ptp s/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "bookmark"}], "created_at": "2018-09-06T19:46:50.911863+00:00", "enabled": false, "updated_at": null, "mechanism": "e2e", "m ode": "hardware", "isystem_uuid": "cfb03391-c91c-4abc-b2ac-1b5f357c602c", "transport": "l2"}}
52974 2018-09-07 15:09:51.608 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Scheduling patch work for ptp/c7fba00b-3970-4a9a-90f5-37f9c373d8fe
52975 2018-09-07 15:09:51.645 5415 INFO dcorch.common.utils [-] Resource created in DB 11/ptp/c7fba00b-3970-4a9a-90f5-37f9c373d8fe/patch
52976 2018-09-07 15:09:51.695 5415 INFO dcorch.common.utils [-] Work order created for Subcloud(availability_status='online',id=2,management_state='managed',region_name='subcloud 2',software_version='18.08',uuid=2e1cf964-6f2d-41ec-a92e-e07264f3211e):11/ptp/c7fba00b-3970-4a9a-90f5-37f9c373d8fe/patch
52977 2018-09-07 15:09:51.708 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Got 1 sync request(s)
52978 2018-09-07 15:09:51.718 5415 INFO dcorch.drivers.openstack.sdk_platform [-] get new keystone client for subcloud subcloud2
52979 2018-09-07 15:09:51.764 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Invoking sync_ptp for ptp [patch]
52980 2018-09-07 15:09:51.765 5415 INFO dcorch.engine.sync_services.sysinv [-] subcloud2/platform: sync_ptp resource_info={"payload": {"uuid": "c7fba00b-3970-4a9a-90f5-37f9c373d8 fe", "links": [{"href": "http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "self"}, {"href": "http://192.168.204.2:6385/ptps/c7fba00b-3970-4a9 a-90f5-37f9c373d8fe", "rel": "bookmark"}], "created_at": "2018-09-06T19:46:50.911863+00:00", "enabled": false, "updated_at": null, "mechanism": "e2e", "mode": "hardware", " isystem_uuid": "cfb03391-c91c-4abc-b2ac-1b5f357c602c", "transport": "l2"}}
- But later audits aborted:
55748 2018-09-07 15:30:47.059 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Audit ptp: [<ptp {u'uuid': u'c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'links': [{u'href': u'http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe', u'rel': u'self'}, {u'href': u'http://192.168.204.2:6385/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8f e', u'rel': u'bookmark'}], u'created_at': u'2018-09-06T19:46:50.911863+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem _uuid': u'cfb03391-c91c-4abc-b2ac-1b5f357c602c', u'transport': u'l2'}>] vs [<ptp {u'uuid': u'1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'links': [{u'href': u'http://192.168.12 1.2:6385/v1/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'self'}, {u'href': u'http://192.168.121.2:6385/ptps/1ae691cf-e427-4a10-ac3e-197d88ae34fc', u'rel': u'bookma rk'}], u'created_at': u'2018-09-07T01:49:21.047102+00:00', u'enabled': False, u'updated_at': None, u'mechanism': u'e2e', u'mode': u'hardware', u'isystem_uuid': u'7c839ff5-7 afc-460f-836d-6d595644e4d8', u'transport': u'l2'}>]
55749 2018-09-07 15:30:47.060 5415 INFO dcorch.engine.sync_services.sysinv [-] get_resource_id ptp uuid=c7fba00b-3970-4a9a-90f5-37f9c373d8fe
55750 2018-09-07 15:30:47.060 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: audit_find_missing: Aborting audit for c7fba00b-3970-4a9a-90f5-37f9c373d8fe
55751 2018-09-07 15:30:47.060 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: audit_find_extra: Aborting audit for c7fba00b-3970-4a9a-90f5-37f9c373d8fe
55752 2018-09-07 15:30:47.068 5415 INFO dcorch.engine.sync_thread [-] subcloud2/platform: Will not audit [u'c7fba00b-3970-4a9a-90f5-37f9c373d8fe']. 1 sync request(s) pending
- Check orch_request and orch_job DB, we found the PTP audit job stuck in
"in-progress" status forever.
dcorch=# select * from orch_request where state='in-progress';
id | uuid | state | try_count | api_version | target_region_name | capabilities | orch_job_id | created_at | updat
ed_at | deleted_at | deleted
-----+--------------------------------------+-------------+-----------+-------------+--------------------+--------------+-------------+----------------------------+--------------
--------------+------------+---------
454 | ae320cea-836d-423b-94a1-378272787a72 | in-progress | 0 | | subcloud1 | | 454 | 2018-09-10 15:46:03.921932 | 2018-09-10 15
:46:04.00064 | | 0
11 | e3dcd00d-dd21-4f53-b245-894f84d616be | in-progress | 0 | | subcloud2 | | 11 | 2018-09-07 15:09:51.688719 | 2018-09-07 15
:09:51.744064 | | 0
(2 rows)
dcorch=# select * from orch_job where id=11;
id | uuid | user_id | project_id | endpoint_type | source_resource_id | operation_type | resource_id |
resource_info
| capabilities | created_at | updated_at | deleted_at | deleted
----+--------------------------------------+---------+------------+---------------+--------------------------------------+----------------+-------------+-------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------+--------------+----------------------------+------------+------------+---------
11 | 18f9a671-2630-4cc1-846c-f8b170a7d5fb | | | platform | c7fba00b-3970-4a9a-90f5-37f9c373d8fe | patch | 11 | {"payload": {"uuid": "c7
fba00b-3970-4a9a-90f5-37f9c373d8fe", "links": [{"href": "http://192.168.204.2:6385/v1/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "self"}, {"href": "http://192.168.204.2:6
385/ptps/c7fba00b-3970-4a9a-90f5-37f9c373d8fe", "rel": "bookmark"}], "created_at": "2018-09-06T19:46:50.911863+00:00", "enabled": false, "updated_at": null, "mechanism": "e2e", "
mode": "hardware", "isystem_uuid": "cfb03391-c91c-4abc-b2ac-1b5f357c602c", "transport": "l2"}} | | 2018-09-07 15:09:51.664904 | | | 0
(1 row)
The issue results in an out-of-sync condition on the subclouds. However, it doesn't have a functional impact to the subcloud operations. Targeting the stx.2019.03 release.