dcorch database purge is deleting in use resources

Bug #1887430 reported by Gerry Kopec
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tee Ngo

Bug Description

Brief Description
-----------------
The daily dcorch database purge cronjob is deleting resources that are still in use by subclouds.
The dcorch purge will delete resources that are not longer used by any orch jobs but as the subcloud_resource table is set to cascade delete it also removes subcloud resources that are still in use. This is then flagged by the next dcorch audit and the subcloud resources are re-created over the next audit cycle(s).

Severity
--------
Major

Steps to Reproduce
------------------
Install a distributed cloud system. Wait 3 days and observe dcorch resource and subcloud_resource tables before and after dcorch purge (at 00:20) and subsequent dcorch.log.

Expected Behavior
------------------
In use subcloud resources should remain intact over purge.

Actual Behavior
----------------
Subcloud resources are deleted then re-added by next dcorch audit.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Distributed Cloud - any type

Branch/Pull Time/Commit
-----------------------
build id: 2020-06-26_01-17-51

Last Pass
---------
unknown

Timestamp/Logs
--------------
/var/log/dcorch/dcorch-clean.log:
2020-07-04 00:20:02.740 2302544 INFO dcorch.db.sqlalchemy.api [req-f418912d-3589-4887-be40-5d3130e4fee1 - - - - -] Purging deleted records older than 2020-07-01 00:20:02.740901ESC[00m
2020-07-04 00:20:02.774 2302544 INFO dcorch.db.sqlalchemy.api [req-f418912d-3589-4887-be40-5d3130e4fee1 - - - - -] 0 records were purged from orch_request table.ESC[00m
2020-07-04 00:20:02.778 2302544 INFO dcorch.db.sqlalchemy.api [req-f418912d-3589-4887-be40-5d3130e4fee1 - - - - -] 0 records were purged from orch_job table.ESC[00m
2020-07-04 00:20:02.788 2302544 INFO dcorch.db.sqlalchemy.api [req-f418912d-3589-4887-be40-5d3130e4fee1 - - - - -] 25 records were purged from resource table.ESC[00m
2020-07-05 00:20:02.005 3614351 INFO dcorch.db.sqlalchemy.api [req-322d6bb6-3df2-484f-97d3-d71c302ca437 - - - - -] Purging deleted records older than 2020-07-02 00:20:02.005486ESC[00m
2020-07-05 00:20:02.034 3614351 INFO dcorch.db.sqlalchemy.api [req-322d6bb6-3df2-484f-97d3-d71c302ca437 - - - - -] 360 records were purged from orch_request table.ESC[00m
2020-07-05 00:20:02.042 3614351 INFO dcorch.db.sqlalchemy.api [req-322d6bb6-3df2-484f-97d3-d71c302ca437 - - - - -] 360 records were purged from orch_job table.ESC[00m
2020-07-05 00:20:02.050 3614351 INFO dcorch.db.sqlalchemy.api [req-322d6bb6-3df2-484f-97d3-d71c302ca437 - - - - -] 25 records were purged from resource table.ESC[00m

See resources being recreated on next dcorch audit:
2020-07-13 00:29:13.451 259367 INFO dcorch.engine.sync_thread [-] subcloud249/identity: Audit users
2020-07-13 00:29:13.452 259367 INFO dcorch.engine.sync_thread [-] subcloud249/identity: b9c0a5ab0861451ca5bf3bfaac66dade not found in DB, will create it
2020-07-13 00:29:13.452 259367 INFO dcorch.engine.sync_services.identity [-] subcloud249/identity: Mapping resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda2f0fa850> to existing subcloud resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda2b8746d0>
2020-07-13 00:29:13.478 259367 INFO dcorch.engine.sync_services.identity [-] Resource created in DB 30347/users/b9c0a5ab0861451ca5bf3bfaac66dade
2020-07-13 00:29:13.510 259367 INFO dcorch.engine.sync_thread [-] subcloud242/identity: Audit users
2020-07-13 00:29:13.512 259367 INFO dcorch.engine.sync_thread [-] subcloud249/identity: 30347 not found in subcloud 198 resource table
2020-07-13 00:29:13.541 259367 INFO dcorch.engine.sync_thread [-] subcloud249/identity: 727e459fcf7d468bb50487c1f7d05477 not found in DB, will create it
2020-07-13 00:29:13.542 259367 INFO dcorch.engine.sync_services.identity [-] subcloud249/identity: Mapping resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda2ca37b50> to existing subcloud resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda2b874d10>
2020-07-13 00:29:13.543 259367 INFO dcorch.engine.sync_thread [-] subcloud242/identity: 30347 not found in subcloud 194 resource table
2020-07-13 00:29:13.543 259367 INFO dcorch.engine.sync_thread [-] subcloud242/identity: Subcloud res b9c0a5ab0861451ca5bf3bfaac66dade not found in DB, will create
2020-07-13 00:29:13.544 259367 INFO dcorch.engine.sync_services.identity [-] subcloud242/identity: Mapping resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda2f0fa850> to existing subcloud resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda2e48a790>
2020-07-13 00:29:13.569 259367 INFO dcorch.engine.sync_services.identity [-] Resource created in DB 30348/users/727e459fcf7d468bb50487c1f7d05477
2020-07-13 00:29:13.572 259367 INFO dcorch.engine.sync_thread [-] subcloud242/identity: 30347 not found in subcloud 194 resource table
2020-07-13 00:29:13.620 259367 INFO dcorch.engine.sync_thread [-] subcloud242/identity: 727e459fcf7d468bb50487c1f7d05477 not found in DB, will create it
2020-07-13 00:29:13.620 259367 INFO dcorch.engine.sync_services.identity [-] subcloud242/identity: Mapping resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda2ca37b50> to existing subcloud resource <dcdbsync.dbsyncclient.v1.identity.identity_manager.User object at 0x7fda31778bd0>
2020-07-13 00:29:13.622 259367 INFO dcorch.engine.sync_thread [-] subcloud249/identity: 30348 not found in subcloud 198 resource table

See 25 re-created resources post audit run:
dcorch=# select * from resource;
  id | uuid | resource_type | master_id | created_at | updated_at | deleted_at | deleted | capabilities
-------+--------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------+----------------------------+------------+------------+---------+--------------
  1783 | 8b615b39-24eb-4362-9bdd-000d02498ecf | idns | 28d50c27-7278-4174-9095-042b51e42a8f | 2020-07-06 00:25:23.405504 | | | 0 |
 30355 | f479a5ba-bd05-4650-8165-d6403f71e152 | users | 1e9244f9204741e09c21f1507218444b | 2020-07-13 00:29:14.735545 | | | 0 |
 29350 | f0c03073-7010-4329-870c-20a8158c997a | projects | 97fa40e3918346bca63abbaea42f8186 | 2020-07-09 00:21:04.235976 | | | 0 |
 30314 | 9a535da8-22e0-4845-b1fc-42fb7a4edb4c | roles | 5be8ca778e204d64a3e58c6704090092 | 2020-07-13 00:20:02.64422 | | | 0 |
 30332 | 7998fb6c-61a1-4356-bdd6-3d729c6c2e2a | roles | 021d46ab1ebc453e9cc320170b840bb2 | 2020-07-13 00:20:03.654625 | | | 0 |
 30333 | 141b9cd3-5e71-447a-b5ed-cfc2f1ff323a | project_role_assignments | 544209edddfc4e5889b84d8444061bab_b9c0a5ab0861451ca5bf3bfaac66dade_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:04.598983 | | | 0 |
 30347 | de788888-d66f-48cc-8adf-6414b80ce428 | users | b9c0a5ab0861451ca5bf3bfaac66dade | 2020-07-13 00:29:13.462587 | | | 0 |
 30352 | 3e01cb72-75e2-4134-b869-43f38207dc84 | users | e862391e771447b187fd9e296490e8f7 | 2020-07-13 00:29:14.400431 | | | 0 |
 30322 | 8133e897-563f-4488-bd9f-6e3e0261cfa5 | projects | 544209edddfc4e5889b84d8444061bab | 2020-07-13 00:20:03.222743 | | | 0 |
 30337 | 4b47e345-27fd-4f3f-b5e2-b9dac7de4cc3 | project_role_assignments | 7606cb5eea5e4d048cfd18efccad2183_177e03ecafcf4986aff2d9bd506db81c_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:05.523253 | | | 0 |
 30348 | 547dabe5-028c-425e-9928-a10b5bc5be23 | users | 727e459fcf7d468bb50487c1f7d05477 | 2020-07-13 00:29:13.551272 | | | 0 |
  2016 | 74d63359-edc7-4006-811a-5f7254d67d10 | iuser | 28d50c27-7278-4174-9095-042b51e42a8f | 2020-07-06 00:29:32.666244 | | | 0 |
 30350 | def92d6b-f1f3-4faa-aafb-f9d89d35dfc5 | users | 177e03ecafcf4986aff2d9bd506db81c | 2020-07-13 00:29:13.695717 | | | 0 |
 30316 | 141a9562-a72a-4974-94f8-d65d775a035e | projects | <<keystone.domain.root>> | 2020-07-13 00:20:02.73192 | | | 0 |
 30318 | 3e89e6e3-9431-460e-b33e-ffcc1b64eece | projects | default | 2020-07-13 00:20:03.031233 | | | 0 |
 30338 | 9897683a-6d25-495b-a216-e01d2b7029ce | project_role_assignments | 7606cb5eea5e4d048cfd18efccad2183_177e03ecafcf4986aff2d9bd506db81c_021d46ab1ebc453e9cc320170b840bb2 | 2020-07-13 00:20:05.672091 | | | 0 |
 30339 | 5bd86999-b7ce-4ae6-b105-3e837d7bebfd | project_role_assignments | 7606cb5eea5e4d048cfd18efccad2183_1e9244f9204741e09c21f1507218444b_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:05.810154 | | | 0 |
 30351 | 7f591b3e-57f7-40e4-96c0-806f5df22e87 | users | f62d73eca8c44ab4b20cbf28879d0998 | 2020-07-13 00:29:13.811944 | | | 0 |
 30317 | 176f011d-590a-4fe6-85f3-b8ac2ac6cac7 | roles | 54fdb9bdb42e4f9c80574623f6f74eb3 | 2020-07-13 00:20:02.92272 | | | 0 |
 30325 | 788c0b24-d155-471c-bf7e-a00e73e4299d | roles | 5ed42ef26d67492d87b05ef74244b629 | 2020-07-13 00:20:03.353337 | | | 0 |
 30326 | 81e3f602-5316-41f5-b449-f70c7f5242af | projects | 7606cb5eea5e4d048cfd18efccad2183 | 2020-07-13 00:20:03.439968 | | | 0 |
  1629 | a60635a9-fad4-49cf-bd47-27fe38990802 | certificates | ssl_ca_10886226602156394257 | 2020-07-06 00:24:10.152876 | | | 0 |
 30334 | 60b74571-d0e4-458d-b5a2-265e727348a1 | project_role_assignments | 7606cb5eea5e4d048cfd18efccad2183_727e459fcf7d468bb50487c1f7d05477_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:04.96071 | | | 0 |
 30336 | ea8469ed-a1ff-4580-8668-70cafde9bb84 | project_role_assignments | 7606cb5eea5e4d048cfd18efccad2183_f62d73eca8c44ab4b20cbf28879d0998_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:05.25287 | | | 0 |
 30340 | ae6bfb5e-b1c7-4ce4-a198-5edc4c7fac35 | project_role_assignments | 544209edddfc4e5889b84d8444061bab_1e9244f9204741e09c21f1507218444b_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:06.023051 | | | 0 |
  2074 | fcb21edf-83d8-4ab4-8f40-b6e1968b6a99 | fernet_repo | keys | 2020-07-06 00:30:35.540059 | | | 0 |
 30341 | b4d9fbad-3c09-43fc-9595-87e0afaaea33 | project_role_assignments | 7606cb5eea5e4d048cfd18efccad2183_e862391e771447b187fd9e296490e8f7_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:06.270187 | | | 0 |
 30342 | 20366f3e-c81b-4f1b-b09b-a0c249067b48 | project_role_assignments | 7606cb5eea5e4d048cfd18efccad2183_e7ba1644a5e141bbaafdad5bd9a8afce_91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:06.587223 | | | 0 |
 30319 | ba6370c4-da9a-4710-8208-ad0369acdd23 | roles | 91f28c933bd746408c4bb263b8f914f4 | 2020-07-13 00:20:03.1831 | | | 0 |
 30354 | a557833e-0946-4ae5-bfa4-98b5b3c736be | users | e7ba1644a5e141bbaafdad5bd9a8afce | 2020-07-13 00:29:14.545771 | | | 0 |
 29215 | 9d66dadd-2ec1-4ec1-bde2-094adb7a3ea1 | projects | 13e24529fe6247a9b444e65fd4bcf874 | 2020-07-07 21:46:46.725401 | | | 0 |

Test Activity
-------------
DC performance testing

Workaround
----------
none

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Gerry Kopec, what is the end user impact of this?

tags: added: stx.distcloud
Revision history for this message
Gerry Kopec (gerry-kopec) wrote :

I don't think there's anything the end user would see. Just wasting time and bogging the orch audit down with unnecessary work, potentially over a few audit cycles.

Changed in starlingx:
assignee: nobody → Tee Ngo (teewrs)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium priority - since there is no end user impact. This is more of an optimization and can be merged in stx master only for stx.5.0

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/740844
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=b5e45f15fe697295b18c62c678c745b3d798cd77
Submitter: Zuul
Branch: master

commit b5e45f15fe697295b18c62c678c745b3d798cd77
Author: Tee Ngo <email address hidden>
Date: Mon Jul 13 21:58:50 2020 -0400

    Update resources purge logic

    The daily dcorch database purge cron job deletes resources that are
    no longer used by any orch jobs but as the subcloud_resource table is
    set to cascade delete it also removes subcloud resources that are
    still in use. This is then flagged by the next dcorch audit and the
    subcloud resources are re-created over the next audit cycle(s).

    In this commit, only purge resources that are not referenced by any
    orch jobs and subclouds.

    Closes-Bug: 1887430
    Change-Id: I7179a2fba21f296dd51660bb224f5d51aaf531d8
    Signed-off-by: Tee Ngo <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/792298

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on distcloud (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/792298
Reason: Updated merge soon

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/793405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on distcloud (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/793405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/796528

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (f/centos8)
Download full text (105.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/796528
Committed: https://opendev.org/starlingx/distcloud/commit/4c5344f8765b372cb84d2b1181589c16db2ae6e4
Submitter: "Zuul (22348)"
Branch: f/centos8

commit cb979811017bd193fc1f06e53bb7830fd3184859
Author: Yuxing Jiang <email address hidden>
Date: Wed Jun 9 11:11:27 2021 -0400

    Format the IP addresses in payload before adding a subcloud

    The IPv6 addresses can be represented in multiple formats. As IP
    addresses are stored as text in database, ansible inventory and
    overrides, this commit converts the IP addresses in payload to
    standard text format of IPv6 address during adding a new subcloud.

    Tested with installing and bootstrapping a new subcloud(RVMC
    configured) with the correct IPv6 address values, but with
    unrecommended upper case letters and '0'. The addresses are
    converted to standard format in database, ansible inventory and
    overrides files.

    Partial-Bug: 1931459
    Signed-off-by: Yuxing Jiang <email address hidden>
    Change-Id: I6c26e749941f1ea2597f91886ad8f7da64521f0d

commit 2cf5d6d5cef0808c354f7575336aec34253993b3
Author: albailey <email address hidden>
Date: Thu May 20 14:19:24 2021 -0500

    Delete existing vim strategy from subcloud during patch orch

    When dcmanager creates a patch strategy, if a subcloud has an
    existing vim patch strategy, it will attempt to re-use
    that strategy during its patching phase, which may result in an
    error.

    This commit deletes the existing vim patch strategy in
    a subcloud, if it exists, so it can be re-created.
    If the strategy cannot be deleted, orchestration fails.

    Change-Id: Id35ef26ed3ddae6d71874fc6bac11df147f72323
    Closes-Bug: 1929221
    Signed-off-by: albailey <email address hidden>

commit 9e14c83f0162549a2a94cb8bc1e73dbc4f4d4887
Author: albailey <email address hidden>
Date: Tue Jun 1 14:37:14 2021 -0500

    Adding activation retry to upgrade orchestration

    When performing an activation, the keystone endpoints may not
    be accessible in the subcloud due to the asyncronous way that
    cert-mon can trigger a restart of keystone.

    This would have occasionally resulted in the upgrade activation
    failing to be initiated, and orchestration needing to be invoked
    again to resume.

    This 'hack' adds retries and sleeps to the initial
    activation action.

    Change-Id: Ic757521dec7bdc248a51a70b5463caafe7927360
    Partial-Bug: 1927550
    Signed-off-by: albailey <email address hidden>

commit bb604c0a9b872efd65fa45f1e2269995818c6262
Author: Tee Ngo <email address hidden>
Date: Thu May 27 22:17:16 2021 -0400

    Fix subcloud show --detail command related issues

    If the subcloud is offline, the command stalls and eventually returns
    the "ERROR (app)" output. If the subcloud is online, the oam_floating_ip
    info is excluded from the output when the subcloud id instead of subcloud
    name is specified.

    This commit fixes both of the above issues.

    Closes-Bug: 1929893
    Change-Id: I995591368564539b0e6af185b1adba2db73e0e46
    Sign...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.