Distributed Cloud orchestration audit fails to delete snmp trapdest and community on subclouds

Bug #1863045 reported by Gerry Kopec
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Yuxing

Bug Description

Brief Description
-----------------
Dcorch audit does not delete snmp trapdest and snmp community configuration from subclouds after they've been deleted on system controller.

Severity
--------
Minor

Steps to Reproduce
------------------
Create trapdest or community on system controller
system snmp-comm-add -c test1
system snmp-trapdest-add -c test1 -i 10.11.2.1

Wait to see that they are reflected on subcloud:
system snmp-comm-list
+----------------+------+--------+
| SNMP community | View | Access |
+----------------+------+--------+
| test1 | .1 | ro |
+----------------+------+--------+
system snmp-trapdest-list
+---------------+-----------------------+------+--------------+-----------+
| IP Address | SNMP Community | Port | Type | Transport |
+---------------+-----------------------+------+--------------+-----------+
| 192.168.204.2 | dcorchAlarmAggregator | 162 | snmpv2c_trap | udp |
| 10.11.2.1 | test1 | 162 | snmpv2c_trap | udp |
+---------------+-----------------------+------+--------------+-----------+

Then delete them
system snmp-trapdest-delete 10.11.2.1
Deleted ip 10.11.2.1

system snmp-comm-delete test1
Deleted community test1

Wait at least 10 minutes to ensure that dcorch audit runs then check status on subcloud.

Expected Behavior
------------------
trapdest or community should be deleted from subcloud

Actual Behavior
----------------
trapdest or community is not deleted from subcloud

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Distributed Cloud (vbox)

Branch/Pull Time/Commit
-----------------------
Design build from master on Jan. 22, 2020
(note that Tao reproduced it in her setup as well with more recent build)

Last Pass
---------
unknown

Timestamp/Logs
--------------
Observe these logs in /var/log/dcorch/dcorch.log on system controller -- note Exception log

[sysadmin@controller-0 ~(keystone_admin)] SC$ date; system snmp-trapdest-delete 10.11.2.1
Tue Feb 11 22:01:17 UTC 2020
Deleted ip 10.11.2.1

2020-02-11 22:07:33.290 2803619 INFO dcorch.engine.sync_thread [-] subcloud1/platform: Audit itrapdest: [] vs [<iTrapdest {u'uuid': u'b1504340-a591-4b8c-9159-942dc804baa1', u'links': [{u'href': u'http://192.168.101.2:6385/v1/itrapdest/b1504340-a591-4b8c-9159-942dc804baa1', u'rel': u'self'}, {u'href': u'http://192.168.101.2:6385/itrapdest/b1504340-a591-4b8c-9159-942dc804baa1', u'rel': u'bookmark'}], u'ip_address': u'192.168.204.2', u'community': u'dcorchAlarmAggregator', u'type': u'snmpv2c_trap', u'port': 162, u'transport': u'udp'}>, <iTrapdest {u'uuid': u'b403ad3f-1932-4749-afa7-4efea21b5283', u'links': [{u'href': u'http://192.168.101.2:6385/v1/itrapdest/b403ad3f-1932-4749-afa7-4efea21b5283', u'rel': u'self'}, {u'href': u'http://192.168.101.2:6385/itrapdest/b403ad3f-1932-4749-afa7-4efea21b5283', u'rel': u'bookmark'}], u'ip_address': u'10.11.2.1', u'community': u'test1', u'type': u'snmpv2c_trap', u'port': 162, u'transport': u'udp'}>]
2020-02-11 22:07:33.296 2803619 INFO dcorch.engine.sync_thread [-] subcloud1/platform: Resource (10.11.2.1) and subcloud resource (10.11.2.1) not in sync with master cloud
2020-02-11 22:07:33.296 2803619 INFO dcorch.engine.sync_thread [-] subcloud1/platform: audit_action: extra_resource/itrapdest
2020-02-11 22:07:33.297 2803619 INFO dcorch.engine.sync_thread [-] subcloud1/platform: Scheduling delete work for itrapdest/None
2020-02-11 22:07:33.301 2803619 INFO dcorch.common.utils [-] Resource not in DB itrapdest/None/delete
2020-02-11 22:07:33.302 2803619 INFO dcorch.engine.sync_thread [-] subcloud1/platform: Exception in schedule_work: Field `master_id' cannot be None

Test Activity
-------------
DC designer testing

Workaround
----------
Issue can be avoided if system controller region is specified with the delete command. This causes the resource to be immediately deleted on the subcloud (ie. not by the audit)
system --os-region-name SystemController snmp-trapdest-delete 10.11.2.1

Once config is orphaned, seems like you have to re-add it on the system controller then delete it with region name specified.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - workaround exists

tags: added: stx.4.0 stx.distcloud
Changed in starlingx:
status: New → Triaged
assignee: nobody → Dariush Eslimi (deslimi)
importance: Undecided → Medium
Dariush Eslimi (deslimi)
Changed in starlingx:
assignee: Dariush Eslimi (deslimi) → Yuxing (yuxing)
Yuxing (yuxing)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per Dariush Eslimi (flock/DC PL), this is a day 1 issue and should be addressed, but there is no reason to hold up stx.4.0 on it. Moving to stx.5.0

tags: added: stx.5.0
removed: stx.4.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/738660
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=4f5d0b4fca0d9494f3acfd32349848558a9c34eb
Submitter: Zuul
Branch: master

commit 4f5d0b4fca0d9494f3acfd32349848558a9c34eb
Author: Yuxing Jiang <email address hidden>
Date: Tue Jun 30 11:50:15 2020 -0400

    Return master_id from get_resource_id for deleting SNMP resource

    This commit modified the get_resource_id method in SysinvSyncThread.
    The previous code only return community/ip_address which is for the
    creat service operation. By this change, the get_resource_id method
    in this class will return resource.master_id for deleting SNMP
    community and trapdest.

    Test:
      1 Add snmp trapdest and snmp community configuration from system
    controller, with command:
        system snmp-comm-add -c test1
        system snmp-trapdest-add -c test1 -i 10.11.2.1
      2 Wait the snmp community and trapdest configured in subcloud in
    subcloud controller:
        system snmp-comm-list
        system snmp-trapdest-list
      3 Delete the snmp community and trapdest in system controller:
        system snmp-trapdest-delete 10.11.2.1
        system snmp-comm-delete test1
      Expected: trapdest or community should be deleted from subcloud,
    result can also be observed in dcorch DB and /var/log/dcorch.log.

    Change-Id: I8d3b40599f83263cade1799d5a1dae72d3bac2c5
    Closes-Bug: 1863045
    Signed-off-by: Yuxing Jiang <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Difu Hu (difuhu) wrote :

Verified on 2020-07-20_20-00-00

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/792298

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on distcloud (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/792298
Reason: Updated merge soon

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/793405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on distcloud (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/793405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/distcloud/+/796528

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (f/centos8)
Download full text (105.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/796528
Committed: https://opendev.org/starlingx/distcloud/commit/4c5344f8765b372cb84d2b1181589c16db2ae6e4
Submitter: "Zuul (22348)"
Branch: f/centos8

commit cb979811017bd193fc1f06e53bb7830fd3184859
Author: Yuxing Jiang <email address hidden>
Date: Wed Jun 9 11:11:27 2021 -0400

    Format the IP addresses in payload before adding a subcloud

    The IPv6 addresses can be represented in multiple formats. As IP
    addresses are stored as text in database, ansible inventory and
    overrides, this commit converts the IP addresses in payload to
    standard text format of IPv6 address during adding a new subcloud.

    Tested with installing and bootstrapping a new subcloud(RVMC
    configured) with the correct IPv6 address values, but with
    unrecommended upper case letters and '0'. The addresses are
    converted to standard format in database, ansible inventory and
    overrides files.

    Partial-Bug: 1931459
    Signed-off-by: Yuxing Jiang <email address hidden>
    Change-Id: I6c26e749941f1ea2597f91886ad8f7da64521f0d

commit 2cf5d6d5cef0808c354f7575336aec34253993b3
Author: albailey <email address hidden>
Date: Thu May 20 14:19:24 2021 -0500

    Delete existing vim strategy from subcloud during patch orch

    When dcmanager creates a patch strategy, if a subcloud has an
    existing vim patch strategy, it will attempt to re-use
    that strategy during its patching phase, which may result in an
    error.

    This commit deletes the existing vim patch strategy in
    a subcloud, if it exists, so it can be re-created.
    If the strategy cannot be deleted, orchestration fails.

    Change-Id: Id35ef26ed3ddae6d71874fc6bac11df147f72323
    Closes-Bug: 1929221
    Signed-off-by: albailey <email address hidden>

commit 9e14c83f0162549a2a94cb8bc1e73dbc4f4d4887
Author: albailey <email address hidden>
Date: Tue Jun 1 14:37:14 2021 -0500

    Adding activation retry to upgrade orchestration

    When performing an activation, the keystone endpoints may not
    be accessible in the subcloud due to the asyncronous way that
    cert-mon can trigger a restart of keystone.

    This would have occasionally resulted in the upgrade activation
    failing to be initiated, and orchestration needing to be invoked
    again to resume.

    This 'hack' adds retries and sleeps to the initial
    activation action.

    Change-Id: Ic757521dec7bdc248a51a70b5463caafe7927360
    Partial-Bug: 1927550
    Signed-off-by: albailey <email address hidden>

commit bb604c0a9b872efd65fa45f1e2269995818c6262
Author: Tee Ngo <email address hidden>
Date: Thu May 27 22:17:16 2021 -0400

    Fix subcloud show --detail command related issues

    If the subcloud is offline, the command stalls and eventually returns
    the "ERROR (app)" output. If the subcloud is online, the oam_floating_ip
    info is excluded from the output when the subcloud id instead of subcloud
    name is specified.

    This commit fixes both of the above issues.

    Closes-Bug: 1929893
    Change-Id: I995591368564539b0e6af185b1adba2db73e0e46
    Sign...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.