IPv6 Distributed Cloud: After reboot required patch applied complete, subcloud has 250.001 alarm not cleared

Bug #1847808 reported by Peng Peng
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
apply reboot required patch on system controller of distributed cloud, patch will apply to subcould automatically. After applying is complete, there is 250.001 alarm in one DX subcloud is not cleared.
Based on Log, it looks like the runtime manifest failed to apply and the alarm was not clear.

Severity
--------
Major

Steps to Reproduce
------------------
Bring DC with 2-3 subcloud with a DX subcloud
apply a reboot reuired patch to system
After applying is completed, check alarm-list on subcloud

TC-name: DC Patching

Expected Behavior
------------------
after patching, no alarms

Actual Behavior
----------------

Reproducibility
---------------
Tried once; not sure

System Configuration
--------------------
DC system
IPv6

Lab-name: DC Subcloud5 WCP_87-88

Branch/Pull Time/Commit
-----------------------
"2019-10-06_20-00-00"

Last Pass
---------

Timestamp/Logs
--------------
[sysadmin@controller-1 ~(keystone_admin)]$ dcmanager strategy-step list
+------------------+-------+-------------------+----------------------------+----------------------------+-------------+
| cloud | stage | state | details | started_at | finished_at |
+------------------+-------+-------------------+----------------------------+----------------------------+-------------+
| SystemController | 1 | applying strategy | apply phase is 7% complete | 2019-10-11 13:42:12.190736 | None |
| subcloud4 | 2 | initial | | None | None |
| subcloud1 | 2 | initial | | None | None |
| subcloud5 | 2 | initial | | None | None |
+------------------+-------+-------------------+----------------------------+----------------------------+-------------+

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager strategy-step list
+------------------+-------+----------+---------+----------------------------+----------------------------+
| cloud | stage | state | details | started_at | finished_at |
+------------------+-------+----------+---------+----------------------------+----------------------------+
| SystemController | 1 | complete | | 2019-10-11 13:42:12.190736 | 2019-10-11 14:10:50.020819 |
| subcloud4 | 2 | complete | | 2019-10-11 14:11:00.026734 | 2019-10-11 14:37:04.490499 |
| subcloud1 | 2 | complete | | 2019-10-11 14:11:00.032574 | 2019-10-11 14:35:04.583524 |
| subcloud5 | 2 | complete | | 2019-10-11 14:11:00.042235 | 2019-10-11 14:38:54.640176 |
+------------------+-------+----------+---------+----------------------------+----------------------------+

Subcloud5:
[sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list
+----------+--------------------------------------------------------------------------+------------------------+----------+----------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |

| 250.001 | controller-0 Configuration is out-of-date. | host=controller-0 | major | 2019-10-11T14: |
| | | | | 26:49.619405 |

Log:
2019-10-11 15:04:21.406 292409 WARNING sysinv.conductor.manager [-] controller-1: iconfig out of date: target 6f1f9c18-8d3d-47e2-af50-84d8a3377400, applied 600e1267-8fa6-4635-8e31-b4a09aa64407
2019-10-11 15:04:21.407 292409 WARNING sysinv.conductor.manager [-] SYS_I Raise system config alarm: host controller-1 config applied: 600e1267-8fa6-4635-8e31-b4a09aa64407 vs. target: 6f1f9c18-8d3d-47e2-af50-84d8a3377400.
2019-10-11 15:04:21.461 292409 INFO sysinv.conductor.manager [-] _config_update_hosts config_uuid=6f1f9c18-8d3d-47e2-af50-84d8a3377400
2019-10-11 15:04:21.461 292409 INFO sysinv.conductor.manager [-] applying runtime manifest config_uuid=6f1f9c18-8d3d-47e2-af50-84d8a3377400, classes: ['openstack::keystone::endpoint::runtime', 'platform::firewall::runtime', 'platform::sysinv::runtime']
2019-10-11 15:04:21.472 292409 INFO sysinv.puppet.puppet [-] Updating hiera for host: controller-1 with config_uuid: 6f1f9c18-8d3d-47e2-af50-84d8a3377400
2019-10-11 15:04:22.931 292409 INFO sysinv.puppet.interface [-] Interface data0 has no primary address
2019-10-11 15:04:22.932 292409 INFO sysinv.puppet.interface [-] Interface data1 has no primary address
2019-10-11 15:04:24.633 292409 INFO sysinv.conductor.manager [-] stx-openstack app status does not warrant app re-apply
2019-10-11 15:04:24.633 292409 INFO sysinv.agent.rpcapi [-] config_apply_runtime_manifest: fanout_cast: sending config 6f1f9c18-8d3d-47e2-af50-84d8a3377400 {'classes': ['openstack::keystone::endpoint::runtime', 'platform::firewall::runtime', 'platform::sysinv::runtime'], 'force': False, 'personalities': ['controller'], 'host_uuids': [u'00c8462b-9f84-4bc1-9e49-52dac2e154d3']} to agent
2019-10-11 15:04:24.636 15355 INFO sysinv.agent.manager [req-a1416945-2533-4339-be11-79682a3bec3a admin None] config_apply_runtime_manifest: 6f1f9c18-8d3d-47e2-af50-84d8a3377400 {u'classes': [u'openstack::keystone::endpoint::runtime', u'platform::firewall::runtime', u'platform::sysinv::runtime'], u'force': False, u'personalities': [u'controller'], u'host_uuids': [u'00c8462b-9f84-4bc1-9e49-52dac2e154d3']} controller
2019-10-11 15:04:24.637 15355 INFO sysinv.agent.manager [req-a1416945-2533-4339-be11-79682a3bec3a admin None] controller-active
2019-10-11 15:04:24.637 15355 INFO sysinv.agent.manager [req-a1416945-2533-4339-be11-79682a3bec3a admin None] _apply_runtime_manifest with hieradata_path = '/opt/platform/puppet/19.10/hieradata'
2019-10-11 15:04:28.167 292982 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
2019-10-11 15:04:28.180 292982 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
2019-10-11 15:05:34.237 292982 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
2019-10-11 15:05:34.249 292982 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
2019-10-11 15:05:57.081 15355 ERROR sysinv.puppet.common [req-a1416945-2533-4339-be11-79682a3bec3a admin None] Failed to execute runtime manifest for host fd01:4::4
2019-10-11 15:05:57.081 15355 TRACE sysinv.puppet.common Traceback (most recent call last):
2019-10-11 15:05:57.081 15355 TRACE sysinv.puppet.common File "/usr/lib64/python2.7/site-packages/sysinv/puppet/common.py", line 75, in puppet_apply_manifest
2019-10-11 15:05:57.081 15355 TRACE sysinv.puppet.common subprocess.check_call(cmd, stdout=fnull, stderr=fnull)
2019-10-11 15:05:57.081 15355 TRACE sysinv.puppet.common File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
2019-10-11 15:05:57.081 15355 TRACE sysinv.puppet.common raise CalledProcessError(retcode, cmd)
2019-10-11 15:05:57.081 15355 TRACE sysinv.puppet.common CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/19.10/hieradata', 'fd01:4::4', 'controller', 'runtime', '/tmp/tmp1XMSjy.yaml']' returned non-zero exit status 1
2019-10-11 15:05:57.081 15355 TRACE sysinv.puppet.common
2019-10-11 15:05:57.084 15355 ERROR sysinv.agent.manager [req-a1416945-2533-4339-be11-79682a3bec3a admin None] failed to apply runtime manifest
2019-10-11 15:05:57.084 15355 TRACE sysinv.agent.manager Traceback (most recent call last):
2019-10-11 15:05:57.084 15355 TRACE sysinv.agent.manager File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1699, in _apply_runtime_manifest

Test Activity
-------------
Regression Testing

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Peng Peng (ppeng) wrote :
Peng Peng (ppeng)
summary: - IPv6 Distributed Cloud: After reboot request patch applied complete,
+ IPv6 Distributed Cloud: After reboot required patch applied complete,
suncloud has 250.001 alarm not cleared
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Don to triage to help route this to the right team

summary: IPv6 Distributed Cloud: After reboot required patch applied complete,
- suncloud has 250.001 alarm not cleared
+ subcloud has 250.001 alarm not cleared
tags: added: stx.distcloud
description: updated
tags: added: stx.update
Changed in starlingx:
assignee: nobody → Don Penney (dpenney)
Revision history for this message
Al Bailey (albailey1974) wrote :

One of the puppet.log errors is:

var/log/puppet/2019-10-11-14-26-53_controller/puppet.log

2019-10-11T14:28:13.287 ^[[mNotice: 2019-10-11 14:28:13 +0000 /Stage[main]/Openstack::Keystone::Endpoint::Runtime/Delete_endpoints[Delete keystone endpoints]/Exec[Delete RegionOne keystone admin endpoint]/returns: usage: openstack endpoint delete [-h] <endpoint-id> [<endpoint-id> ...]^[[0m
2019-10-11T14:28:13.289 ^[[mNotice: 2019-10-11 14:28:13 +0000 /Stage[main]/Openstack::Keystone::Endpoint::Runtime/Delete_endpoints[Delete keystone endpoints]/Exec[Delete RegionOne keystone admin endpoint]/returns: openstack endpoint delete: error: too few arguments^[[0m
2019-10-11T14:28:13.291 ^[[1;31mError: 2019-10-11 14:28:13 +0000 source /etc/platform/openrc && openstack endpoint list --region RegionOne --service keystone --interface admin -f value -c ID | xargs openstack endpoint delete returned 123 instead of one of [0]

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Don Penney (dpenney) → Andy (andy.wrs)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The failure seems to be related to deleting openstack endpoints; assigning to Andy.

tags: removed: stx.update
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.3.0 / medium priority - related to DC feature which is a deliverable for stx.3.0

tags: added: stx.3.0
Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
Ghada Khalil (gkhalil)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/689566

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/689566
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=37ca0899b6b684e7058cb3f53d835d1679bed694
Submitter: Zuul
Branch: master

commit 37ca0899b6b684e7058cb3f53d835d1679bed694
Author: Andy Ning <email address hidden>
Date: Fri Oct 18 15:21:53 2019 -0400

    Remove RegionOne endpoints only if they exist

    In a DC subcloud, when openstack::keystone::endpoint::runtime is
    invoked after system is fully configured by ansible, it still tries
    to delete "RegionOne" endpoints. The operation will fail since
    "RegionOne" endpoints have already been deleted, causing a 250.001
    (configuration out of date) alarm raised. The alarm won't be cleared
    until a lock/unlock of the controllers.

    This update enhanced the runtime class to delete "RegionOne" endpoints
    only if they exist.

    Change-Id: I0e1b7dd88773017281496595aff648cee1dda9ae
    Closes-Bug: 1847808
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Verified on
Load: 2019-11-05_07-34-20

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.