Unable to unlock controller after swact and lock w/ openstack applied

Bug #1881454 reported by Yang Liu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Bob Church

Bug Description

Brief Description
-----------------
On a duplex system with stx-openstack applied, after lock/unlock controller-1, swact, and lock controller-0 - unlocking controller-0 failed with following error. Did not find workaround.

[sysadmin@controller-1 ~(keystone_admin)]$ system host-unlock controller-0
u'\'keystone\'\nTraceback (most recent call last):\n\n File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/amqp.py", line 437, in _process_data\n **args)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 1687, in configure_ihost\n self._puppet.update_system_config()\n\n File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 31, in _wrapper\n func(self, *args, **kwargs)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 116, in update_system_config\n config.update(puppet_plugin.obj.get_system_config())\n\n File "/usr/lib64/python2.7/site-packages/sysinv/puppet/nfv.py", line 96, in get_system_config\n keystone_auth_data = helm_data.get_keystone_auth_data()\n\n File "/usr/lib64/python2.7/site-packages/sysinv/helm/helm.py", line 49, in _wrapper\n return func(self, *args, **kwargs)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/helm/helm.py", line 845, in get_keystone_auth_data\n keystone_operator = self.chart_operators[self.HELM_CHART_KEYSTONE]\n\nKeyError: \'keystone\'\n'

Severity
--------
Critical

Steps to Reproduce
------------------
- Install and configure a duplex system
- Apply stx-openstack
- Swact a couple times
- lock/unlock controller-1
- swact
- lock/unlock controller-0

Expected Behavior
------------------
- Able to unlock controller-0

Actual Behavior
----------------
- controller-0 unlocked rejected with following error:
[sysadmin@controller-1 ~(keystone_admin)]$ system host-unlock controller-0
u'\'keystone\'\nTraceback (most recent call last):\n\n File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/amqp.py", line 437, in _process_data\n **args)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 1687, in configure_ihost\n self._puppet.update_system_config()\n\n File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 31, in _wrapper\n func(self, *args, **kwargs)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 116, in update_system_config\n config.update(puppet_plugin.obj.get_system_config())\n\n File "/usr/lib64/python2.7/site-packages/sysinv/puppet/nfv.py", line 96, in get_system_config\n keystone_auth_data = helm_data.get_keystone_auth_data()\n\n File "/usr/lib64/python2.7/site-packages/sysinv/helm/helm.py", line 49, in _wrapper\n return func(self, *args, **kwargs)\n\n File "/usr/lib64/python2.7/site-packages/sysinv/helm/helm.py", line 845, in get_keystone_auth_data\n keystone_operator = self.chart_operators[self.HELM_CHART_KEYSTONE]\n\nKeyError: \'keystone\'\n'

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Two node system
Lab-name: ip5-6

Branch/Pull Time/Commit
-----------------------
stx master as of "20200530T013359Z"

Last Pass
---------
Same load

Timestamp/Logs
--------------
2020-05-31T02:10:51.000 controller-1 -sh: info HISTORY: PID=791930 UID=42425 system host-lock controller-0

2020-05-31T02:11:45.000 controller-1 -sh: info HISTORY: PID=791930 UID=42425 system host-unlock controller-0

Test Activity
-------------
Normal use

Revision history for this message
Yang Liu (yliu12) wrote :

Seen again on same system. Both set of logs are here:
https://files.starlingx.kube.cengn.ca/launchpad/1881454

Time stamp for unlock failure in second occurrence:
2020-05-31T23:42:12.000 controller-1 -sh: info HISTORY: PID=437060 UID=42425 system host-unlock controller-0

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / high priority - appears to be an issue only when openstack is applied. Error appears related to keystone.

summary: - Unable to unlock controller after swact and lock
+ Unable to unlock controller after swact and lock w/ openstack applied
tags: added: stx.4.0 stx.config stx.distro.openstack
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → yong hu (yhu6)
yong hu (yhu6)
Changed in starlingx:
assignee: yong hu (yhu6) → Yan Chen (ychen2u)
Revision history for this message
zhipeng liu (zhipengs) wrote :
Download full text (3.6 KiB)

Hi all,

This issue might be introduced by below 2 patches submitted by Bob church.
After comparing pass log and failure log, it seems related to openstack plugin decouple patches.
https://review.opendev.org/#/c/686971/
https://review.opendev.org/#/c/688191/

Next step: Propose Bob church to further check it and his patches.

Fail log in sysinv.log with 0530 build

sysinv 2020-06-02 06:18:09.910 256171 INFO sysinv.api.controllers.v1.host [-] Notify VIM host action controller-0 action=unlock sysinv 2020-06-02 06:18:09.988 255277 INFO sysinv.conductor.manager [-] configure_ihost controller-0 sysinv 2020-06-02 06:18:10.703 255277 INFO sysinv.helm.helm [-] HelmOperator: Loading available helm and armada plugins. // https://review.opendev.org/#/c/686971/
sysinv 2020-06-02 06:18:10.705 255277 INFO sysinv.helm.helm [-] Plugins for generic : loaded from sysinv - /usr/lib64/python2.7/site-packages.
sysinv 2020-06-02 06:18:10.706 255277 INFO sysinv.helm.helm [-] Plugins for stx-monitor : loaded from sysinv - /usr/lib64/python2.7/site-packages.
sysinv 2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet [-] failed to create system config: KeyError: 'keystone'
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet Traceback (most recent call last):
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 116, in update_system_config
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet config.update(puppet_plugin.obj.get_system_config())
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet File "/usr/lib64/python2.7/site-packages/sysinv/puppet/nfv.py", line 96, in get_system_config
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet keystone_auth_data = helm_data.get_keystone_auth_data()
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet File "/usr/lib64/python2.7/site-packages/sysinv/helm/helm.py", line 49, in _wrapper
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet return func(self, *args, **kwargs)
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet File "/usr/lib64/python2.7/site-packages/sysinv/helm/helm.py", line 845, in get_keystone_auth_data
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet keystone_operator = self.chart_operators[self.HELM_CHART_KEYSTONE]
2020-06-02 06:18:10.706 255277 ERROR sysinv.puppet.puppet KeyError: 'keystone'

pass log in sysinv.log with my EB based on 0422

sysinv 2020-06-01 09:57:56.903 107334 INFO sysinv.api.controllers.v1.host [-] Notify VIM host action controller-1 action=unlock sysinv 2020-06-01 09:57:56.924 106495 INFO sysinv.conductor.manager [-] configure_ihost controller-1 sysinv 2020-06-01 09:57:58.200 106495 INFO sysinv.puppet.puppet [-] Updating hiera for host: controller-1 with config_uuid: None sysinv 2020-06-01 09:57:59.232 106495 INFO sysinv.puppet.interface [-] Interface data0 has no primary address sysinv 2020-06-01 09:57:59.233 106495 INFO sysinv.puppet.interface [-] Interface data1 has no primary address
sysinv 2020-06-01 09:57:59.744 106495 INFO sysinv.puppet.kubernetes [-] get_kubernetes_join_cmd join_c...

Read more...

Revision history for this message
Yan Chen (ychen2u) wrote :
Download full text (5.1 KiB)

Same test steps on "20200521T080014Z" image can pass:
1. Deploy stx duplex;
2. apply stx-openstack application;
3. host-swact;
4. host-lock the stand-by controller;
5. host-unlock the controller.

Log:

sysinv 2020-06-03 06:52:27.708 2690703 INFO sysinv.api.controllers.v1.host [-] controller-1 apply ihost_val {'ihost_action': 'unlock', 'task': u'Unlocking'}
sysinv 2020-06-03 06:52:27.708 2690703 INFO sysinv.api.controllers.v1.host [-] Notify VIM host action controller-1 action=unlock
sysinv 2020-06-03 06:52:27.709 2690703 WARNING sysinv.api.controllers.v1.vim_api [-] vim_host_action hostname=controller-1, action=unlock
sysinv 2020-06-03 06:52:27.709 2690703 WARNING sysinv.api.controllers.v1.vim_api [-] vim_host_action hostname=controller-1, action=unlock api_cmd=http://localhost:30001/nfvi-plugins/v1/hosts/705a679e-db9a-4e5c-8b64-1c3045ea20ed headers={'Content-type': 'application/json', 'User-Agent': 'sysinv/1.0'} payload={'action': 'unlock', 'hostname': u'controller-1', 'uuid': u'705a679e-db9a-4e5c-8b64-1c3045ea20ed'}
sysinv 2020-06-03 06:52:27.709 2690703 INFO sysinv.api.controllers.v1.rest_api [-] PATCH cmd:http://localhost:30001/nfvi-plugins/v1/hosts/705a679e-db9a-4e5c-8b64-1c3045ea20ed hdr:{'Content-type': 'application/json', 'User-Agent': 'sysinv/1.0'} payload:{"action": "unlock", "hostname": "controller-1", "uuid": "705a679e-db9a-4e5c-8b64-1c3045ea20ed"}
sysinv 2020-06-03 06:52:27.726 2690703 INFO sysinv.api.controllers.v1.rest_api [-] Response={}
sysinv 2020-06-03 06:52:27.726 2690703 INFO sysinv.api.controllers.v1.host [-] controller-1 Perform configure_ihost.
sysinv 2020-06-03 06:52:27.744 2690703 INFO sysinv.openstack.common.rpc.common [-] Connected to AMQP server on 192.178.204.2:5672
sysinv 2020-06-03 06:52:27.753 2688547 INFO sysinv.conductor.manager [-] configure_ihost controller-1
sysinv 2020-06-03 06:52:29.999 2688547 INFO sysinv.puppet.puppet [-] Updating hiera for host: controller-1 with config_uuid: None
sysinv 2020-06-03 06:52:31.246 2688547 INFO sysinv.puppet.interface [-] Interface data0 has no primary address
sysinv 2020-06-03 06:52:31.247 2688547 INFO sysinv.puppet.interface [-] Interface data1 has no primary address
sysinv 2020-06-03 06:52:31.827 2688547 INFO sysinv.puppet.kubernetes [-] get_kubernetes_join_cmd join_cmd=kubeadm join 192.168.206.1:6443 --token lvq87b.6bi48m3m9ne3ebss --discovery-token-ca-cert-hash sha256:d15104ba446d1c31356fec51540a9c4d5b14c739598db13b913f144a42a94693 --control-plane --certificate-key cacf347eea2e4f1b5c599dbe4f45d5eea038b8377a8b527ffbb4f7525adbaca1 --cri-socket /var/run/containerd/containerd.sock
sysinv 2020-06-03 06:52:34.368 2797023 INFO sysinv.cmd.dnsmasq_lease_update [-] Called 'old' for mac '52:54:00:f6:1d:0b' with ip '192.178.204.4'
sysinv 2020-06-03 06:52:36.345 2688547 INFO sysinv.helm.neutron [-] _get_neutron_ml2_config={'ml2': {'physical_network_mtus': 'physnet0:1500,physnet1:1500'}, 'ml2_type_flat': {'flat_networks': ''}}
sysinv 2020-06-03 06:52:47.431 2688547 INFO sysinv.conductor.manager [-] Platform managed application oidc-auth-apps: Prerequisites not met.
sysinv 2020-06-03 06:52:54.176 2688547 INFO sysinv.helm.manifest_base [-] Delete manifest file /opt/platform/a...

Read more...

yong hu (yhu6)
Changed in starlingx:
assignee: Yan Chen (ychen2u) → Bob Church (rchurch)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/733510
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=4617f4d0bec8af850cc77d7504fd02667857ff6b
Submitter: Zuul
Branch: master

commit 4617f4d0bec8af850cc77d7504fd02667857ff6b
Author: Robert Church <email address hidden>
Date: Wed Jun 3 13:52:57 2020 -0400

    Robustness updates to support application plugins

    Changes include:
    - Supporting viewing overrides for non-applied uploaded only
      applications
    - Re-enable sysinv-helm commands to support plugin development
    - Expand version pattern match when discovering plugins to account for
      tarball versions produced from the Cengn build
    - Ensure that applications which are in an apply related state have
      their plugins enabled when the conductor is restarted.

    Change-Id: Ic233ec529065de80105a077d83e56afda2f3ee14
    Closes-Bug: #1881711
    Closes-Bug: #1881454
    Signed-off-by: Robert Church <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Yang Liu (yliu12) wrote :

Test passed on 20200612 load.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.