application apply fails after compute lock and unlock

Bug #1836609 reported by Anujeyan Manokeran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
High
Anujeyan Manokeran

Bug Description

Brief Description
-----------------

After lock and unlock compute-2 application apply failed. Steps are as follows.

system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-07-13 22:22:10,457] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applied | completed |

[2019-07-13 22:22:14,553] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-2'
[2019-07-13 22:22:16,179] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+-------------------------------------------+
| Property | Value |
+---------------------+-------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | None |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| capabilities | {} |
| config_applied | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| config_status | None |
| config_target | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| console | ttyS0,115200n8 |
| created_at | 2019-07-13T13:04:39.719771+00:00 |
| hostname | compute-2 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.222.27 |
| mgmt_mac | 3c:fd:fe:ac:61:9c |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| serialid | None |
| software_load | 19.01 |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-07-13T22:18:17.167986+00:00 |
| uptime | 23816 |
| uuid | b0a796c9-0781-4681-930f-df5fbfbf47c7 |
| vim_progress_status | services-enabled |
+---------------------+-------------------------------------------+
[2019-07-13 22:22:16,284] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-2'
[2019-07-13 22:22:17,978] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+-------------------------------------------+
| Property | Value |
+---------------------+-------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | None |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| capabilities | {} |
| config_applied | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| config_status | None |
| config_target | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| console | ttyS0,115200n8 |
| created_at | 2019-07-13T13:04:39.719771+00:00 |
| hostname | compute-2 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.222.27 |
| mgmt_mac | 3c:fd:fe:ac:61:9c |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| serialid | None |
| software_load | 19.01 |
| task | Locking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-07-13T22:18:17.167986+00:00 |
| uptime | 23816 |
| uuid | b0a796c9-0781-4681-930f-df5fbfbf47c7 |
| vim_progress_status | services-enabled |
+---------------------+-------------------------------------------+

inThread host_helper.modify_host_cpu:: Modifying host compute-2 CPU function vSwitch to {'p2': 1}
[2019-07-13 22:22:38,042] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_113_121
[2019-07-13 22:22:38,042] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-07-13 22:22:38,042] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-cpu-modify -f vswitch -p2 1 compute-2'
[2019-07-13 22:22:39,793] 423 DEBUG MainThread ssh.expect :: Output:
system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-07-13 22:22:41,356] 423 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | compute-2 | worker | locked | disabled | online |
| 5 | compute-3 | worker | unlocked | enabled | available |
| 6 | compute-4 | worker | unlocked | enabled | available |
| 7 | controller-1 | controller | unlocked | enabled | available |
| 8 | storage-0 | storage | unlocked | enabled | available |
| 9 | storage-1 | storage | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
ystem --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-07-13 22:22:54,342] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applied | completed |
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+

'kubectl get pod -o=wide --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded'
[2019-07-13 22:22:54,694] 423 DEBUG MainThread ssh.expect :: Output:
No resources found.
system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-2'
[2019-07-13 22:23:03,263] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+-------------------------------------------+
| Property | Value |
+---------------------+-------------------------------------------+
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | None |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| capabilities | {} |
| config_applied | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| config_status | None |
| config_target | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| console | ttyS0,115200n8 |
| created_at | 2019-07-13T13:04:39.719771+00:00 |
| hostname | compute-2 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.222.27 |
| mgmt_mac | 3c:fd:fe:ac:61:9c |
| operational | disabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| serialid | None |
| software_load | 19.01 |
| task | Unlocking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-07-13T22:22:19.728361+00:00 |
| uptime | 24061 |
| uuid | b0a796c9-0781-4681-930f-df5fbfbf47c7 |
| vim_progress_status | services-disabled |
+---------------------+-------------------------------------------+
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-2'
[2019-07-13 22:29:59,881] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+-------------------------------------------+
| Property | Value |
+---------------------+-------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | None |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| capabilities | {} |
| config_applied | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| config_status | None |
| config_target | 062a2928-eb68-4176-ad2b-d3e5d970d682 |
| console | ttyS0,115200n8 |
| created_at | 2019-07-13T13:04:39.719771+00:00 |
| hostname | compute-2 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.222.27 |
| mgmt_mac | 3c:fd:fe:ac:61:9c |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
| serialid | None |
| software_load | 19.01 |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-07-13T22:29:54.391230+00:00 |
| uptime | 159 |
| uuid | b0a796c9-0781-4681-930f-df5fbfbf47c7 |
| vim_progress_status | services-disabled |
+---------------------+-------------------------------------------+
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-07-13 22:30:01,529] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------+-------------------------------+--------------------+----------+---------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+----------+---------------------------------------------------------------------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applying | processing chart: osh-kube-system-ingress, overall completion: 4.0% |
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-07-13 23:03:31,929] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | apply-failed | operation aborted, check logs for detail |
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
7-14 01:16:26.603 1107082 INFO sysinv.api.controllers.v1.mtce_api [-] number of calls to rest_api_request=1 (max_retry=3)
2019-07-14 01:16:26.603 1107082 INFO sysinv.api.controllers.v1.rest_api [-] PATCH cmd:http://localhost:2112/v1/hosts/b0a796c9-0781-4681-930f-df5fbfbf47c7 hdr:{'Content-type': 'application/json', 'User-Agent': 'sysinv/1.0'} payload:{"tboot": "false", "ttys_dcd": null, "subfunctions": "worker", "bm_ip": null, "install_state": "completed+", "rootfs_device": "/dev/disk/by-path/pci-0000:85:00.0-nvme-1", "bm_username": "root", "operation": "modify", "serialid": null, "id": 4, "console": "ttyS0,115200n8", "uuid": "b0a796c9-0781-4681-930f-df5fbfbf47c7", "mgmt_ip": "192.168.222.27", "software_load": "19.01", "config_status": null, "hostname": "compute-2", "iscsi_initiator_name": "iqn.1994-05.com.redhat:164f9c5ef3", "capabilities": {}, "install_output": "text", "location": {}, "availability": "online", "invprovision": "provisioned", "peer_id": null, "administrative": "locked", "personality": "worker", "recordtype": "standard", "bm_mac": null, "mtce_info": null, "isystem_uuid": "1fb14036-ce32-4aba-a78e-5733f37dc03c", "boot_device": "/dev/disk/by-path/pci-0000:85:00.0-nvme-1", "install_state_info": null, "mgmt_mac": "3c:fd:fe:ac:61:9c", "subfunction_oper": "disabled", "target_load": "19.01", "vsc_controllers": null, "operational": "disabled", "subfunction_avail": "online", "action": "unlock", "bm_type": "bmc"}
2019-07-14 01:16:26.609 1107082 INFO sysinv.api.controllers.v1.rest_api [-] Response={u'status': u'pass'}
2019-07-14 01:16:26.623 1107082 INFO sysinv.api.controllers.v1.host [-] stx-openstack system app is present but not applied, skipping re-apply
2019-07-14 01:16:26.624 1107082 INFO sysinv.api.controllers.v1.host [-] host compute-2 ihost_patch_end_2019-07-14-01-16-26 patch
2019-07-14 01:16:26.639 1107083 INFO sysinv.api.controllers.v1.host [-] compute-2 ihost_patch_start_2019-07-14-01-16-26 patch
2019-07-14 01:16:26.640

Severity
--------
Major
Steps to Reproduce
------------------
1. Precheck application apply and system is good health
2. Lock compute-2
3. Try to modify fake processor p2 in compute-2 as per description failure no processor found.
system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.222.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-cpu-modify -f vswitch -p2 1 compute-2'
3. Unlock compute-2
4. Wait for the application to apply never applied.

System Configuration
--------------------
Storage system

Expected Behavior
------------------
Application apply successful.

Actual Behavior
----------------
As per description never applied.

Reproducibility
---------------

System Configuration
--------------------
Regular system
Load
----
20190713T013000Z

Last Pass
---------
Last passed on load 20190629T013000Z

Timestamp/Logs
--------------
compute lock 2019-07-13T22:18:17.167986+00:00

apply failure seen 2019-07-13 23:03:31,929

Test Activity
-------------
Regression test

Numan Waheed (nwaheed)
tags: added: stx.regression stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 - application apply failure after compute lock/unlock which is a basic operation

tags: added: stx.2.0 stx.containers
Changed in starlingx:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Bob Church (rchurch)
Revision history for this message
Bob Church (rchurch) wrote :

Do we have logs for this? I don't see anything attached. I'll try and reproduce with a recent load.

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
Frank Miller (sensfan22)
Changed in starlingx:
assignee: Bob Church (rchurch) → Stefan Dinescu (stefandinescu)
Frank Miller (sensfan22)
Changed in starlingx:
assignee: Stefan Dinescu (stefandinescu) → Bob Church (rchurch)
Frank Miller (sensfan22)
Changed in starlingx:
assignee: Bob Church (rchurch) → Stefan Dinescu (stefandinescu)
Revision history for this message
Stefan Dinescu (stefandinescu) wrote :

The issue seems reproducible when unlocking a compute-node when one of the controllers is locked or in an unavailable/offline state.

In the above case, pods cannot be launched on that controller node. When unlocking a compute-node, if the openstack application needs to be re-applied automatically, it expects to launch a number of pods equal to the number of controllers (in our case 2). Since one controller is unable to do so, it still waits to launch all replicas and in the end times-out.

Revision history for this message
Frank Miller (sensfan22) wrote :

Updating this issue to won't fix. According to Stefan's analysis this issue can only be reproduced when one of the 2 controllers is locked or out of service and then a compute is unlocked. The solution for this scenario is to unlock the controller and then the stx-openstack application can be applied.

Changed in starlingx:
status: Triaged → Won't Fix
assignee: Stefan Dinescu (stefandinescu) → Anujeyan Manokeran (anujeyan)
Yang Liu (yliu12)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.