stx-monitor application-delete failed by Timeout while waiting on RPC response

Bug #1868567 reported by Peng Peng
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Kevin Smith

Bug Description

Brief Description
-----------------
stx-monitor app applied on system successfully. After running "application-remove stx-monitor", the status changed from "applied" to "removing" and "uploaded". When running "application-delete stx-monitor", cmd failed at
Timeout while waiting on RPC response - topic: "sysinv.conductor_manager", RPC method: "perform_app_delete" info: "<unknown>"

But stx-monitor app was deleted successfully afterward.

Severity
--------
Major

Steps to Reproduce
------------------
stx-monitor app applied on system successfully.
running "application-remove stx-monitor"
running "application-delete stx-monitor"

TC-name: stx_monitor/test_stx_monitor.py::test_stx_monitor[deploy_and_remove]

Expected Behavior
------------------
cmd "application-delete stx-monitor" return properly

Actual Behavior
----------------
cmd "application-delete stx-monitor" return error

Reproducibility
---------------
Reproducible 50%

System Configuration
--------------------
One node system
Two node system
Multi-node system

Lab-name:

Branch/Pull Time/Commit
-----------------------
stx master as of build-date-time

Last Pass
---------
Load: 2020-03-20_04-10-00

Timestamp/Logs
--------------
[2020-03-23 05:46:47,692] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-03-23 05:46:49,241] 436 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+------------------+----------+-----------+
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-monitor | 1.0-1 | monitor-armada-manifest | stx-monitor.yaml | applied | completed |
+---------------------+---------+-------------------------------+------------------+----------+-----------+
controller-1:~$
[2020-03-23 05:46:49,241] 314 DEBUG MainThread ssh.send :: Send 'echo $?'
[2020-03-23 05:46:49,345] 436 DEBUG MainThread ssh.expect :: Output:
0
controller-1:~$
[2020-03-23 05:46:49,346] 1604 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wp_8_12
[2020-03-23 05:46:49,346] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-03-23 05:46:49,346] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-remove stx-monitor'

[2020-03-23 05:46:50,926] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-03-23 05:46:52,537] 436 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+------------------+----------+-------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+------------------+----------+-------------------------------+
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-monitor | 1.0-1 | monitor-armada-manifest | stx-monitor.yaml | removing | deleting application manifest |
+---------------------+---------+-------------------------------+------------------+----------+-------------------------------+
controller-1:~$
[2020-03-23 05:46:52,537] 314 DEBUG MainThread ssh.send :: Send 'echo $?'
[2020-03-23 05:46:52,645] 436 DEBUG MainThread ssh.expect :: Output:
0
controller-1:~$
[2020-03-23 05:46:57,651] 1604 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wp_8_12
[2020-03-23 05:46:57,651] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-03-23 05:46:57,651] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-03-23 05:46:59,553] 436 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+------------------+----------+-----------+
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-monitor | 1.0-1 | monitor-armada-manifest | stx-monitor.yaml | uploaded | completed |
+---------------------+---------+-------------------------------+------------------+----------+-----------+
controller-1:~$
[2020-03-23 05:46:59,553] 314 DEBUG MainThread ssh.send :: Send 'echo $?'
[2020-03-23 05:46:59,660] 436 DEBUG MainThread ssh.expect :: Output:
0
controller-1:~$
[2020-03-23 05:46:59,661] 254 INFO MainThread container_helper.wait_for_apps_status:: ['stx-monitor'] reached expected status uploaded
[2020-03-23 05:46:59,661] 404 INFO MainThread container_helper.remove_app:: stx-monitor removed successfully
[2020-03-23 05:46:59,661] 146 INFO MainThread test_stx_monitor.cleanup_app:: Delete application stx-monitor
[2020-03-23 05:46:59,661] 1604 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wp_8_12
[2020-03-23 05:46:59,662] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-03-23 05:46:59,662] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-03-23 05:47:01,523] 436 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+------------------+----------+-----------+
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-monitor | 1.0-1 | monitor-armada-manifest | stx-monitor.yaml | uploaded | completed |
+---------------------+---------+-------------------------------+------------------+----------+-----------+
controller-1:~$
[2020-03-23 05:47:01,523] 314 DEBUG MainThread ssh.send :: Send 'echo $?'
[2020-03-23 05:47:01,627] 436 DEBUG MainThread ssh.expect :: Output:
0
controller-1:~$
[2020-03-23 05:47:01,627] 1604 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wp_8_12
[2020-03-23 05:47:01,627] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-03-23 05:47:01,628] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-delete stx-monitor'
[2020-03-23 05:48:03,064] 436 DEBUG MainThread ssh.expect :: Output:
Timeout while waiting on RPC response - topic: "sysinv.conductor_manager", RPC method: "perform_app_delete" info: "<unknown>"
controller-1:~$
[2020-03-23 05:48:03,065] 314 DEBUG MainThread ssh.send :: Send 'echo $?'
[2020-03-23 05:48:03,168] 436 DEBUG MainThread ssh.expect :: Output:
1

[2020-03-23 05:48:53,621] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-03-23 05:48:54,975] 436 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+---------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+---------------+----------+-----------+
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
+---------------------+---------+-------------------------------+---------------+----------+-----------+

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
summary: - CMM application-delete failed by Timeout while waiting on RPC response
+ CMD application-delete failed by Timeout while waiting on RPC response
Changed in starlingx:
assignee: nobody → Kevin Smith (kevin.smith.wrs)
Revision history for this message
Kevin Smith (kevin.smith.wrs) wrote : Re: CMD application-delete failed by Timeout while waiting on RPC response

This is caused by https://review.opendev.org/#/c/711096/

I believe this is what is occurring:

Previous to the above change, as part of application-remove the deletion of the namespace caused an implicit wait on pod termination before advancing to 'uploaded' status. As the monitor namespace is not deleted anymore as part of application-remove (we want to keep the pvcs which requires the namespace), the application state potentially advances to 'uploaded' before the pods have all terminated. This implies an application-delete command can then be issued. As the namespace is now removed as part of application-delete however, if there are still terminating pods the namespace deletion will subsequently be delayed which could cause the application-delete rpc timeout that is being seen. Note that the application is still deleted successfully (and application-list correctly shows it removed when it is finished), the application-delete command will just output the RPC timeout error.

I am looking into a change that will wait for pod termination as part of application-remove for the stx-monitor application.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - issue related to app delete which is not a common use-case, but given the issue was introduced in a recent commit, it should be fixed in stx.4.0

Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
tags: added: stx.4.0 stx.monitor
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/714613

Changed in starlingx:
status: Triaged → In Progress
Yang Liu (yliu12)
summary: - CMD application-delete failed by Timeout while waiting on RPC response
+ stx-monitor application-delete failed by Timeout while waiting on RPC
+ response
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/714613
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=d7ba6775212401f2bfc0bee04febe661152e504d
Submitter: Zuul
Branch: master

commit d7ba6775212401f2bfc0bee04febe661152e504d
Author: Kevin Smith <email address hidden>
Date: Mon Mar 23 19:06:49 2020 -0400

    Wait for pod termination on stx-monitor remove

    On removal of the stx-monitor application, wait for all pods
    to have terminated before moving to 'uploaded' status.
    This will prevent the user from issuing an application-delete
    command which could possibly timeout.

    Change-Id: I116a98bdc60a4a7fe05e50eb9b4ddd4e6ef2e24f
    Closes-Bug: 1868567
    Signed-off-by: Kevin Smith <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Peng Peng (ppeng) wrote :

Verified on
Lab: WCP_71_75
Load: 2020-04-20_20-00-00

tags: removed: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729812

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (37.5 KiB)

Reviewed: https://review.opendev.org/729812
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=539d476456277c22d0dcbc3cbbc832e623242264
Submitter: Zuul
Branch: f/centos8

commit 320cc40de8518787c2be234d7fdf88ec0a462df2
Author: Don Penney <email address hidden>
Date: Wed May 13 13:06:11 2020 -0400

    Add auto-versioning to starlingx/config packages

    This update makes use of the PKG_GITREVCOUNT variable to auto-version
    the packages in this repo.

    Change-Id: I3a2c8caeb4b4647608978b1f2ccfcf0661508803
    Depends-On: https://review.opendev.org/727837
    Story: 2006166
    Task: 39766
    Signed-off-by: Don Penney <email address hidden>

commit d9f2aea0fb228ed69eb9c9262e29041eedabc15d
Author: Sharath Kumar K <email address hidden>
Date: Wed Apr 22 16:22:22 2020 +0200

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch9 changes.

    Story: 2006387
    Task: 39524

    Change-Id: Ia1fe0f2baafb78c974551100f16e6a7d99882f15
    Signed-off-by: Sharath Kumar K <email address hidden>

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec file
    2. Rename TIS to StarlingX for .service files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch10 changes.

    Story: 2006387
    Task: 36202

    Change-Id: I404ce0da2621495175ad31489e9ad6f7b0211e26
    Signed-off-by: Sharath Kumar K <email address hidden>

commit d141e954fa6bbf688929ec90d1b6604a97792c43
Author: Teresa Ho <email address hidden>
Date: Tue Mar 31 10:08:57 2020 -0400

    Sysinv extensions for FPGA support

    This update adds cli and restapi to support FPGA device
    programming.

    CLI commands:
    system device-image-apply
    system device-image-create
    system device-image-delete
    system device-image-list
    system device-image-remove
    system device-image-show
    system device-image-state-list
    system device-label-list
    system host-device-image-update
    system host-device-image-update-abort
    system host-device-label-assign
    system host-device-label-list
    system host-device-label-remove

    Story: 2006740
    Task: 39498

    Change-Id: I556c2e7a51b3931b5a66ab27b67f51e3a8aebd9f
    Signed-off-by: Teresa Ho <email address hidden>

commit 491cca42ed854d2cb3ee3646b93c56a4f45f563c
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:25:26 2020 +0000

    Qcow2 conversion to raw can be done using 'image-conversion' filesystem

    1. Conversion filesystem can be added before/after
       stx-openstack is applied
    2. If conversion filesystem is added after stx-openstack
       is applied, changes to stx-openstack will only take effec...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.