RPC Timeout on Backup Operation

Bug #1953204 reported by Iago Regiani
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Iago Regiani

Bug Description

Brief Description
-----------------

The backup playbook notifies all applications via an RPC call, triggering lifecycle hooks, and nginx-ingress-controller fails to complete the post backup hook on time.

Severity
--------
Minor

Steps to Reproduce
------------------

Perform an upgrade from r/stx4.0 to r/stx5.0, then run the Ansible backup

Expected Behavior
------------------

Backup playbook should complete successfully

Actual Behavior
----------------

backup playbook fails at the task "Fail if there is some other/internal error when sending lifecycle hook"

Reproducibility
---------------
Reproducible

System Configuration
--------------------
SX

Branch/Pull Time/Commit
-----------------------
NA

Last Pass
---------
NA

Timestamp/Logs
--------------
sysinv 2021-10-14 20:56:44.595 398375 CRITICAL sysinv [-] Unhandled error: Timeout: Timeout while waiting on RPC response - topic: "sysinv.conductor_manager", RPC method: "backup_restore_lifecycle_actions" info: "<unknown>"
2021-10-14 20:56:44.595 398375 ERROR sysinv Traceback (most recent call last):
2021-10-14 20:56:44.595 398375 ERROR sysinv File "/usr/bin/sysinv-utils", line 10, in <module>
2021-10-14 20:56:44.595 398375 ERROR sysinv sys.exit(main())
2021-10-14 20:56:44.595 398375 ERROR sysinv File "/usr/lib64/python2.7/site-packages/sysinv/cmd/utils.py", line 290, in main
2021-10-14 20:56:44.595 398375 ERROR sysinv CONF.action.func(CONF.action.operation, success)
2021-10-14 20:56:44.595 398375 ERROR sysinv File "/usr/lib64/python2.7/site-packages/sysinv/cmd/utils.py", line 236, in send_notification
2021-10-14 20:56:44.595 398375 ERROR sysinv ok, app = rpcapi.backup_restore_lifecycle_actions(ctx, operation, success)
2021-10-14 20:56:44.595 398375 ERROR sysinv File "/usr/lib64/python2.7/site-packages/sysinv/conductor/rpcapi.py", line 1909, in backup_restore_lifecycle_actions
2021-10-14 20:56:44.595 398375 ERROR sysinv timeout=120,
2021-10-14 20:56:44.595 398375 ERROR sysinv File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/proxy.py", line 126, in call
2021-10-14 20:56:44.595 398375 ERROR sysinv exc.info, real_topic, msg.get('method'))
2021-10-14 20:56:44.595 398375 ERROR sysinv Timeout: Timeout while waiting on RPC response - topic: "sysinv.conductor_manager", RPC method: "backup_restore_lifecycle_actions" info: "<unknown>"

Test Activity
-------------
Feature Testing

Workaround
----------
Run ansible backup playbook again

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: low / not gating - issue w/ specific B&R test; workaround exists

tags: added: stx.update
Changed in starlingx:
importance: Undecided → Low
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: Fix can still be merged in stx master if deemed low risk. However, it's not required for the r/stx.6.0 branch if it doesn't make it before branch creation.

Changed in starlingx:
assignee: nobody → Iago Regiani (iregiani)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/820280
Committed: https://opendev.org/starlingx/config/commit/4fe81979697c9672f0110b2834bf633bfdaaf3f0
Submitter: "Zuul (22348)"
Branch: master

commit 4fe81979697c9672f0110b2834bf633bfdaaf3f0
Author: Regiani Iago <email address hidden>
Date: Thu Dec 2 21:29:03 2021 -0300

    RPC Timeout on Backup Operation

    The backup playbook notifies all applications via an RPC call,
    triggering lifecycle hooks.

    This increases the timeout to give the applications more time to
    complete, especially the nginx-ingress-controller on the post
    backup hook.

    Test Plan:

    PASS: Backup playbook is executed without errors

    Closes-bug: #1953204

    Signed-off-by: Regiani Iago <email address hidden>
    Change-Id: I0951580465548340958ab3ef33afbf9d61f41d79

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: Adding stx.6.0 since the fix will be available for that release

tags: added: stx.6.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.