DC patch orchestration: strategy step failed to update in DB when hit an expection during pre alarm check

Bug #1978882 reported by Yuxing
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Yuxing

Bug Description

Brief Description
-----------------
In patch orchestration, we pre-check the management affecting alarm(s) against the subcloud. If we fail to get the alarm, the error will be stored in detail along with updating the patch strategy status in the database.

When we hit an exception, e.g. a connection failure, the exception message is too long to store in DB( type var(255)). It results in status update failure.

Severity
-----------------
Major

Steps to Reproduce
-----------------
Patch orchestration in a DC. Connection lost during pre-check.

Expected Behavior
-----------------
Strategy failed.

Actual Behavior
-----------------
Strategy stuck

Reproducibility
-----------------
Reproducible

System Configuration
-----------------
DC

Load info (eg: 2022-03-10_20-00-07)
-----------------
WRCP_Dev June 3rd

Last Pass
-----------------
na

Timestamp/Logs
-----------------
2022-06-15 03:46:15.590 2400232 INFO dcmanager.orchestrator.patch_orch_thread [-] Finishing patch strategy for subcloud659
2022-06-15 17:47:43.428 2400232 INFO dcmanager.orchestrator.patch_orch_thread [-] Deleting patch strategy for subcloud659
2022-06-15 21:05:23.439 2682011 ERROR dccommon.drivers.openstack.sdk_platform [-] keystone_client region subcloud659 error: Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTPSConnectionPool(host='2620:10a:a001:ac12::5262', port=5001): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)): ConnectFailure: Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTPSConnectionPool(host='2620:10a:a001:ac12::5262', port=5001): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
2022-06-15 21:05:23.440 2682011 WARNING dcmanager.orchestrator.patch_orch_thread [-] Failure initializing KeystoneClient subcloud659: ConnectFailure: Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTPSConnectionPool(host='2620:10a:a001:ac12::5262', port=5001): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
2022-06-15 21:05:23.440 2682011 ERROR dcmanager.orchestrator.patch_orch_thread [-] Failed to obtain health report for subcloud659 due to Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTPSConnectionPool(host='2620:10a:a001:ac12::5262', port=5001): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)): ConnectFailure: Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTPSConnectionPool(host='2620:10a:a001:ac12::5262', port=5001): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
 [SQL: 'UPDATE strategy_steps SET updated_at=%(updated_at)s, state=%(state)s, details=%(details)s, finished_at=%(finished_at)s WHERE strategy_steps.id = %(strategy_steps_id)s'] [parameters: {'strategy_steps_id': 9052, 'finished_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 440975), 'state': 'failed', 'updated_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 709942), 'details': u"Failed to obtain health report for subcloud659 due to Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTP ... (127 characters truncated) ... ctionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))"}]: DataError: value too long for type character varying(255)
 [SQL: 'UPDATE strategy_steps SET updated_at=%(updated_at)s, state=%(state)s, details=%(details)s, finished_at=%(finished_at)s WHERE strategy_steps.id = %(strategy_steps_id)s'] [parameters: {'strategy_steps_id': 9052, 'finished_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 440975), 'state': 'failed', 'updated_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 709942), 'details': u"Failed to obtain health report for subcloud659 due to Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTP ... (127 characters truncated) ... ctionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))"}]: DBError: (psycopg2.DataError) value too long for type character varying(255)
 [SQL: 'UPDATE strategy_steps SET updated_at=%(updated_at)s, state=%(state)s, details=%(details)s, finished_at=%(finished_at)s WHERE strategy_steps.id = %(strategy_steps_id)s'] [parameters: {'strategy_steps_id': 9052, 'finished_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 440975), 'state': 'failed', 'updated_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 709942), 'details': u"Failed to obtain health report for subcloud659 due to Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTP ... (127 characters truncated) ... ctionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))"}]
2022-06-15 21:05:23.920 2682011 ERROR dcmanager.orchestrator.patch_orch_thread [SQL: 'UPDATE strategy_steps SET updated_at=%(updated_at)s, state=%(state)s, details=%(details)s, finished_at=%(finished_at)s WHERE strategy_steps.id = %(strategy_steps_id)s'] [parameters: {'strategy_steps_id': 9052, 'finished_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 440975), 'state': 'failed', 'updated_at': datetime.datetime(2022, 6, 15, 21, 5, 23, 709942), 'details': u"Failed to obtain health report for subcloud659 due to Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTP ... (127 characters truncated) ... ctionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))"}]
[sysadmin@controller-0 ~(keystone_admin)]$ 'details': u"Failed to obtain health report for subcloud659 due to Unable to establish connection to https://[2620:10a:a001:ac12::5262]:5001/v3/auth/tokens: HTTP ... (127 characters truncated) ... ctionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8aae517410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))"}]: DataError: value too long for type character varying(255)

Alarms
-----------------
na

Test Activity
-----------------
 Developer Testing

Workaround
-----------------
na

Yuxing (yuxing)
Changed in starlingx:
assignee: nobody → Yuxing (yuxing)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/846068

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/846068
Committed: https://opendev.org/starlingx/distcloud/commit/2d8c0f9bed4ada1b26fcaa08638a6fc642cd3127
Submitter: "Zuul (22348)"
Branch: master

commit 2d8c0f9bed4ada1b26fcaa08638a6fc642cd3127
Author: Gabriel Silva Trevisan <email address hidden>
Date: Wed Jun 15 19:48:02 2022 -0300

    Shorten subcloud pre-check error on patch update

    When the subcloud management alarm pre-check fails during patch update,
    the exception string is added to the failure message in the strategy
    step. This might cause its details field to exceed the database limit.

    Remove the exception from the saved message and add an instruction for
    where to check for more details. Continue logging the exception and its
    stack trace to the orchestrator logs.

    Test Plan:

    PASS:
    - Ensure tox tests are passing
    - Verify that strategy step fails when subcloud pre-check call fails

    Closes-Bug: 1978882

    Signed-off-by: Gabriel Silva Trevisan <email address hidden>
    Change-Id: I2ac4f53488882a59c892342139e410b516a18d6f

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.distcloud
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.