stx-monitor app stuck at "applying" status

Bug #1865210 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Peng Peng

Bug Description

Brief Description
-----------------
After stx-monitor app uploaded and apply, the status stuck at "applying".
(LP-1856078 occurred prior this issue)

Severity
--------
Major

Steps to Reproduce
------------------
apply stx-monitor app

TC-name: stx_monitor/test_stx_monitor.py::test_stx_monitor

Expected Behavior
------------------
status become applied

Actual Behavior
----------------
status stuck at applying

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
One node system

Lab-name: wcp-112

Branch/Pull Time/Commit
-----------------------
2020-02-27_04-10-00

Last Pass
---------
2020-02-25_17-07-51

Timestamp/Logs
--------------
[2020-02-28 08:43:24,002] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-02-28 08:43:24,002] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-upload -n stx-monitor /home/sysadmin/stx-monitor.tgz'
[2020-02-28 08:43:25,274] 436 DEBUG MainThread ssh.expect :: Output:
+---------------+----------------------------------+
| Property | Value |
+---------------+----------------------------------+
| active | False |
| app_version | 1.0-1 |
| created_at | 2020-02-28T08:43:25.230123+00:00 |
| manifest_file | stx-monitor.yaml |
| manifest_name | monitor-armada-manifest |
| name | stx-monitor |
| progress | None |
| status | uploading |
| updated_at | None |
+---------------+----------------------------------+

2020-02-28 08:43:32,997] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-02-28 08:43:32,997] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-apply stx-monitor'

[2020-02-28 09:42:48,610] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-02-28 09:42:49,703] 436 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+------------------+--------------+------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+------------------+--------------+------------------------------------------+
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | apply-failed | operation aborted, check logs for detail |
| stx-monitor | 1.0-1 | monitor-armada-manifest | stx-monitor.yaml | applying | None |
+---------------------+---------+-------------------------------+------------------+--------------+------------------------------------------+

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

If https://bugs.launchpad.net/starlingx/+bug/1856078 was hit first and the workaround was not applied to restart tiller, then it's expected that the stx-monitor application would also be stuck.

Marking as a duplicate of https://bugs.launchpad.net/starlingx/+bug/1856078

tags: added: stx.monitor
tags: added: stx.4.0 stx.containers
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Peng Peng (ppeng)
Revision history for this message
Angie Wang (angiewang) wrote :
Download full text (3.4 KiB)

This is another issue. Although stx-monitor is supposed to be stuck due to the error in https://bugs.launchpad.net/starlingx/+bug/1856078, the status and progress for stx-monitor should be updated properly(ie..apply-failed).

However, the DB status failed to be updated to "apply-failed" and the progress failed to be updated with an error msg due to the error msg exceeds the maximum length.

See the log,
sysinv 2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (psycopg2.DataError) value too long for type character varying(255)
 [SQL: 'UPDATE kube_app SET updated_at=%(updated_at)s, status=%(status)s, progress=%(progress)s WHERE kube_app.id = %(id_1)s'] [parameters: {'status': u'apply-failed', 'progress': u'(404)\nReason: Not Found\nHTTP response headers: HTTPHeaderDict({\'Date\': \'Fri, 28 Feb 2020 08:43:34 GMT\', \'Content-Length\': \'210\', \'Content ... (161 characters truncated) ... ","message":"secrets \\"ceph-pool-kube-rbd\\" not found","reason":"NotFound","details":{"name":"ceph-pool-kube-rbd","kind":"secrets"},"code":404}\n\n', 'id_1': 3, 'updated_at': datetime.datetime(2020, 2, 28, 8, 43, 34, 657559)}]: DataError: value too long for type character varying(255)
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters context)
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters DataError: value too long for type character varying(255)
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters
2020-02-28 08:43:34.658 86315 ERROR oslo_db.sqlalchemy.exc_filters
sysinv 2020-02-28 08:43:34.661 86315 ERROR sysinv.openstack.common.rpc.amqp [-] Exception during message handling: DBError: (psycopg2.DataError) value too long for type character varying(255)
 [SQL: 'UPDATE kube_app SET updated_at=%(updated_at)s, status=%(status)s, progress=%(progress)s WHERE kube_app.id = %(id_1)s'] [parameters: {'status': u'apply-failed', 'progress': u'(404)\nReason: Not Found\nHTTP response headers: HTTPHeaderDict({\'Date\': \'Fri, 28 Feb 2020 08:43:34 GMT\', \'Content-Length\': \'210\', \'Content ... (161 characters truncated) ... ","message":"secrets \\"ceph-pool-kube-rbd\\" not found","reason":"NotFound","details":{"name":"ceph-pool-kube-rbd","kind":"secrets"},"code":404}\n\n', 'id_1': 3, 'updated_at': datetime.datetime(2020, 2, 28, 8, 43, 34, 657559)}]

We should store a custom error emg based on the http response instead of the following error msg gotten from http request directly.

(404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Date': 'Fri, 28 Feb 2020 08:43:34 GMT', 'Content-Length': '210', 'Content-Type': 'applic...

Read more...

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Updating the status to match the duplicate LP: https://bugs.launchpad.net/starlingx/+bug/1856078
Merged on 2020-04-22

Changed in starlingx:
status: Triaged → Fix Released
Peng Peng (ppeng)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.