Subcloud goes offline after managing

Bug #1839377 reported by Tyler Smith
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Tyler Smith

Bug Description

Brief Description
-----------------
After provisioning a subcloud and doing a 'dcmanager subcloud manage' on it the following series of events occurs:

- patching fails to sync, stays at unknown, error from logs is 401 or 503. patching commands on the subcloud work fine, endpoints are setup correctly. inventory and identity endpoints go in-sync

- sysinv and fm on the subcloud fail to respond:
[sysadmin@controller-0 ~(keystone_admin)]$ system host-list
None
[sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list
The request you have made requires authentication. (HTTP 401) (Request-ID: req-509afff6-520b-4807-8283-3b6131884ada)
sysinv logs are filled with authentication issues as well

- the subcloud goes offline after a short time. Not due to connectivity issues, can still ping between the systems.

Severity
--------
Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
Seems to happen on both AIO-DX and standard subcloud installs.

Expected Behavior
------------------
subcloud to go to managed state and stay online, sysinv and fm to remain usable.

Reproducibility
---------------
Reproducible
Tried on two different loads with two different configurations

System Configuration
--------------------
standard system controller with AIO-DX or standard subcloud install. Tested in VBOX

Branch/Pull Time/Commit
-----------------------
OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190804T233000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="203"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-08-04 23:30:00 +0000"

Last Pass
---------
About a month ago deployment was working

Timestamp/Logs
--------------
some sysinv logs:
2019-08-07 19:37:12.575 105160 INFO keystonemiddleware.auth_token [-] Retrying validation
2019-08-07 19:37:12.958 105160 INFO keystonemiddleware.auth_token [-] Identity server rejected authorization
2019-08-07 19:37:12.958 105160 WARNING keystonemiddleware.auth_token [-] Identity response: {"error":{"code":401,"message":"The request you have made requires authentication.","title":"Unauthorized"}}

Test Activity
-------------
Developer Testing

description: updated
Changed in starlingx:
importance: Undecided → High
Revision history for this message
Frank Miller (sensfan22) wrote :

Marking high priority/stx.3.0 gating as distributed cloud is an stx.3.0 deliverable.

Changed in starlingx:
status: New → Triaged
tags: added: stx.3.0
Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
Revision history for this message
Frank Miller (sensfan22) wrote :

Assigning to Andy to start with a triage of this issue. Once the triage is complete and the issue is understood better, we will decide who should implement the solution.

Revision history for this message
Andy (andy.wrs) wrote :

The issue is that sysinv's keystone password is not setup properly in this subcloud. From /etc/sysinv/sysinv.conf:

[keystone_authtoken]
username=sysinv
user_domain_name=Default
password=u'1fe015eb532dTi0*'

We can see password is actually set to "u'1fe015eb532dTi0*'" instead of what we expect "1fe015eb532dTi0*" (Note the unicode formatting). Once the subcloud's users is synced by SystemController, sysinv's password is changed to be the same as sysinv in SystemController (which is "1fe015eb532dTi0*"). From then on, sysinv can no longer authenticate with keystone.

I verified this by manually change sysinv's password in sysinv.conf, restart sysinv-inv service, and system host-list start to work again. (If you change it back to u'1fe015eb532dTi0*', system host-list stop working with exact the same errors as reported).

Changed in starlingx:
assignee: Andy (andy.wrs) → Tyler Smith (tyler.smith)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/678248

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/678248
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=9217e03a6255d469f296f562c73972e8af68c6fd
Submitter: Zuul
Branch: master

commit 9217e03a6255d469f296f562c73972e8af68c6fd
Author: Tyler Smith <email address hidden>
Date: Fri Aug 23 10:54:23 2019 -0400

    Subcloud goes offline after managing

    Formatting of the passwords was incorrect leading to
    authentication issues

    Change-Id: If181e38f07dc66b6e4b12bf0b5a7fb123d75fbb2
    Closes-Bug: 1839377
    Signed-off-by: Tyler Smith <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.