remote host collect fails without sourcing openrc first

Bug #1837412 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Eric MacDonald

Bug Description

Brief Description
-----------------
collect is not properly loading openrc env variables on its
own which causes collect of remote hosts to fail from the active
controller if the openrc variables are not in context
before collect runs.

Work Around: manually source /etc/platform/openrc before running collect

Severity
--------
Major

Steps to Reproduce
------------------
Swact to controller-1
run collect without running openrc

Expected Behavior
------------------
collect of remote hosts succeed

Actual Behavior
----------------
collect of remote hosts fail

Reproducibility
---------------
<Reproducible/Intermittent/Seen once>
State if the issue is 100% reproducible, intermittent or seen once. If it is intermittent, state the frequency of occurrence

System Configuration
--------------------
Duplex system

Branch/Pull Time/Commit
-----------------------

SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2019-07-19_00-10-00"
SRC_BUILD_ID="192"

JOB="Titanium_R6_build"
BUILD_BY="jenkins"
BUILD_NUMBER="192"
BUILD_HOST="yow-cgts4-lx.wrs.com"
BUILD_DATE="2019-07-19 00:13:49 -0400"

Last Pass
---------
Unknown

Timestamp/Logs
--------------
controller-1:~$ collect controller-0
Error: cannot collect data from unknown host 'controller-0' (reason:34)
controller-1:~$ collect compute-1
Error: cannot collect data from unknown host 'compute-1' (reason:34)
controller-1:~$ source /etc/platform/openrc
[sysadmin@controller-1 ~(keystone_admin)]$ collect compute-1
[sudo] password for sysadmin:

Test Activity
-------------
Developer Testing

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

When we switch from nova to platform for openrc file I found that the openrc file was being sourced multiple (thinking 3) times. This was causing a large delay when running collect.

The common collect_utils file now sources it and both collect and collect_hosts source the collect_utils.

However it seems the env is not being inherited by collect after sourcing collect_utils as expected.

Need to investigate that.

Quick fix is to just add it back into both scripts.
However, I think its worthwhile figuring out how to properly inherit the env.

Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

Thinking the caller is not inheriting the env vars that are created from nested sourcing.

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

Fix implemented by turning the block of code that is nested sourced into a function that is called in collect and collect_host ; the 2 places its needed.

No change is startup delay.

Testing looks good.

Will post review later today once a few more test cases are run.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 as the collect tool is essential for issue triage.

tags: added: stx.2.0 stx.tools
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/672149

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/672149
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=41e60486af83c76f5ac58c5c0bf30ee1835b9d7d
Submitter: Zuul
Branch: master

commit 41e60486af83c76f5ac58c5c0bf30ee1835b9d7d
Author: Eric MacDonald <email address hidden>
Date: Mon Jul 22 15:15:23 2019 -0400

    Fix sourcing openrc in collect

    The openrc file is being sourced in a short lived
    shell rather than in the shell of the sourcing code.

    As a result the environment created by the 'source'
    does not persist and the inventory request fails
    which prevents collect from learning/validating
    remote host names.

    This update corrects that and makes the code block
    involved in learning openrc variables a function
    call rather than inline whenever collect_utils
    is sourced.

    Test Plan:

    PASS: Verify collect all with no openrc already sourced
    PASS: Verify collect select hosts (same no pre-openrc)
    PASS: Verify collect self controller (same no pre-openrc)
    PASS: Verify collect of self on compute

    Change-Id: I41a097d9d751351f178a1366eb76dfb526c57b19
    Closes-Bug: 1837412
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.